pSearch Algorithms: PC |
Introduction
The PC algorithm was designed to discover causal relationships using observational data. Discrete and continuous data sets are allowed, and a graph can also be used as input to this algorithm. Data points should be independent and identically distributed. Also, any input given to this algorithm should satisfy these three main assumptions:
Some of these assumptions are not testable using observational data. They should come from prior knowledge or by partial experiments. For general information about model building algorithms, consult the Search Algorithms page.
Entering PC parameters
Consider the following example:
When the PC algorithm is chosen from the Search Object combo box,
the following window appears:
The parameters that are used by the PC algorithm can be specified in this window. The parameters are as follows:
Execute the search by either clicking on "OK" which brings up another (redundant) window, or clicking on "Execute" and "OK."
The PC algorithm returns a partially oriented graph where the nodes represent the variables given as input. In our example, the outcome should be as follows if the sample is representative of the population:
The are basically two types of edges that can appear in PC output:
In this case, the PC algorithm deduced that A is a direct cause of B, i.e., the causal effect goes from A to B and it is not intermediated by any of the other observed variable
In this case, the PC algorithm cannot tell if A causes B or if B causes A.
The absence of an edge between any pair of nodes means they are independent, or that the causal effect of one modelNode in the other is intermediate by other observed variables.
Notice, however, that a double directed edge sometimes appear in a PC search output:
Such edges are the result of a partial failure of the PC search. They may appear due to failure of assumptions (e.g., relationships are non-linear, the population graph is cyclic, etc.) or because the sample is not large enough and some statistical decisions are inconsistent. In a situation like that, the user may introduce prior knowledge to constraint the direction such edge may assume, collect more data or use a different algorithm. Knowledge of the domain will be essential.
Finally, a triplet of nodes may assume the following pattern:
In other words, in such patterns, A and B are connected by an undirected edge, A and C are connected by an undirected edge, and B and C are not connected by an edge. By the PC search assumptions, this means that B and C cannot both be cause of A. The three possible scenarios are:
In our example, some edges were compelled to be directed: X2 and X3 are causes of X4, and X4 is a cause of X5. However, we cannot tell much about the triplet (X1, X2, X3), but we know that X2 and X3 cannot both be causes of X1.