The PC algorithm was designed to discover causal relationships using observational or mixed observational and experimental data. Discrete and continuous data sets are allowed, and a graph can also be used as input to this algorithm. The consistency of the algorithm has been proved under several assumptions, which include:
Entering PC parameters
Create a new search object as illustrated below:
When the PC algorithm is chosen from the Search Object combo box,
the following window appears:
The parameters that are used by the PC algorithm can be specified in this window. The parameters are as follows:
Execute the search.
The PC algorithm returns a partially oriented graph where the nodes represent the variables given as input. In our example, the outcome should be as follows if the sample is representative of the population:
The are basically three types of edges that can appear in PC output:
In this case, the PC algorithm deduced that A is a direct cause of B, i.e., the causal effect goes from A to B and it is not intermediated by any of the other observed variable
In this case, the PC algorithm cannot tell if A causes B or if B causes A.
The absence of an edge between any pair of nodes means they are independent, or that the causal effect of one modelNode in the other is intermediate by other observed variables.
A doubly directed edge sometimes appear in a PC search output:
Such edges can arise when the assumption that there are no latent
common causes of A and B is false. They may arise in other cases
because of chance patterns in a sample or because of other violations
of assumptions. Such edges indicate something is amiss.
Finally, a triplet of nodes may assume the following pattern:
In other words, in such patterns, A and B are connected by an undirected edge, A and C are connected by an undirected edge, and B and C are not connected by an edge. By the PC search assumptions, this means that B and C cannot both be cause of A. The three possible scenarios are:
In our example, some edges were compelled to be directed: X2 and X3 are causes of X4, and X4 is a cause of X5. However, we cannot tell much about the triplet (X1, X2, X3), but we know that X2 and X3 cannot both be causes of X1.
Tetrad also provides the GES Algorithm, an alternative algorithm
that works under the same assumptions as PC. It is a good idea to
always run a search in both algorithms and compare the outputs.
The Statistical
Tests Used
In
deciding whether an indepedence or conditional independence
relations holds, the program uses a Table Reducing Chi
Square Test
This is a simple variation of the standard multivariate chi square
tests of conditional independence that has some useful properties from
the point of view of causal search. This test handles zero
cross-tabulation cells by eliminating rows and columns from every
conditional crosstabulation considered consisting entirely of zeros. X
square and degrees of freedom are then calculated from and summed over
the reduced conditional crosstabulation tables. No special handling is
done to adjust for small sample size with respect to the number of
cells in the table.
This test performs as well as a typical G Square test at large sample
sizes but degrades more gracefully. If less data is used, Type I edge
errors go up not at all or very little; instead, the number of correct
edges in the graph simply goes down. Even at very small sample sizes
(down to 30, with a true graph of 10 variables and 10 edges), if the PC
algorithm using this test returns a graph with one or two edges, there
is a strong probability that these edges are correct. From a
performance standpoint, this test runs more smoothly and more quickly
than other tests we've compared it to and does not hang under normal
conditions, even for large graphs.
The algorithm for this test is appended.
ALGORITHM:
Inputs:
1. Question: X _||_ Y | Z1, ..., Zn ?
2. Alpha level.
Output:
Judgement as to whether "X _||_ Y | Z1, ..., Zn" is true or false.
Algorithm:
df := 0
chisquare := 0.0
For each combination of values of the conditioning variables {
table := the conditional table for these values of the
conditioning variables.
reduced_table = table with any rows or columns consisting
entirely of zeros removed.
if (reduced_table is empty) {
skip to next combination of values for
conditioning variables.
}
num_reduced_rows = number of rows of reduced_table
num_reduced_cols = number of cols of reduced_table
df += (num_reduced_rows - 1)(num_reduced_cols - 1)
chisquare += the chi square value for reduced_table
}
Return true iff chisquare < x0 such that area under Chi Square
distribution with degrees of freedom = df for x > x0 equals alpha.
The following is pseudocode representing the way PC is implemented in Tetrad.
Step A: Form the complete undirected graph G over v1,...,vn. Step B (Fast Adjacency Search): For each depth d = 0, 1, ...: For for each variable x: "next_y": For each adjacent modelNode y to v: Let adjX = adj(x) - {y} Let adjY = adj(y) - {x} For each subset Sx of adjX up to size d: If x _||_ y | Sx, remove x---y from G. Continue "next_y." For each subset Sy of adjY up to size d: if x _||_ y | Sy, remove x---y from G. Continue "next_y." Step C: Orient colliders in G, as follows: For each modelNode x: For each pair of nodes y, z adjacent to x: If y and z are not adjacent: If ~(y _||_ z | x): Orient y-->x<--z as a collider. Step D: Apply orientation rules until no more orientations are possible. Rules to use: away from collider, away from cycle, kite1, kite2. (These are Meek's rules R1, R2, R3, and R4.) Away from collider: For each modelNode a: For each b, c in adj(a): If b-->a---c: Orient b-->a-->c. Else if c-->a---b: Orient c-->a-->b. Away from cycle: For each modelNode a: For b, c in adj(a): If a-->b-->c and a---c: Orient a-->c. Else if c-->b-->a and c---a: Orient c-->a. Kite 1: For each modelNode a: For each nodes b, c, d in adj(a) such that a---b, a---c, a---d, and !(c---d): If c-->b and d-->b: Orient a-->b. Kite 2: For each modelNode a: For each nodes b, c, d in adj(a) such that a---b, a---d, b is not adjacent to d, and either a---c, a-->c, or c-->a, If b-->c and c-->d: Orient a-->d. Else if d-->c and c-->b: Orient a-->b.