The search box in the main workspace area looks like this:

if the search is being conducted over a data set, or this:

if the search is being conducted directly over a graph (as a source of true conditional independence facts, to see how the algorithm behaves ideally).
Tetrad has a variety of search algorithms to assist in
searching for causal explanations of a body of data.
It should be noted that the Tetrad search procedures are exponential in the worst case (when all pairs of variables are dependent conditional on every other set of variables.) The search procedures may take a good bit of time, and there is no guarantee beforehand as to how long that will be.
These search algorithms are
different from those conventionally used in statistics.
- There are several search algorithms, differing in the assumptions they make.
- Many of the search algorithms allow the user to specify background information that will be used in the search procedure. In many cases the search results will be uninformative unless such background assumptions are explicitly made. This design not only provides for more flexibility, it also encourages the user to be conscious of the additional assumptions imposed in deriving a model from data.
- Even with background assumptions, data often do not determine a unique best or robust explanation. The search algorithms take in data and return information about a collection of alternative causal graphs that can explain features of the data. They do not usually return a unique graph, although they sometimes will if sufficient prior knowledge is specified. In contrast, if one searches for a regression model of the influences of a set of variables on a selected variable, a regression model will certainly be found (provided there are sufficient data points), specifying which variables influence the target and which do not.
- The algorithms are in some respects cautious. Search algorithms such as FCI and PC , described below, will often say, correctly, that it cannot be determined whether or not a particular variable influences another.
- The algorithms are not just useful guesses. Under explicit assumptions (which often hold at best only approximately), the algorithms are "pointwise consistent"--they converge almost surely to the correct answer. The conditions for this sort of consistency of the seaarch procedures are described in the references. Conventional model search algorithms--stepwise regression, for example--have such guarantees only under very strong prior assumptions about the causal structure.
- The output of the search algorithms provides a variety of indirect information about how much the conclusions of the algorithm can be trusted. They can, for example, be run repeatedly for a variety of specifications of alpha values in their statistical tests, to gain insight about robustness. For search algorithms such as PC, CCD, GES and MimBuild, described below, if the search algorithms return "spaghetti"--a highly connected graph--that indicates the serach cannot determine whether all of the connected variables may be influenced by a common unmeasured cause. If the PC algoirthm returns an edge with two arrowheads, that indicates a latent variable may be acting; if searches other than CCD return graphs with cycles, that indicates the assumptions of the search algoirthm are violated.
- Some of the search procedures are robust against common difficulties in sampling designs--they give correct, but reduced information in such cases. For example, the FCI algorithm allows that there may be unobserved latent common causes of measured variables--or not--and that the sample may have been formed by a process in which the values of measured variables invlucne whether or not a unit is included in the sample (sample selection bias). The CCD algorithm allows that the correct causal structure may be "non-recursive"--essentially a cyclic graphical mode, a folded up time series.
- The output of the algorithms is not an estiamted model with parameter values, but a discription of a class of causall graphs that can explain statistical features of the data considered by the search procedures. That information can be converted by hand into particular graphical models in the form of directed graphs, which can then be estimated by the program and tested.
The search procedures available are named:
- PC --for
searching for Bayes net or SEM
models when it is assumed there is no latent (unrecorded) variable that
contributes to the association of two or more measured variables.
- GES -- for
searching for Bayes net or SEM
models when it
is assumed there is no latent (unrecorded) variable that contributes to
the association of two or more measured variables.
- FCI --which performs a search
similar to PC but allowing that
there may be latent variables.
- CCD--for
searching for non-recursive SEM
models (models of feedback systems using cyclic graphs) without latent
variables
- Build Pure
Clusters--for searching for sets
of variables that share a single latent common cause
- Purify
Clusters--for searching for sets of
variables that share a
single latent common cause
- MimBuild--for
searching for latent
structure from the output of Build Pure Clusters or Purify Clusters
- MB Fan Search -- for searching for Markov blankets DAGs for a given target T over a list of variables <v1,...,vn,T>.
Inputs to the Search Box
There are two possible inputs for a search
algorithm: a data set or a graph. If a graph is input, the program
allows searches the program computes implied independence and
conditional independence relations and allows you to conduct any search
that uses only such constraints--the PC, FCI and CCD algorithms.
Why would you apply a Search procedure to a
model you already know? For a very important reason: The Search
procedures will find the graphical representation of alternative
models to your model that imply the same constraints.
The more usual use of the search algorithms requires a data set as
input. Here is an example.
- Select the Search button.

- Click in the workbench to create a Search
icon.
- Use the Flow Charter button to connect the
Data icon to the Search icon.

- Double-click the Search icon to choose an
search procedure.

Selecting a Search procedure
Tetrad offers the following choices of search
algorithms. For more details about the assumptions and parameters
needed for each algorithm, click in the respective links.
There are two main classes of algorithms. The
first one is designed for general graphs with or without assuming the
possibility of hidden common causes:
- PC algorithm:
this method assumes that there are no hidden common causes between observed
variables in the input (i.e., variables from the data set, or observed variables
in the input graph) and that the graphical structure sought has no cycles.
- FCI algorithm:
this method does not assume that there are no hidden common causes between
observed variables in the input (i.e., variables from the data set, or observed
variables in the input graph); it does assume that the graphical strucutre
sought has no cycles.
- CCD algorithm:
this method assumes there are no hidden common causes; it allows cycles; it
is only correct for discrete variables under a restrictive assumtptions
- GES algorithm:
same assumptions as the PC algorithm, except that this one performs search
by scoring a graph by its asymptotic posterior probability.
The second class concerns algorithms to search
for latent variable structural equation models from data and
background knowledge.
- MIM Build algorithm:
learns the causal relationships among latent variables, when the true (unknown)
data generation process is believed to be a pure measurement/structural model.
- Build Pure Clusters
algorithm: a complement to MIM Build and Purify, this algorithm learns
the causal relationships from latent variables to observed variables, when
the true (unknown) data generation process is believed to be contain a pure
measurement/structural submodel--i.e. a model in which each
- Purify algorithm:
given a measurement model, this method searches for a submodel in which there
are no every measured variable is influenced by one and only one latent variable.
Select the desired algorithm that meets your assumptions from the
Search list. An initial dialog box showing the search parameters you
can set is displayed. The following figure illustrates the one that is
displayed when PC Algorithm is selected.

After the proper parameters are set, if the user checks the box
"Execute searches automatically", the automated search procedure will
start when the OK button is clicked. The respective Help button can be
used to get instructions about that specific algorithm. The next window
displays the result of the procedure, and can also be used to fire new
searches. The following figure illustrates an output for the PC
algorithm.

Inserting background knowledge
Besides the assumptions underlying each
algorithm, another source of constraints that can be used by the search
procedures to narrow down the search and return a more informative
output is making use of background knowledge provided by the user. To see how to specify background knowledge for a search algorithm, see Editing Knowledge.
Assumptions
A search procedure is pointwise consistent if as the sample size
increases without bound, the output of the algorithm converges with
probability 1 to
true information about the data generating structure. For all of the
Tetrad search algorithms, available proofs of pointwise consistency,
assume at least the following:
1. The sample is i.i.d--the probaiblity of any unit in a population
being sampled is the same as any other, and the joint probability
distribution of the variables is the same for all units.
2. The joint probability distribution is locally Markov. In acyclic
cases, this is equivalent to a simpler "global" Markov condition:
that a variable is independent of all variables that are not its
effects conditional on the direct causes of the variable in the causal
graph (its "parents"). In cyclic cases, the local Markov condition has
a related but more technical definition. (See Spirtes, et al., 2000).
3. All of the independence and conditional independence relations in
the joint probability distribution are consequences of the local Markov
condition for the true causal graph.
In addition, various specific search algorithms impose other assumptions. Of course,
the search algorithms may give correct in information when
these assumptions do not strictly hold, and in some cases will do so when they
are grossly violated--the PC algoirthm, for example, will sometimes correctly
identify the presence of unrecorded common causes of recorded variables.