Sunday, March 22, 2026

Causality focused basic statistics

 

This is an outline of a presentation prepared for the SEM Working Group Meeting 2026, in Warsaw, Poland, 15–17 April 2026. It was developed with colleagues from the University of Medicine, Pharmacy, Sciences and Technology of Târgu Mureș: Ioan-Bogdan Bacos, Manuela Rozalia Gabor, Laura Barcutean & Petru-Alexandru Curta.

    The arguments are old [i] but less known: statistical modeling is not statistical testing, and whereas modeling is done more intuitively graphically, in a structural way, statistical tests are just ‘hammers’ one use for different nails… David Kenny showed [ii] 3.5 decades ago that models can be easily expressed like [iii]

independent variable   ->   dependent variable

He defined a model as “a formal representation of a set of relationships between variables” (there is also Model Theory [iv]).

As Jim Jaccard and Jacob Jacobi have shown [v] (see pic in footnotes), many statistical tests really tackle the same model, commonly some xcont ->  ycont relation (xcont means x is continuous; x01 instead is a binary x; for more causal-focused discussions, go to Tinyurl.com/ONCAUSALITY ).

To make this ‘visible’, we show how to ‘run’ several statistical tests, and that they necessarily have to reach the same conclusion, in terms of the ‘p value’, to what extent we decide/not that a relation is non-null. We share a link to the Copilot.AI chat that implements the technical parts of our illustration (one does not need to ‘know’ software coding in the age of AI…).

     We first generated data in the very flexible and intuitive graphical modeling software Onyx[vi] using a data generating model

ivcont -> xcont -> mcont -> ycont [& xcont ->  ycont]

which saves a csv file; and dichotomized all variables in Excel, around their means, to create binary counterparts; and we compute xbym as the product xcont*ycont: this data will be then read into R and utilized for the demonstration to follow.

We show the model equivalence of the following statistical tests:

STATISTICAL TEST           STRUCTURAL MODEL

(1) t-test                                 for x01 -> y01 (x01 -> ycont similar)

(2) F-test                                 for x01 -> y01 (x01 -> ycont similar)

(3) chi-squared test                 for x01 <-> y01 (cannot run x01 -> ycont)

(4) simple regression; and       for x01 -> y01 (correlation x01 <-> y01; & xcont <-> ycont should reach similar conclusion)

(5) a path model                       x01 -> y01 (x01 -> ycont similar)

And then add a third variable and show that it can play several distinct roles:

(6) A mediator                          xcont -> mcont -> ycont [& xcont -> ycont]

(7) An instrumental variable (IV) model  ivcont -> xcont -> ycont [no ivcont -> ycont path]

(8) Pearl’s mediating IV model   xcont -> mcont -> ycont [no xcont -> ycont]

Beyond this, adding a xcont*ycont interaction term opens up modeling options for ‘causal’ mediation too (a Mplus translation of Tyler Vandweweele’s SAS decomposition code is on SEMNET; AIs can do this now right away, one for R exists already [vii]).

The results of simulation and analyses are:

(1) t-test                     t = 0.39753, df = 95.447, p-value = 0.6919

(2) F-test                    F value 0.158, p-value = 0.692

(3) Chi-squared test   X-squared = 0.16103, df = 1, p-value = 0.6882

(4) Simple regression t value -0.398, Pr(>|t|) = 0.692

(Pearson correlation mirrors the regression findings necessarily t = -0.39757 , df = 98, p-value = 0.6918)

(5) Path model (with lavaan)    z-value -0.402 P(>|z|) = 0.688

Their p-values align [viii]: we would conclude the same thing.

All of them however can be replaced by a ‘walk through’ the path model “x01 -> y01”, using as ‘raw’ data the variances and covariance between the variables. This ‘tracing rule visual estimation’ will replicate the regression and path analysis results, in terms of the actual estimate; the tracing rule does not run statistical significance tests, however.

The effects estimated in R were: Regression: -0.040 & Path analysis: -0.03982

The tracing rule simply leads to the solution

Effect x01 -> y01 = Covariance(x01, y01)/Variance(x01)

which yields the same result: [Tracing rule] -0.04025

For (6)-(8), the codes are in the R appendix r_Poland.txt – all are easy to ‘grab’ with an AI assisting.

The file contains two more ‘free gifts’, dagitty and MIIVsem codes to investigate what ‘statistical adjustments/controls’ have to be done, and nOT done, when focused on specific causal effects of interest.

Of course, each test is better suited for some combination of continuous/categorical pair, e.g. the t-test and the F test and the z-test in the regression model commonly use a continuous outcome (but they work with binary too).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

PROMPT used in Copilot:

Using the notation ivcont, xcont, mcont, ycont, for 4 continuous variables, and x01, m01, y01 for 3 binary variables, generate R code to analyze some of them using the following tests:

(1) t-test for x01 -> y01

(2) F-test for x01 -> y01

(3) chi-squared test for x01 <-> y01

(4) simple regression for x01 -> y01

(5) a path model (with lavaan) x01 -> y01

(6) a mediation model (lavaan)  xcont -> mcont -> ycont [& xcont -> ycont]

(7) a instrumental variable (IV) model (lavaan)  ivcont -> xcont -> ycont [no ivcont -> ycont path]

(8) Pearl’s mediating IV model (lavaan) xcont -> mcont -> ycont [no xcont -> ycont]

[then asked for Pearson correlation for x01 <-> y01]


[i] Robin Beaumont has shown this in 2017 in great detail  SEM equivalent to basic statistical procedures

[ii] Kenny, D. A. (1987). Statistics for the social and behavioral sciences. Posted by author at https://davidakenny.net/doc/statbook/kenny87.pdf  Little, Brown Boston.

[iii] “Research in the behavioral and social sciences often involves testing statistical models.

What Is a Model?

A statistical model is a formal representation of a set of re1ationships between variables. Statistical models contain an outcome variable that is the focus of study. […]

A very simple model is one in which the dependent variable equals a constant plus the residual variable.

dependent variable = constant variable + residual variable

[…]  In simple equation form the model is

dependent variable = effect of the independent variable + residual variable

Instead of expressing the model as an equation, the model could be just as easily specified by a diagram; arrows could be drawn from cause to effect, as follows:

independent variable   ->   dependent variable    <-   residual variable

A representation of a model that uses arrows is called a path diagram.”  (Kenny, 1987), p. 184-5

[iv] Rizza, D. (2025). Model Theory: The Algebraic Basics: Springer.

[v] Jaccard, J., & Jacoby, J. (2009). Theory construction and model-building skills: A practical guide for social scientists: Guilford Press.

[vi] The Onyx steps are simple, Robin Beaumont has a series of trainings on Youtube, see WWW1

[vii] Software choice can be of course expanded at will, see e.g. Python and Stata

[viii] That t, z, F, and chi-squared tests are special cases of eachother and can be mathematically derived one from another, under special conditions, Gemini.AI conformed to us (but you can verify too).

 

 




No comments:

Post a Comment