EvaluationHelp: Causality focused basic statistics

This is an outline of a presentation prepared for the SEM Working Group Meeting 2026, in Warsaw, Poland, 15–17 April 2026. It was developed with colleagues from the University of Medicine, Pharmacy, Sciences and Technology of Târgu Mureș: Ioan-Bogdan Bacos, Manuela Rozalia Gabor, Laura Barcutean & Petru-Alexandru Curta. A video walking through the process is at Tinyurl.com/STATSCAUSAL2

The arguments are old [i] but less known: statistical modeling is not statistical testing, and whereas modeling is done more intuitively graphically, in a structural way, statistical tests are just ‘hammers’ one use for different nails… David Kenny showed [ii] 3.5 decades ago that models can be easily expressed like [iii]

independent variable -> dependent variable

He defined a model as “a formal representation of a set of relationships between variables” (there is also Model Theory [iv]).

As Jim Jaccard and Jacob Jacobi have shown [v] (see pic in footnotes), many statistical tests really tackle the same model, commonly some xcont -> ycont relation (xcont means x is continuous; x01 instead is a binary x; for more causal-focused discussions, go to Tinyurl.com/ONCAUSALITY ).

To make this ‘visible’, we show how to ‘run’ several statistical tests, and that they necessarily have to reach the same conclusion, in terms of the ‘p value’, to what extent we decide/not that a relation is non-null. We share the link to the ChatGPT.AI chat that implements the technical parts of our illustration (one does not need to ‘know’ software coding in the age of AI… Claude even draws the models for you, see all the way down, but their link cannot be opened publicly: Claude.AI ).

We first generated data in the very flexible and intuitive graphical modeling software Onyx[vi] using a data generating model
ivcont -> xcont -> mcont -> ycont [& xcont -> ycont]
which saves a csv file; and dichotomized all variables in Excel, around their means, to create binary counterparts; and we compute xbym as the product xcont*ycont: this data will be then read into R and utilized for the demonstration to follow.
We show the model equivalence of the following statistical tests:
STATISTICAL TEST STRUCTURAL MODEL
(1) t-test for x01 -> y01 (x01 -> ycont similar)
(2) F-test for x01 -> y01 (x01 -> ycont similar)
(3) chi-squared test for x01 <-> y01 (cannot run x01 -> ycont)
(4) simple regression; and for x01 -> y01 (correlation x01 <-> y01; & xcont <-> ycont should reach similar conclusion)
(5) a path model x01 -> y01 (x01 -> ycont similar)

And then add a third variable and show that it can play several distinct roles:
(6) A mediator xcont -> mcont -> ycont [& xcont -> ycont]
(7) An instrumental variable (IV) model ivcont -> xcont -> ycont [no ivcont -> ycont path]
(8) Pearl’s mediating IV model xcont -> mcont -> ycont [no xcont -> ycont]
Beyond this, adding a xcont*ycont interaction term opens up modeling options for ‘causal’ mediation too (a Mplus translation of Tyler Vandweweele’s SAS decomposition code is on SEMNET; AIs can do this now right away, one for R exists already [vii]).

The results of simulation and analyses are:
(1) t-test t = 0.39753, df = 95.447, p-value = 0.6919
(2) F-test F value 0.158, p-value = 0.692
(3) Chi-squared test X-squared = 0.16103, df = 1, p-value = 0.6882
(4) Simple regression t value -0.398, Pr(>|t|) = 0.692
(Pearson correlation mirrors the regression findings necessarily t = -0.39757 , df = 98, p-value = 0.6918)
(5) Path model (with lavaan) z-value -0.402 P(>|z|) = 0.688

Their p-values align [viii]: we would conclude the same thing.

   All of them however can be replaced by a ‘walk through’ the path model “x01 -> y01”, using as ‘raw’ data the variances and covariance between the variables. This ‘tracing rule visual estimation’ will replicate the regression and path analysis results, in terms of the actual estimate; the tracing rule does not run statistical significance tests, however.
   The effects estimated in R were: Regression: -0.040 & Path analysis: -0.03982
   The tracing rule simply leads to the solution

Effect x01 -> y01 = Covariance(x01, y01)/Variance(x01)

which yields the same result: [Tracing rule] -0.04025
   For (6)-(8), the codes are in the R appendix r_Poland.txt – all are easy to ‘grab’ with an AI assisting.
   The file contains two more ‘free gifts’, dagitty and MIIVsem codes to investigate what ‘statistical adjustments/controls’ have to be done, and nOT done, when focused on specific causal effects of interest.
   Of course, each test is better suited for some combination of continuous/categorical pair, e.g. the t-test and the F test and the z-test in the regression model commonly use a continuous outcome (but they work with binary too).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PROMPT used in Copilot:
Using the notation ivcont, xcont, mcont, ycont, for 4 continuous variables, and x01, m01, y01 for 3 binary variables, generate R code to analyze some of them using the following tests:
(1) t-test for x01 -> y01
(2) F-test for x01 -> y01
(3) chi-squared test for x01 <-> y01
(4) simple regression for x01 -> y01
(5) a path model (with lavaan) x01 -> y01
(6) a mediation model (lavaan) xcont -> mcont -> ycont [& xcont -> ycont]
(7) a instrumental variable (IV) model (lavaan) ivcont -> xcont -> ycont [no ivcont -> ycont path]
(8) Pearl’s mediating IV model (lavaan) xcont -> mcont -> ycont [no xcont -> ycont]
[then asked for Pearson correlation for x01 <-> y01]

[i] Robin Beaumont has shown this in 2017 in great detail SEM equivalent to basic statistical procedures

[ii] Kenny, D. A. (1987). Statistics for the social and behavioral sciences. Posted by author at https://davidakenny.net/doc/statbook/kenny87.pdf Little, Brown Boston.

[iii] “Research in the behavioral and social sciences often involves testing statistical models.

What Is a Model?

A statistical model is a formal representation of a set of re1ationships between variables. Statistical models contain an outcome variable that is the focus of study. […]

A very simple model is one in which the dependent variable equals a constant plus the residual variable.

dependent variable = constant variable + residual variable

[…] In simple equation form the model is

dependent variable = effect of the independent variable + residual variable

Instead of expressing the model as an equation, the model could be just as easily specified by a diagram; arrows could be drawn from cause to effect, as follows:

independent variable -> dependent variable <- residual variable

A representation of a model that uses arrows is called a path diagram.” (Kenny, 1987), p. 184-5

[iv] Rizza, D. (2025). Model Theory: The Algebraic Basics: Springer.

[v] Jaccard, J., & Jacoby, J. (2009). Theory construction and model-building skills: A practical guide for social scientists: Guilford Press.

[vi] The Onyx steps are simple, Robin Beaumont has a series of trainings on Youtube, see WWW1

[vii] Software choice can be of course expanded at will, see e.g. Python and Stata

[viii] That t, z, F, and chi-squared tests are special cases of eachother and can be mathematically derived one from another, under special conditions, Gemini.AI conformed to us (but you can verify too).

# R CODE FOR ALL STEPS: first read data
cont01s <- read.csv("C:\\\\data\\\\4vars.iv.med.2.csv")

view(cont01s) ## view the data in a separate insert window
names(cont01s) ## view the variables in the data
# xcont mcont ycont ivcont xbym x01 m01 y01 iv01

# (1) t-test: x01 -> y01
# Compare mean of y01 across levels of x01 (both 0/1)
t.test(y01 ~ x01, data = cont01s)
# (x01 -> ycont similar) t.test(y01 ~ x01, data = cont01s)

# (2) F-test: x01 -> y01
# One-way ANOVA (equivalent to regression F-test for binary x01)
fit_aov <- aov(y01 ~ x01, data = cont01s)
summary(fit_aov)
# (x01 -> ycont similar)

# (3) chi-squared test: x01 <-> y01
# Treat both as categorical
tab_xy <- table(cont01s$x01, cont01s$y01)
chisq.test(tab_xy, correct = FALSE)
# (cannot run x01 -> ycont)

# (4) simple regression: x01 -> y01
fit_lm <- lm(y01 ~ x01, data = cont01s)
summary(fit_lm)
# (x01 -> ycont similar)

### # Pearson correlation for two binary variables
cor(cont01s$x01, cont01s$y01, method = "pearson")
#This will return the correlation, confidence interval, and p‑value
cor.test(dat$x01, dat$y01, method = "pearson")
# (x01 -> ycont similar)

# (5) path model (lavaan): x01 -> y01
install.packages("lavaan")
library(lavaan)

model_path <- '
y01 ~ x01
'

fit_path <- sem(model_path, data = cont01s)
summary(fit_path, standardized = FALSE, fit.measures = FALSE)

# (6) mediation model (lavaan):
# xcont -> mcont -> ycont, plus direct xcont -> ycont

model_med <- '
# Regressions
mcont ~ a * xcont
ycont ~ b * mcont + c_prime * xcont

# Indirect, direct, total effects
ind := a * b
direct := c_prime
total := ind + direct
'

fit_med <- sem(model_med, data = cont01s)
summary(fit_med, standardized = FALSE, fit.measures = FALSE)

3# (7) IV model (lavaan):
# ivcont -> xcont -> ycont, no direct ivcont -> ycont

model_iv <- '
# First stage
xcont ~ a * ivcont

# Second stage
ycont ~ b * xcont

# (No ycont ~ ivcont path)

# Indirect effect of ivcont on ycont via xcont
iv_ind := a * b
'

fit_iv <- sem(model_iv, data = cont01s)
summary(fit_iv, standardized = FALSE, fit.measures = FALSE)

# (8) Pearl’s mediating IV-style model (lavaan):
# xcont -> mcont -> ycont, no direct xcont -> ycont

model_pearl <- '
# Regressions
mcont ~ a * xcont
ycont ~ b * mcont # no xcont -> ycont path

# Indirect effect only
ind := a * b
'

fit_pearl <- sem(model_pearl, data = cont01s)
summary(fit_pearl, standardized = FALSE, fit.measures = FALSE)

# (8.a) MIIVsem
install.packages("dagitty")
install.packages("MIIVsem")
library("dagitty")
library("MIIVsem")

m_iv2 <- '
# IV part
xcont ~ ivcont

# m part
mcont ~ xcont

# y part
ycont ~ xcont + mcont
'
# This lists for each Y<-X the IV(s) needed as Y X IV1 IV2 etc
# Uses a model-implied instrumental variable (MIIV) search
miivs(m_iv2)
#
miive(m_iv2 , cont01s)

# (8.b) MIIVsem mediation
m_med1 <- '
# m part
mcont ~ xcont

# y part
ycont ~ xcont + mcont
'
miivs(m_med1 )
# Interpretation
# LHS RHS MIIVs

# mcont xcont xcont
# For mcont<-xcont one would need to use as IV xcont

# ycont xcont, mcont mcont, xcont
# For ycont<-xcont one would need to use as IV mcont
# For ycont<-mcont one would need to use as IV xcont

# Estimates using two stage least squares (2SLS)
miive(m_iv2 , cont01s)

# (8.b.d.) dagitty for Mediation
xymed <- dagitty('dag {
mcont [pos="1.1,1"]
xcont [pos="1,1.1"]
ycont [pos="1.2,1.1"]
xcont -> ycont
mcont -> ycont
xcont -> mcont
}')
plot(xymed)

adjustmentSets( xymed, "xcont", "ycont", type="all" ) ## none should be
# adjusting for the mediator gives you the direct effect, not the total effect.
adjustmentSets( xymed, "xcont", "ycont", effect="direct" ) ## should be the mediator
# !! { mcont }
adjustmentSets( xymed, "xcont", "ycont", effect="total" ) ## should be {}

# (8.b) MIIVsem nonrecursive/feedback/cyclical
# Nonrecursive
m_nonrec <- '
ycont ~ xcont + mcont
xcont ~ ycont + ivcont
'
miivs(m_nonrec)
miive(m_nonrec , cont01s)

# (8.b) dagitty
library("dagitty")
## This can generate data according to the model, use some classic parameter values

xydag <- dagitty('dag {
ivcont [pos="1,1"]
mcont [pos="1,1.5"]
xcont [pos="1.5,1"]
ycont [pos="1.5,1.5"]
xcont -> ycont
mcont -> ycont
ycont -> xcont
ivcont -> xcont
}')

plot(xydag)

adjustmentSets( xydag, "xcont", "ycont", type="all" )
adjustmentSets( xydag, "ycont", "xcont", type="all" )

> adjustmentSets( xydag, "x", "y", type="all" )
{}
{}

EvaluationHelp

Sunday, March 22, 2026

Causality focused basic statistics

No comments:

Post a Comment

About Me

Blog Archive