Here we will walk through: (1). How to get the data; (2).
How to map and analyze the spatial data using spatial regression models; (3).
Create spatially lagged variables for all effects in the mediation model (M and
Y pretty much); (4). Run spatial mediation models (SEM).
Notes on software: some choices for software are
habit/utility, you can switch them: (i). I use Stata for SEM/mediation, I
normally use Mplus, R\lavaan would do too, or Onyx or others. (ii). GeoDa is
the only one I know that can generate for you a spatially lagged variable, AND
allow you to save it, then use it in another software for analyses; it also
does mapping the quickest (Stata can do maps nicely too, even over time, see http://bit.ly/covidCT_video ); (iii). To
show 2 variables (‘layers’) at a time in a map, Tableau seems to fastest and
most flexible option (see an example here http://bit.ly/debtct_zip
); we have here however 3 such pairs of variables… and visualizing spatial
mediation overall I don’t think has been done yet by someone (visualizing plain
mediation wasn’t accomplished well either). So let’s proceed with the steps:
1. How to get the data
Download some free data to work with; I chose for this CDC's Social Vulnerability Index
(SVI): free, covering the whole US; you can analyze county level US data, e.g., but with thousands of counties in it, it’s less ‘visible’ what we are doing, and the map is too busy too: I will use CT data, Census Tract level for this (aggregating up to ZIP would be an option); to download it as such, choose from Data Documentation; get BOTH, shape file (Connecticut.zip in my case) and the CSV one.From the many files you need to extract and use 3: *.shp & *.dbf: & *.shx
NOTE: If you have Shape files separately from your own data, you would need to MERGE them, GeoDa can easily do it, you need to make sure you have the SAME spatial/region code in both (e.g. census tract, or ZIP code): a worked example is here.
2. How to map and analyze
the spatial data using spatial regression models
* Get GeoDa (if you haven’t yet), and install it. Open the Shape file, a map will pop up too. A GeoDa intro is here.
2.0. Target a specific theoretical model to investigate,
here we want “racial/ethnic minority -> income -> uninsured” or EP_MINRTY
-> EP_PCI (inc_sqrt) -> EP_UNINSUR
(variables are nicely described in documentation).
2.a. Visualize them first: as a trio, using Scatter plot matrix
- here also one can see regression coefficients,
both-directions, for each pair: there is no ‘correlation’ here, and there is a
good reason: with spatial data a ‘bidirectional’ coefficient like a correlation
is less meaningful, because one needs an ‘effect’ (outcome) and a ‘cause’
(predictor), so one can then add a proper NEEDED spatially lagged variable for
the effect: only then one can talk about a X->Y spatial effect, after
accounting for LagY like: X + Xlag -> Y.
2.b. There are some -999 values in that need to be made
BLANK/delete
One way to handle them is to click in EACH cell and
delete the -999 values: 3 found and deleted.
+ At this point is better to save the project (which carries the table behind it):
2.c. Also, it would be better for estimation ease to rescale income (EP_PCI ) to be e.g. US$10,000’s; but I am not sure how to do it in here: so will get the SQUARE ROOT instead: Go to Table view, Right click and choose Add Variable, type inc_sqrt (e.g.); then repeat and choose ‘Calculator’, and select as Pperator SQUARE ROOT. Btw, these were the variables:
EP_MINRTY
Percentage minority (all persons
except white, non-Hispanic) estimate, 2014-2018 ACS
This calculation resulted in
some division by 0 errors in cases where E_HH equals 0. These rows were revised
with the estimated proportions set to 0 and their corresponding MOEs set to
-999.
EP_UNINSUR
Adjunct variable - Percentage
uninsured in the total civilian noninstitutionalized population estimate,
2014-2018 ACS
EP_PCI
Per capita income estimate,
2014-2018 ACS
2.d. Now we need a ‘weight matrix’ that tells GeoDa what we think counts as a ‘neighbor’; there are some options here, we’ll go with Queen: Go to Tools tab, Choose Weight manager; click on Create; then Select ID variable & choose FIPS; leave Queen contiguity, order of contiguity 1; click Create: a window pops up to SAVE this new file (*.gal) with your name: say CTCensusTracts_queen; Close
2.e. Now one can examine these ‘weights’ or ‘who is neighboring whom’ (will not show here for now).
3. Create spatially lagged
variables for all effects in the mediation model (M and Y pretty much)
3.a. Create spatially lagged variables; lagincom: for inc_sqrt and lagunins: for EP_UNINSUR; no need for a XLag for the cause/predictor, if no variable is pointing into it.
Right click in Table: choose Calculator, choose Spatial lag tab; the new weight matrix is shown up in ‘Weight’ box now; Click on Add variable, enter say ‘lagincom’ then click on ‘Add’. NOW you can define what to compute in it: click in Variable for the target Y (here inc_sqrt); leave ‘Use row-standardized weights’ checked; then done: can click Apply now.
You have now 3 new variables in the data: so SAVE the project.
NOTE: These lagged variables are NOT needed for GeoDa, it
uses them behind your back: they are needed to SAVE and then analyze in a SEM
program.
3.b. Save the new data, and bring in another software for SEM analyses: for spatial mediation.
Before this, run EP_MINRTY -> EP_UNINSUR as Classic
regression; then click on Weight box: and run Spatial Lag, then Spatial Error:
save them.
Tip: try running ‘Classic’ GeoDa regression EP_MINRTY + lagincom -> EP_UNINSUR
and then compare to the ‘Spatial lag’ GeoDa regression
EP_MINRTY -> EP_UNINSUR: any guess what they will show??? (yes, identical results; both GeoDa and Stata of course).
4. Run spatial mediation
models (SEM)
4.a. To save the file, Save As, click on yellow ‘open’ icon, choose ‘Comma Separated Value’ csv; click ‘OK’ give it a name (will see ‘Saved Successfully’ pop up!); now better open CSV in Excel, and save as *.xls first, then import in Stata with Import\Excel
- Stata can handle now for 2 lagged effects in the model,
not only one for the final effect/outcome, as in Geoda, this is the power of
SEM to handle simultaneous effects). So we fit 2 equations:
EP_MINRTY + lagincom
-> inc_sqrt
EP_MINRTY + lagunins
+ inc_sqrt -> EP_UNINSUR
The naïve effect NonWhite-Uninsured is: for 10% points
increase in percent non-White, there is a 1.2% points increase in uninsured
(SE=0.05, t=24.7).
The proper spatially lagged effect however is about ½:
0.55% points increase (SE=0.06, t=9.8).
In SEM, the total effect is a 0.47% points increase
(SE=0.06, t=7.5, differs because we properly also lagged spatially M=income!),
of which:
* 10% is indirect: 0.05% points increase (SE=0.02,
t=6.0),
* and the rest of 90% is direct (residual), 0.42% points
increase (SE=0.07, t=6.0). I paste the Stata code and results:
sem EP_UNINSUR <- EP_MINRTY inc_sqrt lagunins, nocapslatent
(3 observations with missing values excluded)
Endogenous variables
Observed: EP_UNINSUR
Exogenous variables
Observed: EP_MINRTY inc_sqrt lagunins
Fitting target model:
Iteration 0: log likelihood = -12439.076
Iteration 1: log likelihood = -12439.076
Structural equation model Number of obs = 827
Estimation method = ml
Log likelihood = -12439.076
-------------------------------------------------------
| OIM
| Coef. SE z P>|z| [95%C.I.]
-------------------------------------------------------
Structural |
EP_UNINSUR |
EP_MINRTY | .043 .007 6.00 0.000 .029 .057
inc_sqrt | -.010 .003 -2.96 0.003 -.016 -.003
lagunins | .753 .040 19.04 0.000 .676 .831
_cons | 1.892 .799 2.37 0.018 .325 3.459
-------------------------------------------------------
var(e.EP_UNINSUR)| 12.556 .618 11.403 13.827
-------------------------------------------------------
LR test of model vs. saturated: chi2(0) = 0.00, Prob > chi2 = .
* Now full SEm mediation model, with gsem:
gsem (inc_sqrt <- EP_MINRTY lagincom ) (EP_UNINSUR <- EP_MINRTY inc_sqrt lagunins) , nocapslatent
gsem, coeflegend
Generalized structural equation model Number of obs = 828
Response : inc_sqrt Number of obs = 828
Family : Gaussian
Link : identity
Response : EP_UNINSUR Number of obs = 827
Family : Gaussian
Link : identity
Log likelihood = -6130.5325
-------------------------------------------------------
| Coef. Legend
-------------------------------------------------------
inc_sqrt |
EP_MINRTY | -.4799262 _b[inc_sqrt:EP_MINRTY]
lagincom | .8298569 _b[inc_sqrt:lagincom]
_cons | 49.46046 _b[inc_sqrt:_cons]
--------------------------------------------------------
EP_UNINSUR |
inc_sqrt | -.0096035 _b[EP_UNINSUR:inc_sqrt]
EP_MINRTY | .0425863 _b[EP_UNINSUR:EP_MINRTY]
lagunins | .7530868 _b[EP_UNINSUR:lagunins]
_cons | 1.891827 _b[EP_UNINSUR:_cons]
-------------------------------------------------------
var(e.inc_sqrt)| 741.4005 _b[/var(e.inc_sqrt)]
var(e.EP_UNINSUR)| 12.55649 _b[/var(e.EP_UNINSUR)]
-------------------------------------------------------
* IE = indirect effect
nlcom _b[inc_sqrt:EP_MINRTY]*_b[EP_UNINSUR:inc_sqrt]
_nl_1: _b[inc_sqrt:EP_MINRTY]*_b[EP_UNINSUR:inc_sqrt]
-------------------------------------------------------
| Coef. SE z P>|z| [95%C.I.]
-------------------------------------------------------
_nl_1 | .005 .002 2.86 0.004 .001 .008
-------------------------------------------------------
* TE = total effect
nlcom _b[EP_UNINSUR:EP_MINRTY] + _b[inc_sqrt:EP_MINRTY]*_b[EP_UNINSUR:inc_sqrt]
_nl_1: _b[EP_UNINSUR:EP_MINRTY] + _b[inc_sqrt:EP_MINRTY]*_b[EP_UNINSUR:inc_sqrt]
-------------------------------------------------------
| Coef. SE z P>|z| [95%C.I.]
-------------------------------------------------------
_nl_1 | .047 .006 7.51 0.000 .035 .060
-------------------------------------------------------
No comments:
Post a Comment