Friday, December 27, 2024

Run a Spatial lag regression in Excel

 

This is an explanatory note for the ‘How to’ training videos tinyurl.com/intrstats3 (or directly Youtube3 ). The Excel worksheet is posted online at Tinyurl.com/SPATIALSSM  (or directly Dataverse).

The Research Question we start with is simply:

“Do states with more residents in poverty have longer/shorter life expectancies? By how much?”

First, the spoiler: naïve or a-spatial analyses will overestimate the effect, almost always (depending on the extent of spatial ‘excess similarity’ of values in both variables, between neighboring states)[i]. A visual 'proof' is below: neighboring states 'push up/down' their neighbors' values, one variable at a time.




* To run a spatial lag regression in Excel, one needs 2 pieces of distinct data: (1). The 2 variables for the US states; (2). The ‘shape file’ of the US states, i.e. the geographic information systems (GIS) set of files encoding the location of, and boundaries between, the states.

*** The steps involved in this are:

1. Obtain a 49x49 matrix data file marking which states neighbors which other state  

* Find a ‘shape file’ for the US states online: e.g. Census – States level (cb_2018_us_state_20m.zip)

 

   * (to go from the 51 states file to the contiguous 49, use QGIS[ii]) Unzip in a folder, and then in GeoDa (free) open it; in Tools \ Weights Manager \ Create, Select an ID variable, choose an ID (GEOID e.g. or better the 2 letter state abbreviation one), and Contiguity Weight \ Queen Contiguity: what saves then is a *.gal file, in essence a text file: open it with Notepad e.g. to see its structure; for CT e.g., it’s 2 lines: The state name, the total number of its ‘queen contiguity’ neighbors, then on the next line the names of the neighbors

CT 3
NY MA RI

* This is the data we’ll process in Excel to turn into a 49 * 49 matrix, full with 0’s except spots where the column state is the neighbor of the row state (plus, we scale the non-zero numbers to add up to 1: so for CT’s 3 neighbors, each cell gets a .33): all this is shown in the Excel file Poverty_lifeexp_matrix_reg.xlsx in successive worksheets: queen_49Orig, HowTo, ProcessStandardize, STANDimport49b49. This last one will be used to turn a OLS regression into a spatial lag regression: that’s all!

2. Generate the spatial lag variables

* Multiply each variable, a column of 49 rows (a vertical vector) by the standardized weight matrix (in worksheet Reg_Lag): formula is simply MMULT(B2:AX50,P2:P50); the result is the spatial lag derivative of the initial Life Expectancy variable found in P2:P50!

3. Run the spatial lag regression

* Use the mean centered data in columns B, C, and D to run a multiple regression by hand’ in Excel; it merely means implementing (in steps however, the formula entered in 1 chink did not run!) the formula for the beta/regression coefficient found in Greene, p.23 eq. 3-10.

β p +1 x 1  = (X’p+1 x N ·XN x p+1)-1 · X’ p+1 x N ·y N x 1

for p predictors (here p = 2), N = 49 states, X are the predictors, y is the outcome. The “+1” addition is because one would need a ‘vector of 1s’ added in the matrix of predictors, this is added in the Excel.

* What we see is that the naïve β = -0.44, while the proper spatial β = -0.31.  

Conclusion:

States with 10% points more residents in poverty have a lower life expectancy at birth by 3.7 months; naïve analyses would yield instead an inflated (biased up) 5.3 months value.

* Now anyone can run a spatial regression without much fuss; working in Stata or R this can be done quite quickly, but what’s happening behind the scenes would be lost: we unpacked it here.

Some more details:

A. Keeping track of the matching by state is essential: many options exist for this, best in this instance is to use the 2 letter state abbreviation, and keep checking whether one messes the order or not, at each step: copy and paste alongside the columns to check. Alternatively, Excel can also do ‘matching’, see e.g. WWW.

          * For larger files, like the ~3,080 US counties, or the ~65,000 US census tracts, this ‘by-hand’ process becomes a little cumbersome (Excel could still do it… ), so other automated options are recommended: Stats’s sp module is a simple and instructive one: see Chuck Huber’s ‘how to’ blog posting. See also Di Liu’s post.

B. Checking the results can be done in GeoDa straight away: see Luc Anselin’s Guide (a PDF here)

* There are two ways to check this in GeoDa:  B.a. Run a Classic Regression, then a Spatial Lag (with Weight File defined); B.b. Create a spatial lag variable using the Calculator \ Spatial lag option.

  

C. Accounting for the spatial ‘auto’-correlation is much like accounting for the prior time values – where the true meaning of ‘auto’ comes from: prior values of the same variable are the main ‘driver’ of current values; one can easily a prior time (=time lag) outcome as co-predictor too, along with the spatial lag co-predictor.

*******Additional resources****************************

Some books to refer to when needing stats reviewing/reminding

* Kenny, D. A. (1987). Statistics for the social and behavioral sciences: Little, Brown Boston.

* Greene_2002_Econometric Analysis

*Reference cited**

Cameron, A., & Trivedi, P. (2009). Microeconometrics Using Stata. College Station, TX: Stata Press.

Footnotes:


[i] This is commonly called ‘nonindependence’ or less intuitively ‘auto’-correlation, even though the concept applies to 1 variable only: % poverty exhibits this, and separately life expectancy exhibits it too, the extent of is is given (commonly) by Moran’s I, which is ‘kind of’ a correlation, meaning theoretically ranging from -1 to +1. At least two features however makes it pretty different: (1). Its ‘null’ (no non-independence…) value is not 0, but ; (2). The ‘what correlates with what’ is less visible, economists call it more properly “correlated observations”, E(yi, yj) = 0, see (Cameron & Trivedi, 2009), p. 81 .

[ii] Handling 'shape files’ to delete unwanted regions, and for ‘joining’ and other operations, can be best done in QGIS; this is another task, see e.g. WWW.

Sunday, December 22, 2024

Intro to Statistics only in Excel

 

This is an explanatory note for the ‘How to’ training videos tinyurl.com/intrstats1 (or directly Youtube1 ) & tinyurl.com/intrstats2  (or Youtube2 ).

I provide details to assist in answering some research questions (RQ), using simulated data, with several basic statistical tests: chi-square test (then McNemar) and t-test, for ‘independent’ and ‘dependent samples’. The Excel worksheet is posted online at Tinyurl.com/101statsexcel  (or Osf.Io ).  The RQs are motivated by a study on weight loss, whose data is posted also online at Dataverse  (Coman, 2024) and center around body mass index (weight), Hemoglobin A1c (blood glucose), gender, and time. I asked: RQ1: Are there more overweight males than females? (& RQ.1.b. Research Question 1: Do males and females differ in body mass index?); RQ.2. Research Question 2: Does BMI levels change?; RQ.3. Research Question 1: Is the level of HgA1c predicted by BMI?

These RQs invite directly analyses best equipped to answer them. [i]

All these tests merely compare: differences, against some standard reference level of similarity (no-difference):

1. Do cases (persons/patients) differ in their 1 variable only value, say BMI? They may, they may not: if all had the same BMI, there would be nothing to explain. If ½ of the sample seem to have somewhat similar high BMI values, another ½ some somewhat similar low BMI values, the differences are mainly between the low and high ‘clusters’ (we may have 2 classes of folks, and within-class differences are rather small-ish, compared to the between-classes).

1.a. These questions beat around a causal bush, to be honest: differences in BMI are of interest mostly because of the obesity epidemic in several countries, so what we truly want to know is not just ‘what explains differences in male BMI’, but what determines John’s BMI and Jake’s BMI, so that we can tell John to exercise 30 min/day and tell Jake to exercise 45 min/day (whatever comes out from analyses), if they want to drop their BMI by some 5 kg/m2 (the unit for BMI).

1.b. Eventually, this ‘what drives differences’ knowledge is needed for another practical (and causal) inquiry: How much average weight loss would prevent say half of those who (are not yet now, but) would become diabetic in a year, to actually become diabetic? 

2. Are ‘these folks’ different from ‘those folks’ (diabetic vs. ‘normal’) in terms of something else, like weight (BMI)?

This “2 variable” question can take on different ‘shapes’ depending on how we ‘carve out’ each variable: from a ‘both continuous’ first step, we can look at a graph like below, where each diamond is a person, and split it into 2 halves, either vertically, or horizontally, or into 4 quadrants, using some ‘arbitrary’ lines (in our case set at the sample means).


2.a. If we ignore where diamonds sit in each quadrant, and just compare the 4 ‘groups’ of folks, we fall back on a 2-categorical variables RQ setup: this is handled in the Excel we work through in the Youtube-Training-1 in the “Are there more overweight males than females?” section.[ii]

*** Note that a 2x1 table of counts (of the 2 combinations (0,.), and (1,.) of normal/over-weight) in which one instead enters the means of the other variable, HgA1c here, turns the data into a format ripe for a comparison of means line of questioning: a t-test of independent samples would fit here like a glove (the one-way Anova test will yield identical results)!

*** Also note that, if we add a 3rd variable in a 2x2 table (like normal/over-weight and normal/diabetic), say blood pressure, in the form of the mean of each cross-group, one ends up with a two-way Anova structure in which there are 2 ‘main effects’ on blood pressure [iii].

2.b. The 2 continuous variables shown in the scatter plot invite questions of ‘going hand-in-hand’: are there most of the folks situated in the Low&Low (0,0) and High&High (1,1) quadrants, with only a few in the other 2? Then we have a positive relation; if we push this mental exercise to placing ALL the diamonds on a straight line (at a 45 degree angle), the 2 variables become identical [iv].

*** We show how to run a simple linear regression analysis using Excel’s ‘powers’, but also how to run a multiple regression, using Excel’s matrix multiplication (and Greene’s formulas, p. 23, eq. 3-10).

Some cold showers:

A. The statistical tests themselves are related, and each ‘falls back’ on another under some limiting constraints [v]; they also rest on specific assumptions, which can/not be relaxed handily (e.g. equality of variances in t-tests); better way to ‘open up the black box’ of such mathematical straight jackets is to model all ‘parts’ flexibly, e.g. in multiple-group structural equation models (SEM), like in this article (Coman et al., 2014). 

B. Using mathematical formulas to derive specific estimates (e.g. the standard error of the mean difference) can only take us so far: statistics is not as exact as arithmetic/algebra[vi].

*Additional resources**

Some books to refer to when needing stats reviewing/reminding

* Devore 2016 Probability and Statistics for Engineering andthe Sciences

* Kenny, D. A. (1987). Statistics for the social and behavioral sciences: Little, Brown Boston.

* Hernán MA, Robins JM (2019). Causal Inference. Boca Raton: Chapman & Hall/CRC. (SAS , Stata R, Python)

* Greene_2002_EconometricAnalysis

* Barreto, H., & Howland, F. (2005). IntroductoryEconometrics: Using Monte Carlo Simulation with Microsoft Excel

 *References cited**

Coman, E. (2024). Data and appendix for: "Restructuring basic statistical curricula: mixing older analytic methods with modern software tools in psychological research. Retrieved from: https://doi.org/10.7910/DVN/QDXM7U

Coman, E. N., Iordache, E., Dierker, L., Fifield, J., Schensul, J. J., Suggs, S., & Barbour, R. (2014). Statistical power of alternative structural models for comparative effectiveness research: advantages of modeling unreliability https://pubmed.ncbi.nlm.nih.gov/26640421/. Journal of Modern Applied Statistical Methods, 13(1), 71-90.

Stevens, J. (2009). Applied multivariate statistics for the social sciences: Lawrence Erlbaum.

Footnotes:

[i] Note that one puts the horse behind the carriage when “dichotomizing a continuous variable then using statistical tests for a categorical variable”! One in fact either asks the question in a continuous framework (Does BMI differ between biological genders?) OR in a categorical framework (Are there more/less overweight persons among males vs/ females?) It is the RQ that should trigger transforming a variable, not the search for a convenient analytic model. The ‘what does overweight mean?’ additional research question is buried when one gallantly splits a continuous variable around some convenient value, like the sample mean: for some specific variables, like HgA1c this becomes essential: what HgA1c qualifies a patient as ‘diabetic’? (i.e. “ When does diabetes ‘comes into existence’?”)

[ii] Note that we used ‘biological gender’ here, where we could have used ‘diabetic vs. not’, just to give more weight to this ‘categorical’ variable meaning: biological gender itself however can be conceptualized as a continuous measure, and it has been, in cases where the gender assignment is questioned (like this tennis example, or the recent Olympics boxing controversy, see ‘unspecified gender eligibility tests’).

[iii] More generally, Anova is a special case of the log linear model where the cell frequencies are replaced by the cell means of a third variable (see (Stevens, 2009), ch.14 Categorical Data Analysis: The Log Linear Model).

[iv] This is the end point of the problem called multi-collinearity: we use two variables in statistical models, but unbeknownst to us, they are correlated 1.0, i.e. one is a linear combination of the other one, so we don’t have 2, but 1!.

[v] * t and z test are equivalent for samples n>30; t uses sample variance, z needs population variance (www);

* F = t2: If you square a t-statistic, you get an F-statistic with 1 degree of freedom in the numerator (www1 & www2).

* When the denominator degrees of freedom in an F-statistic become very large, the F-distribution approaches a chi-square distribution: chi-squared = (numerator degrees of freedom) * F (www).

[vi] in math 1 ≠ 2, ever, while statistically, sometimes 1 = 2 can ‘happen’:  if 1 and 2 represent the  means of say $cash boys and girls in a classroom have on them, we may conclude they ‘have the same amounts of cash’, depending on the variability of individual values (if the second mean is within 1.96 standard errors of the other mean. The t-test formula for 2 independent samples means is t = (mean1 - mean2) / sqrt((sd1^2/n1) + (sd2^2/n2)), where sd1 and sd2 are the 2 standard deviations; for say 10 boys and 10 girls, with sd1 = 1.2 and sd2 = 1.2, t = 1/ 0.536656315  = 1.863389981, which is smaller than the 1.96 value that corresponds to a very small chance (<.05) of observing such a difference between the sample means, if the two population means were in fact equal (‘null’ hypothesis): we hence cannot reject the ‘null’ so 1 and 2 are statistically (significantly) indistinguishable.

Monday, February 15, 2021

Spatial mediation demonstration

This posting is a ‘how&why to’ analyze spatial data (census tract, ZIP, town, county levels, e.g.) in the more flexible structural equation modeling (SEM) manner. The technical details will be hinted at, this is a pretty specialized areas, so much so that has been branded with a solid name (spatial econometrics).

Here we will walk through: (1). How to get the data; (2). How to map and analyze the spatial data using spatial regression models; (3). Create spatially lagged variables for all effects in the mediation model (M and Y pretty much); (4). Run spatial mediation models (SEM).

Notes on software: some choices for software are habit/utility, you can switch them: (i). I use Stata for SEM/mediation, I normally use Mplus, R\lavaan would do too, or Onyx or others. (ii). GeoDa is the only one I know that can generate for you a spatially lagged variable, AND allow you to save it, then use it in another software for analyses; it also does mapping the quickest (Stata can do maps nicely too, even over time, see http://bit.ly/covidCT_video ); (iii). To show 2 variables (‘layers’) at a time in a map, Tableau seems to fastest and most flexible option (see an example here http://bit.ly/debtct_zip ); we have here however 3 such pairs of variables… and visualizing spatial mediation overall I don’t think has been done yet by someone (visualizing plain mediation wasn’t accomplished well either). So let’s proceed with the steps:

1. How to get the data

Download some free data to work with; I chose for this  CDC's Social Vulnerability Index

(SVI): free, covering the whole US; you can analyze county level US data, e.g., but with thousands of counties in it, it’s less ‘visible’ what we are doing, and the map is too busy too: I will use CT data, Census Tract level for this (aggregating up to ZIP would be an option); to download it as such, choose from Data Documentation; get BOTH, shape file (Connecticut.zip in my case) and the CSV one.

From the many files you need to extract and use 3: *.shp & *.dbf: & *.shx 

NOTE: If you have Shape files separately from your own data, you would need to MERGE them, GeoDa can easily do it, you need to make sure you have the SAME spatial/region code in both (e.g. census tract, or ZIP code): a worked example is here.

2. How to map and analyze the spatial data using spatial regression models

* Get GeoDa (if you haven’t yet), and install it. Open the Shape file, a map will pop up too. A GeoDa intro is here.   




2.0. Target a specific theoretical model to investigate, here we want “racial/ethnic minority -> income -> uninsured” or EP_MINRTY -> EP_PCI (inc_sqrt)  -> EP_UNINSUR (variables are nicely described in documentation).

2.a. Visualize them first: as a trio, using Scatter plot matrix 



- this shows off-limit values, and one can click on the dots in the scatterplots, and bring up BOTH the table and the map to see ‘who’ are those offending cases, and why.

- here also one can see regression coefficients, both-directions, for each pair: there is no ‘correlation’ here, and there is a good reason: with spatial data a ‘bidirectional’ coefficient like a correlation is less meaningful, because one needs an ‘effect’ (outcome) and a ‘cause’ (predictor), so one can then add a proper NEEDED spatially lagged variable for the effect: only then one can talk about a X->Y spatial effect, after accounting for LagY like: X + Xlag -> Y.

2.b. There are some -999 values in that need to be made BLANK/delete

One way to handle them is to click in EACH cell and delete the -999 values: 3 found and deleted.

+ At this point is better to save the project (which carries the table behind it):


2.c. Also, it would be better for estimation ease to rescale income (EP_PCI  ) to be e.g. US$10,000’s; but I am not sure how to do it in here: so will get the SQUARE ROOT instead: Go to Table view, Right click and choose Add Variable, type inc_sqrt (e.g.); then repeat and choose ‘Calculator’, and select as Pperator SQUARE ROOT. Btw, these were the variables:


EP_MINRTY

Percentage minority (all persons except white, non-Hispanic) estimate, 2014-2018 ACS

This calculation resulted in some division by 0 errors in cases where E_HH equals 0. These rows were revised with the estimated proportions set to 0 and their corresponding MOEs set to -999.

EP_UNINSUR

Adjunct variable - Percentage uninsured in the total civilian noninstitutionalized population estimate, 2014-2018 ACS

EP_PCI

Per capita income estimate, 2014-2018 ACS

2.d. Now we need a ‘weight matrix’ that tells GeoDa what we think counts as a ‘neighbor’; there are some options here, we’ll go with Queen: Go to Tools tab, Choose Weight manager; click on Create; then Select ID variable & choose FIPS; leave Queen contiguity, order of contiguity 1; click Create: a window pops up to SAVE this new file (*.gal) with your name: say CTCensusTracts_queen; Close

Can close this window now.


2.e. Now one can examine these ‘weights’ or ‘who is neighboring whom’ (will not show here for now).

3. Create spatially lagged variables for all effects in the mediation model (M and Y pretty much)

3.a. Create spatially lagged variables; lagincom: for inc_sqrt and lagunins: for EP_UNINSUR; no need for a XLag for the cause/predictor, if no variable is pointing into it.  

Right click in Table: choose Calculator, choose Spatial lag tab; the new weight matrix is shown up in ‘Weight’ box now; Click on Add variable, enter say ‘lagincom’ then click on ‘Add’. NOW you can define what to compute in it: click in Variable for the target Y (here inc_sqrt); leave ‘Use row-standardized weights’ checked; then done: can click Apply now.  




You have now 3 new variables in the data: so SAVE the project.


NOTE: These lagged variables are NOT needed for GeoDa, it uses them behind your back: they are needed to SAVE and then analyze in a SEM program.

3.b. Save the new data, and bring in another software for SEM analyses: for spatial mediation.

Before this, run EP_MINRTY -> EP_UNINSUR as Classic regression; then click on Weight box: and run Spatial Lag, then Spatial Error: save them.

Tip: try running ‘Classic’ GeoDa regression EP_MINRTY +  lagincom -> EP_UNINSUR

and then compare to the ‘Spatial lag’ GeoDa regression

EP_MINRTY -> EP_UNINSUR: any guess what they will show???  (yes, identical results; both GeoDa and Stata of course). 

4. Run spatial mediation models (SEM)

4.a. To save the file, Save As, click on yellow ‘open’ icon, choose ‘Comma Separated Value’ csv; click ‘OK’ give it a name (will see ‘Saved Successfully’ pop up!); now better open CSV in Excel, and save as *.xls first, then import in Stata with Import\Excel

- Stata can handle now for 2 lagged effects in the model, not only one for the final effect/outcome, as in Geoda, this is the power of SEM to handle simultaneous effects). So we fit 2 equations:

EP_MINRTY +  lagincom -> inc_sqrt

EP_MINRTY +  lagunins + inc_sqrt -> EP_UNINSUR

 Quick results (more results table available) are:

The naïve effect NonWhite-Uninsured is: for 10% points increase in percent non-White, there is a 1.2% points increase in uninsured (SE=0.05, t=24.7).

The proper spatially lagged effect however is about ½: 0.55% points increase (SE=0.06, t=9.8).

In SEM, the total effect is a 0.47% points increase (SE=0.06, t=7.5, differs because we properly also lagged spatially M=income!), of which:

* 10% is indirect: 0.05% points increase (SE=0.02, t=6.0),

* and the rest of 90% is direct (residual), 0.42% points increase (SE=0.07, t=6.0). I paste the Stata code and results: 

* Y <- X M lag [with sem]:
sem EP_UNINSUR <- EP_MINRTY inc_sqrt lagunins, nocapslatent
(3 observations with missing values excluded)
Endogenous variables
Observed:  EP_UNINSUR
Exogenous variables
Observed:  EP_MINRTY inc_sqrt lagunins
Fitting target model:
Iteration 0:   log likelihood = -12439.076
Iteration 1:   log likelihood = -12439.076
Structural equation model                       Number of obs     =        827
Estimation method  = ml
Log likelihood     = -12439.076
-------------------------------------------------------
                 |                 OIM
                 |      Coef. SE      z    P>|z|       [95%C.I.]
-------------------------------------------------------
Structural       |
  EP_UNINSUR     |
       EP_MINRTY |   .043   .007     6.00   0.000     .029    .057
        inc_sqrt |  -.010   .003    -2.96   0.003    -.016   -.003
        lagunins |   .753   .040    19.04   0.000     .676    .831
           _cons |  1.892   .799     2.37   0.018     .325   3.459
-------------------------------------------------------
var(e.EP_UNINSUR)|   12.556   .618                   11.403   13.827
-------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .
* Now full SEm mediation model, with gsem:
gsem (inc_sqrt <- EP_MINRTY lagincom ) (EP_UNINSUR <- EP_MINRTY inc_sqrt lagunins) , nocapslatent
gsem, coeflegend
Generalized structural equation model           Number of obs     =        828
Response       : inc_sqrt                       Number of obs     =        828
Family         : Gaussian
Link           : identity
Response       : EP_UNINSUR                     Number of obs     =        827
Family         : Gaussian
Link           : identity
Log likelihood = -6130.5325
-------------------------------------------------------
                  |      Coef.  Legend
-------------------------------------------------------
inc_sqrt          |
        EP_MINRTY |  -.4799262  _b[inc_sqrt:EP_MINRTY]
         lagincom |   .8298569  _b[inc_sqrt:lagincom]
            _cons |   49.46046  _b[inc_sqrt:_cons]
--------------------------------------------------------
EP_UNINSUR        |
         inc_sqrt |  -.0096035  _b[EP_UNINSUR:inc_sqrt]
        EP_MINRTY |   .0425863  _b[EP_UNINSUR:EP_MINRTY]
         lagunins |   .7530868  _b[EP_UNINSUR:lagunins]
            _cons |   1.891827  _b[EP_UNINSUR:_cons]
-------------------------------------------------------
   var(e.inc_sqrt)|   741.4005  _b[/var(e.inc_sqrt)]
 var(e.EP_UNINSUR)|   12.55649  _b[/var(e.EP_UNINSUR)]
-------------------------------------------------------
*       IE = indirect effect
nlcom _b[inc_sqrt:EP_MINRTY]*_b[EP_UNINSUR:inc_sqrt]
       _nl_1:  _b[inc_sqrt:EP_MINRTY]*_b[EP_UNINSUR:inc_sqrt]
-------------------------------------------------------
             |      Coef. SE       z      P>|z|      [95%C.I.]
-------------------------------------------------------
       _nl_1 |    .005   .002     2.86   0.004     .001   .008
-------------------------------------------------------
 
*       TE = total effect
nlcom _b[EP_UNINSUR:EP_MINRTY] + _b[inc_sqrt:EP_MINRTY]*_b[EP_UNINSUR:inc_sqrt]
       _nl_1:  _b[EP_UNINSUR:EP_MINRTY] + _b[inc_sqrt:EP_MINRTY]*_b[EP_UNINSUR:inc_sqrt]
-------------------------------------------------------
             |      Coef. SE      z     P>|z|     [95%C.I.]
-------------------------------------------------------
       _nl_1 |   .047   .006     7.51   0.000     .035    .060
-------------------------------------------------------