WEEK 10:
STATISTICAL INFERENCE
We use public opinion polls a lot, because we don’t have enough
money like the Census does to talk to everyone. Even with a randomly drawn
sample, our estimates of public opinion will not be as exact as if we had
interviewed everyone in the population. Statistical inference measures tell us
something very simple. They tell us whether whatever relationship between two
variables exists in the poll, does it also exist in the population. So, if you
find that women are more liberal than men in your poll, does that relationship
also exist in the population? Statistical tools are limited in the information that
they provide in the sense that they are just looking for whether or not any
relationship of any strength exists in the population. So, for example you
might find that in the sample, women vote 5% more Democratic than do men. If
your findings are statistically significant, that just means that the direction
of this relationship exists in the entire population. But that percentage
difference in the population might be only 1%; or it might be 10%. Either way, the statistics we
use (chi-squared, t-tests) would say that your results are statistically
significant. Because of the limited info these statistics provide, some
researchers do not find them very valuable, and we are not talking about them until
the tail end of the course. They are important, however, because they must be
reported in all of your tables and mentioned in the text of your paper. And
they do at least tell you that your relationship is not limited to your sample,
but that it exists in the entire population.
To
recap: Statistical inference is our ability to generalize a relationship found
in a sample to the entire population from which that sample was drawn. That is,
can we infer population characteristics from sample data. If our statistical
inference test suggests that in the population the relationship between the two
variables is nonrandom, the relationship is said to be statistically
significant.
One
measure of statistical inference is Chi Squared. An example of
statistical inference using Chi Squared is drawn from the 2010 Mississippi Poll,
which sampled only 601 adult Mississippians from an adult population of over
two million. We found a definite relationship in the sample between gender and
seat belt use. 83% of women said they "always" used their seat belts,
compared to 76% of men. 12% of men said they "never" or
"seldom" used their seat belts, compared to only 5% of women. The
magnitude of this relationship between gender and seat belt use was 7%:
[(83-76) + (12-5)] / 2. But can we generalize this relationship found in the
sample to the entire population? Is there a relationship between gender and
seat belt use in the entire population? Statistical inference is the procedure
we use to determine if any relationship exists in the entire population.
In this example, the
chi-squared (Pearson) value is 10.8 with 3 df, which is significant at .05 level.
This means that there are only 5 chances in one hundred that no relationship
exists in the population; thus, there is a 95% chance that this relationship
does exist in the entire population. So in the entire population, it is quite likely that women are indeed more likely to use their seat belts than are men.
In
other words, Chi Squared significance level is one of those statistics where
the lower the value, the better it is. A .01 significance level indicates that
there is only one chance in one hundred that no relationship exists in the
population. A .001 level of significance indicates that there is only one
chance in one thousand that no relationship exists in the population. As such, these values which are lower than .05 show that it is even more likely that a relationship found in a sample also exists in the population.
How
do you find the Chi Squared significance level in your computer output? Take a
look at one of your crosstabs tables. Under it are two more tables, one for chi
squared statistics and one for gamma.
Take
a look at one student’s computer output from the 2020 class, who looks
at the relationship between religiosity and ideology. There is a clear
relationship in the poll between these two variables, since 68.0% of weekly
church attenders are self-described conservatives, compared to only 42.4% of
yearly church attenders. Conversely, 24.8% of the yearly church attenders are
liberals, compared to only 12% of weekly church attenders. How statistically
significant is this relationship?
ideology1 Ideology * religfre1 Religiosity recoded
Crosstabulation |
||||||
|
religfre1 Religiosity recoded |
Total |
||||
1.00 Weekly |
2.00 Monthly (codes2,3) |
3.00 Yearly (codes4,5) |
||||
ideology1 Ideology |
1.00 Liberal |
Count |
27 |
36 |
31 |
94 |
% within religfre1 Religiosity recoded |
12.0% |
25.2% |
24.8% |
19.1% |
||
2.00 Moderate |
Count |
45 |
48 |
41 |
134 |
|
% within religfre1 Religiosity recoded |
20.0% |
33.6% |
32.8% |
27.2% |
||
3.00 Conservative |
Count |
153 |
59 |
53 |
265 |
|
% within religfre1 Religiosity recoded |
68.0% |
41.3% |
42.4% |
53.8% |
||
Total |
Count |
225 |
143 |
125 |
493 |
|
% within religfre1 Religiosity recoded |
100.0% |
100.0% |
100.0% |
100.0% |
Chi-Square Tests |
|||
|
Value |
df |
Asymptotic Significance (2-sided) |
Pearson Chi-Square |
34.359a |
4 |
.000 |
Likelihood Ratio |
34.946 |
4 |
.000 |
Linear-by-Linear Association |
23.922 |
1 |
.000 |
N of Valid Cases |
493 |
|
|
a. 0 cells (.0%) have expected count less than 5. The minimum
expected count is 23.83. |
Using a formula to
calculate chi-squared, you (the computer program) get a value of 34.359 at 4
degrees of freedom (3 rows minus 1=2)(3 columns minus 1 = 2). Rows and columns
denote number of categories of each variable. You subtract one from each number.
Multiply the results together. 2 times 2 = 4, gives you four degrees of
freedom. A table in a textbook or on-line source, or in your case your computer
program, gives you the significance level of this statistic. In this case, the
results are so statistically significant that it is basically zero.
In published papers, we
typically only report four values. Is the significance level the best,
at < .001? Or is it < .01. Or is it < .05. The arrow to the left
means, “less than.” We also report a rejected hypothesis, which is > .05;
the arrow to the right means “greater than”. In this case, zero is less than
.001, it is the best-case scenario, so in your tables and text you just report
Chi-squared sig. < .001.
Symmetric Measures |
|||||
|
Value |
Asymptotic Standard Errora |
Approximate Tb |
Approximate Significance |
|
Ordinal by Ordinal |
Gamma |
-.331 |
.058 |
-5.481 |
.000 |
N of Valid Cases |
493 |
|
|
|
|
a. Not assuming the null hypothesis. |
|||||
b. Using the asymptotic standard error assuming the null
hypothesis. |
|||||
This third table gives the gamma value,
which is in the first column. Remember with gamma values, the higher the
absolute value of the number, the better; the highest possible is 1.0 or
-1.0. This value in the first column of these Gamma tables is what you report
in your tables and the text of your paper.
Normally, we won’t be reporting the significance
levels for gamma (the last columns). The only exception is in certain cases
where the significance levels for gamma and chi-squared are different, and that
difference reflects something important. Chi-squared is a nominal level
measurement, so it reports any deviation from chance for all of the cells in
your table, even for the middle categories. However, your hypotheses are all
directional, meaning that you posit that one extreme category of one variable
is related to one extreme category of another variable. So, occasionally, gamma
significance might be worth reporting.
This
student example is looking at sex differences in educational level. Historically,
you might hypothesize that men tend to have a higher education level than do
women. And indeed, 21.1% of men are college graduates, compared to 16.9% of
females, which is consistent with your hypothesis. However, 21.3% of men are
high school dropouts, compared to only 18.9% of females. This is the opposite
of the hypothesis. So what happens?
educate1 Education Level * sex Gender
Respondent Crosstabulation |
|||||
|
sex Gender Respondent |
Total |
|||
1 MALE |
2 FEMALE |
||||
educate1 Education Level |
3.00 < Hi Sch |
Count |
137 |
140 |
277 |
% within sex Gender Respondent |
21.3% |
18.9% |
20.0% |
||
4.00 Hi Sch Grad |
Count |
213 |
236 |
449 |
|
% within sex Gender Respondent |
33.1% |
31.9% |
32.4% |
||
5.00 Some College |
Count |
158 |
239 |
397 |
|
% within sex Gender Respondent |
24.5% |
32.3% |
28.7% |
||
6.00 College Grad + > |
Count |
136 |
125 |
261 |
|
% within sex Gender Respondent |
21.1% |
16.9% |
18.9% |
||
Total |
Count |
644 |
740 |
1384 |
|
% within sex Gender Respondent |
100.0% |
100.0% |
100.0% |
Well, as you can see from Chi-Squared, it says
that it is significant at the .01 level. But would you really say that the
hypothesis was upheld? After all, there really is no substantively significant
differences between the sexes in education levels.
Chi-Square Tests |
|||
|
Value |
df |
Asymptotic Significance (2-sided) |
Pearson Chi-Square |
11.598a |
3 |
.009 |
Likelihood Ratio |
11.654 |
3 |
.009 |
Linear-by-Linear Association |
.093 |
1 |
.760 |
N of Valid Cases |
1384 |
|
|
a. 0 cells (.0%) have expected count less than 5. The minimum
expected count is 121.45. |
The gamma table in this case is very
informative. Note that the value is only .019. That value, which measures the
significance of this relationship between ordinal variables, is NOT
statistically significant. In this rare case, I would conclude that the
hypothesis was rejected. And I would explain that there is a statistically
insignificant Curvilinear relationship between sex and education level.
Men are both slightly more educated and slightly less educated than women,
while women are more likely than men to be in a middle category of having “some
college.” But the overall relationship between sex and education is so weak
that it is statistically insignificant.
Symmetric Measures |
|||||
|
Value |
Asymptotic Standard Errora |
Approximate Tb |
Approximate Significance |
|
Ordinal by Ordinal |
Gamma |
.019 |
.041 |
.461 |
.645 |
N of Valid Cases |
1384 |
|
|
|
|
a. Not assuming the null hypothesis. |
|||||
b. Using the asymptotic standard error assuming the null
hypothesis. |
|||||
Further information about Chi-Squared and how it
is calculated is available in my obsolete
class notes. You do not need to know that material for the final exam, so I
am not including it in this week’s notes.
A
second test of statistical inference that is often found in published articles
is the t-test for differences between means.
The t-test is
an interval statistic (dependent variable must be interval). It tests the
hypothesis that two groups have different means, and that the inter-group
difference can be generalized to the population.
Two-sample t-test
(SPSS-independent sample) means that each group is considered a sample.
A one-tailed t-test
means that your hypothesis has a direction for the relationship. A two-tailed
t-test is used to test nondirectional hypotheses. A two-tailed test is
stricter, and SPSS does not report a one-tailed test, hence if your results are
significant for the 2-tailed test, they will also be significant for the
1-tailed test.
Two statistics are
reported in the SPSS program-- for two populations having equal variances, or
unequal variances.
The t-test is computed
using the formula in textbooks or on-line.
Degrees of freedom equals the sum of the two sample sizes minus two.
The t-value must be
larger than the table entry to be significant at the specified level. We are
more concerned with application instead of how the t-test is calculated, since
our SPSS computer program will compute the statistics for us.
Using the SPSS program.
Use Compare Means- Independent Samples Statistics Menu. Your Test Variable is
your dependent variable, which should be interval level. Your Grouping Variable
should be a dichotomous independent variable (recode it, when necessary). Use
Levine test, which must be p <= .05 for equal variances; otherwise, use
unequal variances row. Cite t-value and 2-tail sig. level in any research paper you
do for graduate school. Significance Level must be <= .05.
You don’t have to
understand the mechanics of this, except for the following example of a test
question. My test questions are very straight forward.
Example of a t-test problem (drawn from 2008-2010 Mississippi Poll
data).
Examining predictors of family income. Family income is an
interval data, coded from a low of 1 for under $10,000 to a high of 8 for over
$70,000. The following indicates what the average income codes
are for pairs of categories of each predictor, as well as what the t-test
significance level is. Answer the following two questions: For each predictor,
what group has the higher family income; Is the t-test statistically
significant for each of the following five predictors (remember, it must be
significant at least at the .05 level)?
Lab work, focusing on
your individual papers:
The
next Findings and Tables section of your paper is the most critical part, since
it basically counts for half of your overall paper grade. The literature review
counts for one-fourth of your overall paper grade. So if you get pressed for
time, you might want to put more time into the Findings and Tables section.
The
bivariate part of the Findings section is pretty straight forward, as you can
see from the sample
student paper. You have one paragraph for each of your hypotheses. It is
probably most readable to put the table first, and then have the text
paragraph. Put the next table in, then have the next text paragraph. And so on.
Feel free to renumber the table numbers to conform with the hypothesis numbers
in your model and hypotheses, and literature review sections. You will each
have 5 of these bivariate tables, testing each of your hypotheses.
The
most complicated part of the paper is the multivariate section. First, take a
look at the computer output and how each of these multivariate tables have only
a portion of the sample, based on the category of the variable that I
controlled for. Putting these tables in your paper are MUCH more readable to a
reader than the multivariate crosstabs table that SPSS usually displays.
We
talked about multivariate tables and why we do these analyses previously in
this class. But here are some examples from previous student papers showing the
value of multivariate analyses.
One
project in 2020 looked at Abortion as the dependent variable, with sex and
religiosity as independent early variables and ideology as the middle,
intervening variable. Interestingly enough, sex may not affect attitudes toward
abortion (don’t worry about rejected hypotheses; as in this case, such findings
are still very interesting and valuable to know). However, both religiosity and
ideology may affect abortion attitudes, with liberals and the least religious
being more pro-choice than conservatives and the most religious. Now the
question is, do these bivariate relations exist in a multivariate sense, that
is, are both of these predictors important. Or is ideology the only important
predictor, and highly religious people are more pro-life only because they are
more conservative than the seculars. This is when we have multivariate tables.
In this case, we produced three multivariate tables that broke up the sample
into three groups (weekly church attenders, monthly church attenders, and
rarely attenders), and for each group we looked at whether ideology affected
abortion attitudes. In this case, it looks like ideology does affect abortion
attitudes, for each of these three religiosity groups examined separately. So
ideology is important in your final model. Then, we produce three more tables,
this time separating the sample into three ideology groups (liberal, moderate,
conservative), only this time, we look at whether religiosity is important in
affecting abortion attitudes. In these cases, it looks like religiosity is
important in affecting abortion attitudes for each of these three ideology
groups examined separately. So in your final model, religiosity is also
important. This final redrawn model would therefore keep both links of ideology
affecting abortion attitudes and religiosity affecting abortion attitudes. Your
Conclusion section will have your redrawn model, but the Conclusion section is not
due until the entire rewritten paper is turned in.
You
can take a look at the student sample paper for some ideas on how to write up
each of your multivariate tables. It would probably be easiest to just talk
about each table separately. Then, have a summing up paragraph on what they all
tell you, as the sample paper does.
You
might have only two predictors that are significantly related to your dependent
variable in your bivariate analyses, so you may end up controlling for each of
them to produce two series of multivariate tables. But you might ask, why
reproduce the same percentages but in a slightly different format in these two
sets of multivariate tables? Isn’t that redundant? Yes, but like in the test
you just took, it is easier for you and a reader to follow the results. You then
don’t have to compare across three multivariate tables. You can just look at
each table separately. So some of you will have these types of repetitive multivariate
tables.
Another
project from 2020 looked at a complex but valuable subject like support for
defense spending, and might examine the important variables of sex, age, and
ideology. It is just as valuable to have rejected hypotheses as accepted hypotheses,
since we may learn to our surprise that in Mississippi at this time, maybe
there were no sex or even ideological differences in support for defense
spending. But age may be a critical factor. In that case, we might control for
ideology and sex separately to see whether age is still important for each
ideological and sex group. The information is valuable, as attitudes of the
older generation (policy makers) and the younger generation (future leaders) is
very important to know. The results may show that the generational gap in
support for defense spending exists for only two ideological groups, and that the
gap may be especially strong for one sex group (but also exist for the other).
So even if you think that your results are weak, you can use conditional
variables to explain the conditions under which relationships exist.
Your paper findings and tables are due April 8.