WEEK 10:

STATISTICAL INFERENCE

We use public opinion polls a lot, because we don’t have enough money like the Census does to talk to everyone. Even with a randomly drawn sample, our estimates of public opinion will not be as exact as if we had interviewed everyone in the population. Statistical inference measures tell us something very simple. They tell us whether whatever relationship between two variables exists in the poll, does it also exist in the population. So, if you find that women are more liberal than men in your poll, does that relationship also exist in the population? Statistical tools are limited in the information that they provide in the sense that they are just looking for whether or not any relationship of any strength exists in the population. So, for example you might find that in the sample, women vote 5% more Democratic than do men. If your findings are statistically significant, that just means that the direction of this relationship exists in the entire population. But that percentage difference in the population might be only 1%; or it might be 10%. Either way, the statistics we use (chi-squared, t-tests) would say that your results are statistically significant. Because of the limited info these statistics provide, some researchers do not find them very valuable, and we are not talking about them until the tail end of the course. They are important, however, because they must be reported in all of your tables and mentioned in the text of your paper. And they do at least tell you that your relationship is not limited to your sample, but that it exists in the entire population.

To recap: Statistical inference is our ability to generalize a relationship found in a sample to the entire population from which that sample was drawn. That is, can we infer population characteristics from sample data. If our statistical inference test suggests that in the population the relationship between the two variables is nonrandom, the relationship is said to be statistically significant.

One measure of statistical inference is Chi Squared. An example of statistical inference using Chi Squared is drawn from the 2010 Mississippi Poll, which sampled only 601 adult Mississippians from an adult population of over two million. We found a definite relationship in the sample between gender and seat belt use. 83% of women said they "always" used their seat belts, compared to 76% of men. 12% of men said they "never" or "seldom" used their seat belts, compared to only 5% of women. The magnitude of this relationship between gender and seat belt use was 7%: [(83-76) + (12-5)] / 2. But can we generalize this relationship found in the sample to the entire population? Is there a relationship between gender and seat belt use in the entire population? Statistical inference is the procedure we use to determine if any relationship exists in the entire population.

In this example, the chi-squared (Pearson) value is 10.8 with 3 df, which is significant at .05 level. This means that there are only 5 chances in one hundred that no relationship exists in the population; thus, there is a 95% chance that this relationship does exist in the entire population. So in the entire population, it is quite likely that women are indeed more likely to use their seat belts than are men.

In other words, Chi Squared significance level is one of those statistics where the lower the value, the better it is. A .01 significance level indicates that there is only one chance in one hundred that no relationship exists in the population. A .001 level of significance indicates that there is only one chance in one thousand that no relationship exists in the population. As such, these values which are lower than .05 show that it is even more likely that a relationship found in a sample also exists in the population.

How do you find the Chi Squared significance level in your computer output? Take a look at one of your crosstabs tables. Under it are two more tables, one for chi squared statistics and one for gamma.

Take a look at one student’s computer output from the 2020 class, who looks at the relationship between religiosity and ideology. There is a clear relationship in the poll between these two variables, since 68.0% of weekly church attenders are self-described conservatives, compared to only 42.4% of yearly church attenders. Conversely, 24.8% of the yearly church attenders are liberals, compared to only 12% of weekly church attenders. How statistically significant is this relationship?

*ideology1 Ideology religfre1 Religiosity recoded Crosstabulation**
			religfre1 Religiosity recoded			Total
			1.00 Weekly	2.00 Monthly (codes2,3)	3.00 Yearly (codes4,5)	Total
ideology1 Ideology	1.00 Liberal	Count	27	36	31	94
	1.00 Liberal	% within religfre1 Religiosity recoded	12.0%	25.2%	24.8%	19.1%
	2.00 Moderate	Count	45	48	41	134
	2.00 Moderate	% within religfre1 Religiosity recoded	20.0%	33.6%	32.8%	27.2%
	3.00 Conservative	Count	153	59	53	265
	3.00 Conservative	% within religfre1 Religiosity recoded	68.0%	41.3%	42.4%	53.8%
Total		Count	225	143	125	493
Total		% within religfre1 Religiosity recoded	100.0%	100.0%	100.0%	100.0%

Chi-Square Tests
	Value	df	Asymptotic Significance (2-sided)
Pearson Chi-Square	34.359^a	4	.000
Likelihood Ratio	34.946	4	.000
Linear-by-Linear Association	23.922	1	.000
N of Valid Cases	493
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 23.83.

Using a formula to calculate chi-squared, you (the computer program) get a value of 34.359 at 4 degrees of freedom (3 rows minus 1=2)(3 columns minus 1 = 2). Rows and columns denote number of categories of each variable. You subtract one from each number. Multiply the results together. 2 times 2 = 4, gives you four degrees of freedom. A table in a textbook or on-line source, or in your case your computer program, gives you the significance level of this statistic. In this case, the results are so statistically significant that it is basically zero.

In published papers, we typically only report four values. Is the significance level the best, at < .001? Or is it < .01. Or is it < .05. The arrow to the left means, “less than.” We also report a rejected hypothesis, which is > .05; the arrow to the right means “greater than”. In this case, zero is less than .001, it is the best-case scenario, so in your tables and text you just report Chi-squared sig. < .001.

Symmetric Measures
		Value	Asymptotic Standard Error^a	Approximate T^b	Approximate Significance
Ordinal by Ordinal	Gamma	-.331	.058	-5.481	.000
N of Valid Cases		493
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.

This third table gives the gamma value, which is in the first column. Remember with gamma values, the higher the absolute value of the number, the better; the highest possible is 1.0 or -1.0. This value in the first column of these Gamma tables is what you report in your tables and the text of your paper.

Normally, we won’t be reporting the significance levels for gamma (the last columns). The only exception is in certain cases where the significance levels for gamma and chi-squared are different, and that difference reflects something important. Chi-squared is a nominal level measurement, so it reports any deviation from chance for all of the cells in your table, even for the middle categories. However, your hypotheses are all directional, meaning that you posit that one extreme category of one variable is related to one extreme category of another variable. So, occasionally, gamma significance might be worth reporting.

This student example is looking at sex differences in educational level. Historically, you might hypothesize that men tend to have a higher education level than do women. And indeed, 21.1% of men are college graduates, compared to 16.9% of females, which is consistent with your hypothesis. However, 21.3% of men are high school dropouts, compared to only 18.9% of females. This is the opposite of the hypothesis. So what happens?

*educate1 Education Level sex Gender Respondent Crosstabulation**
			sex Gender Respondent		Total
			1 MALE	2 FEMALE	Total
educate1 Education Level	3.00 < Hi Sch	Count	137	140	277
	3.00 < Hi Sch	% within sex Gender Respondent	21.3%	18.9%	20.0%
	4.00 Hi Sch Grad	Count	213	236	449
	4.00 Hi Sch Grad	% within sex Gender Respondent	33.1%	31.9%	32.4%
	5.00 Some College	Count	158	239	397
	5.00 Some College	% within sex Gender Respondent	24.5%	32.3%	28.7%
	6.00 College Grad + >	Count	136	125	261
	6.00 College Grad + >	% within sex Gender Respondent	21.1%	16.9%	18.9%
Total		Count	644	740	1384
Total		% within sex Gender Respondent	100.0%	100.0%	100.0%

Well, as you can see from Chi-Squared, it says that it is significant at the .01 level. But would you really say that the hypothesis was upheld? After all, there really is no substantively significant differences between the sexes in education levels.

Chi-Square Tests
	Value	df	Asymptotic Significance (2-sided)
Pearson Chi-Square	11.598^a	3	.009
Likelihood Ratio	11.654	3	.009
Linear-by-Linear Association	.093	1	.760
N of Valid Cases	1384
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 121.45.

The gamma table in this case is very informative. Note that the value is only .019. That value, which measures the significance of this relationship between ordinal variables, is NOT statistically significant. In this rare case, I would conclude that the hypothesis was rejected. And I would explain that there is a statistically insignificant Curvilinear relationship between sex and education level. Men are both slightly more educated and slightly less educated than women, while women are more likely than men to be in a middle category of having “some college.” But the overall relationship between sex and education is so weak that it is statistically insignificant.

Symmetric Measures
		Value	Asymptotic Standard Error^a	Approximate T^b	Approximate Significance
Ordinal by Ordinal	Gamma	.019	.041	.461	.645
N of Valid Cases		1384
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.

Further information about Chi-Squared and how it is calculated is available in my obsolete class notes. You do not need to know that material for the final exam, so I am not including it in this week’s notes.

A second test of statistical inference that is often found in published articles is the t-test for differences between means.

The t-test is an interval statistic (dependent variable must be interval). It tests the hypothesis that two groups have different means, and that the inter-group difference can be generalized to the population.

Two-sample t-test (SPSS-independent sample) means that each group is considered a sample.

A one-tailed t-test means that your hypothesis has a direction for the relationship. A two-tailed t-test is used to test nondirectional hypotheses. A two-tailed test is stricter, and SPSS does not report a one-tailed test, hence if your results are significant for the 2-tailed test, they will also be significant for the 1-tailed test.

Two statistics are reported in the SPSS program-- for two populations having equal variances, or unequal variances.

The t-test is computed using the formula in textbooks or on-line.
Degrees of freedom equals the sum of the two sample sizes minus two.

The t-value must be larger than the table entry to be significant at the specified level. We are more concerned with application instead of how the t-test is calculated, since our SPSS computer program will compute the statistics for us.

Using the SPSS program. Use Compare Means- Independent Samples Statistics Menu. Your Test Variable is your dependent variable, which should be interval level. Your Grouping Variable should be a dichotomous independent variable (recode it, when necessary). Use Levine test, which must be p <= .05 for equal variances; otherwise, use unequal variances row. Cite t-value and 2-tail sig. level in any research paper you do for graduate school. Significance Level must be <= .05.

You don’t have to understand the mechanics of this, except for the following example of a test question. My test questions are very straight forward.

Example of a t-test problem (drawn from 2008-2010 Mississippi Poll data).

Examining predictors of family income. Family income is an interval data, coded from a low of 1 for under $10,000 to a high of 8 for over $70,000. The following indicates what the average income codes are for pairs of categories of each predictor, as well as what the t-test significance level is. Answer the following two questions: For each predictor, what group has the higher family income; Is the t-test statistically significant for each of the following five predictors (remember, it must be significant at least at the .05 level)?

Education: high school dropout income mean is 2.68; college graduate income mean is 6.31; t-test is statistically significant at .001 level. The answer is: college graduate has the higher income; it is statistically significant.
Sex: male income mean is 4.68; female income mean is 4.20; t-test is statistically significant at .01 level. Males have the higher income; is statistically significant.
Race: white income mean is 4.92; black income mean is 3.15; t-test is statistically significant at .001 level. Whites have the higher income; is significant.
Ideology: moderates' income mean is 4.26; conservatives' income mean is 4.75; t-test is statistically significant at .05 level. Conservatives are higher; significant.
Number of adults living in household: 1 adult households' income mean is 3.16; 2 adult households' income mean is 4.84; t-test is statistically significant at .001 level. Two adult household has higher income; is significant.

Lab work, focusing on your individual papers:

The next Findings and Tables section of your paper is the most critical part, since it basically counts for half of your overall paper grade. The literature review counts for one-fourth of your overall paper grade. So if you get pressed for time, you might want to put more time into the Findings and Tables section.

The bivariate part of the Findings section is pretty straight forward, as you can see from the sample student paper. You have one paragraph for each of your hypotheses. It is probably most readable to put the table first, and then have the text paragraph. Put the next table in, then have the next text paragraph. And so on. Feel free to renumber the table numbers to conform with the hypothesis numbers in your model and hypotheses, and literature review sections. You will each have 5 of these bivariate tables, testing each of your hypotheses.

The most complicated part of the paper is the multivariate section. First, take a look at the computer output and how each of these multivariate tables have only a portion of the sample, based on the category of the variable that I controlled for. Putting these tables in your paper are MUCH more readable to a reader than the multivariate crosstabs table that SPSS usually displays.

We talked about multivariate tables and why we do these analyses previously in this class. But here are some examples from previous student papers showing the value of multivariate analyses.

One project in 2020 looked at Abortion as the dependent variable, with sex and religiosity as independent early variables and ideology as the middle, intervening variable. Interestingly enough, sex may not affect attitudes toward abortion (don’t worry about rejected hypotheses; as in this case, such findings are still very interesting and valuable to know). However, both religiosity and ideology may affect abortion attitudes, with liberals and the least religious being more pro-choice than conservatives and the most religious. Now the question is, do these bivariate relations exist in a multivariate sense, that is, are both of these predictors important. Or is ideology the only important predictor, and highly religious people are more pro-life only because they are more conservative than the seculars. This is when we have multivariate tables. In this case, we produced three multivariate tables that broke up the sample into three groups (weekly church attenders, monthly church attenders, and rarely attenders), and for each group we looked at whether ideology affected abortion attitudes. In this case, it looks like ideology does affect abortion attitudes, for each of these three religiosity groups examined separately. So ideology is important in your final model. Then, we produce three more tables, this time separating the sample into three ideology groups (liberal, moderate, conservative), only this time, we look at whether religiosity is important in affecting abortion attitudes. In these cases, it looks like religiosity is important in affecting abortion attitudes for each of these three ideology groups examined separately. So in your final model, religiosity is also important. This final redrawn model would therefore keep both links of ideology affecting abortion attitudes and religiosity affecting abortion attitudes. Your Conclusion section will have your redrawn model, but the Conclusion section is not due until the entire rewritten paper is turned in.

You can take a look at the student sample paper for some ideas on how to write up each of your multivariate tables. It would probably be easiest to just talk about each table separately. Then, have a summing up paragraph on what they all tell you, as the sample paper does.

You might have only two predictors that are significantly related to your dependent variable in your bivariate analyses, so you may end up controlling for each of them to produce two series of multivariate tables. But you might ask, why reproduce the same percentages but in a slightly different format in these two sets of multivariate tables? Isn’t that redundant? Yes, but like in the test you just took, it is easier for you and a reader to follow the results. You then don’t have to compare across three multivariate tables. You can just look at each table separately. So some of you will have these types of repetitive multivariate tables.

Another project from 2020 looked at a complex but valuable subject like support for defense spending, and might examine the important variables of sex, age, and ideology. It is just as valuable to have rejected hypotheses as accepted hypotheses, since we may learn to our surprise that in Mississippi at this time, maybe there were no sex or even ideological differences in support for defense spending. But age may be a critical factor. In that case, we might control for ideology and sex separately to see whether age is still important for each ideological and sex group. The information is valuable, as attitudes of the older generation (policy makers) and the younger generation (future leaders) is very important to know. The results may show that the generational gap in support for defense spending exists for only two ideological groups, and that the gap may be especially strong for one sex group (but also exist for the other). So even if you think that your results are weak, you can use conditional variables to explain the conditions under which relationships exist.

Your paper findings and tables are due April 8.