WEEK 8:

BIVARIATE AND MULTIVARIATE CONTINGENCY TABLES

 

CONTINGENCY TABLES- BIVARIATE RELATIONS

Contingency tables can be used with nominal level measures, though we usually employ ordinal or interval level data having a limited number of categories. Contingency tables permit you to view the data in an easily interpretable and understood manner.

Percentage Difference is a measure of strength of the relationship. It ranges from a low of 0 to a high of 100. Always put the independent variable at the top of the table, and the dependent variable at the side. Then, calculate the column percentages. For ordinal and interval level indicators, compare the column percents (for the two extreme categories of the predictor) across the same category of your dependent variable. Make this comparison for the two extreme categories of your dependent variable, and take the average. If one of these comparisons is contrary to your hypothesis, make the difference a negative.

Other Measures of Association to use (Source: Research Methods in Political Science: An Introduction Using MicroCase, 2nd edition, by Michael Corbett; p. 139-144; copyrighted by MicroCase Corporation):

All measures range from 0 for no relationship to 1 for perfect relationship. A positive or negative sign is a function of the direction of the coding of the variables and whether your hypothesis is upheld.

The following are nine examples of bivariate tables. In class, we will review three features of each table. 1) Is the relationship statistically significant? Is Chi-squared significant at the .05 level or below? 2) What is the magnitude of the relationship? That is, what is the gamma value. To determine the relative importance of the predictors-- which predictor is most and least important-- use the absolute value of the gamma, and ignore the sign. 3) What is the direction of the relationship? That is, devise a hypothesis for each table that reflects how the two variables are related. Example for table 1: People younger in age are more likely to favor spending more on health care, compared to people older in age. It would not have been as accurate to say that: People younger in age are more likely to favor spending less on health care, compared to those older in age, because the percentage difference is only 2% (10%-8%). The percentage difference between the two extreme age groups for the "More" category is 14% (72%-58%).
Note: The tables in your research paper should look like these tables in format.

Table 1

Age Differences in State Spending Preferences for Health Care

AGE

STATE SPENDING

DESIRED:

 

18-35

 

36-55

 

56 and Over

Less

10%

7%

8%

Same

18%

18%

34%

More

72%

75%

58%

N Size

(555)

(571)

(524)

Gamma = -.16
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 2

Income Differences in State Spending Preferences for Health Care

FAMILY INCOME

STATE SPENDING

DESIRED:

 

< $20,000

 

$20-40,000

 

$40-60,000

 

> $60,000

Less

10%

4%

7%

10%

Same

13%

17%

30%

36%

More

77%

79%

63%

54%

N Size

(365)

(363)

(222)

(333)

Gamma = -.28
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 3

Ideological Differences in State Spending Preferences for Health Care

SELF-IDENTIFIED IDEOLOGY

STATE SPENDING

DESIRED:

 

Liberal

 

Moderate

 

Conservative

Less

3%

6%

12%

Same

15%

17%

31%

More

82%

77%

57%

N Size

(262)

(495)

(808)

Gamma = -.41
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 4

Race Differences in State Spending Preferences for Health Care

 RACE

STATE SPENDING

DESIRED:

White

African-American

Less

10%

3%

Same

31%

10%

More

59%

87%

N Size

(1050)

(555)

Gamma = .63
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 5

Sex Differences in State Spending Preferences for Health Care

 SEX

STATE SPENDING

DESIRED:

Men

Women

Less

12%

5%

Same

27%

20%

More

61%

75%

N Size

(772)

(889)

Gamma = .33
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 6

Income Differences in Having Access to a Personal Computer

  FAMILY INCOME

HAVE ACCESS TO A PC?

 

< $20,000

 

$20-40,000

 

$40-60,000

 

> $60,000

Yes

54%

67%

85%

94%

No

46%

33%

15%

6%

N Size

(370)

(368)

(232)

(341)

Gamma = -.59
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 7

Race Differences in Having Access to a Personal Computer

RACE

HAVE ACCESS TO A PC?

White

African-American

Yes

74%

69%

No

26%

31%

N Size

(1084)

(560)

Gamma = .12
Chi-squared significance < .05
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 8

Sex Differences in Having Access to a Personal Computer

SEX

HAVE ACCESS TO A PC?

Men

Women

Yes

74%

70%

No

26%

30%

N Size

(790)

(910)

Gamma = .10
Chi-squared significance < .06; Not Significant at .05 level.
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 9

Age Differences in Having Access to a Personal Computer

 AGE

HAVE ACCESS TO A PC?

 

18-35

 

36-55

 

56 and Over

Yes

82%

79%

55%

No

18%

21%

45%

N Size

(564)

(585)

(538)

Gamma = .41
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Now repeat the SPSS analyses for these bivariable tables for the two most recent polls- the combined 2012 and 2014 samples. Do this assignment in class.

A note about your research papers. The Findings and Tables portion of your papers are due April 8, which is after the second test. Check the sample student paper for how you do this section of the paper.

 

MULTIVARIATE CONTINGENCY TABLES

Multivariate crosstabulations:
Multivariate analysis involves one dependent variable and more than one independent variables (predictor).

Controlling- multivariate tables always permit you to examine the relationship between a predictor and a dependent variable, after taking into account the impact of a second predictor.

For example, African-Americans tend to have a lower turnout than whites. A possible control variable is socioeconomic status (SES). Perhaps African-Americans have a lower average turnout than whites because of the generally lower socioeconomic status of blacks, and we know that people of all races having a lower SES tend to have lower turnout compared to people of all races having a higher SES. To determine whether a lower SES level explains why African-Americans tend to have lower turnouts than whites we examine: the relationship between race and turnout, controlling for SES. Do whites and blacks of the same SES level have the same turnout level; if so, SES is more important than race in shaping turnout.

___>___________>SES ________>
RACE _____________________> TURNOUT

Three types of variables that one can control for:
1) Outside variables- a variable that has an effect on one of your predictors and on your dependent variable. In the model above, race is an outside variable. You can control for SES to determine if race has a direct, causal effect on turnout, or whether the race-turnout effect is spurious. If spurious, then racial differences in turnout exist only because there are racial differences in SES.
2) Intervening variable- a variable that is located between a predictor and a dependent variable, and that explains why the "early" predictor is related to the dependent variable. SES is an intervening variable in the above model, as it explains why race is related to turnout.
3) Specifying or Conditional variables- a predictor that changes the relationship between another predictor and the dependent variable. That is, the relationship has a different direction or magnitude for different categories of the specifying variable. If a race gap in turnout exists only among college grads in Mississippi but not among other educational groups, then education is the specifying variable.

Examples of Multivariate Tables (cell entries are completely artificial, non-real data)

MODEL TESTED FOR ALL THREE SCENARIOS THAT FOLLOW

RACE.............................> SES ...................................................> PARTICIPATION

RACE ..........................................................................................> PARTICIPATION

 

SCENARIO 1:

BIVARIATE (includes low, medium, and high SES groups):

White Race

Black Race

Low Participation

40%

60%

High Participation

60%

40%

Column % Totalled

100%

100%

MULTIVARIATE (Low SES group only):

White Race

Black Race

Low Participation

70%

70%

High Participation

30%

30%

Column % Totalled

100%

100%

MULTIVARIATE (Medium SES group only):

White Race

Black Race

Low Participation

50%

50%

High Participation

50%

50%

Column % Totalled

100%

100%

MULTIVARIATE (High SES group only):

White Race

Black Race

Low Participation

20%

20%

High Participation

80%

80%

Column % Totalled

100%

100%

In scenario 1, race has a bivariate relationship to participation, and whites tend to have a higher participation level compared to African Americans. We then control for a possible intervening variable of socioeconomic status. We divide our sample into three groups- low SES, medium SES, and high SES. We now see that within each of these SES groups, race has no impact on participation. Within each SES group, whites and blacks have the same level of participation. Therefore, we redraw our model and eliminate the direct link between race and participation. Instead, we have the model below. We keep the direct link between SES and participation, since SES has a definite impact on participation. Looking across the three multivariate tables and focusing only on whites, you can see that 80% of high SES whites were high in participation, compared to only 30% of low SES whites who were high in participation. Looking only at blacks, you can also compare across the multivariate tables and see a similar impact of SES on participation.

RACE ...................................> SES .....................................> PARTICIPATION

 

SCENARIO 2:

BIVARIATE (includes low, medium, and high SES groups):

White Race

Black Race

Low Participation

40%

60%

High Participation

60%

40%

Column % Totalled

100%

100%

MULTIVARIATE (Low SES group only):

White Race

Black Race

Low Participation

40%

60%

High Participation

60%

40%

Column % Totalled

100%

100%

MULTIVARIATE (Medium SES group only):

White Race

Black Race

Low Participation

40%

60%

High Participation

60%

40%

Column % Totalled

100%

100%

MULTIVARIATE (High SES group only):

White Race

Black Race

Low Participation

40%

60%

High Participation

60%

40%

Column % Totalled

100%

100%

Scenario 2 is the same kind of analysis, but it shows you how your data may exhibit a different pattern. In this scenario, you can see that within each multivariate table showing a different SES level, race differences in participation persist. That is, for each SES grouping, a higher percentage of whites are high in participation, compared to African Americans. Therefore, we redraw our model, and show that race exerts a direct impact on participation, even after controlling for SES. We do not have any direct link between SES and participation, because when we compare across the three multivariate tables, we find that SES has no impact on participation. Among whites, regardless of SES level, 60% of whites are high in participation. Among blacks, regardless of SES level, 40% of blacks are high in participation. So in Scenario 2, SES exerts no direct impact on participation.

RACE............................................................> SES

RACE............................................................> PARTICIPATION

 

SCENARIO 3:

BIVARIATE (includes low, medium, and high SES groups):

White Race

Black Race

Low Participation

40%

70%

High Participation

60%

30%

Column % Totalled

100%

100%

MULTIVARIATE (Low SES group only):

White Race

Black Race

Low Participation

70%

80%

High Participation

30%

20%

Column % Totalled

100%

100%

MULTIVARIATE (Medium SES group only):

White Race

Black Race

Low Participation

50%

60%

High Participation

50%

40%

Column % Totalled

100%

100%

MULTIVARIATE (High SES group only):

White Race

Black Race

Low Participation

30%

40%

High Participation

70%

60%

Column % Totalled

100%

100%


Scenario 3 is yet another pattern that the data may form. In this scenario, you can see that both race and SES affect participation levels. Therefore, your final model retains both of these linkages- a direct link between SES and participation, and a direct link between race and participation.

RACE .........................................> SES .................................> PARTICIPATION

RACE .....................................................................................> PARTICIPATION


We now turn to a real-world example of multivariate tables that gets to an important personnel management type of question. A public university out West suffered an enrollment loss because of the pandemic, so they were forced to fire many of their professors. Since half of the women professors were fired, while only 22% of the men professors were fired (see the first bivariate table below), the women professors sued the university. The university lawyer argued that qualifications based on years of experience (seniority) was the major factor used in firing faculty, and he pointed to the third bivariate table below. That tables shows that 70% of low seniority faculty were fired, while only 10% of high seniority faculty were fired. So, what factor was more important in deciding who got fired- the gender of the faculty member, or the faculty member’s seniority?

MODEL OF GENDER AND SENIORITY AFFECTING JOB SECURITY

GENDER ....................................> SENIORITY ..............................> JOB

GENDER ...........................................................................................> SECURITY

 

BIVARIATE: Gender .......> Job Security

Men

Women

Fired

22% (55)

50% (75)

Kept Job

78% (195)

50% (75)

100% (250)

100% (150)

BIVARIATE: Gender .......> Seniority

Men

Women

Low Seniority

20% (50)

67% (100)

High Seniority

80% (200)

33% (50)

100% (250)

100% (150)

BIVARIATE: Seniority .......> Job Security

Low Seniority

High Seniority

Fired

70% (105)

10% (25)

Kept Job

30% (45)

90% (225)

100% (150)

100% (250)

To answer which of the two independent variables is most important in explaining the dependent variable, we can break up our sample into four groups defined by gender and seniority levels. In short, control for seniority to examine the relationship between sex and job security. Looking at the two multivariate tables below, does sex have a direct impact on job security? No. If you had a low seniority, you were 70% likely to be fired, regardless of whether you were a man or a woman.

Now compare across the two multivariate tables to see if seniority has a direct effect on job security. Does it? Yes. Fully 70% of low seniority men were fired, compared to only 10% of high seniority men. So seniority affected the job security of men. For women, 70% of low seniority women were fired, compared to only 10% of high seniority women. So seniority also affected the job security of women.

MULTIVARIATE (Low Seniority Group Only):

Men

Women

Fired

70% (35)

70% (70)

Kept Job

30% (15)

30% (30)

100% (50)

100% (100)

MULTIVARIATE (High Seniority Group Only):

Men

Women

Fired

10% (20)

10% (5)

Kept Job

90% (180)

90% (45)

100% (200)

100% (50)

 

So, we now redraw our final model, and show that seniority has a direct effect on job security, and that gender does not have a direct effect on job security.

GENDER ................... > SENIORITY ........................> JOB SECURITY

So, the university lawyer is celebrating his expected win in court, and breaks open the champagne bottle. But the lady lawyer for the faculty women responds that the university is still at fault, because of the direct link between gender and seniority. That is, women are more likely to have lower seniority than men, because the university historically discriminated against women in the hiring and promotion process. She points to the second bivariate table above that shows that 80% of the men were high in seniority, compared to only 33% of women who were high in seniority. Woops! The university lawyer decides to settle out of court. Some of the women faculty get their jobs back, and the university changes its hiring and promotion practices to eliminate any discrimination between the sexes. Though this is a hypothetical example with artificial data, it does show how the statistical analyses that you are learning in this class are job relevant. Indeed, past students have even gotten jobs as analysts for state governments and private corporations.

Now, let’s turn to a question that we asked on my test a few years ago. See if you can answer all of these questions, before we go over this in class.

 

TEST QUESTION. (25 points) Please study the following three bivariate tables and four multivariate tables, and answer each of the lettered questions. This info is drawn from the 2010-2014 Mississippi Polls. A 5% difference or higher constitutes statistical significance. They bear on the model:

 

MARITAL STATUS

(married)

Age SEAT BELT USAGE

(old) (high)

 

(Note: age is the earliest, outside variable; marital status is the intervening variable; seat belt usage is the latest, dependent variable. There are three arrowed lines- age affects marital status; marital status affects seat belt usage; age affects seat belt usage. The categories that go together are old, married, high usage)

BIVARIATE RELATION BETWEEN AGE AND SEAT BELT USAGE:

SEAT BELT USAGE

18-30

31-64

65 and older

Always

69%

80%

85%

Sometimes

31%

20%

15%

N Size

(351)

(745)

(243)

 

A.      What is the direction of the relationship between age and seat belt usage?

 

BIVARIATE RELATION BETWEEN AGE AND MARITAL STATUS:

EDUCATION LEVEL

18-30

31-64

65 and older

Married

18%

64%

56%

Not Married

82%

36%

44%

N Size

(348)

(752)

(254)

 

B.      What is the direction of the relationship between age and marital status?

 

BIVARIATE RELATION BETWEEN MARITAL STATUS AND SEAT BELT USAGE:

SEAT BELT USAGE

Married

Not Married

Always

82%

74%

Sometimes

18%

26%

N Size

(692)

(667)

 

C.      What is the direction of the relationship between marital status and seat belt usage?

 

RELATIONSHIP BETWEEN AGE AND SEAT BELT USAGE, AMONG THE MARRIED

SEAT BELT USAGE

18-30

31-64

65 and older

Always

78%

80%

88%

Sometimes

22%

20%

12%

N Size

(64)

(477)

(139)

D.      Among those who are married, is there any relationship between age and seat belt usage? Yes, or no? If so, what is the direction of the relationship?

 

RELATIONSHIP BETWEEN AGE AND SEAT BELT USAGE, AMONG THOSE NOT MARRIED

PC ACCESS

18-30

31-64

65 and older

Always

67%

79%

81%

Sometimes

33%

21%

19%

N Size

(284)

(267)

(104)

E.       Among those who are unmarried, is there any relationship between age and seat belt usage? Yes, or no? If so, what is the direction of the relationship?

 

RELATIONSHIP BETWEEN MARITAL STATUS AND SEAT BELT USAGE, AMONG ADULTS 18-30

SEAT BELT USAGE

Married

Not Married

Always

78%

67%

Sometimes

22%

33%

N Size

(64)

(284)

F.       Among adults 18-30 years old, is there any relationship between marital status and seat belt usage? Yes, or no? If so, what is the direction of the relationship?

 

RELATIONSHIP BETWEEN MARITAL STATUS AND SEAT BELT USAGE, AMONG ADULTS 65 AND OLDER

SEAT BELT USAGE

Married

Not Married

Always

88%

81%

Sometimes

12%

19%

N Size

(139)

(104)

 

G.      Among adults 65 and older, is there any relationship between marital status and seat belt usage? Yes, or no? If so, what is the direction of the relationship?

 

H.      So what factor has some effect on seat belt usage? Is it age, marital status, or both? Just circle the correct response.