WEEK 8:
BIVARIATE AND MULTIVARIATE CONTINGENCY TABLES
CONTINGENCY TABLES- BIVARIATE RELATIONS
Contingency
tables can be used with nominal level measures, though we usually employ
ordinal or interval level data having a limited number of categories.
Contingency tables permit you to view the data in an easily interpretable and
understood manner.
Percentage Difference is a measure of strength of the
relationship. It ranges from a low of 0 to a high of 100. Always put the
independent variable at the top of the table, and the dependent variable at the
side. Then, calculate the column percentages. For ordinal and interval level
indicators, compare the column percents (for the two extreme categories of the
predictor) across the same category of your dependent variable. Make this
comparison for the two extreme categories of your dependent variable, and take
the average. If one of these comparisons is contrary to your hypothesis, make
the difference a negative.
Other Measures of
Association to use
(Source: Research Methods in Political Science: An Introduction Using
MicroCase, 2nd edition, by Michael Corbett; p. 139-144; copyrighted by
MicroCase Corporation):
All
measures range from 0 for no relationship to 1 for perfect relationship. A
positive or negative sign is a function of the direction of the coding of the
variables and whether your hypothesis is upheld.
The
following are nine examples of bivariate tables. In class, we will review three
features of each table. 1) Is the relationship statistically significant? Is
Chi-squared significant at the .05 level or below? 2) What is the magnitude of
the relationship? That is, what is the gamma value. To determine the relative
importance of the predictors-- which predictor is most and least important--
use the absolute value of the gamma, and ignore the sign. 3) What is the
direction of the relationship? That is, devise a hypothesis for each table that
reflects how the two variables are related. Example for table 1: People younger
in age are more likely to favor spending more on health care, compared to
people older in age. It would not have been as accurate to say that: People younger in age are more likely to favor spending less on health care, compared to those older in age, because the percentage difference is only 2% (10%-8%). The percentage difference between the two extreme age groups for the "More" category is 14% (72%-58%).
Note: The tables in your research paper should look like these tables in
format.
Table 1
Age Differences in State Spending Preferences for Health Care
AGE
STATE
SPENDING DESIRED: |
18-35 |
36-55 |
56
and Over |
Less |
10% |
7% |
8% |
Same |
18% |
18% |
34% |
More |
72% |
75% |
58% |
N
Size |
(555) |
(571) |
(524) |
Gamma = -.16
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Table 2
Income Differences in State Spending Preferences for Health Care
FAMILY
INCOME
STATE
SPENDING DESIRED: |
<
$20,000 |
$20-40,000 |
$40-60,000 |
>
$60,000 |
Less |
10% |
4% |
7% |
10% |
Same |
13% |
17% |
30% |
36% |
More |
77% |
79% |
63% |
54% |
N
Size |
(365) |
(363) |
(222) |
(333) |
Gamma = -.28
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Table 3
Ideological Differences in State Spending Preferences for Health
Care
SELF-IDENTIFIED IDEOLOGY
STATE
SPENDING DESIRED: |
Liberal |
Moderate |
Conservative |
Less |
3% |
6% |
12% |
Same |
15% |
17% |
31% |
More |
82% |
77% |
57% |
N
Size |
(262) |
(495) |
(808) |
Gamma = -.41
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Table 4
Race Differences in State Spending Preferences for Health Care
RACE
STATE
SPENDING DESIRED: |
White |
African-American |
Less |
10% |
3% |
Same |
31% |
10% |
More |
59% |
87% |
N
Size |
(1050) |
(555) |
Gamma = .63
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Table 5
Sex Differences in State Spending Preferences for Health Care
SEX
STATE
SPENDING DESIRED: |
Men |
Women |
Less |
12% |
5% |
Same |
27% |
20% |
More |
61% |
75% |
N
Size |
(772) |
(889) |
Gamma = .33
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Table 6
Income Differences in Having Access to a Personal Computer
FAMILY INCOME
HAVE
ACCESS TO A PC? |
<
$20,000 |
$20-40,000 |
$40-60,000 |
>
$60,000 |
Yes |
54% |
67% |
85% |
94% |
No |
46% |
33% |
15% |
6% |
N
Size |
(370) |
(368) |
(232) |
(341) |
Gamma = -.59
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Table 7
Race Differences in Having Access to a Personal Computer
RACE
HAVE
ACCESS TO A PC? |
White |
African-American |
Yes |
74% |
69% |
No |
26% |
31% |
N
Size |
(1084) |
(560) |
Gamma = .12
Chi-squared significance < .05
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Table 8
Sex Differences in Having Access to a Personal Computer
SEX
HAVE
ACCESS TO A PC? |
Men |
Women |
Yes |
74% |
70% |
No |
26% |
30% |
N
Size |
(790) |
(910) |
Gamma = .10
Chi-squared significance < .06; Not Significant at .05 level.
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Table 9
Age Differences in Having Access to a Personal Computer
AGE
HAVE
ACCESS TO A PC? |
18-35 |
36-55 |
56
and Over |
Yes |
82% |
79% |
55% |
No |
18% |
21% |
45% |
N
Size |
(564) |
(585) |
(538) |
Gamma = .41
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.
Now
repeat the SPSS analyses for these bivariable tables for the two most recent
polls- the combined 2012 and 2014 samples. Do this assignment in class.
A
note about your research papers. The Findings and Tables portion of your papers
are due April 8, which is after the second test. Check the sample
student paper for how you do this section of the paper.
MULTIVARIATE CONTINGENCY TABLES
Multivariate
crosstabulations:
Multivariate analysis involves one dependent variable and more than one
independent variables (predictor).
Controlling-
multivariate tables always permit you to examine the relationship between a
predictor and a dependent variable, after taking into account the impact of a
second predictor.
For example,
African-Americans tend to have a lower turnout than whites. A possible control
variable is socioeconomic status (SES). Perhaps African-Americans have a lower
average turnout than whites because of the generally lower socioeconomic status
of blacks, and we know that people of all races having a lower SES tend to have
lower turnout compared to people of all races having a higher SES. To determine
whether a lower SES level explains why African-Americans tend to have lower
turnouts than whites we examine: the relationship between race and turnout,
controlling for SES. Do whites and blacks of the same SES level have the same
turnout level; if so, SES is more important than race in shaping turnout.
___>___________>SES
________>
RACE _____________________> TURNOUT
Three types of variables
that one can control for:
1) Outside variables- a variable that has an effect on one of your
predictors and on your dependent variable. In the model above, race is an outside variable.
You can control for SES to determine if race has a direct, causal effect on
turnout, or whether the race-turnout effect is spurious. If spurious, then racial
differences in turnout exist only because there are racial differences in SES.
2) Intervening
variable- a variable that is located between a predictor and a dependent
variable, and that explains why the "early" predictor is related to
the dependent variable. SES is an intervening variable in the above model, as it explains why
race is related to turnout.
3) Specifying or Conditional variables- a predictor that changes
the relationship between another predictor and the dependent variable. That is,
the relationship has a different direction or magnitude for different
categories of the specifying variable. If a race gap in turnout exists only among
college grads in Mississippi but not among other educational groups, then
education is the specifying variable.
Examples of Multivariate Tables (cell entries are completely
artificial, non-real data)
MODEL TESTED FOR ALL
THREE SCENARIOS THAT FOLLOW
RACE.............................>
SES ...................................................> PARTICIPATION
RACE ..........................................................................................>
PARTICIPATION
SCENARIO 1:
BIVARIATE (includes low,
medium, and high SES groups):
White Race |
Black Race |
|
Low Participation |
40% |
60% |
High Participation |
60% |
40% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (Low SES
group only):
White Race |
Black Race |
|
Low Participation |
70% |
70% |
High Participation |
30% |
30% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (Medium SES
group only):
White Race |
Black Race |
|
Low Participation |
50% |
50% |
High Participation |
50% |
50% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (High SES
group only):
White Race |
Black Race |
|
Low Participation |
20% |
20% |
High Participation |
80% |
80% |
Column % Totalled |
100% |
100% |
In
scenario 1, race has a bivariate relationship to participation, and whites tend
to have a higher participation level compared to African Americans. We then
control for a possible intervening variable of socioeconomic status. We divide
our sample into three groups- low SES, medium SES, and high SES. We now see
that within each of these SES groups, race has no impact on participation.
Within each SES group, whites and blacks have the same level of participation.
Therefore, we redraw our model and eliminate the direct link between race and
participation. Instead, we have the model below. We keep the direct link
between SES and participation, since SES has a definite impact on
participation. Looking across the three multivariate tables and focusing only
on whites, you can see that 80% of high SES whites were high in participation,
compared to only 30% of low SES whites who were high in participation. Looking
only at blacks, you can also compare across the multivariate tables and see a
similar impact of SES on participation.
RACE
...................................> SES
.....................................> PARTICIPATION
SCENARIO 2:
BIVARIATE (includes low,
medium, and high SES groups):
White Race |
Black Race |
|
Low Participation |
40% |
60% |
High Participation |
60% |
40% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (Low SES
group only):
White Race |
Black Race |
|
Low Participation |
40% |
60% |
High Participation |
60% |
40% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (Medium SES
group only):
White Race |
Black Race |
|
Low Participation |
40% |
60% |
High Participation |
60% |
40% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (High SES
group only):
White Race |
Black Race |
|
Low Participation |
40% |
60% |
High Participation |
60% |
40% |
Column % Totalled |
100% |
100% |
Scenario 2 is the same kind of analysis, but it shows you
how your data may exhibit a different pattern. In this scenario, you can see
that within each multivariate table showing a different SES level, race
differences in participation persist. That is, for each SES grouping, a higher
percentage of whites are high in participation, compared to African Americans.
Therefore, we redraw our model, and show that race exerts a direct impact on
participation, even after controlling for SES. We do not have any direct link
between SES and participation, because when we compare across the three
multivariate tables, we find that SES has no impact on participation. Among
whites, regardless of SES level, 60% of whites are high in participation. Among
blacks, regardless of SES level, 40% of blacks are high in participation. So in
Scenario 2, SES exerts no direct impact on participation.
RACE............................................................>
SES
RACE............................................................>
PARTICIPATION
SCENARIO 3:
BIVARIATE (includes low,
medium, and high SES groups):
White Race |
Black Race |
|
Low Participation |
40% |
70% |
High Participation |
60% |
30% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (Low SES
group only):
White Race |
Black Race |
|
Low Participation |
70% |
80% |
High Participation |
30% |
20% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (Medium SES
group only):
White Race |
Black Race |
|
Low Participation |
50% |
60% |
High Participation |
50% |
40% |
Column % Totalled |
100% |
100% |
MULTIVARIATE (High SES
group only):
White Race |
Black Race |
|
Low Participation |
30% |
40% |
High Participation |
70% |
60% |
Column % Totalled |
100% |
100% |
Scenario 3 is yet another pattern
that the data may form. In this scenario, you can see that both race and SES
affect participation levels. Therefore, your final model retains both of these
linkages- a direct link between SES and participation, and a direct link
between race and participation.
RACE
.........................................> SES
.................................> PARTICIPATION
RACE
.....................................................................................>
PARTICIPATION
We now turn to a real-world
example of multivariate tables that gets to an important personnel management type
of question. A public university out West suffered an enrollment loss because
of the pandemic, so they were forced to fire many of their professors. Since
half of the women professors were fired, while only 22% of the men professors
were fired (see the first bivariate table below), the women professors sued the
university. The university lawyer argued that qualifications based on years of
experience (seniority) was the major factor used in firing faculty, and he
pointed to the third bivariate table below. That tables shows that 70% of low
seniority faculty were fired, while only 10% of high seniority faculty were
fired. So, what factor was more important in deciding who got fired- the gender
of the faculty member, or the faculty member’s seniority?
MODEL OF GENDER AND
SENIORITY AFFECTING JOB SECURITY
GENDER
....................................> SENIORITY
..............................> JOB
GENDER
...........................................................................................>
SECURITY
BIVARIATE: Gender
.......> Job Security
Men |
Women |
|
Fired |
22% (55) |
50% (75) |
Kept Job |
78% (195) |
50% (75) |
100% (250) |
100% (150) |
BIVARIATE: Gender
.......> Seniority
Men |
Women |
|
Low Seniority |
20% (50) |
67% (100) |
High Seniority |
80% (200) |
33% (50) |
100% (250) |
100% (150) |
BIVARIATE: Seniority
.......> Job Security
Low Seniority |
High Seniority |
|
Fired |
70% (105) |
10% (25) |
Kept Job |
30% (45) |
90% (225) |
100% (150) |
100% (250) |
To answer which of the two independent variables is most
important in explaining the dependent variable, we can break up our sample into
four groups defined by gender and seniority levels. In short, control for
seniority to examine the relationship between sex and job security. Looking at
the two multivariate tables below, does sex have a direct impact on job security?
No. If you had a low seniority, you were 70% likely to be fired, regardless of
whether you were a man or a woman.
Now compare across the two multivariate tables to see if
seniority has a direct effect on job security. Does it? Yes. Fully 70% of low
seniority men were fired, compared to only 10% of high seniority men. So
seniority affected the job security of men. For women, 70% of low seniority
women were fired, compared to only 10% of high seniority women. So seniority
also affected the job security of women.
MULTIVARIATE (Low
Seniority Group Only):
Men |
Women |
|
Fired |
70% (35) |
70% (70) |
Kept Job |
30% (15) |
30% (30) |
100% (50) |
100% (100) |
MULTIVARIATE (High
Seniority Group Only):
Men |
Women |
|
Fired |
10% (20) |
10% (5) |
Kept Job |
90% (180) |
90% (45) |
100% (200) |
100% (50) |
So,
we now redraw our final model, and show that seniority has a direct effect on
job security, and that gender does not have a direct effect on job security.
GENDER
................... > SENIORITY ........................> JOB SECURITY
So, the university lawyer is celebrating his
expected win in court, and breaks open the champagne bottle. But the lady
lawyer for the faculty women responds that the university is still at fault,
because of the direct link between gender and seniority. That is, women are
more likely to have lower seniority than men, because the university
historically discriminated against women in the hiring and promotion process.
She points to the second bivariate table above that shows that 80% of the men
were high in seniority, compared to only 33% of women who were high in
seniority. Woops! The university lawyer decides to settle out of court. Some of
the women faculty get their jobs back, and the university changes its hiring
and promotion practices to eliminate any discrimination between the sexes. Though
this is a hypothetical example with artificial data, it does show how the
statistical analyses that you are learning in this class are job relevant.
Indeed, past students have even gotten jobs as analysts for state governments
and private corporations.
Now, let’s turn to a question
that we asked on my test a few years ago. See if you can answer all of these
questions, before we go over this in class.
TEST QUESTION. (25 points) Please
study the following three bivariate tables and four multivariate tables, and
answer each of the lettered questions. This info is drawn from the 2010-2014
Mississippi Polls. A 5% difference or higher constitutes statistical
significance. They bear on the model:
MARITAL
STATUS
(married)
Age
SEAT BELT USAGE
(old)
(high)
(Note: age is the earliest, outside variable; marital status is the intervening variable; seat belt usage is the latest, dependent variable. There are three arrowed lines- age affects marital status; marital status affects seat belt usage; age affects seat belt usage. The categories that go together are old, married, high usage)
BIVARIATE RELATION BETWEEN AGE
AND SEAT BELT USAGE:
SEAT BELT
USAGE |
18-30 |
31-64 |
65 and older |
Always |
69% |
80% |
85% |
Sometimes |
31% |
20% |
15% |
N Size |
(351) |
(745) |
(243) |
A.
What is the direction of the relationship
between age and seat belt usage?
BIVARIATE RELATION BETWEEN AGE
AND MARITAL STATUS:
EDUCATION
LEVEL |
18-30 |
31-64 |
65 and older |
Married |
18% |
64% |
56% |
Not Married |
82% |
36% |
44% |
N Size |
(348) |
(752) |
(254) |
B.
What is the direction of the relationship
between age and marital status?
BIVARIATE RELATION BETWEEN MARITAL
STATUS AND SEAT BELT USAGE:
Married |
Not Married |
|
Always |
82% |
74% |
Sometimes |
18% |
26% |
N Size |
(692) |
(667) |
C.
What is the direction of the relationship
between marital status and seat belt usage?
RELATIONSHIP BETWEEN AGE AND SEAT
BELT USAGE, AMONG THE MARRIED
SEAT BELT
USAGE |
18-30 |
31-64 |
65 and older |
Always |
78% |
80% |
88% |
Sometimes |
22% |
20% |
12% |
N Size |
(64) |
(477) |
(139) |
D.
Among those who are married, is there any
relationship between age and seat belt usage?
Yes, or no? If so, what is the direction of the relationship?
RELATIONSHIP BETWEEN AGE AND SEAT
BELT USAGE, AMONG THOSE NOT MARRIED
PC ACCESS |
18-30 |
31-64 |
65 and older |
Always |
67% |
79% |
81% |
Sometimes |
33% |
21% |
19% |
N Size |
(284) |
(267) |
(104) |
E.
Among those who are unmarried, is there any
relationship between age and seat belt usage?
Yes, or no? If so, what is the direction of the relationship?
RELATIONSHIP
BETWEEN MARITAL STATUS AND SEAT BELT USAGE, AMONG ADULTS 18-30
SEAT BELT
USAGE |
Married |
Not Married |
Always |
78% |
67% |
Sometimes |
22% |
33% |
N Size |
(64) |
(284) |
F.
Among adults 18-30 years old, is there any
relationship between marital status and seat belt usage? Yes, or no? If so, what is the direction of
the relationship?
RELATIONSHIP BETWEEN MARITAL
STATUS AND SEAT BELT USAGE, AMONG ADULTS 65 AND OLDER
SEAT BELT
USAGE |
Married |
Not Married |
Always |
88% |
81% |
Sometimes |
12% |
19% |
N Size |
(139) |
(104) |
G.
Among adults 65 and older, is there any
relationship between marital status and seat belt usage? Yes, or no? If so, what is the direction of
the relationship?
H.
So what factor has some effect on seat belt
usage? Is it age, marital status, or both? Just circle the correct response.