Updated January 5, 2021. Also, print out the two learning modules on Statistical Inference and on Regression, Experimental, or Other Notes.

TOPIC ONE: INTRODUCTION TO THE COURSE

Mention the research paper, the in-class tests, why the poll is not being conducted this year, and the need to attend every class. Call the roll, and learn a little about each student.

TOPIC TWO: SCIENCE AND THE HISTORY OF THE DISCIPLINE OF POLITICAL SCIENCE

THE HISTORY OF THE DISCIPLINE OF POLITICAL SCIENCE

Philosophical orientation, ask the "ought", how should things be, asks what justice is, who should rule (the wise or the multitude), what are the obligations of citizens and of government.

Traditional approach, focus on institutional process, how a bill becomes a law, the structure of the government, a legalistic case-study approach, a nation is seen as unitary and as a rational actor, very descriptive approach, very historical method.

The Traditionalist approach to analysis combines the classical and institutional eras.

Transitional era- 1900 to 1945. Problems of irony of form, pluralism exists.

1) Science, theory, predictions, explanation, patterns:
Examples:
a) Theories of presidential voting behavior: sociological theory of voting, such as race and income affecting voter's vote choice; social-psychological University of Michigan model using party identification, issues, and candidate evaluations; simple satisfaction versus dissatisfaction predicting vote for presidential party's candidate.
b) Southern state legislative groups: white Republicans are fairly conservative; African-American Democrats are fairly liberal; and white Democrats are essentially centrist or moderate, depending on the issues.
c) Mass-elite study, which party organization is closer to the average voter on issues? In 1991 and 2001 Alabama-Mississippi study, it was Democrats, though their organization has moved to the left over the years. A contrary study examined why Republicans now control a majority of U.S. House and Senate seats in the South; it found that Democratic congress members and Senators had steadily moved ideologically to the left since 1970, suggesting that Democratic "elites" in today's South may have become too liberal for many white southern voters.

2) Data gathering and research are theory directed
Examples:
a) For presidential voting behavior, national survey of voters was conducted, asking their party identification, their attitudes on public issues, and their likes and dislikes of the major party candidates; such a national study would also ask voters' their race and income, and whether they were financially satisfied or dissatisfied.
b) For southern state legislative factions, we identified legislators' party from their websites, their race from their pictures, and their roll call votes from newspapers' reports.
c) For mass-elite study, we conducted mail surveys of Democratic and Republican county executive committee members in Mississippi and Alabama, and statewide telephone polls of average adults in both states. We asked identically worded questions on about twenty different public policy issues, including ideological self-identification and party identification. In the contrary study, we used ADA and ACA ideological pressure groups over the past forty years to determine the liberal or conservative voting behavior of Southern Democratic U.S. House and Senate members.

3) Value free
Examples:
a) We simply seek to predict and explain how the political world operates, we do not let our own opinions about how it should operate influence our research. Hence, though conservatives may claim that Reagan won in 1980 because of his conservative philosophy, and liberals may claim that Obama won in 2008 because of his liberal philosophy, our research may indicate that each won merely because voters were dissatisfied with the economic recessions and financial crises (and in 1980 the foreign policy crises).
b) A researcher may be a disillusioned liberal who believes that African-American lawmakers are isolated from all other lawmakers, but the data may show that white Democrats often vote with them on education, race, and election issues, and that Republican lawmakers lose on more roll call votes than do black Democrats (if Democrats control the legislature numerically, which Democrats did in the South until the turn of the century).
c) A researcher may be a conservative Republican who has friends in the state Republican party headquarters, but the data may show that average voters are essentially moderate, that Democratic party organization members in the South until the turn of the century were moderate liberal, while Republican party organization members were conservative. Hence, especially on education and health care issues, Democratic party members were closer to average southern voters than were Republicans, at least until the turn of the century. A researcher may be a liberal, but the data would show how southern U.S. House and Senators from the South had more and more liberal voting records as the decades passed from 1970 to 2010; hence, today's Democratic elites are too liberal for many conservative white southern voters.

4) Interdisciplinary- sociology, psychology, economics
Examples:
a) The earlier American presidential election studies of the 1940s relied heavily on sociology, proposing that group membership affected the party voted for outside of the South. Rurality, Protestantism, and higher income predicted more Republican votes, while urban residence, Catholicism, and lower income predicted more Democratic votes.
b) My study of Balance Theory drew on psychology. People tend to acquire and retain psychologically consistent beliefs and attitudes. If a person likes a candidate, they tend to believe that the candidate agrees with their own positions on issues regardless of whether the candidate actually does; if a voter dislikes a candidate, they tend to believe that they are in disagreement with the candidate on the issue.
c) Another psychological theory is Social Judgment Theory, which proposes that people not only have most preferred political viewpoints, but that they also have latitudes of acceptance for views that are close to their own, and latitudes of rejection for views opposite to their own. Those having intense views have larger latitudes of rejection, so they reject more viewpoints unlike their own. People also engage in assimilation of views falling in their latitude of acceptance, and contrast of others' views falling in their latitude of rejection. Contrast means that they misperceive those persons' positions to be even further from their own views than they are in reality. Hence, the big ideological split between the two political parties today is made worse by ideologues in each party demonizing their opponents.
d) Shaffer worked with economics professor (Chressanthis) in studying whether U.S. Senate election margins were accountable to the public, which they were in an indirect sense. Elections were affected by presidential coattails, campaign spending, divisive primaries, and preceding election margin. Economic conditions in the state and federal pork barrel dollars did not affect the elections.

5) Methodological sophistication-
Examples:
a) We conduct national public opinion polls that are representative of the nation's diversity. We do not conduct shopping mall polls, or phone-in or internet polls that fail to reflect the views of lower socioeconomic classes. So we can test whether the sociological group, social-psychological, or economic models of presidential voting are upheld. In yet another published study, I used such national polls from 1960 thru 1976 to explain how voter turnout declined due to decreased political efficacy, decreased partisan intensity, and decreased newspaper readership.
b) The southern state legislative factions research started with one southern state, Mississippi, in only a few years. We expanded to a twenty year time frame in Mississippi. Then, we added other southern states, like Georgia, Florida, Arkansas, and Texas.
c) My mass-elite linkage study started with just Mississippi in one year, but then I contacted Pat Cotter at the University of Alabama and we had a second state for confirmation. We also did the study originally in 1991, and then repeated it in 2001. In 2001 we also examined all 11 southern states with a national opinion poll.
d) Shaffer's study of balance theory relied on the 1994-1996 American national panel study to examine cognition change over time; panel studies follow the same people over time.
e) Shaffer and Chressanthis study of Senate accountability used pooled time-series, cross-sectional approach. All even-numbered years from 1976 thru 1986 were included, as were all 33 state contests in each election year. Regression and probit were used.

6) Individual and group level of analysis
a) The presidential voting studies used the individual voter as the unit of analysis.
b) The southern state legislative factions also looked at individuals (legislators in this case), but they combined them into three groups based on their race and party.
c) Balance theory and voter turnout studies also looked at individuals.
d) Mass-elite study looked at individuals of different types, the mass voter versus the elite party member.

Criticisms of Behavioralism- are people and events predictable, can we be value free; discuss

Remind students about the need to choose a topic for the research paper. Remind them that they need to choose four variables from the Mississippi Poll, and that those four variables must all be asked in the same year(s). Also, they should try to choose two or three adjacent years, in order to increase the sample size and reduce the sample error. A nice summary of the questions asked in the Mississippi Poll, and what years they are asked in is available on-line.

Most papers will have one outside "early" variable, such as a demographic characteristic, two intervening variables (in the middle), and one dependent variable. See this sample research paper available here. The model's visual picture is messed up, due to computer problems.

Some papers will have two outside variables, one intervening variable, and one dependent variable. See this other sample research paper.

TOPIC THREE: ETHICAL CONCERNS

Anonymity versus Confidentiality-
Anonymity- no one can identify a person with their responses
Confidentiality- researcher knows who the respondent is, but promises not to tell anyone

Examples of informed consent:
1) Mississippi Poll
2) NSF Grassroots Party Activists cover letter

One must never harm subjects.
MSU Human Subjects form approval
Subpoena problem, so if confidential data convert into anonymous data as soon as possible

Studies having ethics problems:
1) MSU literacy study, when suspected interviewer fraud results in Attorney General request for respondent info
2) Ray Cleere's IHL university workplace study included identifiable questions and political questions, and MSU dropped out of it
3) NSF Grassroots Party Activists study- ICPSR deleted county and state variables

Political biases are a major problem in funded research:
1) Media sensationalism- 1982 Clarion-Ledger Senate poll; 1988 Lott-Dowdy Senate race controversy.
2) Official suppression of studies they disagree with- Mabus governmental child care study suppressed by Fordice administration

ASPA Code of Ethics: 5 sources of ethics
1) Serve Public Interest: oppose discrimination and harassment, promote affirmative action; public right to know; involve citizens in decisionmaking
2) Respect Law and Constitution: change obsolete, counterproductive laws; prevent mismanagement of public funds, need audits; protect privileged information; whistleblower protect
3) Personal Integrity: give others credit for their work-plagiarism; avoid appearance of conflict-of-interest, such as nepotism, gift acceptance, misusing public resources, improper outside employment; act nonpartisan in actions; admit own errors
4) Ethical Organizations: promote creativity, open communication among workers; permit dissent, no reprisal, due process used; merit use
5) Professional Excellence: keep current on new issues, problems, upgrade professional competence; professional associations active; help public service students, like internships provide

TOPIC FOUR: MSU IRB REQUIREMENTS

TOPIC FIVE: THEORY BUILDING

1. Explanation- why does something happen
Examples:
a) Presidential voting models. People vote Democratic because they psychologically identify with the Democratic party, because they are liberal, and because they prefer the Democratic presidential candidate's characteristics. Or, people hold the President's party responsible for economic conditions in the country, so they tend to vote for the President or his party's successor when things are going well, and they tend to vote against him or his party's successor when things are going badly.
b) Southern state legislative factions. White conservatives are gravitating toward the more conservative party nationally, the Republicans, therefore white Republican legislators tend to vote conservatively. Liberal African-Americans tend to join the more liberal party nationally, the Democrats, so African-American Democratic legislators tend to vote liberally. Moderate whites tend to join the more ideologically inclusive party in the modern South, so they tend to be Democrats; hence, white Democratic legislators tend to vote moderately.

2) Prediction- if we know people's positions on the independent variables, we can predict their positions on the dependent variables.
Examples:
a) In presidential vote model, if a voter is a Democrat, a liberal, and prefers the Democratic candidate's attributes, we predict that they would vote for the Democratic presidential candidate. If a voter is a Republican, a conservative, and prefers the Republican candidate's attributes, we predict they would vote for the Republican presidential candidate.
b) In the southern state legislative project, we predict that African-American Democratic legislators will tend to vote more liberally, against anti-crime measures, for public education projects, and for affirmative action programs. We predict that white Republican legislators will tend to vote in the opposite manner, in a conservative direction. We also predict that white Democrats will tend to vote somewhere in between these two groups, being supportive of pro-education and anti-crime measures.
c) Clinton impeachment vote was very partisan in committee. In House Judiciary Committee, conservative white male Republicans opposed demographically diverse liberal Democrats. Click here for info about the Judiciary Committee members.

3) Generalizability- does theory apply to different situations and circumstances and time and geographic areas
Examples:
a) Presidential vote model. Can apply to other offices, such as U.S. Congress, governor, state legislature. Applies to any time span; 19th century would have different parties though (Whigs and Democrats, Federalists and Democratic-Republicans). Can apply to different geographic areas, such as other nations (Ohio State professor Bradley Richardson used party identification model in Japan, Netherlands, Germany, France, Britain, Italy).
b) Southern state legislative factions project. Can be generalized to other southern states, even to northern states and the Congress, as the literature indicates. Can be generalized over time, such as 1980 to present. Can it be generalized to other nations having a newly empowered group, such as South Africa?

4) Parsimony- simple with few independent variables, simplest theory is best if everything else is equal
Examples:
a) Presidential Vote models. Social-psychological model is parsimonious, as has only three predictors--party identification, issues, candidates. The economic dissatisfaction model has even fewer predictors--one.
b) Southern state legislative factions project. It has only two predictors--party and race of legislator. The dependent variable is less parsimonious, as it is not merely ideology, but different types of issues such as education, crime, race issues.

The party identification model. The five presidential elections from 1992 thru 2008 were very competitive with Democrats winning three and Republicans winning two. So if we had no other information about a state like Mississippi, we would predict that a Mississippi survey respondent would have a 50-50 chance of voting Democratic or Republican. Our predictive success improves once we ask a respondent what their party identification (a 7-point scale, which we recode into 5 categories) is. How they vote follows (using the Mississippi Poll data):

We can then apply this theory to the most recent presidential election of 2012. The results were very similar to previous years (independent leaners are omitted in this analysis):

Independent variable is the predictor; it comes first temporally and causally, it causes the dependent variable.

Dependent variable is the effect, it is being caused by the independent variable.

Ideology --------------------------> Presidential Vote
(Independent var.).......................(Dependent Variable)

Example: self-identified conservatives are more likely to vote Republican, compared to self-identified liberals.

Hypothesis test- example with crosstabulations, put independent variable at top, dependent variable at the side. Calculate column percents.

VOTE FOR:	LIBERAL	MODERATE	CONSERVATIVE
BARACK OBAMA	65%	54%	32%
JOHN MCCAIN	35%	46%	68%
	100%	100%	100%

..............Concept <------------------------> Concept
.......................(Relationship between concepts)

............Indicator <------------------------> Indicator
..............(Relationship between indicators; hypothesis testing)

Operationalizing your concept is to select specific indicators of your abstract concepts. Hypothesis testing occurs at the indicator level, and it measures the relationship between the indicators.

At the theoretical level, the two principal concepts are Social Deprivation and Religiosity. The principal hypothesis at the theoretical level is that people who are socially deprived are more likely to be intensely religious than are people who are not socially deprived.

Operationalizing the concepts is to choose valid, specific indicators of those concepts. One indicator of religiosity might be frequency of church attendance. An indicator of social deprivation might be annual family income before taxes. The major problem with operationalizing one's concepts is whether the indicators are valid measures of those theoretical concepts. Is a person who attends church twice a week necessarily more religious than someone who never attends church, but who reads the Bible and prays daily? Is a person with a large family income, but who also has a large family size, necessarily well-off financially? Can you think of more valid indicators of these concepts of social deprivation and religiosity?

Hypothesis Testing measures the relationship between the indicators. Are people with low family incomes more likely to attend church weekly, compared to people with high family incomes? Are people with lower net financial worth more likely to pray daily, compared to people with high net financial worth? If your hypothesis is rejected, there may be two reasons. Perhaps your theory is rejected, or perhaps your indicators are not valid measures of your concepts.

Using the 2004-2010 Mississippi Poll, no significant relationship was found between reported family income and reported frequency of church attendance.

TOPIC FIVE CONTINUED: INTRODUCTION TO RESEARCH PAPER, MODEL, HYPOTHESES

1) Introduction- discuss the importance of your subject. Discuss your initial expectations. Example of gender gap in presidential voting--why are women voting slightly more Democratic than are men? Why is this subject important? Why do you think this female Democratic bias is occurring?

2) Your model and hypotheses. List all five of your hypotheses, and draw your model.

Example of a model and its hypotheses:
Assume that sex is the earliest, independent variable; presidential vote is the latest, dependent variable; ideology and income are the two intervening variables located between sex and vote.

SEX........(H1).......> Ideology .....(H2).....> PRESIDENTIAL
Male or...................(H3)..............................> VOTE
Female.....(H4)........> Income ......(H5)........> (D or R)

The hypotheses are:
H1: Women are more likely to be liberal, compared to men.
H2: Liberals are more likely to vote Democratic for President, compared to conservatives.
H3: Women are more likely to vote Democratic for President, compared to men.
H4: Women are more likely to have lower incomes, compared to men.
H5: Lower income people are more likely to vote Democratic for president, compared to higher income people.

3) Literature review. Need at least 10 academic sources. The articles should be grouped by hypothesis, even if you must discuss the same article more than once. For my on-line bibliography of articles since 1975 in four political science journals, click here. When in the internet, click on EDIT at top of page, then click on FIND (ON THIS PAGE), and then type in the keyword in the FIND WHAT box. Keep clicking on the FIND NEXT box to find multiple articles. Also, use different keywords for each of your variables (concepts).

4) Methods section. Provide information for each of the years of the Mississippi Poll that you are using. For information about the questions included in the polls, click here. Information on the sampling methods used in each year is provided here. Three sample paragraphs for your paper follow:

To test my model, I used information drawn from The Mississippi Poll project, a series of statewide public opinion polls conducted by the Survey Research Unit of the Social Science Research Center (SSRC) at Mississippi State University and directed by political science professor Stephen D. Shaffer. In order to maximize my sample size and therefore minimize my sample error, I combined or pooled telephone surveys conducted in two years-- 2000 and 2004. The 2000 Mississippi Poll surveyed 613 adult Mississippi residents from April 3 to April 16, 2000 and had a response rate of 49%, while the 2004 Mississippi Poll surveyed 523 adult Mississippi residents from April 5 to April 21, 2004 for a response rate of 48%. The two years combined contained only 765 likely voters- respondents whose responses to three questionnaire items indicated that they were likely to vote in the presidential election, and to vote for candidates of the two major parties. With 765 likely voters interviewed, the sample error is 3.6%, which means that if every Mississippi likely voter had been interviewed, the results could differ from those reported here by as much as 3.6%. The pooled sample was adjusted or weighted by demographic characteristics to ensure that social groups less likely to answer the surveys or to own telephones were also represented in the sample in rough proportion to their presence in the state population. In both years, a random sampling technique was used to select the households and each individual within the household to be interviewed, and no substitutions were permitted. The SSRC's Computer Assisted Telephone Interviewing System (CATI) was used to collect the data.

I relied on four variables included in both years of the Mississippi Poll. Sex is very straightforward, while income was measured by reported total family income before taxes in the year before each survey. The presidential vote asked respondents six months before the election which of the two major party candidates they planned to vote for if the election were held today. Ideology was a self-identification question, asking respondents the following questions: "What about your political beliefs? Do you consider yourself very liberal, somewhat liberal, moderate or middle of the road, somewhat conservative, or very conservative?"

In order to have enough people to analyze using multivariate tables, I recoded or combined categories of two of the variables. Eight income categories were recoded into three levels--low income was defined as families making less than $20,000 a year, middle income was considered as $20-40,000 per year, and high income included families making over $40,000 annually. Five ideological self-identification categories were combined into three groups-- liberals included those considering themselves as "very" or "somewhat" liberal, conservatives were those identifying themselves as "somewhat" or "very" conservative, and the middle category of "moderate/middle of the road" constituted an intermediate "moderate" grouping. Sex and presidential vote already had only two categories for each, so they did not have to be recoded.

5) Findings-- bivariate. Test each of your 5 hypotheses using crosstabs. Compare percentages using complete sentences, which test your hypotheses. Mention the direction of the relationship, the magnitude of the relation using gamma or average percentage difference, and statistical significance level using chi-squared. Also, draw all tables and provide variable and value labels, and column percents and totals.

Gamma = -.04
Chi-squared > .05
Note: Percentages total 100% down each column.
Source: 2000 and 2004 Mississippi Polls, conducted by Mississippi State University.

Hypothesis 3 of my model states that women will be more likely to vote Democratic for president, compared to men. In the 2000 and 2004 Mississippi Polls, 43% of women indicated that they intended to vote for Democratic presidential candidates, compared to a slightly smaller 41% of men who indicated an intended Democratic vote. However, this percentage difference in Democratic vote between the sexes is only 2%, and the gamma value reflecting the magnitude of the relationship between sex and the presidential vote is a mere -.04. Furthermore, the Chi-squared statistic is not significant at the .05 level, indicating that we cannot generalize this weak relationship between sex and the presidential vote, found in the 2000 and 2004 statewide polls, to the entire population. Hence, my hypothesis that women are more likely than men to vote Democratic for president is rejected.

6) Findings- multivariate. At least control for your two intervening variables. Provide information listed in 5. What do these multivariate tables tell you about which of the variables is important in influencing the dependent variable, and about how important each is.

7) Conclusions- Redraw your model, discuss your findings and literature, suggestions for future research.

8) References- alphabetize your references by authors' last name. Give full citations for scholarly articles, books, and other citations.

Review the Mississippi Poll codebook, and choose four variables that will constitute the model that you will do your research paper on. Now, draw up the model, and type the exact wording of your five hypotheses.

The methods used in each year of the Mississippi Poll, which constitute part of the Methods section of your paper, are found here.

TOPIC SIX: RESEARCH DESIGN

1) Problem Formulation- what are you studying, why is it important. Rivenbark article, casino gambling, importance due to tax regressivity, which hurts the poor, plus it can cause addiction.

2) Literature review- thorough. Political science journals are: American Political Science Review, American Journal of Political Science, Journal of Politics, American Politics Quarterly, Public Opinion Quarterly. For a list of on-line political science articles, click here. Another great tool for your literature review where you can search for studies on your hypotheses is at jstor.

3) Identify Unit of Analysis- what are you collecting data on, getting information about what units.

The four units of analysis are: Individual, county, state, nation
a) Individual level examples are public opinion polls.
b) County level example is a public policy study examining spending in each of Mississippi's 82 counties.
c) State level example is a public policy study examining spending in each of the nation's 50 states.
d) Nation unit of analysis example may be relating each of the world's nation's suicide rate to its absence of Catholicism in its population.

Test your ability to identify the unit of analysis of ten different studies by going back to the directory for this class, and accessing one of the sample tests for Test 1 for the similar Research Methods class. They are also here.

4) Design data collection mode- survey, roll call, aggregate (unit of analysis above the individual), content analysis:
a) Survey is a public opinion survey. It can be of the mass population, or of a more specialized group, such as government workers.
b) Roll call mode deals with congressional or state legislative votes on public issues, and often includes demographic characteristics of their districts' constituents.
c) Aggregate mode deals with a level of analysis higher than the individual. It deals with cases that combine numbers of individuals, such as counties, states, etc. The data are often secondary data analysis, collected by government agencies.
d) Content analysis is a study of the characteristics of messages, such as how ideologically biased is the mass media, and how many liberal or conservative themes are voiced by a President or governor

5) Pre-test survey anticipates validity problems with indicators, and suggests variables you left out. For a statewide public opinion poll of 600 Mississippians who are asked 100 questions, you might ask a random sample of 25 Starkville residents the 100 questions, and then ask the interviewers whether the respondents had difficulty answering any of the questions, and if so why.

6) Data collection, surveys use CATI system, or secondary data analysis (use existing dataset).
CATI stands for Computer-Assisted Telephone Interviewing system, and is used for the researcher to collect her own data on an original study.
Secondary data analysis relies on existing data sources, such as the University of Michigan National Election Studies conducted every two years, or the MSU Mississippi Poll conducted every two years.

7) Data reduction, usually obsolete with CATI, often needed with in-person and mail surveys; enter data into SPSS program.

8) Design statistical analysis technique, do a simple one first such as crosstabs.

10) Conclusions- what you found, so what, importance, theory upheld or rejected, future research directions.

TOPIC SEVEN: LEVELS OF MEASUREMENT

NOMINAL- lowest level of measurement, mere classification. No ability to order the categories.
Examples are religion. Use crosstabulations.

ORDINAL- able to order the categories of the variable in terms of a category having more of something than the next category. But can't determine how much more of that quality that the category has compared to the other category.
Example is rating job performance of public officials into excellent, good, fair, or poor categories.

INTERVAL- able to order the categories, and also determine how much of the quality the category has. Usually has numbers that have meaning to denote how much of the quality each category has.
Example is income. Use regression techniques.

Test your ability to classify indicators by nominal, ordinal, and interval levels of measurement by turning to the sample tests, test 1. Click here.

TOPIC EIGHT: RELIABILITY AND VALIDITY

RELIABILITY

Definition- repeated measurements of a concept (the indicator) should yield similar results.

1) Test-Retest- using the same indicator on the same people at two or more time points. Should have consistent responses at both time points.

(Note: the following table is derived from Herbert B. Asher's Presidential Elections and American Politics, 5th edition, page 71; Brooks/Cole co., 1992)

How much stability is there in this table? How many people have given the same response at both time points? Count the number of people in the diagonal. The number remaining stable in attitudes = (9 + 13 + 4 + 5 + 5 + 7 + 6) = 49. The total number of people in the table is 100. Hence, 49% of the sample has remained stable in attitudes. Is 49% high or low reliability? The stable percent must be compared to chance alone. Chance stability is the number of stable cells, divided by the total number of cells in the table. Hence, chance stability is 7 / 49 = 14%. Since 49% is significantly higher than 14%, this indicator is reliable.

Conduct a test-retest reliability test for the party identification indicator, as measured at the individual level in a panel between 1982 and 1997. (Source of this info is: Political Behavior of the American Electorate, 12th edition, by William H. Flanigan and Nancy Zingale, p. 104; data originally are from the Youth-Parent Socialization Panel Study, 1965-1997, Youth Wave, data provided by the ICPSR). First, construct a cross-tabulation that shows how many people (of 100 total) fell into each of 9 cells in a table; the two years are 1982 and 1997; this party identification indicator has only three categories- Democrats, Independents, and Republicans. The cell entries follow:

How many people kept the same partisanship at both times points? Add up 23 + 27 + 17 = 67. What percentage of stability do you have? 67/100 = 67%. What is chance stability? 3 stable cells / 9 total cells = 33%. Is actual stability significantly greater than chance alone? Yes, since 67% is much higher than 33%. So the party identification indicator remains a reliable indicator.

2) Alternate Forms (Parallel Forms)- using two or more indicators on the same people at one time point. Should have consistent responses for both indicators.

Consistent responses for both indicators are Democrats who believe that the Democratic party is best for people like themselves, Republicans who believe that the Republican party is best for people like themselves, and Independents who believe that both parties are equally good for people like themselves. The number of consistent responses is (172 + 40 + 157) = 369.

The total number of people is 519. The percentage of people who give consistent responses is:

369 / 519 = 71%. How reliable is the party identification indicator compared to chance alone. Chance is the number of consistent cells divided by the total number of cells: 3 / 9 = 33%. Since 71% is significantly greater than 33%, the party identification indicator is reliable.

3) Split Half- using multiple indicators of a concept on the same people at one time point. Forms two scales with each combining people's responses on half of the indicators. The two scales' scores should be consistent for people.

Health care example. In 2004 the Mississippi Poll included seven questions about how important people thought a number of health care issues were, and they rated them from scores of 1 for Very Important to scores of 4 for Not Important. An item on Recruiting and Retaining Doctors were not highly related to the other six items, so we excluded it from analysis. The other six items were:

These six indicators were divided into two groups: Group A included items 1, 3, and 5; and Group B included items 2, 4, and 6. Responses to all three items in each group were added together. Since each item was coded to range from a 1 to 4, the scale for each group ranges from a 3 to a 12. The Pearson correlation between the two scales is a .71, which is pretty respectable.

Another way of testing consistency is with a crosstabulation. Looking at the frequency distributions of each scale, I combined each scale's codes as follows: 3 and 4 were coded as High Priority; 5 and 6 were coded as Medium; 7 thru 12 were coded as Low Priority. The crosstabulation follows:

Notice that 292 people (141 + 104 + 47) gave consistent responses to both of the scales. They fall in the diagonal, being high-high, medium-medium, or low-low. The total number of people in the table is 458. Therefore, 292/458 people gave consistent responses, or 64% of the sample. Chance alone would predict about one-third or 33%. So the six indicators of the importance of health care demonstrate some reliability.

4) Cronbach's Alpha- used for multi-indicator indexes, calculates how reliable the component indicators are. Ranges from 0 for unreliable to 1 for most reliable. The Cronbach's Alpha for the six health care items included in the 2004 Mississippi Poll analysis discussed earlier was .80.

VALIDITY

1) Face Validity- on its face, it appears to be valid. Simple concepts, such as a ruler. Just use it.

2) Construct (Criterion) Validity- relate your questionable indicator to more well established indicators, and see whether it behaves as you expect it to behave.

Note: Cell entries are percentage vote for Republican candidate among each of the seven party identification categories. These data are from the Mississippi Poll.

Our expectations are that the percentage Republican vote would increase steadily as one moves from the most Democratic party identification category of Strong Democrat to the most Republican party identification category of Strong Republican. Examine the 1988 presidential vote indicator, we see a steady increase in Republican vote as we move from Strong Dem. to Strong Rep. with two exceptions. Only 87% of Weak Republicans voted for Republican Bush, while 94% of Independent Republicans voted for Bush. Those two categories should have reversed percentages, so circle both of those cells, since they involve validity problems with the party identification indicator. Examine the 1996 presidential vote and you find two sets of validity problems among Democrats and Republicans. Circle the four cells having validity problems.

Repeat this validity test for the other vote indicators, and discuss the validity problems with the party identification indicator that you find.

3) Convergent-Discriminant Validity Test- different measures of the same concept should yield similar results; the same measures of different concepts should yield different results. Examine correlation matrix.

Note: data are based on the 1981-1999 Mississippi Poll, with some fictitious data included to simplify table interpretation.

Convergent-discriminant validity tests help to determine if your multiple indicators of one concept are actually measuring only one concept, or whether your indicators are measuring more than one concept (a multi-dimensional concept). Generate a correlation matrix as indicated above, and remember that the correlations range from 0 for no relationship to 1 for highest relationship. Then, pick out the highest correlations in order of their size. In the above table, the validity test shows that spending is a multi-dimensional concept involving four separate dimensions (concepts). Those dimensions are: social welfare (poor, day care, health), education (elementary-secondary and college), economic development (industry, tourism), and public order (police, prisons). The environment and highways indicators do not relate to any of these four, above the .2 correlation level. Hence, any researcher combining all eleven spending indicators into one scale that supposedly measures one concept of public support for government programs has validity problems, since there are four dimensions rather than one dimension of state spending.

In this updated correlation matrix, note that only two dimensions emerge, and that three spending items are unrelated to both dimensions. The highest correlations are between health care and poverty spending (.40) and between health care and universities (.42). Elementary/secondary and universities are correlated at .31. The three other correlations between these four spending items range from .26 to .30 in value. These items of elementary-secondary, universities, health care, and poverty spending form one dimension. The second dimension is tourism and industry, which are correlated at .36. The three spending items that are uncorrelated with these two dimensions are police and highways, where the correlations with other spending items never exceed .18, and the environment (correlations never exceed .23). Unlike ten years ago, people appear to see the relevance of education for social welfare programs, in that people with a better education are less likely to need social welfare programs. Also note that we no longer ask the prisons spending item, and we did not ask the day care spending item in the early 2000s.

Update with 2004, 2006, and 2008 data. The correlation matrix (not shown) again shows only two dimensions. One is a social welfare-education dimension with the poor, health care, day care (item is now being asked again), elementary and secondary education, and higher education spending items. The second dimension is tourism and industry. Other items like police and roads are uncorrelated with any of these items (though environmental spending has a slight correlation with the items in the first dimension). Also note that ideological self-identification is correlated with the first dimension items, but not the second dimension items.

Sample test question on convergent-discriminant validity test:

Conduct a convergent-discriminant validity test with the 10 state government spending items listed below, as measured in the most recent pooled Mississippi Poll dataset from 2004, 2006, 2008, and 2010.

CORRELATION MATRIX, SPENDING ITEMS, 2004-2010

Poor

Health Care

Elem. Second.

Schools

Univer-

sities

Day Care

Environ-

ment

Tourism

Industry

Roads

Poor

Health Care

.45

Elem. Second.

Schools

.32

.31

Univer-

Sities

.23

.41

.39

Day Care

.37

.44

.33

.35

Environ-

Ment

.32

.27

.23

.21

.28

Tourism

.01

.06

.11

.10

.12

.08

Industry

.11

.19

.10

.19

.09

.16

.23

Roads

.17

.15

.17

.21

.13

.15

.17

Police

.13

.16

.13

.18

.17

.13

.15

.21

How many dimensions do you get? I'd say, three. What do they pertain to? I'd say- social welfare(poor, health, schools, universities, day care, environment), economic development (tourism, industry), and public safety (roads, police). What are the intra-cluster item correlations? For social welfare, take the average of the 15 correlations among these 6 items; this value is .491/15 = .33. For economic development, it is .23. For public safety, it is .21. The inter-cluster correlations are: social welfare-economic development is (.132/12) .11. For social welfare-public safety, the inter-cluster correlation is (.193/12) .16. For economic development-public safety, the inter-cluster correlation is (.15 + .17 + .13 + .15 = .60/4) .15.

4) Factor Analysis- can be used as a validity test for testing whether a concept is multi-dimensional.

2004 health care example. The six relevant items were subjected to a Principal Components Factor Analysis with Varimax Rotation. Only 457 of the 523 respondents were analyzed, since others lacked responses on one or more of the six items. Thus, 13% of the respondents were excluded from this factor analysis. Only one factor emerged, explaining 51% of the variance in all six items. Other factors explained less of the variance than each item did, so they were dropped from the analysis. The factor loadings for each item ranged from a low of .66 for public education to encourage nutrition and exercise to a high of .78 for providing health care for adults who can't afford it.

These results suggest that it is valid to combine these six health care importance indicators into one scale measuring one dimension. If we had included the third health care item on the importance of recruiting and retaining doctors in Mississippi, we would have still ended up with one dimension, but the loading of that item on the factor was only .47, clearly the lowest of the factor loadings. This suggests that that item does not measure the one dimension very well, so we excluded it from the scale.

TOPIC NINE: SURVEY RESEARCH--SAMPLING AND SURVEY TYPES

(Source of table: Survey Research Methods, by Earl R. Babbie, Wadsworth Publishing Co., 1973, page 376)

Types of Surveys: In-person; Telephone; Mail; Mixed Methods; briefly discuss each.

For further information, see Mail and Telephone Surveys, by Don Dillman, John Wiley and Sons Co, 1978.

PROS AND CONS OF SURVEY TYPES

In-person-- pros:
1) Observe and clear up R's confusion
2) Obtain objective information about R's (respondent) lifestyle
3) Visual Aids use
4) Establish rapport? High response rate?

Telephone-- pros:
1) Quick
2) Cost effective
3) Centralized interviewing- no fraud
4) Interviewer safety

Telephone-- cons:
1) Excludes those without telephones or owning only cell phones
2) No visual aids-- voice dependent

Mail-- cons:
1) Excludes illiterates
2) Can't control who answers survey
3) Can't control order of questions answered
4) Slow
5) Incomplete forms
6) Low response rate?

Probability Sampling. Definition of probability sample: each population unit has some chance of being in the sample, and that chance can be calculated. Types of probability samples:

Sampling within the household:
1) Kish method, ask household resident to list first names of all adults, then toss dice to select adult to interview;
2) Carter-Trodahl method: multiple selection tables asking number of adults and number of men in household;
3) Sociological last birthday method; problem that it oversamples women.

A nice example of the results of weighting your sample, which also shows the growing problem of cell phone use among young adults, is provided on-line.

By 2014 the Mississippi poll was using a dual frame of sampling of both land lines and cell phones, so the sampling bias against young adults was minimized. See the analysis of that year's sample.

TOPIC TEN: SURVEY RESEARCH-QUESTIONNAIRE CONSTRUCTION, IMPLEMENTATION

ACTUAL EXAMPLES:
(From Survey Research for Public Administration, by David H. Folz, Sage Publishers)

1) Perceptions of local problems- p. 5, 22, 107
A) No problem, Minor Problem, Major Problem
B) Most serious problem
C) Agree-disagree with problem statements

3) Policy preferences- p. 5, 22
A) Single most important change
B) How improve quality of life- not important, somewhat important, very important
C) One policy- oppose or favor, strong or some.

4) Funding priorities- p. 5, 22
A) Single choice, reduce funding first
B) City spending- too little, about right, too much

6) Citizen usage satisfaction- p. 8
A) Filter question, did they use service?
B) Satisfied or dissatisfied, very or somewhat
C) How often policy met expectations

7) Business usage satisfaction- p. 6
A) Survey gov't workers about complaints heard
B) Survey businesses about specific problems, Overall satisfaction

8) Wording problems- p. 99
A) Loaded or leading
B) Double barreled
C) Too complex, double negative (Miss Poll)
D) Unbalanced alternatives (Blacks treated same as whites or worse)
E) Acquiescence bias (agreement bias)- especially on agree-disagree items
F) Sensitive items- use income categories
G) Social desirability- race items

Review the most recent results from The Mississippi Poll, which also examines attitude changes over the past few decades.

TOPIC ELEVEN: REVIEW OF MATERIAL FOR FIRST EXAM

Sample outline of first test from one year ago. Each question is ten points. THIS YEAR'S SAMPLE TO BE POSTED LATER.

TOPIC TWELVE: DESCRIPTIVE STATISTICS

DISPERSION- diversity, how divided or united the cases are, the form of the distribution (interval level)

Identify the mode and median categories in each of the following examples drawn from the 2010 Mississippi Poll:

Punishment favored in cases of first-degree murder:
Death penalty............... = 51%
Life without parole......... = 42%
A shorter jail term than life = 7%

How rate President Obama's job performance:
Excellent = 14%
Good .... = 24%
Fair .... = 23%
Poor .... = 39%

Ideological self-identification:
Very Liberal......... = 6%
Somewhat Liberal..... = 8%
Moderate............. = 34%
Somewhat Conservative = 26%
Very Conservative.... = 26%

Education Level:
High School Dropout. = 23%
High School Graduate = 30%
Some College........ = 29%
College Graduate.... = 13%
Some Graduate Work.. = 5%

Annual Family Income
Under $10,000 = 14%
$10-20,000... = 11%
$20-30,000... = 14%
$30-40,000... = 16%
$40-50,000... = 10%
$50-60,000... = 6%
$60-70,000... = 8%
Over $70,000. = 21%

Likelihood of Living in the Current Community in Five Years:
Definitely No. = 8%
Probably No... = 13%
Probably Yes.. = 30%
Definitely Yes = 49%

Population of the Community You Live In:
Farm or ranch = 11%
Rural area... = 30%
Under 2,500.. = 12%
2,500-10,000. = 18%
10,000-50,000 = 22%
Over 50,000.. = 7%

Now identify the mode and median for the more recent results in the 2014 Mississippi Poll.

MEANS- What follows is a verbal interpretation of means, using the ideological self-identification and ideological perception questions.

Question wording: "What about your political beliefs? Do you consider yourself: very liberal, somewhat liberal, moderate or middle of the road, somewhat conservative, or very conservative?" Question wordings: "Please label the following political figures as very liberal, somewhat liberal, moderate (or middle of the road), somewhat conservative, or very conservative." "Democratic Presidential hopeful Hillary Clinton." "Democratic Presidential hopeful Barack Obama." "Republican Presidential hopeful John McCain." Ideological perception questions were not asked for the U.S. senate candidates. However, such questions were asked in previous years' polls for Musgrove, for when he was lieutenant governor (1998) and governor (2000, 2002 polls). For comparison purposes, we also include the perceptions of previous Democratic presidential candidates, asked in previous Mississippi polls.

The values below are "means" or averages for the ideological variables, all of which are coded as 1 for very liberal, 2 for somewhat liberal, 3 for moderate, 4 for somewhat conservative, and 5 for very conservative.

Now identify the means in words of the perceived ideologies of candidates in the more recent Mississippi polls:

RANGE is distance between extreme categories. It requires an interval level measurement. Thus, merely subtract the lowest number representing the category at one end of the indicator from the highest number representing the category at the other end of the indicator. Examples follow:

A test of your knowledge of VARIANCE. Remember the example of Mississippi's party organization members in 2001. The mean for Democrats was 2.69, which was between somewhat liberal and moderate, but closer to moderate. The mean for Republicans was 4.45, which was between somewhat conservative and very conservative, but closer to somewhat conservative. However, remember the form of the distribution. Nearly 10% of Democrats were very conservative, and almost 20% were somewhat conservative, so there was considerable diversity or dispersion of ideologies in the Democratic party. Therefore, the variance of Democrats' ideology scores was a relatively higher number, a variance of 1.351. For Republicans on the other hand, less than 2% of them were very liberal or somewhat liberal. So there was much unity and clustering of ideological scores for the Republicans, and little diversity or dispersion of scores. Therefore, the variance of Republicans' ideology scores was a relatively low number, a variance of .493. Therefore, Democrats were more divided in ideology (a higher variance), and Republicans were more united on ideology (a lower variance).

Test Question 4A. (5 points) The following two questions are based on the last three Mississippi Polls, all conducted in the 21st century. Using the statistic of variance, are Democrats or Republicans most divided on each of the following five variables:

Test Question 4B. (5 points) Using the statistic of variance, are whites or blacks most united on each of the following five variables:

TOPIC THIRTEEN: CONTINGENCY TABLES- BIVARIATE RELATIONS

Contingency tables can be used with nominal level measures, though we usually employ ordinal or interval level data having a limited number of categories. Contingency tables permit you to view the data in an easily interpretable and understood manner.

Percentage Difference is a measure of strength of the relationship. It ranges from a low of 0 to a high of 100. Always put the independent variable at the top of the table, and the dependent variable at the side. Then, calculate the column percentages. For ordinal and interval level indicators, compare the column percents (for the two extreme categories of the predictor) across the same category of your dependent variable. Make this comparison for the two extreme categories of your dependent variable, and take the average. If one of these comparisons is contrary to your hypothesis, make the difference a negative.

Other Measures of Association to use (Source: Research Methods in Political Science: An Introduction Using MicroCase, 2nd edition, by Michael Corbett; p. 139-144; copyrighted by MicroCase Corporation):

All measures range from 0 for no relationship to 1 for perfect relationship. A positive or negative sign is a function of the direction of the coding of the variables and whether your hypothesis is upheld.

The following are nine examples of bivariate tables. In class, we will review three features of each table. 1) Is the relationship statistically significant? Is Chi-squared significant at the .05 level or below? 2) What is the magnitude of the relationship? That is, what is the gamma value. To determine the relative importance of the predictors-- which predictor is most and least important-- use the absolute value of the gamma, and ignore the sign. 3) What is the direction of the relationship? That is, devise a hypothesis for each table that reflects how the two variables are related. Example for table 1: People younger in age are more likely to favor spending more on health care, compared to people older in age.
Note: The tables in your research paper should look like these tables in format.

Table 1

Age Differences in State Spending Preferences for Health Care

�� AGE

STATE SPENDING DESIRED:	18-35	36-55	56 and Over
Less	10%	7%	8%
Same	18%	18%	34%
More	72%	75%	58%
N Size	(555)	(571)	(524)

Gamma = -.16
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 2

Income Differences in State Spending Preferences for Health Care

�� FAMILY�� INCOME

STATE SPENDING DESIRED:	< $20,000	$20-40,000	$40-60,000	> $60,000
Less	10%	4%	7%	10%
Same	13%	17%	30%	36%
More	77%	79%	63%	54%
N Size	(365)	(363)	(222)	(333)

Gamma = -.28
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 3

Ideological Differences in State Spending Preferences for Health Care

�� SELF-IDENTIFIED IDEOLOGY

STATE SPENDING DESIRED:	Liberal	Moderate	Conservative
Less	3%	6%	12%
Same	15%	17%	31%
More	82%	77%	57%
N Size	(262)	(495)	(808)

Gamma = -.41
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 4

Race Differences in State Spending Preferences for Health Care

�� RACE

STATE SPENDING DESIRED:	White	African-American
Less	10%	3%
Same	31%	10%
More	59%	87%
N Size	(1050)	(555)

Gamma = .63
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 5

Sex Differences in State Spending Preferences for Health Care

�� SEX

STATE SPENDING DESIRED:	Men	Women
Less	12%	5%
Same	27%	20%
More	61%	75%
N Size	(772)	(889)

Gamma = .33
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 6

Income Differences in Having Access to a Personal Computer

�� FAMILY�� INCOME

HAVE ACCESS TO A PC?	< $20,000	$20-40,000	$40-60,000	> $60,000
Yes	54%	67%	85%	94%
No	46%	33%	15%	6%
N Size	(370)	(368)	(232)	(341)

Gamma = -.59
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 7

Race Differences in Having Access to a Personal Computer

�� RACE

HAVE ACCESS TO A PC?	White	African-American
Yes	74%	69%
No	26%	31%
N Size	(1084)	(560)

Gamma = .12
Chi-squared significance < .05
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 8

Sex Differences in Having Access to a Personal Computer

�� SEX

HAVE ACCESS TO A PC?	Men	Women
Yes	74%	70%
No	26%	30%
N Size	(790)	(910)

Gamma = .10
Chi-squared significance < .06; Not Significant at .05 level.
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Table 9

Age Differences in Having Access to a Personal Computer

�� AGE

HAVE ACCESS TO A PC?	18-35	36-55	56 and Over
Yes	82%	79%	55%
No	18%	21%	45%
N Size	(564)	(585)	(538)

Gamma = .41
Chi-squared significance < .001
Note: Cell entries total 100% down each column.
Source: 2006, 2008, 2010 Mississippi Poll.

Now repeat the SPSS analyses for these bivariable tables for the two most recent polls- the combined 2012 and 2014 samples. Do this assignment in class.

TOPIC FOURTEEN: MULTIVARIATE CONTINGENCY TABLES

Multivariate crosstabulations:
Multivariate analysis involves one dependent variable and more than one independent variable (predictor).

Controlling- multivariate tables always permit you to examine the relationship between a predictor and a dependent variable, after taking into effect the impact of a second predictor.

For example, African-Americans tend to have a lower turnout than whites. A possible control variable is socioeconomic status (SES). Perhaps African-Americans have a lower average turnout than whites because of the lower socioeconomic status of blacks, and we know that people of all races having a lower SES tend to have lower turnout compared to people of all races having a higher SES. To determine whether a lower SES level explains why African-Americans tend to have lower turnouts than whites we examine: the relationship between race and turnout, controlling for SES. Do whites and blacks of the same SES level have the same turnout level; if so, SES is more important than race in shaping turnout.

Three types of variables that one would control for:
1) Outside variables- a variable that has an effect on one of your predictors and on your dependent variable. Here, race is an outside variable. You would control for it to determine if SES has a direct, causal effect on turnout, or whether the race-turnout effect is spurious. If spurious, then race directly affects or causes SES and turnout, but SES does not have a direct causal effect on turnout.
2) Intervening variable- a variable that is located between a predictor and a dependent variable, and that explains why the "early" predictor is related to the dependent variable. SES is an intervening variable here, as it explains why race is related to turnout.
3) Specifying or Conditional variables- a predictor that changes the relationship between another predictor and the dependent variable. That is, the relationship has a different direction or magnitude for different categories of the specifying variable. If a race gap in turnout exists only among college grads in Mississippi but not among other educational groups, then education is the specifying variable.

Examples of Multivariate Tables (cell entries are completely artificial, non-real data)

RACE....................................> SES ...................................................> PARTICIPATION

RACE .................................................................................................> PARTICIPATION

BIVARIATE (includes low, medium, and high SES groups):

	White Race	Black Race
Low Participation	40%	60%
High Participation	60%	40%
Column % Totalled	100%	100%

MULTIVARIATE (Low SES group only):

	White Race	Black Race
Low Participation	70%	70%
High Participation	30%	30%
Column % Totalled	100%	100%

MULTIVARIATE (Medium SES group only):

	White Race	Black Race
Low Participation	50%	50%
High Participation	50%	50%
Column % Totalled	100%	100%

MULTIVARIATE (High SES group only):

	White Race	Black Race
Low Participation	20%	20%
High Participation	80%	80%
Column % Totalled	100%	100%

RACE ...................................> SES .....................................> PARTICIPATION

BIVARIATE (includes low, medium, and high SES groups):

	White Race	Black Race
Low Participation	40%	60%
High Participation	60%	40%
Column % Totalled	100%	100%

MULTIVARIATE (Low SES group only):

	White Race	Black Race
Low Participation	40%	60%
High Participation	60%	40%
Column % Totalled	100%	100%

MULTIVARIATE (Medium SES group only):

	White Race	Black Race
Low Participation	40%	60%
High Participation	60%	40%
Column % Totalled	100%	100%

MULTIVARIATE (High SES group only):

	White Race	Black Race
Low Participation	40%	60%
High Participation	60%	40%
Column % Totalled	100%	100%

BIVARIATE (includes low, medium, and high SES groups):

	White Race	Black Race
Low Participation	40%	70%
High Participation	60%	30%
Column % Totalled	100%	100%

MULTIVARIATE (Low SES group only):

	White Race	Black Race
Low Participation	70%	80%
High Participation	30%	20%
Column % Totalled	100%	100%

MULTIVARIATE (Medium SES group only):

	White Race	Black Race
Low Participation	50%	60%
High Participation	50%	40%
Column % Totalled	100%	100%

MULTIVARIATE (High SES group only):

	White Race	Black Race
Low Participation	30%	40%
High Participation	70%	60%
Column % Totalled	100%	100%

................................................................(40% multivariate)

RACE .........................................> SES .................................> PARTICIPATION

RACE .....................................................................................> PARTICIPATION

GENDER ....................................> SENIORITY ..............................> JOB

GENDER ...........................................................................................> SECURITY

BIVARIATE: Gender .......> Job Security

	Men	Women
Fired	22% (55)	50% (75)
Kept Job	78% (195)	50% (75)
	100% (250)	100% (150)

BIVARIATE: Gender .......> Seniority

	Men	Women
Low Seniority	20% (50)	67% (100)
High Seniority	80% (200)	33% (50)
	100% (250)	100% (150)

BIVARIATE: Seniority .......> Job Security

	Low Seniority	High Seniority
Fired	70% (105)	10% (25)
Kept Job	30% (45)	90% (225)
	100% (150)	100% (250)

MULTIVARIATE (Low Seniority Group Only):

	Men	Women
Fired	70% (35)	70% (70)
Kept Job	30% (15)	30% (30)
	100% (50)	100% (100)

MULTIVARIATE (High Seniority Group Only):

	Men	Women
Fired	10% (20)	10% (5)
Kept Job	90% (180)	90% (45)
	100% (200)	100% (50)

PARTY ID ....................................> PARDON .... ..............................> 1976 PRESIDENTIAL

PARTY ID ...........................................................................................> VOTE

BIVARIATE:Party Id .......> Presidential Vote

	Democratic Party Id	Republican Party Id
Carter (Dem) Vote	80% (400)	10% (30)
Ford (Rep) Vote	20% (100)	90% (270)
	100% (500)	100% (300)

BIVARIATE: Party Id .......> Attitude toward Ford Pardon of Nixon

	Democratic Party Id	Republican Party Id
For Pardon	10% (50)	83% (250)
Against Pardon	90% (450)	17% (50)
	100% (500)	100% (300)

BIVARIATE: Attitude to Pardon .......> Presidential Vote

	For Pardon	Against Pardon
Carter (Dem) Vote	22% (65)	73% (365)
Ford (Rep) Vote	78% (235)	27% (135)
	100% (300)	100% (500)

MULTIVARIATE (Among Democrats Only)

	For Pardon	Against Pardon
Carter (Dem) Vote	80% (40)	80% (360)
Ford (Rep) Vote	20% (10)	20% (90)
	100% (50)	100% (450)

MULTIVARIATE (Among Republicans Only)

	For Pardon	Against Pardon
Carter (Dem) Vote	10% (25)	10% (5)
Ford (Rep) Vote	90% (225)	90% (45)
	100% (250)	100% (50)

TOPIC FIFTEEN: REVIEW FOR TEST

TOPIC SIXTEEN: REGRESSION- BIVARIATE AND MULTIPLE REGRESSION

BIVARIATE REGRESSION

This technique finds the best fitting straight line through a set of points. Best fitting is defined by minimizing the sum of squared distances between the points and the regression line.

Equation of line is Y = a + (b * X), where Y is dependent variable, x is independent variable, a is the Y intercept, and b is the slope of the line, or (change in Y)/(change in X)

R² is explained variance, the variance in Y explained by the independent variable's regression line.

Total Variation = sum of squared distances between the mean of Y and each case's Y value

Unexplained Variation (Residual) = sum of squared distances between each case's Y value and each case's predicted Y value (from the regression equation)

Explained Variation = sum of squared distances between each case's predicted Y value and the mean of Y.

b = unstandardized regression coefficient = slope = (change in Y) / (change in X)

Beta = standardized regression coefficient = b * (sd_x/sd_y), where sd means standard deviation. It adjusts for the differing ranges and scales of the variables.

Beta ranges from -1 to +1 with 0 being no relationship between the independent and dependent variables. The sign depends on the direction of the coding of your variables. A +1 or -1 is a perfect relationship. b values have a greater range which is not confined to 1 or -1.

Pearson R is the correlation coefficient. It equals the Beta in the bivariate case only.

R² is the explained variation. It is the predictive ability of your independent variable.

Adjusted R² shrinks the value of R² by penalizing for each additional independent variable, and is statistically preferable to the R².

The F statistic tests the statistical significance of the regression equation as a whole, and must be below .05.

Problem of outlying or deviant cases. Discuss faculty example involving previous senior faculty who have since retired.

You are asked to examine the relationship between years of service since receiving a PhD degree, and nine-month salaries of ten history professors. You need to plot the following points on graph paper, and then calculate the b value (unstandardized regression coefficient value or slope) and the y-intercept, as well as calculate what salary would have to be given to a senior professor with 30 years of service since their PhD was hired from another university, as well as what the starting salary would be (for someone with zero years of service who just got their PhD):

MULTIPLE REGRESSION

Multiple Regression is linear regression applied to more than one independent variable. With two independent variables, the predicted values comprise a plane (instead of a line in the one independent variable case).

b value is the unstandardized regression coefficient, controlling for the effects of all other predictors. It is used to predict the value of the dependent variable from the known values of the independent variables.

b value is also used in making comparisons across subsamples. For example, if an independent variable is more important in affecting the dependent variable among men or among women.

Beta is the standardized regression coefficient, controlling for the effects of all other predictors. It tells the relative importance of the independent variables in influencing the dependent variable. It ranges from 0 to 1, with 1 being most important and 0 being least important. Negative signs reflect the direction of variable's coding.

Multiple r is the correlation between the actual Y value and the predicted Y value from the multiple regression equation.

R² is the variance in the dependent variable explained by all of the independent variables.

Example of a multiple regression equation problem (taken from the 2006-2010 Mississippi Polls).

Predicting who believes they have been racially profiled. This dependent variable is coded 1 for reported being profiled, and 2 for reported not being profiled. The independent variables and their coding follow:

Now repeat this multiple regression analysis of racial profiling reports with the most recent combined 2012-14 polls. Do this in class.

CAUSAL MODELING

Multiple regression provides only the direct effects that independent variables exert on dependent variables. Yet outside variables may also affect the dependent variable by affecting an intervening variable in the model. Hence, an outside variable may exert an indirect effect on the dependent variable.

Total effects of an independent variable are equal to the sum of the direct effect of that variable and all of its indirect effects. Each indirect effect is the product of the effect that that outside variable has on an intervening variable, and the effect that the intervening variable has on the dependent variable.

Causal Modeling procedures.
1) Devise a model that shows temporal-causal ordering of the variables
2) Use multiple regression SPSS program and regress each dependent variable in the model on all of the independent variables that are "earlier" than it is
3) Draw arrows for all statistically significant linkages. Put Betas just above each line.
4) Indirect effects involve multiplying the relevant Betas together
5) Total effect = direct effect + indirect effects

TOPIC SEVENTEEN: STATISTICAL INFERENCE

Statistical inference is our ability to generalize a relationship found in a sample to the entire population from which that sample was drawn. That is, can we infer population characteristics from sample data. If our statistical inference test suggests that in the population the relationship between the two variables is nonrandom, the relationship is said to be statistically significant.

For example, our 2010 Mississippi Poll sampled only 601 adult Mississippians from an adult population of over two million. We found a definite relationship in the sample between gender and seat belt use. 83% of women said they "always" used their seat belts, compared to 76% of men. 12% of men said they "never" or "seldom" used their seat belts, compared to only 5% of women. The magnitude of this relationship between gender and seat belt use was 7%: [(83-76) + (12-5)] / 2. But can we generalize this relationship found in the sample to the entire population? Is there a relationship between gender and seat belt use in the entire population? Statistical inference is the procedure we use to determine if any relationship exists in the entire population.
In this example, the chi-squared (Pearson) is 10.8 with 3 df, and is significant at .05 level. This means that there is only 5 chances in one hundred that no relationship exists in the population; thus, there is a 95% chance that this relationship does exist in the entire population.

1) Chi-squared is for nominal level variables. Hence, it does not provide information about the direction of the relationship, it simply indicates that a relationship exists in the population. Since the value of chi-squared tends to increase as sample size increases, it does not measure the strength of the association between variables.

Chi-squared = summation [ (fo - f e )squared / fe ]
For the expected frequency for each cell, multiply the column total and the row total for that cell, and divide by the table total.
Degrees of freedom equal the number of columns minus 1 multiplied by the number of rows minus 1.
Consult the Chi-squared chart on page 577 of the text.
On the SPSS output, use the Pearson chi-squared, which is the most widely used form.

Warning: chi-squared should not be used if any cell has an expected value less than 1, or if more than 20% of the cells have expected values less than 5.

Example from Berman, Evan M., Public Administration Review, March/April 1997, Vol. 57 Issue 2, pages 105-113, "Dealing with Cynical Citizens" article, table 3, where he examines whether there is a link between the number of strategies that cities use to keep people informed about local government's actions and how much trust they have in city government.

The chi-squared computation for each cell is:
(37-24.5)²/24.5 = 156.25/24.5 = 6.4
(65-77.5)²/77.5 = 156.25/77.5 = 2.0
(36-48.5)²/48.5 = 156.25/48.5 = 3.2
(166-153.5)²/153.5 = 156.25/153.5 = 1.0
Summate these four cell results: 6.4+2+3.2+1 = 12.6

Chi-squared value is 12.6 with 1 degree of freedom. (2-1) * (2-1) = 1 df. Check chart in textbook, this value is significant at .001 level.

2) The t-test is an interval statistic (dependent variable must be interval). It tests the hypothesis that two groups have different means, and that the inter-group difference can be generalized to the population.

Two-sample t-test (SPSS-independent sample) means that each group is considered a sample.

A one-tailed t-test means that your hypothesis has a direction for the relationship. A two-tailed t-test is used to test nondirectional hypotheses. A two-tailed test is stricter, and SPSS does not report a one-tailed test, hence if your results are significant for the 2-tailed test, they will also be significant for the 1-tailed test.

Two statistics are reported-- for two populations having equal variances, or unequal variances.

The t-test is computed using the formula in your textbook.
Degrees of freedom equals the sum of the two sample sizes minus two.

t-value must be larger than table entry to be significant at the specified level. See page 576 of your textbook.

Using SPSS program. Use Compare Means- Independent Samples Statistics Menu. Your Test Variable is your dependent variable, which should be interval level. Your Grouping Variable should be a dichotomous independent variable (recode it, when necessary). Use Levine test, which must be p <= .05 for equal variances; otherwise, use unequal variances row. Cite t-value and 2-tail sig. in papers. Significance Level must be <= .05.

Examining predictors of family income. Family income is an interval data, coded from a low of 1 for under $10,000 to a high of 8 for over $70,000. The following indicates what the average income codes are for pairs of categories of each predictor, as well as what the t-test significance level is. Answer the following two questions: For each predictor, what group has the higher family income; Is the t-test statistically significant for each of the following five predictors (remember, it must be significant at least at the .05 level)?

Now repeat this analysis of demographic differences in income with the most recent poll data from the pooled 2012 and 2014 Mississippi Polls. Do this in class.

TOPIC EIGHTEEN: EXPERIMENTAL DESIGNS

Classical Experimental Design:

Pre-test ---------------> Stimulus ----------------> Post-test
Experimental Group

Pre-test ---------------------------------------------> Post-test
Control Group

Also, both groups must be equal in composition. Ensure equality by: matching; random assignment.

Internal invalidity problems- inferences (conclusions) drawn are not an accurate reflection of what actually happened.

(Note: internal and external invalidity problems derived from Donald T. Campbell and Julian C. Stanley's Experimental and Quasi-Experimental Designs for Research, Houghton Mifflin Co., 1963, pages 5-6.)

Solomon 4 Group Design- use same two groups from the classical experimental design, include two more groups. One having stimulus-posttest only, and another having only the posttest. Must have equal groups, which then assumes equal pre-test scores.

Post-Test Only Design- one experimental and one control group, no pre-tests. Groups must be equal, use randomization

Classical Experimental Design is strong on internal validity, but weak on external validity

TOPIC NINETEEN: QUASI-EXPERIMENTAL DESIGNS

Quasi-experimental designs are only moderate on internal validity, since they are natural-occurring experiments, and people cannot be randomly assigned to the groups.

1) Time Series Design- multiple pre-tests before stimulus; multiple post-tests after stimulus; no control group. Failure to control for numerous threats to internal validity of quasi-experiment.

2) Control Series Design- two time series, one for experimental group, one for control group. Must have groups that are as comparable as possible. Controls for many internal validity problems.

Correlational Design- extensive social science research, such as survey research.
This is a post-test only design, with statistical controls used to simulate experimental and control group. However, random assignment is not used to create groups.

One shot case study is a pre-experiment weak on both internal and external validity. It consists of a stimulus and a post-test.

TOPIC TWENTY: PANEL STUDIES

Problems with cross-sectional surveys that gather data at only one time point:
1) Inability to study change
2) Hard to make recursive causal assumptions

Panel design definition: the same people, asked the same questions, at two or more time points. Each time point is called a wave.

Examples of panel studies:
1) National election studies panels of 1956-58-60, of 1972-74-76, and of 1992, 1994, and 1996. The second set was able to study the effects of Watergate. Major findings of these panels is that party and issue attitudes affect each other in a reciprocal sense. Also, efficacy affects participation (turnout, campaigning), and participation affects external efficacy.
2) 1980, 4 wave U.S. national election study. It examined the effects of campaigns on voters. Major finding was that Carter lost because of dissatisfaction and failed leadership perceptions, not because of ideological issues.
3) The M. Kent Jennings panel of high school seniors and their parents. Wave 1 was in 1965, wave 2 in 1973, and wave 3 in 1982. Subject was socialization and persistence of attitudes over time. Major finding was that political attitudes (including partisanship) tend to stabilize around age 30.

TOPIC TWENTY-ONE: AGGREGATE DATA (ECOLOGICAL FALLACY)

Ecological fallacy is the incorrect assumption that relationships existing at the aggregate level also exist at the individual level.

Example of religion and presidential vote in the 1940s. Two tables showing individual level relations and aggregate marginal results.

First example from 1990 census- foreign born and college degrees aggregate relationship
STATE.....% FOREIGN BORN.....% COLLEGE DEGREE
Mass...................9%......................20%
N.H....................5%......................18%
Vermont................4%......................19%
N.Y...................14%......................18%
N.J...................10%......................18%
Alab...................1%......................12%
Ark....................1%......................11%
La.....................2%......................14%
Miss...................1%......................12%
Ga.....................2%......................15%
S.C.................2%......................13%

The above table suggests that the foreign born are more likely to have college degrees than are U.S.-born adults. Such a conclusion would be committing the ecological fallacy. In reality, the data are merely indicating that states (not people) with a higher percentage of foreign born residents are also states that happen to have a population that contains a greater percentage of college educated adults, compared to states with a lower percentage of foreign born residents. The relationship between foreign born and education is a spurious (non-causal one); states with well-funded education systems tend to be located in the Northeast and Midwest, and those are the same states where many immigrants settle.

Second example from 2010 census- % black and % Republican presidential vote at state level of analysis
STATE.....% BLACK.....% REPUBLICAN PRES. VOTE IN 2008
Alabama........26%.............60%
Arkansas.......15%.............59%
Georgia........31%.............52%
Miss...........37%.............56%
Iowa............3%.............45%
Minn............5%.............44%
Penn............11%.............44%
Wash............4%.............41%
Wisc............6%.............42%

The above table suggests that African-Americans are more likely to vote Republican for President than are whites. Such a conclusion would be committing the ecological fallacy, since the table provides aggregate data, not individual-level data. The table in reality is merely showing that states having a high percentage of African-Americans are also states that just happen to be more likely to vote Republican for President, compared to states having a lower percentage of African-Americans. The relationship between race and vote at the state unit of analysis is a spurious, non-causal one. African-Americans merely happen to be concentrated in southern states, since such states historically relied on slavery on large plantations, and southern whites tend to be more conservative politically than are whites in the north.

TOPIC TWENTY-TWO: UNOBTRUSIVE MEASURES AND CONTENT ANALYSIS

Problems with obtrusive measures: (derived from the book Unobtrusive Measures: Nonreactive Research in the Social Sciences, by Eugene Webb, Donald T. Campbell, Richard D. Schwartz, and Lee Sechrest; Rand McNally Co., 1972)
1) Guinea Pig or Testing Effect: subjects may feel that they must leave a good impression, or test may make them interested in subject.
2) Role Selection: nonrepresentative role selected, especially by less educated and those less familiar with subject of test.
3) Response Sets, such as acquiescence bias, response category similar wordings: example of Mississippi Poll, state program spending items.
4) Interviewer Effect: race, age, and sex of interviewer may affect responses.

Unobtrusive Measures directly remove the researcher from the research setting.

Types of Unobtrusive Measures (Source: Research Methods in the Social Sciences, 5th edition, by Chava Frankfort-Nachmias and David Nachmias; St. Martin's, 1996, pages 315-324)

	Male Sex	Female Sex
Gore or Kerry (D) Vote	41%	43%
George Bush Jr. (R) Vote	59%	57%
N Size	(359)	(406)

1972 Party Id	Strong Dem.	Weak Dem.	Indep. Dem.	Pure Indep.	Indep. Rep.	Weak Rep.	Strong Rep.
Strong Dem.	9	4	1	0	0	0	0
Weak Dem.	5	13	3	2	1	1	0
Indep. Dem.	2	3	4	1	1	0	0
Pure Indep.	1	1	2	5	2	1	0
Indep. Rep.	1	0	1	3	5	2	1
Weak Rep.	0	1	0	1	3	7	2
Strong Rep.	0	0	0	0	1	4	6

SAMPLE SIZE	50/50	60/40	70/30	80/20	90/10
100	10	9.8	9.2	8	6
200	7.1	6.9	6.5	5.7	4.2
300	5.8	5.7	5.3	4.6	3.5
400	5	4.9	4.6	4	3
500	4.5	4.4	4.1	3.6	2.7
600	4.1	4	3.7	3.3	2.4
700	3.8	3.7	3.5	3	2.3
800	3.5	3.5	3.2	2.8	2.1
900	3.3	3.3	3.1	2.7	2
1000	3.2	3.1	2.9	2.5	1.9
1100	3	3	2.8	2.4	1.8
1200	2.9	2.8	2.6	2.3	1.7
1300	2.8	2.7	2.5	2.2	1.7
1400	2.7	2.6	2.4	2.1	1.6
1500	2.6	2.5	2.4	2.1	1.5
1600	2.5	2.4	2.3	2	1.5
1700	2.4	2.4	2.2	1.9	1.5
1800	2.4	2.3	2.2	1.9	1.4
1900	2.3	2.2	2.1	1.8	1.4
2000	2.2	2.2	2	1.8	1.3

	Few strategies	Some or Many Strategies	Row N Sizes
Trust Low	37 (50.7%)	65 (28.1%)	102
Medium or High Trust	36 (49.3%)	166 (71.9%)	202
Column N	73	231	304

	Few Strategies	Some or Many Strategies	Row N Sizes
Trust Low	24.5	77.5	102
Medium or High Trust	48.5	153.5	202
Column N	73	231	304

POLITICAL ANALYSIS CLASS NOTES

TOPIC ONE: INTRODUCTION TO THE COURSE

TOPIC TWO: SCIENCE AND THE HISTORY OF THE DISCIPLINE OF POLITICAL SCIENCE

THE HISTORY OF THE DISCIPLINE OF POLITICAL SCIENCE

TOPIC THREE: ETHICAL CONCERNS

TOPIC FOUR: MSU IRB REQUIREMENTS

TOPIC FIVE: THEORY BUILDING

TOPIC FIVE CONTINUED: INTRODUCTION TO RESEARCH PAPER, MODEL, HYPOTHESES

TOPIC SIX: RESEARCH DESIGN

TOPIC SEVEN: LEVELS OF MEASUREMENT

TOPIC EIGHT: RELIABILITY AND VALIDITY

RELIABILITY

VALIDITY

TOPIC NINE: SURVEY RESEARCH--SAMPLING AND SURVEY TYPES

PROS AND CONS OF SURVEY TYPES

TOPIC TEN: SURVEY RESEARCH-QUESTIONNAIRE CONSTRUCTION, IMPLEMENTATION

TOPIC ELEVEN: REVIEW OF MATERIAL FOR FIRST EXAM

TOPIC TWELVE: DESCRIPTIVE STATISTICS

TOPIC THIRTEEN: CONTINGENCY TABLES- BIVARIATE RELATIONS

TOPIC FOURTEEN: MULTIVARIATE CONTINGENCY TABLES

Examples of Multivariate Tables (cell entries are completely artificial, non-real data)

TOPIC FIFTEEN: REVIEW FOR TEST

TOPIC SIXTEEN: REGRESSION- BIVARIATE AND MULTIPLE REGRESSION

BIVARIATE REGRESSION

MULTIPLE REGRESSION

CAUSAL MODELING

TOPIC SEVENTEEN: STATISTICAL INFERENCE

TOPIC EIGHTEEN: EXPERIMENTAL DESIGNS

Classical Experimental Design:

TOPIC NINETEEN: QUASI-EXPERIMENTAL DESIGNS

TOPIC TWENTY: PANEL STUDIES

TOPIC TWENTY-ONE: AGGREGATE DATA (ECOLOGICAL FALLACY)

TOPIC TWENTY-TWO: UNOBTRUSIVE MEASURES AND CONTENT ANALYSIS

TOPIC TWENTY-THREE: Students Make Classroom Presentations of Their Research Papers

TOPIC TWENTY-FOUR: REVIEW FOR FINAL EXAM

FINAL NON-CUMULATIVE EXAMINATION

Party that is best for "People like you"	Democratic	Independent	Republican
Democrats	172	51	7
Both are Equal	18	40	29
Republican	6	39	157

Well EstablishedIndicators	Strong Dem	Weak Dem	Indep. Dem.	Pure Indep.	Indep. Rep.	Weak Rep.	Strong Rep.
Pres. Vote
1984-1992	13%	54%	49%	77%	95%	91%	95%
1996-2004	7	32	22	58	87	92	94
1984	15	65	48	90	94	83	91
1988	13	46	52	68	94	87	98
1992	7	49	47	50	100	98	95
1996	7	26	23	45	90	84	92
2000	7	30	30	63	84	97	93
2004	7	40	15	69	86	96	97
2008	18	20	24	72	85	96	86
2012	2	36	0	50	93	93	92
Senate Vote
1984-1994	29%	54%	55%	80%	86%	80%	92%
1984	25	53	44	79	80	63	93
1988	15	40	50	80	83	76	92
1994	50	73	77	80	93	93	94
2014	13	30	40	67	79	97	98

	Day Care	Envir	Health	Industry	Police	Poor	Prisons	Highways	E&S Educ.	Tourism
Day Care	-
Envir	.19	-
Hlth.	.36	.17	-
Indus.	.06	.07	.10	-
Police	.08	.10	.08	.09	-
Poor	.39	.11	.39	.03	.05	-
Prison	.15	.06	.13	.07	.22	.12	-
Highways	.15	.11	.12	.14	.16	.08	.12	-
E&S Educ.	.11	.13	.15	.08	.15	.13	.08	.09	-
Tourism	.07	.10	.02	.25	.15	0	.12	.13	.01	-
University	.14	.12	.15	.14	.10	.18	.07	.13	.33	.07