WEEK 12:

EXPERIMENTAL, QUASI EXPERIMENTAL DESIGNS

 

The Classical Experimental Design is really kind of cool. It is supposed to be the most scientific and accurate in terms of proving whether an independent variable causes a dependent variable. We use the classical experimental design to test the safety and effectiveness of new drugs. For example, when they developed a vaccine for coronavirus, they had to determine whether this one variable (the vaccine, which is called a Stimulus) had an effect on the dependent variable (how severe your illness was when you got the disease, measured by hospitalization or death). They needed two groups of people who were equal in every other respect (overall health, age, sex, race, whatever). Their health was tested in a pre-test administered to both groups. Obviously, if the groups were equal, their pre-test scores should be equal. One of these groups gets the vaccine (experimental group), and other does not (they get a placebo, fake pill). After some time, both groups get tested once again. If the stimulus (vaccine) had an effect, that would be reflected in the post-test scores of the experimental group having lower hospitalization and death rates than the control group. That is the process that we normally use to test drugs. Politically, though, it is difficult to tell people that they are in the control group and will not receive the “wonder drug”, plus sometimes you have an emergency and go right to human testing rather than use animal testing first, for example. So the classical experimental design is not used as much as it should be used in studies of public policy, such as in educational policy and welfare-workfare programs.

 

Classical Experimental Design:

Pre-test ---------------> Stimulus ----------------> Post-test
Experimental Group

Pre-test ---------------------------------------------> Post-test
Control Group

 

Both groups must be equal in composition. Ensure equality by: matching; or random assignment. You can match on what characteristics you think are important, such as race, sex, age. Random assignment of subjects to the two groups is cool, since you may have left out an important variable (such as overall good health), and random selection is like a poll. You should get a representative sample, if you have a large enough sample. If you have a small sample, you might get an unrepresentative situation regarding one variable; so you can start with random assignment, but then match for those characteristics that give you an unrepresentative group.

A public policy example of a classical experimental design would be a university wishing to improve student learning in American Government. They randomly assign students to two different American Government classes, and check to ensure that the characteristics of the students are identical in the two groups. They decide to implement a more innovative teaching technique for the experimental group (the stimulus) than the standard in-class lecture for the control group. Indeed, part of the stimulus may be distance learning, computer simulations, games, whatever.

There are 11 Internal Invalidity problems that can cause you to draw incorrect conclusions about the effectiveness of your stimulus, if you look at only the experimental group. For example, maybe you are teaching the Government class during the 9-11 terror attack (or a pandemic, or an insurrection). Students might be so stunned that they really become more interested and knowledgeable about politics and government. If the pre and post tests were tests on their political knowledge, their scores in both groups would go up over time. If you just had the experimental group, you’d think that the scores went up because of the innovative teaching technique. In reality, their environment outside of the classroom affected their test scores. The innovative teaching technique had no beneficial impact. This is called the problem of History. This is why you need the control group. History would affect both of the groups, so both of their test scores would go up because of History. If the experimental group's scores went up even more than the control group's, that additional increase would be because of the stimulus.

Another internal invalidity problem is Maturation. That is, students are now on their own, living apart from their parents. They grow up, and become more interested in politics which impacts their own lives. So maturation alone causes their test scores to go up. So, you need the control group, because the groups are both equal, so Maturation should cause both of the groups’ scores to go up. Any additional increase in the experimental group’s scores is because of the stimulus.

An example of the Testing effect problem is when people may be humiliated by their poor scores on the pre-test. So taking the test itself may cause them to study more, and to do better later on. The stimulus may have no impact on them. So you need the control group because both of the groups will be affected by testing. Again, for the stimulus to be effective, the test scores should increase more in the experimental group than in the control group.

An example of Instrumentation is when the teacher feels bad that the students did so poorly on the pre-test. So she decides to make the post-test easier. Well, duh! What do you think will happen to the post-tests? They’ll go up. If you just have the experimental group, you’ll conclude that the stimulus had an effect. Woops. They didn’t learn anything; the stimulus actually didn’t have an effect. In the classical experimental design, we can determine this because we gave the same, easier post-test to the control group, and their scores went up just as much. So, you need the control group to determine the effect of the stimulus. In this case, you may also want to come up with a bank of 100 questions, and draw up 50 beforehand for the pre-test and 50 for the post-test, and make sure that the two tests are equal in difficulty. In any event, just make sure that both groups get the same test.

Statistical regression to the mean. In evaluating public programs for needy people, we often deal with very poor, very unemployed folks who are at a low in their lives. It’s kind of like taking a really hard, unfamiliar class. You may have one really low grade. But it doesn’t reflect how you usually perform. What happens when we test you again? Wow, your score goes up. In an example of coming up with a job training program for the hard core unemployed, if we only have the experimental group, regression to the mean would predict that their situation would be better over time, even if the new program had no effect. So we again need the control group. And make sure that both groups are equal, and that they are both hard-core unemployed types.

Selection biases. Don’t let people choose what group they want to be a part of. The most motivated students might volunteer for the innovative teaching technique group. The two groups would then no longer be equal in composition. The scores of the experimental group may go up more, simply because the students are more motivated. In reality, the stimulus had no effect. So, use matching or random assignment of students to the two groups, to ensure that the two groups in equal in composition.

Experimental mortality. The test scores in the experimental group are higher in the post-test than in the pre-test, so you think that the stimulus had an effect. But some of the weaker students dropped the class. You’re only left with the more motivated students, who had higher pre-tests than the other students to begin with. So watch for who leaves the experimental (and control) group. You may have to recalculate the pre-test scores for only those who have stayed in the group throughout. Plus, you may have to weight the scores of the groups to ensure that the two groups are comparable.

So, we have a control group, equal in composition to the experimental group. And we don’t have experimental mortality, or we have corrected for it. Great! But wait, there’s more, as Mr. Wonderful on Shark Tank would say. Students in the experimental group love their class so much that they brag about it to the students in the control group, and those students start playing the same games or whatever. Now, their scores go up as well. The worn-out experimenter would see scores going up equally in both groups, and would incorrectly conclude that the stimulus had no effect. In reality, it had a great effect. So, you have to prevent this Diffusion or Imitation effect. Don’t permit any communication between the groups. How can you do this? Maybe have the two groups at two different branches of your university in different cities. But would the groups be equal? Students at our Meridian campus may be non-traditional students. So, yet another thing for the researcher to worry about.

Compensatory Rivalry. Ever see the movie Rocky, about the unknown boxer who didn’t have a prayer? He fought harder, to make up for his disadvantage. If people know they are in the Control group, they may similarly be inspired and challenged to just work harder. Their scores go up, simply because of their knowledge that they are in the control group. The scores in the experimental group go up because the stimulus had a positive effect, but the scores also go up in the control group because of Compensatory Rivalry. We draw the incorrect conclusion that the stimulus had no effect, because the scores went up in both groups. So we have to ensure that the subjects of the experiment are not aware of which group they are in.

Demoralization effect. The reverse could happen. The medical patients, for example, might learn that they are not getting the Wonder Drug (vaccine). They get demoralized. I’m going to die, they think! Their scores go down, because their emotional state hurts their health, and because they know that they are in the control group. But because the scores did not change in the experimental group, you incorrectly conclude that the stimulus had some positive effect. So you have to ensure that the subjects do not know what group they are in.

So for both compensatory rivalry and demoralization, we have to ensure that people do not know what group they are in. So in the medical experiment, the control group is given a placebo. Also, the doctors and nurses interacting with the patients cannot know what group the patient is in. Otherwise, they might give more TLC to the control group, because they feel sorry for them. That better bedside manner might cause their test scores to go up. In short, if they know that their patients are in the control group, they are providing Compensation to them. That compensation is actually like a second stimulus. It can cause their scores to go up. So don’t give compensation. Don’t even let health care professionals know what group they are working with.

So, to sum up. Internal invalidity problems are inferences (conclusions) drawn that are not an accurate reflection of what actually happened. There are 11 internal invalidity problems:

 

Well, the researcher has gone through all of that work, and now she has something else to worry about. Can she generalize the results of her experiment to the entire population that she cared about? This is called External Invalidity problems.

For example, political scientists did an experimental study of college students, and gave the two groups descriptions of political candidates to vote for that differed only with respect to the race or sex of the candidate. They found that the candidate’s race or sex had no effect on the students’ vote preferences. They concluded, great, racism and sexism are dead in America. Are they right? Well, no. First, they have the problem of Sample Bias. College students are younger, and they are more educated, than the average American. Both groups are more tolerant or liberal on racial and gender issues than other people. So you can’t really generalize to the population. If would be better to draw a sample from the general population, and do the classical experimental design on them.

A second problem is that it is an Artificial Experiment. The subjects know that they are not really voting for someone. Try to make the experiment more realistic. Maybe impossible. I used Unobtrusive Measures (rather than experimental design) to determine racism-sexism in a university setting on one occasion years ago. I successfully urged the hiring of the first (and only) woman Dean of Arts and Sciences, who was the best qualified candidate. Most department heads treated her like any other Dean, but one was intimidated and seemed to bow to her and rush to shake her hand at the interviews (no virus back then). Another head was flustered, and kept interrupting her at department meetings, so she told him to shut up, and finally fired him. I also successfully urged the hiring of the first African American political science department head, who was also the best qualified candidate. In that situation all of the faculty treated him just like any department head, they talked only about his qualifications, and they unanimously urged his hiring. So, very realistic situation, since these administrators were their bosses.

Long-lasting effect is a third problem with external invalidity. Even if you can generalize to the population, does it have a long-lasting effect, or is it merely temporary. If you wanted to increase civics knowledge in the general population, and you did the classical experimental design, and post-test scores were higher, would they stay higher. Or would they drop down over time to what they were initially. That is why multiple post-tests can help. Sometimes, the scores will drop down over time, but they will still be higher than they were in the pre-test. So the stimulus did have some effect, but not as great as you thought. This is a case with this coronavirus vaccine, which is found to be effective for only about 6 months; therefore, people need booster shots.

I won’t even mention the 4th problem, which is too complex to explain, and we’re running out of time.

To summarize: External Invalidity Problems are an inability to generalize to a population. There are 3 external invalidity problems that we talked about:

(Source Note: internal and external invalidity problems derived from Donald T. Campbell and Julian C. Stanley's Experimental and Quasi-Experimental Designs for Research, Houghton Mifflin Co., 1963, pages 5-6.)

The Classical Experimental Design is strong on internal validity, but weak on external validity

 

Quasi-experimental designs are only moderate on internal validity, since they are natural-occurring experiments, and people cannot be randomly assigned to the groups.

Two major types of quasi-experiments:

1) Time Series Design- multiple pre-tests before stimulus; multiple post-tests after stimulus; no control group. This design’s problem is a failure to control for numerous threats to the internal validity of the quasi-experiment.

2) Control Series Design- two time series, one for experimental group, one for control group. You must have groups that are as comparable as possible. This design controls for many internal validity problems.

Example, you want to save lives on our highways by greatly increasing the fine for not using seat belts. You use a Time Series Design, and look at the highway fatality rate per miles driven in the three years before the new state law increasing fines. Then, over the next three years, you look at the highway fatality rate. So you have three pre-tests, and three post-tests. If the rates go down in the post-tests, you say that the stimulus (the new law) had an effect.

But, what if other things were going on in the environment that were also causing fatality rates to go down. Maybe the cars are being built better, with more safety devices. Maybe there is more police enforcement. Maybe the weather is less rainy, or people are becoming more considerate. Whatever. These other factors cause fatality rates to go down.

So a Control Series Design is an even better design. Find a state whose composition is similar to Mississippi. How about Alabama? They would also be affected by many of these other factors. But they did not have the law change. So you have an experimental and control group. Did the fatality rate go down more in Mississippi than in Alabama? If so, the stimulus had an effect.