WEEK 12:
EXPERIMENTAL, QUASI EXPERIMENTAL DESIGNS
The Classical Experimental Design is really kind of
cool. It is supposed to be the most scientific and accurate in terms of proving
whether an independent variable causes a dependent variable. We use the
classical experimental design to test the safety and effectiveness of new
drugs. For example, when they developed a vaccine for coronavirus, they had to
determine whether this one variable (the vaccine, which is called a Stimulus)
had an effect on the dependent variable (how severe your illness was when you
got the disease, measured by hospitalization or death). They needed two groups
of people who were equal in every other respect (overall health, age, sex,
race, whatever). Their health was tested in a pre-test administered to both
groups. Obviously, if the groups were equal, their pre-test scores should be
equal. One of these groups gets the vaccine (experimental group), and other
does not (they get a placebo, fake pill). After some time, both groups get
tested once again. If the stimulus (vaccine) had an effect, that would be
reflected in the post-test scores of the experimental group having lower
hospitalization and death rates than the control group. That is the process
that we normally use to test drugs. Politically, though, it is difficult to
tell people that they are in the control group and will not receive the “wonder
drug”, plus sometimes you have an emergency and go right to human testing
rather than use animal testing first, for example. So the classical
experimental design is not used as much as it should be used in studies of
public policy, such as in educational policy and welfare-workfare programs.
Classical Experimental Design:
Pre-test ---------------> Stimulus
----------------> Post-test
Experimental Group
Pre-test --------------------------------------------->
Post-test
Control Group
Both groups must
be equal in composition.
Ensure equality by: matching; or random assignment. You can match on what
characteristics you think are important, such as race, sex, age. Random
assignment of subjects to the two groups is cool, since you may have left out
an important variable (such as overall good health), and random selection is
like a poll. You should get a representative sample, if you have a large enough
sample. If you have a small sample, you might get an unrepresentative situation
regarding one variable; so you can start with random assignment, but then match
for those characteristics that give you an unrepresentative group.
A public policy example
of a classical experimental design would be a university wishing to improve
student learning in American Government. They randomly assign students to two
different American Government classes, and check to ensure that the
characteristics of the students are identical in the two groups. They decide to
implement a more innovative teaching technique for the experimental group (the
stimulus) than the standard in-class lecture for the control group. Indeed,
part of the stimulus may be distance learning, computer simulations, games,
whatever.
There are 11 Internal
Invalidity problems that can cause you to draw incorrect
conclusions about the effectiveness of your stimulus, if you look at only the
experimental group. For example, maybe you are teaching the Government class
during the 9-11 terror attack (or a pandemic, or an insurrection). Students
might be so stunned that they really become more interested and knowledgeable
about politics and government. If the pre and post tests were tests on their
political knowledge, their scores in both groups would go up over time. If you
just had the experimental group, you’d think that the scores went up because of
the innovative teaching technique. In reality, their environment outside of the
classroom affected their test scores. The innovative teaching technique had no
beneficial impact. This is called the problem of History. This is why you need
the control group. History would affect both of the groups, so both of
their test scores would go up because of History. If the experimental group's
scores went up even more than the control group's, that additional increase would
be because of the stimulus.
Another internal
invalidity problem is Maturation. That is, students are now on their
own, living apart from their parents. They grow up, and become more interested
in politics which impacts their own lives. So maturation alone causes their
test scores to go up. So, you need the control group, because the groups are
both equal, so Maturation should cause both of the groups’ scores to go up. Any
additional increase in the experimental group’s scores is because of the
stimulus.
An example of the Testing
effect problem is when people may be humiliated by their poor scores on the
pre-test. So taking the test itself may cause them to study more, and to do
better later on. The stimulus may have no impact on them. So you need the
control group because both of the groups will be affected by testing. Again,
for the stimulus to be effective, the test scores should increase more in the
experimental group than in the control group.
An example of Instrumentation
is when the teacher feels bad that the students did so poorly on the
pre-test. So she decides to make the post-test easier. Well, duh! What do you
think will happen to the post-tests? They’ll go up. If you just have the
experimental group, you’ll conclude that the stimulus had an effect. Woops.
They didn’t learn anything; the stimulus actually didn’t have an effect. In the
classical experimental design, we can determine this because we gave the same,
easier post-test to the control group, and their scores went up just as much.
So, you need the control group to determine the effect of the stimulus. In this
case, you may also want to come up with a bank of 100 questions, and draw up 50
beforehand for the pre-test and 50 for the post-test, and make sure that the
two tests are equal in difficulty. In any event, just make sure that both
groups get the same test.
Statistical regression
to the mean. In evaluating public programs for needy people, we often deal
with very poor, very unemployed folks who are at a low in their lives. It’s
kind of like taking a really hard, unfamiliar class. You may have one really low grade. But it doesn’t reflect how you
usually perform. What happens when we test you again? Wow, your score goes up.
In an example of coming up with a job training program for the hard core
unemployed, if we only have the experimental group, regression to the mean
would predict that their situation would be better over time, even if the new
program had no effect. So we again need the control group. And make sure that both
groups are equal, and that they are both hard-core unemployed types.
Selection biases. Don’t let people
choose what group they want to be a part of. The most motivated students might
volunteer for the innovative teaching technique group. The two groups would then
no longer be equal in composition. The scores of the experimental group may go
up more, simply because the students are more motivated. In reality, the
stimulus had no effect. So, use matching or random assignment of students to
the two groups, to ensure that the two groups in equal in composition.
Experimental mortality. The test scores in the
experimental group are higher in the post-test than in the pre-test, so you
think that the stimulus had an effect. But some of the weaker students dropped
the class. You’re only left with the more motivated students, who had higher
pre-tests than the other students to begin with. So watch for who leaves the
experimental (and control) group. You may have to recalculate the pre-test
scores for only those who have stayed in the group throughout. Plus, you may
have to weight the scores of the groups to ensure that the two groups are
comparable.
So, we have a control
group, equal in composition to the experimental group. And we don’t have
experimental mortality, or we have corrected for it. Great! But wait, there’s
more, as Mr. Wonderful on Shark Tank would say. Students in the experimental
group love their class so much that they brag about it to the students in the
control group, and those students start playing the same games or whatever.
Now, their scores go up as well. The worn-out experimenter would see scores
going up equally in both groups, and would incorrectly conclude that the
stimulus had no effect. In reality, it had a great effect. So, you have to
prevent this Diffusion or Imitation effect. Don’t permit any
communication between the groups. How can you do this? Maybe have the two
groups at two different branches of your university in different cities. But
would the groups be equal? Students at our Meridian campus may be
non-traditional students. So, yet another thing for the researcher to worry
about.
Compensatory Rivalry. Ever see the movie
Rocky, about the unknown boxer who didn’t have a prayer? He fought harder,
to make up for his disadvantage. If people know they are in the Control group,
they may similarly be inspired and challenged to just work harder. Their scores
go up, simply because of their knowledge that they are in the control group.
The scores in the experimental group go up because the stimulus had a positive effect, but the scores also go up in the control group because of Compensatory Rivalry. We draw the incorrect conclusion that the stimulus had no effect, because the scores went up in both groups. So we have to ensure that the
subjects of the experiment are not aware of which group they are in.
Demoralization effect. The reverse
could happen. The medical patients, for example, might learn that they are not
getting the Wonder Drug (vaccine). They get demoralized. I’m going to die, they
think! Their scores go down, because their emotional state hurts their health,
and because they know that they are in the control group. But because the scores did not change in the experimental group, you incorrectly conclude that the stimulus had some positive effect. So you have to ensure
that the subjects do not know what group they are in.
So for both compensatory
rivalry and demoralization, we have to ensure that people do not know what
group they are in. So in the medical experiment, the control group is given a
placebo. Also, the doctors and nurses interacting with the patients cannot know what
group the patient is in. Otherwise, they might give more TLC to the control
group, because they feel sorry for them. That better bedside manner might cause
their test scores to go up. In short, if they know that their patients are in
the control group, they are providing Compensation to them. That
compensation is actually like a second stimulus. It can cause their scores to
go up. So don’t give compensation. Don’t even let health care professionals
know what group they are working with.
So, to sum up. Internal invalidity problems are
inferences (conclusions) drawn that are not an accurate reflection of what
actually happened. There are 11 internal invalidity problems:
Well, the researcher has
gone through all of that work, and now she has something else to worry about.
Can she generalize the results of her experiment to the entire population that she
cared about? This is called External Invalidity problems.
For example, political
scientists did an experimental study of college students, and gave
the two groups descriptions of political candidates to vote for that differed only with respect to the race or sex of the
candidate. They found that the candidate’s race or sex had no effect on the
students’ vote preferences. They concluded, great, racism and sexism are dead in
America. Are they right? Well, no. First, they have the problem of Sample
Bias. College students are younger, and they are more educated, than the
average American. Both groups are more tolerant or liberal on racial and gender issues
than other people. So you can’t really generalize to the population. If would
be better to draw a sample from the general population, and do the classical
experimental design on them.
A second problem is that
it is an Artificial Experiment. The subjects know that they are not
really voting for someone. Try to make the experiment more realistic. Maybe
impossible. I used Unobtrusive Measures (rather than experimental design) to
determine racism-sexism in a university setting on one occasion years ago. I successfully
urged the hiring of the first (and only) woman Dean of Arts and Sciences, who
was the best qualified candidate. Most department heads treated her like any
other Dean, but one was intimidated and seemed to bow to her and rush to shake
her hand at the interviews (no virus back then). Another head was flustered, and
kept interrupting her at department meetings, so she told him to shut up, and
finally fired him. I also successfully urged the hiring of the first African
American political science department head, who was also the best qualified
candidate. In that situation all of the faculty treated him just like any
department head, they talked only about his qualifications, and they
unanimously urged his hiring. So, very realistic situation, since these
administrators were their bosses.
Long-lasting effect is a third problem with
external invalidity. Even if you can generalize to the population, does it have
a long-lasting effect, or is it merely temporary. If you wanted to increase
civics knowledge in the general population, and you did the classical
experimental design, and post-test scores were higher, would they stay higher.
Or would they drop down over time to what they were initially. That is why
multiple post-tests can help. Sometimes, the scores will drop down over time,
but they will still be higher than they were in the pre-test. So the stimulus
did have some effect, but not as great as you thought. This is a case with this
coronavirus vaccine, which is found to be effective for only about 6 months;
therefore, people need booster shots.
I won’t even mention the
4th problem, which is too complex to explain, and we’re running out
of time.
To summarize: External Invalidity Problems are an inability
to generalize to a population. There are 3 external invalidity problems
that we talked about:
(Source Note: internal and external invalidity
problems derived from Donald T. Campbell and Julian C. Stanley's Experimental and Quasi-Experimental Designs
for Research, Houghton Mifflin Co., 1963, pages 5-6.)
The Classical
Experimental Design is strong on internal validity, but weak on external
validity
Quasi-experimental
designs are only moderate on internal validity, since they are
natural-occurring experiments, and people cannot be randomly assigned to the
groups.
Two major types of quasi-experiments:
1) Time Series
Design- multiple pre-tests before stimulus; multiple post-tests after
stimulus; no control group. This design’s problem is a failure to control for
numerous threats to the internal validity of the quasi-experiment.
2) Control Series
Design- two time series, one for experimental group, one for control
group. You must have groups that are as comparable as possible. This design controls
for many internal validity problems.
Example, you want to
save lives on our highways by greatly increasing the fine for not using seat
belts. You use a Time Series Design, and look at the highway fatality rate per
miles driven in the three years before the new state law increasing fines.
Then, over the next three years, you look at the highway fatality rate. So you
have three pre-tests, and three post-tests. If the rates go down in the
post-tests, you say that the stimulus (the new law) had an effect.
But, what if other
things were going on in the environment that were also causing fatality rates
to go down. Maybe the cars are being built better, with more safety devices.
Maybe there is more police enforcement. Maybe the weather is less rainy, or
people are becoming more considerate. Whatever. These other factors cause
fatality rates to go down.
So a Control Series
Design is an even better design. Find a state whose composition is similar to
Mississippi. How about Alabama? They would also be affected by many of these
other factors. But they did not have the law change. So you have an
experimental and control group. Did the fatality rate go down more in
Mississippi than in Alabama? If so, the stimulus had an effect.