Last changed 12 March 2003 ............... Length about 3,000 words (21,000 bytes).
This is a WWW document maintained by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/hawth.html. You may copy it. How to refer to it.

The Hawthorne effect: a note

by Stephen W. Draper,   Department of Psychology,   University of Glasgow.

Contents (click to jump to a section)

Preface

This is a note on the Hawthorne effect: often mentioned, not so easy to find a simple account of it.

 

Finding and referring to the Hawthorne effect in the literature

Note that "Hawthorne" is not the name of a researcher, but of the factory where the effect was first observed and described: the Hawthorne works of the Western Electric Company in Chicago.

One definition of the Hawthorne effect is: An experimental effect in the direction expected but not for the reason expected; i.e. a significant positive effect that turns out to have no causal basis in the theoretical motivation for the intervention, but is apparently due to the effect on the participants of knowing themselves to be studied in connection with the outcomes measured.

The short way to refer to the Hawthorne effect is:
Mayo,E. (1933) The human problems of an industrial civilization (New York: MacMillan) ch.3.

or

Roethlisberger,F.J. & Dickson,W.J. (1939) Management and the Worker (Cambridge, Mass.: Harvard University Press).

The longer way is:
The studies were done 1924-1933?. Roethlisberger & Dickson give a great amount of detail, and little interpretation. Mayo gives a shorter account, and additionally the interpretation which has been so influential: essentially, that it was feeling they were being closely attended to that was the cause of the improvements in performance.

The Hawthorne effect comes from management research. More comments on this in a later section.

What was the original Hawthorne effect?

Basically, a series of studies on the productivity of workers manipulated various conditions (pay, light levels, rest breaks etc.), but each change resulted on average over time in productivity rising, including eventually a return to the original conditions. This was true of each of the individual workers as well as of the group mean.

Clearly the variables the experimenters manipulated were not the only nor dominant causes of productivity. One interpretation, mainly due to Mayo, was that the important effect here was the feeling of being studied: it is this that is now referred to by "the Hawthorne effect".

More detail

1924-1927 there were 2.5 years of illumination level experiments. In 1927 four studies began on selected small groups. In 1932 a questionnaire and interview study of 20,000? employees.

Illumination studies pp.14-18 (part of ch.1) of Roethlisberger & Dickson (1939)
Study 1a-d. a-c were experiments on whole departments.
1a) No control group, experimental groups in 3 different deptartments. All showed an increase of productivity (from an initial base period), didn't decrese with illumination.
1b) 2 groups. The control group got stable illumination; the other got a sequence of increasing levels. Got a substantial rise in production in both, but no difference between the groups.
1c) Experimental and control groups. Experimental group got a sequence of decreasing light levels. Both groups steadily increased production, until finally light in experimental group so low they protested and production fell off.
1d) 2 girls only. Their production stayed constant under widely varying light levels, but they said they preferred the light (1) if experimenter said bright was good, then the brighter they believed it to be they more they liked it; (2) then ditto when he said dimmer was good. And if they were deceived about a change, they said the preferred it i.e. it was their belief about the light level not the actual light level, and what they thought the experimenter expected to be good, not what was materially good.

Study 2: the relay assembly experiments (2a,b) on a group of 1+5 female operators.
2a Rest pauses and hours of work (in a separate room). Small group piecework the only expt. var.
2b About a piecework payment system (on a separate bench, but normal room).
2c Mica splitting test room. Like 2a: separate room, but already and constantly on piecework rates.
2d Bank wiring: pure observation of a 14 man team. Group piecework. Could always easily see their own rate.

Study 2a: a group of 6 experienced female workers segregated; 1 serving, 5 assembling telephone relays: a 1 min. task in good conditions. Output carefully measured. 5 year study. Output (time for every relay produced) was secretly measured for 2 weeks before moving them to the experimental room. Then 5 weeks of measures; then manipulations of pay rules (group piecework for the 5 person group); then 2 5 min. breaks (after a discussion with them on the best length of time); then 2 10 min. breaks (not their preference) again produced improvement; then 6 5 min. rests (dislike, reduced output); then (free?) food in the breaks; shortened the day by 30 mins (output up); shortened it more (output per hour up, but overall down); return to earlier condition (output peaked); etc. etc. Attitudes as well as behaviour and output were measured.

Parsons (1974) argues that in 2a,2d they had feedback on their work rates; but in 2b they didn't. He argues that in the studies 2a-d, there is at least some evidence that the following factors were potent:

  1. Rest periods
  2. Learning, given feedback i.e. skill acquisition
  3. Piecework pay where an individual does get more pay for more work, without counter-pressures (e.g. believing that management will just lower pay rates).

He (re)defines "the Hawthorne effect as the confounding that occurs if experimenters fail to realize how the conseqences of subjects' performance affect what subjects do" [i.e. learning effects, both permanent skill improvement and feedback-enabled adjustments to suit current goals]. So he is saying it is not attention or warm regard from experimenters, but either a) actual change in rewards b) change in provision of feedback on performance. His key argument is that in 2a the "girls" had access to the counters of their work rate, which they didn't previously know at all well.

It is notable however that he refuses to analyse the illumination experiments, which don't fit his analysis, on the grounds that they haven't been properly published and so he can't get at details, whereas he had extensive personal communication with Roethlisberger & Dickson.

Possibly a longitudinal learning effect. But Mayo says it is to do with the fact that the workers felt better in the situation, because of the sympathy and interest of the observers. He does say that this experiment is about testing overall effect, not testing factors separately. He also discusses it not really as an experimenter effect but as a management effect: how management can make workers perform differently because they feel differently. A lot to do with feeling free, not feeling supervised but more in control as a group. The experimental manipulations were important in convincing the workers to feel this way: that conditions were really different. The experiment was repeated with similar effects on mica splitting workers.

When we refer to "the Hawthorne effect" we are pretty much referring to Mayo's interpretation in terms of workers' perceptions, but the data show strikingly continuous improvement. It seems quite a different interpretation might be possible: learning, expertise, reflection -- all processes independent of the experimental intervention? However the usual Mayo interpretation is certainly a real possible issue in desiging studies in education and other areas, regardless of the truth of the original Hawthorne study.

Recently the issue of "implicit social cognition" i.e. how much weight we actually give to what is implied by others' behaviour towards us (as opposed to what they say e.g. flattery) has been discussed: this must be an element here too.

Clark & Sugrue (1991, p.333) in a review of educational research say that uncontrolled novelty (i.e. halo) effects cause on average 30% of a standard deviation (SD) rise (i.e. 50%-63% score rise), which decays to small level after 8 weeks. In more detail: 50% of a SD for up to 4 weeks; 30% of SD for 5-8 weeks; and 20% of SD for > 8 weeks, (which is < 1% of the variance).

 

Can we trust the research?


Candice Gleim says:
Broad experimental effects and their classifications can be found in Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNally. and Cook, T.D., & Campbell, D.T. (1979), Quasi-Experimentation : Design and Analysis Issues. Houghton Mifflin Co.

A summary is provided at http://www.valdosta.peachnet.edu/~whuitt/psy702/intro/valdgn.html and a newer version at http://chiron.valdosta.edu/whuitt/col/intro/research.html

 


Michael L. Kamil says:
You might want to be a bit careful about the scientific basis for the Hawthorne effect. Lee Ross has brought the concept into some question There is a popular news story in the New York Times a couple of years ago: link.

 


David Carter-Tod says:
Interestingly in the process of doing a quick search on this I came across the following quote:
A psychology professor at the University of Michigan, Dr. Richard Nisbett, calls the Hawthorne effect 'a glorified anecdote.' 'Once you've got the anecdote,' he said, 'you can throw away the data.'" A dismissive comment which back-handedly tells you something about the power of anecdote and narrative. There is however, no doubt that there is a Hawthorne effect in education particularly.

The original newspaper piece
Some references to it: http://www.cquest.utoronto.ca/env/aera/aera-lists/aera-c/98-12/0015.html
http://dhp.com/~laflemm/hmco/Ch7quiz2.htm
http://www.felician.edu/instres/Data/Math%20Stats/Lectures/The%20Nature%20of%20Statistics.doc


Don Smith says:
I recall studying the Hawthorne Effect as an undergraduate for a management degree years ago. At that time the message was that if a group knew they were being studied the results may be biased.

However, I found Harry Braverman's comments in his book "Labor and Monopoly Capital" more interesting. According to Braverman, the Hawthorne tests were based on behaviorist psychology and were supposed to confirm that workers performance could be predicted by pre-hire testing. However, the Hawthorne study showed "that the performance of workers had little relation to ability and in fact often bore a reverse relation to test scores...".

What the studies really showed was that the workplace was not "a system of bureaucratic formal organization on the Weberian model, nor a system of informal group relations, as in the interpretation of Mayo and his followers but rather a system of power, of class antagonisms.

According to Braverman this discovery was a blow to those hoping to apply the behavioral sciences to manipulate workers in the interest of management.

 

My view: What is wrong about the quoted dismissiveness is that there was not 1 study, but 3 illumination experiments, and 4 other experiments: only 1 of these 7 is alluded to. What is right is that a) there certainly are significant criticisms of the method that can be made and b) most subsequent writing shows a predisposition to believe in the Hawthorne effect, and a failure to read the actual original studies.

So, can we trust the literature?

The experiments were quite well enough done to establish that there were large effects due to causal factors other than the simple physical ones the experiments had originally been designed to study. The output ("dependent") variables were human work, and we can expect that educational effects to be similar (but it is not so obvious that medical effects would be). The experiments stand as a warning about simple experiments on human participants as if they were only material systems. There is less certainty about the nature of the surprise factor, other than it certainly depended on the mental states of the participants: their knowledge, beliefs, etc.

Candidate causes are:

  1. Material factors, as originally studied e.g. illumination, ...
  2. Motivation or goals e.g. piecework, ...
  3. Feedback: can't learn skill without good feedback. Simply providing proper feedback can be a big factor. This can often be a side effect of an experiment, and good ethical practice promotes this further. Yet perhaps providing the feedback with nothing else may be a powerful factor.
  4. The attention of experimenters.

Parsons implies that (4) might be a "factor" as a major heading in our thinking, but as a cause can be reduced to a mixture of (2) and (3). That is: people might take on pleasing the experimenter as a goal, at least if it doesn't conflict with any other motive; but also, improving their performance by improving their skill will be dependent on getting feedback on their performance, and an experiment may give them this for the first time. So you often won't see any Hawthorne effect -- only when it turns out that with the attention came either usable feedback or a change in motivation.

Adair (1984)
Warns of gross factual inaccuracy in most secondary publications on Hawthorne effect. And that many studies failed to find it, but some did.

Argues that we should look at it as a variant of Orne's (1973) experimental demand characteristics. So for Adair, the issue is that an experimental effect depends on the participants' interpretation of the situation; that this may not be at all like the experimenter's interpretation and the right method is to do post-experimental interviews in depth and with care to discover participants' interpretation.

So he thinks it is not awareness per se; nor special attention per se; but you have to investigate participants' interpretation in order to discover if/how the experimental conditions interact with the participants' goals (in participants' view). This can affect whether participants' believe something, if they act on it or don't see it as in their interest, etc.

Its interpretation in management research

The research was and is relevant firstly in the 'Human Resources Management' movement. The discovery of the effect was most immediately a blow to those hoping to apply the behavioral sciences to manipulate workers in the interest of management.

Other interpretations it has been linked to are: Durkheim's 'anomie' concept; the Weberian model of a system of bureaucratic formal organization; a system of informal group relations, as in the interpretation of Mayo and his followers; a system of power, of class antagonisms.

What does it mean in education?

We might distinguish between:

The placebo and Hawthorne effects compare and contrast in these ways:

  • Both are psychological effects of the participants, causing an effect when the material intervention has no effect.
  • Both are effects produced by the learners' perceptions and reactions; but the former emphasises their response to new equipment or methods, while the latter emphasises their response simply to being studied.
  • The cause in the placebo effect is the participants' false belief in the material efficacy of the intervention. The cause in the Hawthorne effect is the participants' response to being studied i.e. to the human attention.
  • In both cases, the experimenter may be deceiving the participants, or may be mistakenly sincere, or neutral with respect to the effects of the technology or intervention. In general however, the experimenter appearing to the participant to believe in the efficacy of the intervention, while not essential, may be more or more often important to the placebo effect than to the Hawthorne effect.

What does it mean for educational research and evaluation?

These are some notes stimulated by a valuable chapter by Shayer (1992).

There are two different aims for research:

  • [Science]: Finding the causes, testing a (causal) model
  • [Engineering] Discovering and proving the generalisability of the effect.

Science studies

If you want just to find causes and laws, not to achieve any useful practical effect, then the focus is on isolating causes by controlling experiments and avoiding things such as the Hawthorne effect. Hence, in medical research, double blind trials etc.

Note that double blind trials (where neither experimenter nor patient know which intervention/treatment they are getting during the trial) are quite practicable for testing pills (where a dummy sugar pill can easily be made that the patient cannot tell apart from other pills); but not for major surgery, nor usually for educational interventions that require actions by the learner: in these cases participants necessarily know which treatment they have been given.

Double (or triple) blind trials "control for" all 4 of the above effects in the sense of making them equal for all groups by removing the ability of both experimenter and participants to even know which treatment they are getting, much less to believe they know which is more effective.

They may tend to abolish the placebo effect by removing the patient's knowledge that they are getting the active treatement. However they do NOT remove the Hawthorne effect (only make it equal for all groups in the trial), since on the contrary the experiment almost certainly makes participants very aware of receiving special attention. This could mean that the effect sizes measured in some groups are misleading, and would not be seen later in normal practice. The trial would be a fair comparison between groups, but the (size of) effect measured would not be predictive of the effect seen in non-experimental conditions, due to a similar "error" (i.e. effect due to the Hawthorne effect) applying to both groups.

This could, at least in theory, matter. A case in point could be comparing homeopathic and conventional medicine. Generally a patient will get about 50 minutes of the practitioner's attention in the former case, and 5 minutes in the latter. It is not hard to imagine that this could have a significant effect on patient recovery. A standard double blind experiment would be most seriously misleading in a case where both a drug and the Hawthorne effect of attention were of similar size, but not additive (i.e. either one was effective, but getting both gave no extra benefit): then a conventional trial would see similar and useful effect sizes in all groups, but would not be able to tell that in fact either giving the drug or giving an hour's attention to the patient were alternative effective therapies.

Finally, neither medicine nor education habitually employ counter-balanced experimental designs, where all participants get both treatments: one group gets A then B, and the other gets B then A. This is because of the possibility of assymmetric transfer effects i.e. the effect of B (say) is different depending on whether or not the participant had A first. For instance, learning French vocbulary first then reading French literature is not likely to have the same effect as receiving them the other way round.

Applied or engineering studies

Shayer thinks there are distinct questions and stages to address in applied as opposed to "scientific" research -- i.e. in research on being able to generalise the creation of a desired effect:
  • 1. Study primary effect: Is there an effect (whatever the cause), what effect, what size of effect?
  • 2a Replication: can it be done by other enthusiasts (not only by the original researcher)?
  • 2b Generalisability: can it be transferred via training to the general population of teachers? i.e. without special enthusiasm or skills.

One danger is the Hawthorne effect: you get an effect, but not due to the theory. The opposite is to get a null effect even though theory is correct because transfer/training didn't work. So you need to do projects in several stages, showing effects at each.

In stage 1 you do an experiment and show there really is an effect, defensible against all worries. But you still haven't shown what it is caused by: whether the factors described in your theory, or by the experimenter: i.e. no defence against Hawthorne. Use 1 or 2 teachers, and control like crazy.

In 2a you show it can be done by others: so at least not just a Papert charisma effect, but it still might be a learner enthusiasm effect (halo). Use say 12 teachers.

In 2b you are testing whether training can be done.

Note that if what you care about is improving learning and the learners' experience, then you may want to maximise not avoid halo and Hawthorne effects. If you can improve learning by changing things every year, telling students this is the latest thing, then that is the ethical and practically and practically effective thing to do.

My summary view

In the light of the various critiques, I think we could see the Hawthorne effect at several levels.

At the top level, it seems clear that in some cases there is a large effect that experimenters did not anticipate, that is due to participants' reactions to the experiment itself. This is the analogue to the uncertainty principle BUT (unlike in quantum mechanics) it is only happens sometimes. So as a methodological heuristic (you should always think about this issue) it is useful, but as an exact predictor of effects, it is not: often there is no Hawthorne effect of any kind. To understand when and why we will see a Hawthorne or experimenter effect, we need more detailed considerations.

At a middle level, I would go with Adair (1984), and say that the most important (though not the only) aspect of this is how the participants interpret the situation. Interviewing them (after the "experiment" part) would be the way to investigate this.

This is important because factory workers, students, and most experimental participants are doing things at the request of the experimenter. What they do depends on what their personal goals are, how they understand the task requested, whether they want to please the experimenter and/or whether they see this task as impinging on other interests and goals they hold, what they think the experimenter really wants. Besides all those issues that determine their goals and intentions in the experiment, further aspects of how how they understand the situation can be important by affecting what they believe about the effects of their actions. Thus the experimenter effect is really not one of interference, but of a possible difference in the meaning of the situation for participants and experimenter. Since all voluntary action (i.e. actions in most experiments) depends upon the actor's goals AND on their beliefs about the effects of their actions, differences in understanding of the situation can have big effects.

At the lowest level is the question of what the direct causal factors might be. These could include:

  • Material ones that are intended by the experimenter
  • Feedback that an experiment might make available to the participants
  • Changes to goals, motivation, and beliefs about action effects induced by the experimental situation.

Parson's argument, primarily about feedback provision, is that learning (improving a skill) requires plenty of feedback on your performance. If an experiment provides that (as a side effect of making experimental measurements) where it wasn't readily available before, you may see performance improvement due to that alone. A related issue in education may be that if a student does not believe they can improve at something then they won't try (e.g. "I can't do maths", "Perfect pitch is an innate ability so there is no point in me practising"), but an experiment might make them change this assumption and so start making an effort to learn (placebo effect, halo effect).

References

G. Adair (1984) "The Hawthorne effect: A reconsideration of the methodological artifact" J. Appl. Psych. vol.69 (2), 334-345 [Reviews references to Hawthorne in the psychology methodology literature.]

Clark,R.E. & Sugrue,B.M. (1991) "Research on instructional media, 1978-1988" in G.J.Anglin (ed.) Instructional technology: past, present, and future ch.30 pp.327-343 (Libraries unlimited: Englewood, Colorado).

Mayo,E. (1933) The human problems of an industrial civilization (New York: MacMillan)

Orne,M.T. (1973) "Communication by the total experimental situation: Why is it important, how it is evaluated, and its significance for the ecological validity of findings" in P.Pliner, L.Krames & T.Alloway (eds.) Communication and affect pp.157-191 (New York: Academic Press).

H. M. Parsons (1974) "What happened at Hawthorne?" Science vol.183, 922-932 [A very detailed description, in a more accessible source, of some of the experiments; used to argue that the effect was due to feedback-promoted learning.]

Roethlisberger,F.J. & Dickson,W.J. (1939) Management and the Worker (Cambridge, Mass.: Harvard University Press).
[This is a large book (more than 600 pages) of details of the study.]

Schön, D.A. (1983) The reflective practitioner: How professionals think in action (Temple Smith: London) (Basic books?)

Shayer,M. (1992) "Problems and issues in intervention studies" in Demetriou,A., Shayer,M. & Efklides,A. (eds.) Neo-Piagetian theories of cognitive development: implications and applications for education ch. 6 pp.107-121 (London : Routledge)

Zdep,S.M. & Irvine,S.H. (1970) "A reverse Hawthorne effect in educational evaluation" Journal of School Psychology vol.8 pp.89-95