The Stroop Effect Research Paper

To see and interact with the world, we first need to understand it. Visual processing is one way we do this, and is composed of many parts. When we see an object, we don’t just see its physical attributes, we also comprehend the meaning behind them. We know that a chair needs legs because the seat needs to be raised, we know that the wood comes from trees, we know we could sit in it, and so on. There is information that we process about the things we see without even being aware of that processing.

So when John Ridley Stroop asked people to read words on a sheet of paper in 1929, he knew that their automatic processing would come into play, and could offer a breakthrough insight into brain function. Research from as early as 1894 had shown that associations of even nonsense syllables would become embedded into a person’s understanding, and could interfere with how they processed and recalled these syllables, despite no real meaning being attached to them. It was therefore clear, even in the beginnings of contemporary psychological research, that associations are powerful and pervasive.

What is the Stroop Effect?

Stroop’s innovation was to show, clearly and definitively, that our embedded knowledge about our environment impacts how we interact with it. His research method is now one of the most famous and well-known examples of a psychological test, and is elegant in its simplicity.

First, the participant reads a list of words for colors, but the words are printed in a color different to the word itself. For example, the word “orange” would be listed as text, but printed in green. The participant’s reading time of the words on the list is then recorded. Next, the participant has to repeat the test with a new list of words, but should name the colors that the words are printed in. So, when the word “orange” is printed in green, the participant should say “green” and move on to the next word.

Below is a brief example of the Stroop test, try it out! You can also click here for a printable version to test it with others.

First, time yourself while you read the following text, ignoring the colors the words are printed in.






Now time yourself while you state the colors of the following words, ignoring the actual text (as best as you can!).






In most cases, it takes longer to state the colors of the words, rather than to read the text they are printed in, despite the incongruence being essentially the same across both lists (i.e. both show words in the wrong color). It appears we are more influenced by the physical text than than the text color.


Why does this happen?

What this reveals is that the brain can’t help but read. As habitual readers, we encounter and comprehend words on such a persistent basis that the reading occurs almost effortlessly, whereas declaration of a color requires more cognitive effort. When there is a conflict between these two sources of information, our cognitive load is increased, and our brains have to work harder to resolve the required difference. Performing these tasks (preventing reading, processing word color, and resolving information conflict) ultimately slows down our responses, and makes the task take longer.

There are a few theories that slightly differ in their definitions of the Stroop Effect, yet their differences mostly lie in which part that they emphasize. For example, one theory emphasizes that the automaticity of reading as the principal cause of Stroop interference, while another emphasizes the mental prioritizing which we perform when reading, as compared to defining colors. While differences in theories may therefore exist, all essentially converge on the central premise that reading is a simpler and more automatic task than stating colors, and that a conflict between the two will increase the time needed for processing.

What can we use it for?

Using this paradigm, we can assess an individual’s cognitive processing speed, their attentional capacity, and their level of cognitive control (otherwise known as their executive function). These skills and facets are implicit in so many ways in which we interact with the world, suggesting that this test reveals a brief – yet incisive – view into human thought and behavior.

The test is also used in a variety of different ways to the original, in an effort to exploit the experimental setup to reveal more about a clinical population, for example. Even neurodevelopmental disorders such as schizophrenia and autism have been examined with the Stroop test.

Furthermore, there are several variations and differing implementations of the test available, allowing different aspects of cognition to be honed in on. One of these variations is the “emotional Stroop test” in which participants complete both the original Stroop, and a version which has both neutral and emotionally charged words. The resulting text features words such as “pain” or “joy” amongst everyday words. Research has shown that anxious people were likely to experience more interference (i.e. more time spent declaring word color) with emotionally charged words, suggesting a preponderance of the emotional word content.

Experimental designs like this allow researchers to target and observe cognitive processes that underlie explicit thought. The test reveals the working of non-conscious brain function and reduces some of the biases that can otherwise emerge in testing.

Other experimental setups utilize the lessons of the Stroop Effect – that incongruent information will require more mental resources to resolve correctly – with numbers, rather than words. Termed the “Numerical Stroop Effect”, this experiment has shown that presenting numbers of incongruent sizes next to each other will slow down reading and comprehension. For an example, see the image below:

Examples of the different test types that are used in the Numerical Stroop.


This experiment shows that, with all else being controlled for, incongruence in numerical size will cause the greatest interference, increasing the delay in comprehension. An interesting feature with the Numerical Stroop is that the interference is found for both types of incongruence – when the numbers are incongruent with size, then a delay is shown for reporting the size, as well as for reporting the numbers. This effect reveals that the automatic processing is not just limited to words, suggesting that the brain looks for normal patterns in a variety of presented stimuli, as it appears to struggle when this doesn’t occur.

How can the Stroop test be used?

The Stroop test can be simply administered with a basic experimental setup. At its most fundamental, all you need is an image of the Stroop test words, a stopwatch, and someone to record the time and answers (and a willing participant!). However, if you want to gain more insights from the data, there are plenty of ways to take the test further. With iMotions you can simply set up and present the Stroop test, while also expanding the data collection possibilities. Using the survey function, the test can be quickly and simply added. This can be done with either the built-in iMotions survey tool, or with the Qualtrics survey tool, which allows even more metrics to be taken into account.

The ability to record from various synchronized biometric devices opens up new avenues for research. For example, with an eye-tracking tool, you can examine exactly how long each participant looks at each word, and their precise speed of comprehension. Using areas of interest (AOIs) can be of particular use as this allows you to analyze specific parts of the scene in isolation, or compared to the data for the scene as a whole, or even with other AOIs. It’s then possible to determine which words demanded the most visual attention, allowing you to accurately dissect the data in fine detail.

Below are a few examples of that idea in practice, each of which took only minutes to set up and start.

First, we’ve added an image of a Stroop test to the survey function – one version is essentially the same as the original, while another has neutral words mixed with food related words. This version of the Stroop test would require that the participant verbally declare the color of each word – audio recording could help in accurately measuring participant responses. We have also included an example using a multiple choice paradigm that is detailed below, and using the Qualtrics survey function below that.

The normal Stroop test inserted as a survey image into iMotions.


A modified Stroop test inserted as a survey image into iMotions.


After we’ve set up eye-tracking and added a participant list, we can add AOIs to the words, so that we can view and analyze data for each. Below is an image of how this looks:

The Stroop test in iMotions with AOIs placed over the color words.


After running through a few participants, we can start to visualize and analyze their data, producing both detailed AOI data, and heatmaps showing overview data. Below are examples of what this data could look like. Of course, more detailed data is available to export and analyze, if desired.

Data displayed in iMotions showing the time-to-first-fixation (TTFF), the time spent in seconds looking at the AOI (which is only shown to one decimal point in the image above), and the ratio of participants who viewed the AOI.



A heatmap showing the level of fixation across the words shown in the Stroop test.


Alternatively, we can insert each word of the Stroop test within the survey setup, and use the keyboard input function for the participant to answer each word color. This would also allow us to investigate the error rate in a more systematic manner. This is shown across the two images below.

The survey setup with an incongruent word-color stimuli. Several of these surveys can be quickly arranged for multiple tests.


How the above survey appears to the participant. The participant is required to choose one of the predefined colors before advancing to the next question.


Within this paradigm, eye movements can also be measured, providing information about the amount of time taken to process the information. The approach may take longer for each participant, and remembering the keyboard-color combinations may encumber their cognitive processing (although this shouldn’t present a problem if this approach is used with the correct controls), however it does allow a finer dissection of eye movement for each word, and also informs us about the error rate from incorrect answers.

Using Qualtrics

Finally, we can see how this test is implemented in iMotions using the Qualtrics survey function. This is easily implemented, and appears in a similar way to the above surveys that are built by iMotions. One of the advantages of using Qualtrics is that feedback to participant answers can be immediately provided, should this be desired. The following image shows how the stimulus presentation appears on screen.

Qualtrics implementation of the Stroop test.


The participant can then click on the corresponding color to answer the question. If an incorrect answer is chosen, the response would be shown as below.

Feedback for participant in Qualtrics.


The participant can then proceed to complete other questions, and their answers will be recorded, allowing later analysis and visualization of the results.

With all of the information completed and data analyzed, we can now start to discern which words showed the greatest amount of Stroop interference (the latency produced when naming the color that the word is printed in). Having several paradigms with different colors, words, and with only blocks of colors will provide more baseline information and control for experimental error. Ultimately this gives a good basis for the participant data to be normalized, and compared with more validity. We can now test if there is any difference with the words of interest and potentially start to draw conclusions about the implicit thoughts of participants (with the example above, it could be that participants who are hungrier would spend a longer duration in naming the colors of the words, suggesting those words are more salient to them).


The Stroop test is a widely-used, well established methodology that reveals various brain functions, and implicit cognitive workings. The original article has now been cited over 13,000 times and that number will surely continue to rise well into the future. With iMotions, it’s easy to start asking questions with the Stroop Task and to get to the answers quickly. Contact us and hear how we can help with your research needs and questions.

We think you might also enjoy:

Hi there! I'm the Science Editor at iMotions. I've previously spent my time as a neuroscientist / psychologist, where I found and developed my love for good science. I have a PhD in neuroscience and developmental biology, alongside a bachelor's degree in psychology, and a master's degree in cognitive and computational neuroscience. I'm a big fan of the brain and mind. I believe in the power of well-captured data to provide answers about who we are, what we think, and why we behave in the way that we do.

An internet resource developed by
Christopher D. Green
York University, Toronto, Ontario

(Return to Classics index)

[1] (1935)

George Peabody College

First published in Journal of Experimental Psychology, 18, 643-662.


Interference or inhibition (the terms seem to have been used almost indiscriminately) has been given a large place in experimental literature. The investigation was begun by the physiologists prior to 1890 (Bowditch and Warren, J. W., 1890) and has been continued to the present, principally by psychologists (Lester, 1932). Of the numerous studies that have been published during this period only a limited number of the most relevant reports demand our attention here.

Münsterberg (1892) studied the inhibiting effects of changes in common daily habits such as opening the door of his room, dipping his pen in ink, and taking his watch out of his pocket. He concluded that a given association can function automatically even though some effect of a previous contrary association remains.

Müller and Schumann (1894) discovered that more time [p. 644] was necessary to relearn a series of nonsense syllables if the stimulus syllables had been associated with other syllables in the meantime. From their results they deduced the law of associative inhibition which is quoted by Kline (1921, p. 270) as follows: "If a is already connected with b, then it is difficult to connect it with k, b gets in the way." Nonsense syllables were also used by Shepard and Fogelsonger (1913) in a series of experiments in association and inhibition. Only three subjects were used in any experiment and the changes introduced to produce the inhibition were so great in many cases as to present novel situations. This latter fact was shown by the introspections. The results showed an increase in time for the response which corresponded roughly to the increase in the complexity of the situation. The only conclusion was stated thus: "We have found then that in acquiring associations there is involved an inhibitory process which is not a mere result of divided paths but has some deeper basis yet unknown" (p. 311).

Kline (1921) used 'meaningful' material (states and capitals, counties and county seats, and books and authors) in a study of interference effects of associations. He found that if the first associative bond had a recall power of 10 percent or less it facilitated the second association, if it had a recall power of 15 percent to 40 percent the inhibitory power was small, if it had a recall power of 45 percent to 70 percent the inhibiting strength approached a maximum, if the recall power was 70 percent to 100 percent the inhibition was of medium strength and in some cases might disappear or even facilitate the learning of a new associaiton.

In card sorting Bergström (1893 and 1894), Brown (1914), Bair (1902), and Culler (1912) found that changing the arrangement of compartments into which cards were being sorted produced interference effects. Bergström (1894, p. 441) concluded that "the interference effect of an association bears a constant relation to the practice effect, and is, in fact, equivalent to it." Both Bair and Culler found that the interference of the opposing habits disappeared if the habits were practiced alternately.

[p. 645] Culler (1912), in the paper already referred to, reported two other experiments. In one experiment the subjects associated each of a series of numbers with striking a particular key on the typewriter with a particular finger; then the keys were changed so that four of the numbers had to be written with fingers other than those formerly used to write them. In the other experiment the subjects were trained to react with the right hand to 'red' and with the left hand to 'blue.' Then the stimuli were interchanged. In the former experiment an interference was found which decreased rapidly with practice. In the latter experiment the interference was overbalanced by the practice effect.

Hunter and Yarbrough (1917), Pearce (1917), and Hunter (1922) in three closely related studies of habit interference in the white rat in a T-shaped discrimination box found that a previous habit interfered with the formation of an 'opposite' habit.

Several studies have been published which were not primarily studies of interference, but which employed materials that were similar in nature to those employed in this research, and which are concerned with why it takes more time to name colors than to read color names. Several of these studies have been reviewed by Telford (1930) and by Ligon (1932). Only the vital point of these studies will be mentioned here.

The difference in time for naming colors and reading color names has been variously explained. Cattell (1886) and Lund (1927) have attributed the difference to 'practice.' Woodworth and Wells (1911, p. 52) have suggested that, "The real mechanism here may very well be the mutual interference of the five names, all of which, from immediately preceding use, are 'on the tip of the tongue,' all are equally ready and likely to get in one another's way." Brown (1915, p. 51) concluded "that the difference in speed between color naming and word reading does not depend upon practice" but that (p. 34) "the association process in naming simple objects like colors is radically different from the association process in reading printed words."

[p. 646] Garrett and Lemmon (1924, p. 438) have accounted for their findings in these words, "Hence it seems reasonable to say that interferences which arise in naming colors are due not so much to an equal readiness of the color names as to an equal readiness of the color recognitive processes. Another factor present in interference is very probably the present strength of the associations between colors and their names, already determined by past use." Peterson (1918 and 1925) has attributed the difference to the fact that, "One particular response habit has become associated with each word while in the case of colors themselves a variety of response tendencies have developed." (1925, p. 281.) As pointed out by Telford (1930), the results published by Peterson (1925, p. 281) and also published by Lund (1927, p. 425) confirm Peterson's interpretation.

Ligon (1932) has published results of a 'genetic study' of naming colors and reading color names in which he used 638 subjects from school grades 1 to 9 inclusive. In the light of his results he found all former explanations untenable (He included no examination of or reference to Peterson's data and interpretation.) and proceeded to set up a new hypothesis based upon a three factor theory, a common factor which he never definitely describes and special factors of word reading and color naming. He points out that the common factor is learned but the special factors are organic. He promises further evidence from studies now in progress.

The present problem grew out of experimental work in color naming and word reading conducted in Jesup Psychological Laboratory at George Peabody College For Teachers. The time for reading names of colors had been compared with the time for naming colors themselves. This suggested a comparison of the interfering effect of color stimuli upon reading names of colors (the two types of stimuli being presented simultaneously) with the interfering effect of word stimuli upon naming colors themselves. In other words, if the word 'red' is printed in blue ink how will the interference of the ink-color 'blue' upon reading the printed word 'red' compare with the interference of the [p. 647] printed word 'red' upon calling the name of the ink-color 'blue?' The increase in time for reacting to words caused by the presence of conflicting color stimuli is taken as the measure of the interference of color stimuli upon reading words. The increase in the time for reacting to colors caused by the presence of conflicting word stimuli is taken as the measure of the interference of word stimuli upon naming colors. A second problem grew out of the results of the first. The problem was, What effect would practice in reacting to the color stimuli in the presence of conflicting word stimuli have upon the reaction times in the two situations described in the first problem?


The materials employed in these experiments are quite different from any that have been used to study interference.[2] In former studies the subjects were given practice in responding to a set of stimuli until associative bonds were formed between the stimuli and the desired responses, then a change was made in the experimental 'set up' which demanded a different set of responses to the same set of stimuli. In the present study pairs of conflicting stimuli, both being inherent aspects of the same symbols, are presented simultaneously (a name of one color printed in the ink of another color -- a word stimulus and a color stimulus). These stimuli are varied in such a manner as to maintain the potency of their interference effect. Detailed descriptions of the materials used in each of the three experiments are included in the reports of the respective experiments.


The Effect of Interfering Color Stimuli Upon Reading Names of Colors Serially


When this experiment was contemplated, the first task was to arrange suitable tests. The colors used on the Woodworth Wells color-sheet were considered but two changes were deemed advisable. As the word test to be used in comparison with the
[p. 648] color test was to be printed in black it seemed well to substitute another color for black as an interfering stimulus. Also, because of the difficulty of printing words in yellow that would approximate the stimulus intensity of the other colors used, yellow was discarded. After consulting with Dr. Peterson, black and yellow were replaced by brown and purple. Hence, the colors used were red, blue, green, brown, and purple. The colors were arranged so as to avoid any regularity of occurrence and so that each color would appear twice in each column and in each row, and that no color would immediately succeed itself in either column or row. The words were also arranged so that the name of each color would appear twice in each line. No word was printed in the color it named but an equal number of times in each of the other four colors; i.e. the word 'red' was printed in blue, green, brown, and purple inks; the word 'blue' was printed in red, green, brown, and purple inks; the word 'blue' was printed in red, green, brown, and purple inks; etc. No word immediately succeeded itself in either column or row. The test was printed from fourteen point Franklin lower case type. The word arrangement was duplicated in black print from same type. Each test was also printed in the reverse order which provided a second form. The tests will be known as "Reading color names where the color of the print and the word are different" (RCNd),[3] and "Reading color names printed in black" (RCNb).

Subjects and Procedure:

Seventy college undergraduates (14 males and 56 females) were used as subjects. Every subject read two whole sheets (the two forms) of each test at one sitting. One half of the subjects of each sex, selected at random, read the tests in the order RCNb (form 1), RCNd (form 2), RCNd (form 1) and RCNb (form 2), while the other half reversed the order thus equating for practice and fatigue on each test and form. All subjects were seated so as to have good daylight illumination from the left side only. All subjects were in the experimental room a few minutes before beginning work to allow the eyes to adjust to light conditions. The subjects were volunteers and apparently the motivation was good.

A ten-word sample was read before the first reading of each test. The instructions were to read as quickly as possible and to leave no errors uncorrected. When an error was left the subject's attention was called to that fact as soon as the sheet was finished. On the signal "Ready! Go!" the sheet which the subject held face down was turned by the subject and read aloud. The words were followed on another sheet (in black print) by the experimenter and the time was taken with a stop watch to a fifth of a second. Contrary to instructions 14 subjects left a total of 24 errors uncorrected on the RCNd test, 4 was the maximum for any subject, and 4 other subjects left 1 error each on the RCNb test. As each subject made 200 reactions on each test this small number of errors was considered negligible. The work was done under good daylight illumination.

Results: Table 1 gives the means (m), standard deviations (), differences (D), probable error of the difference (P Ed), and the reliability of the difference (D / P Ed) for the whole group and for each sex.

Observation of the bottom line on the table shows that it [p. 649] took an average of 2.3 seconds longer to read 100 colors names printed in colors different from that named by the word than to read the same names printed in black. This difference is not reliable which is in agreement with Peterson's prediction made when the test was first proposed.

The means for the sex groups show no particular difference. An examination of the means and standard deviations for the two tests shows that the interference factor caused a slight increase in the variability for the whole group and for the female group, but a slight decrease for the male group.

Table II presents the same data arranged on the basis of college classification. Only college years one and two contain a sufficient number of cases for comparative purposes. They show no differences that approach reliability.


The Effect of Interfering Word Stimuli upon Naming Colors Serially


For this experiment the colors of the words in the RCNd test, described in Experiment I, were printed in the same order but in the form of solid squares () from 24 point type instead of words. This sort of problem will be referred to as the [p. 650] "Naming color test" (NC). The RCNd test was employed also but in a very different manner from that in Experiment I. In this experiment the colors of the print of the series of names were to be called in succession ignoring the colors named by the words; e.g. where the word 'red' was printed in blue it was to be called 'blue,' where it was printed in green it was to be called 'green,' where the word 'brown' was printed in red it was to be called 'red,' etc. Thus color of the print was to be the controlling stimulus and not the name of the color spelled by the word. This is to be known as the "Naming color or word test where the color of the print and the word are different" (NCWd). (See Appendix B. [sic - A?])

Subjects and Procedure:

One hundred students (88 college undergraduates, 29 males and 59 females, and 12 graduate students, all females) served as subjects. Every subject read two whole sheets (the two forms) of each test at one sitting. Half of the subjects read in the order NC, NCWd, NCWd, NC, and the other half in the order NCWd, NC, NC, NCWd, thus equating for practice and fatigue on the two tests. All subjects were seated (in their individual tests) near the window so as to have good daylight illumination from the left side. Every subject seemed to make a real effort.

A ten-word sample of each test was read before reading the test the first time. The instructions were to name the colors as they appeared in regular reading line as quickly as possible and to correct all errors. The methods of starting, checking errors, and timing were the same as those used in Experiment 1. The errors were recorded and for each error not corrected, twice the average time per word for the reading of the sheet on which the error was made was added to the time taken by the stop watch. This plan of correction was arbitrary but seemed to be justified by the situation. There were two kinds of failures to be accounted for: first, the failure to see the error: and second, the failure to correct it. Each phase of the situation gave the subject a time advantage which deserved taking note of. Since no accurate objective measure was obtainable and the number of errors was small the arbitrary plan was adopted. Fifty-nine percent of the group left an average of 2.6 errors uncorrected on the NCWd test (200 reactions) and 32 percent of the group left an average of 1.2 errors uncorrected on the NC test (200 reactions). The correction changed the mean on the NCWd test from 108.7 to 110.3 and the mean of the NC test from 63.0 to 63.3.


The means of the times for the NC and NCWd tests for the whole group and for each sex are presented in Table III along with the difference, the probable error of the [p. 651] difference, the reliability of the difference, and difference divided by the mean time for the naming color test.


The comparison of the results for the whole group on the NC and NCWd test given in the bottom line of the table indicates the strength of the interference of the habit of calling words upon the activity of naming colors. The mean time for 100 responses is increased from 63.3 seconds to 110.3 seconds or an increase of 74 percent. (The medians on the two tests are 61.9 and 110.4 seconds respectively.) The standard deviation is increased in approximately the same ratio from 10.8 to 18.8. The coefficient of variability remains the same to the third decimal place ( / m = .171). The difference between means may be better evaluated when expressed in terms of the variability of the group. The difference of 47 seconds is 2.5 standard deviation units in terms of the NCWd test or 4.35 standard deviation units on the NC test. The former shows that 99 percent of the group on the NCWd test was above the mean on the NC test (took more time); and the latter shows that the group as scored on the NC test was well below the mean on the NCWd test. These results are shown graphically in Fig. 1 where histograms and normal curves (obtained by the Gaussian formula) of the two sets of data are superimposed.

The small area in which the curves overlap and the 74 percent increase in the mean time for naming colors caused by the presence of word stimuli show the marked interference effect of the habitual response of calling words.

[p. 652] The means for the sex groups on the NCWd test show a difference of 3.6 seconds which is only 1.16 times its probable error; but the means on the NC test have a difference of 8.2 seconds which is 5.17 times its probable error. This reliable sex-difference favoring the females in naming colors agrees with the findings of Woodworth-Wells (1911), Brown (1915), Ligon (1932), etc.

The same data are arranged according to college classification in Table IV. There is some indication of improvement of the speed factor for both tests as the college rank improves. The relative difference between the two tests, however, remains generally the same except for fluctuations which are probably due to the variation in the number of cases.



The Effects of Practice upon Interference


The tests used were the same in character as those described in Experiments 1 and 2 (RCNb, RCNd, NC, and NCWd) with some revision. The NC test was printed in swastikas ( ) instead of squares (). Such a modification allowed white to appear in the figure with the color, as is the case when the color is presented in the printed word. This change also made it possible to print the NC test in shades which more nearly match those in the NCWd test. The order of colors was determined under one restriction other than those given in section 2. Each line contained one color whose two appearances were separated by only one other color. This was done to equate, as much as possible, the difficulty of the different lines of the test so that any section of five lines would approximate the difficulty of any other section of five lines. Two forms of the tests were printed; in one the order was the inverse of that in the other.

[p. 653] Subjects and Procedure:

Thirty-two undergraduates in the University of Arizona (17 males and 15 females), who offered their services, were the subjects. At each day's sitting 4 half-sheets of the same test were read, and the average time (after correction was made for errors according to the plan outlined in Experiment 2) was recorded as the day's score. Only a few errors were left uncorrected. The largest correction made on the practice test changed the mean from 49.3 to 49.6. The plan of experimentation was as follows:

On the 1st day the RCNb test was used to acquaint the subjects with the experimental procedure and improve the reliability of the 2d day's test. The RCNd test was given the 2d day and the 13th day to obtain a measure of the interference developed by practice on the NC and NCWd tests. The RCNd test was given the 14th day to get a measure of the effect of a day's practice upon the newly developed interference. The NC test was given the 3d and 12th days, just before and just after the real practice series, so that actual change in interference on the NCWd test might be known. The test schedule was followed in regular daily order with two exceptions. There were two days between test days 3 and 4, and also two between test days 8 and 9, in which no work was done. These irregularities were occasioned by week-ends. Each subject was assigned a regular time of day for his work throughout the experiment. All but two subjects followed the schedule with very little irregularity. These two were finally dropped from the group and their data rejected.

All of the tests were given individually by the author. The subject was seated near a window so as to have good daylight illumination from the left side. There was no other source of light. Every subject was in the experimental room a few minutes before beginning work to allow his eyes to adapt to the light conditions. To aid eye-adaptation and also to check for clearness of vision each subject read several lines in a current magazine. Every subject was given Dr. Ishihara's test for color vision. One subject was found to have some trouble with red-green color vision; and her results were discarded though they differed from others of her sex only in the number of errors made and corrected.


: The general results for the whole series of tests are shown in Table V which presents the means, standard deviations, and coefficients of variability for the whole group and for each sex separately, together with a measure of sex differences in terms of the probable error of the difference. Table VI, which is derived from Table V, summarizes the practice effects upon the respective tests. The graphical representation of the results in the practice series gives the learning curve presented in Fig. 2.
[p. 654]


The Effect of Practice on the NCWd Test upon Itself

The data to be considered here are those given in the section of Table V under the caption "Days of Practice on the NCWd Test." They are also presented in summary in the left section of Table VI and graphically in Fig. 2. From all [p. 655] three presentations it is evident that the time score is lowered considerably by practice.



Reference to Table VI shows a gain of 16.8 seconds or 33.9 percent of the mean of the 1st day's practice. The practice curve is found to resemble very much the 'typical' learning curve when constructed on [p. 656] time units.
The coefficient of variability is increased from .14 ± .012 to .19 ± .015. This difference divided by its probable error gives 2.60 which indicates that it is not reliable. The probability of a real increase in variability, however, is 24 to 1. Hence, practice on the NCWd test serves to increase individual differences.

An examination of the data of the sex groups reveals a differences in speed on the NCWd test which favors the females. This is to be expected as there is a difference in favor of females in naming colors. Though the difference is not reliable in any one case it exists throughout the practice series; indicating that the relative improvement is approximately the same for the two groups. This latter fact is also shown by the ratio of the difference between the halves of practice series to the first half. It is .185 for the males and .180 for the females.

The Effect of Practice on the NCWd Test upon the NC Test

The middle section of Table VI shows a gain on the NC test of 4.0 seconds or 13.9 percent of the initial score. This is only 23.7 percent of the gain on the NCWd test which means that less than one fourth of the total gain on the NCWd test is due to increase in speed in naming colors. The improvement is greater for the males, which is accounted for by the fact that there is more difference between naming colors and reading names of colors for the males than for the females.

The Effects in the RCNd Test of Practice on the NCWd and NC Tests

The right section of Table VI shows that the practice on the NCWd and NC tests resulted in heavy loss in speed on the RCNd test. A comparison of the right and left sections of the table shows that the loss on the RCNd test, when measured in absolute units, is practically equal to the gain on the NCWd test; when measured in relative units it is much greater. It is interesting to find that in ten short practice periods the relative values of opposing stimuli can be modified so greatly. [p. 657] There is little relation, however, between the gain in one case and the loss in the other. The correlation between gain and loss in absolute units is .262 ± .11, while the correlation between percent of gain and percent of loss is .016 ± .17, or zero. This is what one might expect.

From a consideration of the results of the two applications of the RCNd test given in the final tests of Table V, it is evident that the newly developed interference disappears very rapidly with practice. From one day to the next the mean decreases from 34.8 to 22.0 seconds. This indicates that renewing the effectiveness of old associations which are being opposed by newly formed ones is easier than strengthening new associations in opposition to old well established ones.

The variability of the group is increased by the increase in interference due to practice on the NCWd test. The coefficient of variability increases from .15 ± .013 to .34 ± .031, the difference divided by its probable error being 5.65. This is not surprising as the degree of the interference varies widely from different subjects. Its degree is determined by the learning on the practice series which is shown by the individual results to vary considerably. One day's practice on the RCNd test reduced the variability from .34 ± .031 to .25 ± .022. The decrease in variability is 2.3 times its probable error.

The data from this experiment present interesting findings on the effect of practice upon individual differences. The results which have already been discussed separately are presented for comparison in Table VII.


[p. 658] These results show that practice increases individual differences where a stimulus to which the subjects have an habitual reaction pattern is interfering with reactions to a stimulus for which the subjects do not have an habitual reaction pattern (the word stimulus interfering with naming colors, NCWd test); but decreases individual differences where a stimulus to which the subjects do not have an habitual reaction pattern is interfering with reactions to a stimulus for which the subjects have an habitual reaction pattern (the color stimulus interfering with reading words -- RCNd test). There are two other variables involved, however: initial variability and length of practice. Thus in the NCWd test the initial variability was less, the difficulty greater, and the practice greater than in the RCNd test. These findings lend some support to Peterson's hypothesis, "Subjects of normal heterogeneity would become more alike with practice on the simpler processes or activities, but more different on the more complex activities" (Peterson and Barlow, 1928, p. 228).

A sex difference in naming colors has been found by all who have studied color naming and has been generally attributed to the greater facility of women in verbal reactions than of men. There is some indication in our data that this sex difference may be due to the difference in the accustomed reaction of the two sexes to colors as stimuli. In other words responding to a color stimulus by naming the color may be more common with females than with males. This difference is probably built up through education. Education in color is much more intense for girls than for boys as observing, naming, and discussing colors relative to dress is much more common among girls than among boys. The practice in naming colors in the NCWd test decreased the difference between the sex groups on the NC test from a difference 5.38 times its probable error to a difference 2.99 times its probable error. This decrease in the difference due to practice favors the view that the difference has been acquired and is therefore a product of training.

[p. 659] SUMMARY

1. Interference in serial verbal reactions has been studied by means of newly devised experimental materials. The source of the interference is in the materials themselves. The words red, blue, green, brown, and purple are used on the test sheet. No word is printed in the color it names but an equal number of times in each of the other four colors; i.e. the word 'red' is printed in blue, green, brown, and purple inks; the word 'blue' is printed in red, green, brown, and purple inks; etc. Thus each word presents the name of one color printed in ink of another color. Hence, a word stimulus and a color stimulus both are presented simultaneously. The words of the test are duplicated in black print and the colors of the test are duplicated in squares or swastikas. The difference in the time for reading the words printed in colors and the same words printed in black is the measure of the interference of color stimuli upon reading words. The difference in the time for naming the colors in which the words are printed and the same colors printed in squares (or swastikas) is the measure of the interference of conflicting word stimuli upon naming colors.

2. The interference of conflicting color stimuli upon the time for reading 100 words (each word naming a color unlike the ink-color of its print) caused an increase of only 2.3 seconds or 5.6 percent over the normal time for reading the same words printed in black. This increase is not reliable. But the interference of conflicting word stimuli upon the time for naming 100 colors (each color being the print of a word which names another color) caused an increase of 47.0 seconds or 74.3 percent of the normal time for naming colors printed in squares.

These tests provide a unique basis (the interference value) for comparing the effectiveness of the two types of associations. Since the presence of the color stimuli caused no reliable increase over the normal time for reading words (D / PEd = 3.64) and the presence of word stimuli caused a considerable increase over the normal time for naming colors (4.35 standard deviation units) the associations that have been [p. 660] formed between the word stimuli and the reading response are evidently more effective than those that have been formed between the color stimuli and the naming response. Since these associations are products of training, and since the difference in their strength corresponds roughly to the difference in training in reading words and naming colors, it seems reasonable to conclude that the difference in speed in reading names of colors and in naming colors may be satisfactorily accounted for by the difference in training in the two activities. The word stimulus has been associated with the specific response 'to read,' while the color stimulus has been associated with various responses: 'to admire,' 'to name,' 'to reach for,' 'to avoid,' etc.

3. As a test of the permanency of the interference of conflicting word stimuli to naming colors eight days practice (200 reactions per day) were given in naming the colors of the print of words (each word naming a color unlike the ink-color of its print). The effects of this practice were as follows: 1. It decreased the interference of conflicting word stimuli to naming colors but did not eliminate it. 2. It produced a practice curve comparable to that obtained in many other learning experiments. 3. It increased the variability of the group. 4. It shortened the reaction time to colors presented in color squares. 5. It increased the interference of conflicting color stimuli upon reading words.

4. Practice was found either to increase or to decrease the variability of the group depending upon the nature of the material used.

5. Some indication was found that the sex difference in naming colors is due to the difference in the training of the two sexes.

(Manuscript received August 15, 1934)


[1] The writer wishes to acknowledge the kind assistance received in the preparation of this thesis. He is indebted to Dr. Joseph Peterson for encouragement, helpful suggestions, and criticism of the manuscript; to Major H. W. Fenker, a graduate student in psychology, for helpful suggestions relative to preparation of the manuscript; to Drs. J. Peterson, S. C. Garrison, M. R. Schneck, J. E. Caster, O. A. Simley, W. F. Smith, and to Miss M. Nichol for aid in securing subjects; to some three hundred college students who served as subjects; and to William Fitzgerald of The Peabody Press for substantial assistance in the printing of the test materials.

[2] Descoeudres (1914) and also Goodenough and Brian (1929) presented color and form simultaneously in studying their relative values as stimuli.

[3] In Appendix Awill be found a key to all symbols and abbreviations used in this paper.


BAIR, J. H., The practice curve: A study of the formation of habits. Psychol. Rev. Monog. Suppl., 1902 (No. 19), 1-70.

BERGSTRÖM, J. A., Experiments upon physiological memory. Amer. J. Psychol., 1893, 5, 356-359.

BERGSTRÖM, J. A., The relation of the interference of the practice effect of an association. Amer. J. Psychol., 1894, 6, 433-442.

BOWDITCH, H. P., and WARREN, J. W., The knee-jerk and its physiological modifications. J. Physiology, 1890, 11, 25-46.

BROWN, WARNER, Practice in associating color names with colors. Psychol. Rev., 1915, 22, 45-55.

BROWN, WARNER, Habit interference in card sorting. Univ. of Calif. Studies in Psychol., 1914, V. i, No. 4.

CATTELL, J. McK., The time it takes to see and name objects. Mind, 1886, 11, 63-65.

CULLER, A. J., Interference and adaptability. Arch. of Psychol., 1912, 3 (No. 24), 1-80.

DESCOEUDRES, A., Couleur, forme, ou nombre. Arch. de psychol., 1914, 14, 305-341.

GARRETT, H. E., and LEMMON, V. W., An analysis of several well-known tests. J. Appld. Psychol., 1924, 8, 424-438.

GOODENOUGH, F. L., and BRIAN, C. R., Certain factors underlying the acquisition of motor skill by pre-school children. J. EXPER. PSYCHOL., 1929, 12, 127-155.

HUNTER, W. S., and YARBROUGH, J. U., The interference of auditory habits in the white rat. J. Animal Behav., 1917, 7, 49-65.

HUNTER, W. S., Habit interference in the white rat and in the human subject. J. Comp. Psychol., 1922, 2, 29-59.

KLINE, L. W., An experimental study of associative inhibition. J. EXPER. PSYCHOL., 1921, 4, 270-299.

LESTER, O. P., Mental set in relation to retroactive inhibition. J. EXPER. PSYCHOL., 1932, 15, 681-699.

LIGON, E. M. A., Genetic study of color naming and word reading. Amer. J. Psychol., 1932, 44, 103-121.

LUND, F. H., The role of practice in speed of association. J. EXPER. PSYCHOL., 1927, 10, 424-433.

MÜLLER, G. E., and SCHUMANN, F., Experimentalle Beiträge zu Untersuchung des Gedächtnisses. Zsch. f. Psychol., 1894, 6, 81-190.

MÜNSTERBERG, HUGO, Gedächtnisstudien. Beiträge zur Experimentellen Psychologie, 1892, 4, 70.

PEARCE, BENNIE D., A note on the interference of visual habits in the white rat. J. AnimalBehav., 1917, 7, 169-177.

PETERSON, J., and BARLOW, M. C., The effects of practice on individual differences. The 27th Year Book of Nat. Soc. Study of Educ., Part II, 1928, 211-230.

PETERSON, J., LANIER, L. H., and WALKER, H. M., Comparisons of white and negro children. J. Comp. Psychol., 1925, 5, 271-283.

PETERSON, J., and DAVID, Q. J., The psychology of handling men in the army. Minneapolis, Minn. Perine Book Co., 1918, pp. 146.

SHEPARD, J. F., and FOGELSONGER, H. M., Association and inhibition. Psychol. Rev., 1913, 20, 291-311.

TELFORD, C. W., Differences in responses to colors and their names. J. Genet. Psychol., 1930, 37, 151-159.

WOODWORTH, R. S., and WELLS, F. L., Association tests. Psychol. Rev. Monog. Suppl., 1911, 13 (No. 57), pp. 85.

Appendix A

A Key to Symbols and Abbreviations

NC         Naming Colors.
NCWd    Naming the Colors of the Print of Words Where the Color of the Print and the Word are Different.
RCNb     Reading Color Names Printed in Black Ink.
RCNd     Reading Color Names Where the Color of the Print and the Word are Different.
D            Difference.
D / P Ed Difference divided by the probable error of the difference.
M & F    Males and Females.
P Ed       Probable error of the difference.
Sigma or standard deviation.
/ mStandard deviation divided by the mean.






One thought on “The Stroop Effect Research Paper

Leave a Reply

Your email address will not be published. Required fields are marked *