In spite of the negative connotation caused by the two words, author and researcher Richard Phelps is a true believer in standardized testing. At the 15th Congress of the World Association for Educational Research in Marrakesh, Morocco in June of 2007, Phelps presented his theories on the benefits of standardized testing,
As part of his early remarks, Phelps acknowledges that standardized testing is the “greatest single social contribution of modern psychology, and may be the most useful evaluation method available for human resource-intensive endeavors.” However, he does manage to address both sides of the issue.
In his research, Phelps brings some interesting data to the table, including a study on the evaluation of student work, which shows an incredible range between the lowest and highest scores in several instances. To my surprise, his studies date from the early 1910’s to today. As the author notes, studies on teacher grading objectivity have been going on for decades, and the results are always the same. As far as standardized testing is concerned, Phelps dates their existence all the way back to the Chinese civil service exam, which began centuries ago.
It is evident that this author brings much research to the table to show both sides of the issue. He takes the trouble to examine the arguments against “test bias” and “discrimination.” Like any good persuasive arguer, Phelps defends his argument while answering the critics in the process.
The Role and Importance of Standardized Testing in the World of Teaching and Training
Phelps begins his study by asking one simple question, “Why standardized testing?” He readily admits that they are not perfect evaluation tools, but can provide information that no other evaluation can provide.
His main argument is a strong one, and the author has surprising evidence backing him up. Phelps’ main supporting argument for standardized testing is that without it, we would have to rely more on individual teacher grading and testing. At face value, this doesn’t seem valid, but there is almost 100 years of research backing up this point. The first study used comes from researchers Starch and Elliott (1912) who made copies of two actual English examinations and sent them to teachers to grade and return. To their surprise, the grades ranged from 50 to 98 percent. Of the 142 teachers used for this study, 14 scored the paper below 80 percent, while 14 scored it above 94.
Surprised by the results, the pair repeated the procedure with an exam from another content area (Geometry.) The results were more stunning as these grades ranged from 28 to 92 percent. In this case, twenty of the 116 papers were scored below 60 percent, and nine above 85. Later researchers found the same results. In essence, teachers’ marks are an unreliable means of measurement.
Further research on the topic has enlightened as to why this has occurred. Other studies have shown that American teachers consider “nearly everything” when grading student work, including class participation, perceived effort, student progress, and other factors. In one particular study, it was shown that 66 percent of teachers felt that their perception of a student’s ability should be taken into consideration in awarding the final grade (Frary, Cross, & Weber 1993). Needless to say, standardized tests do not reflect how many absences a student has or how well a student participates in class. There is no room for any type of bias, whether it is gender, ethnic, or class. Phelps wraps up this section of standardized test defense by stating that “it is more than an antidote to biased judgment. We need standardized tests because each of us is a prisoner of our own limited experiences and observations.” He also goes on to say that these tests provide an opportunity to be free of subjectivity, whether it is due to bias or Bayesian (time-saving) shortcuts.
Looking Far Into The Past
As previously mentioned, Phelps goes far back for research to support his points. His first found use of the standardized test dates back to the administration of the Chinese civil service exam many centuries ago (Zeng, 1999, 8). This is a remedial example of the test, and the author adds that the “scientific” standardized test is actually about 100 years old.
Because of the long use of standardized tests, Phelps’ second argument is that testing technology has improved at an amazing rate in a brief period. There are many reasons for this, including increased complexity and sophistication in the product, the ability to provide more information for the price, and a better format, with more reliability, fairness, and validity than its predecessors.
While admitting that quick improvement in a product carries some risk, the author also argues that they have improved in quality and convenience, and actually become more difficult for the average person or policymaker to understand. Phelps does harbor negative feelings toward policy makers, especially when he discusses the No Child Left Behind act. He feels that the newfound complexity of testing for public purposes has been lost on the politicians and policy makers who have chosen other reasons to use standardized testing.
The Debate Continues: Are There Special Interests?
Phelps continues his study with a long discussion about the ongoing debate with regard to standardized testing, and how the debates are “primitive and one-sided.” He goes on to explain the reason for this by citing a theory from the late economist Mancur Olsen (1965, 1982), which explained the political power of “special interests” in democratic societies.
Here’s Olsen’s argument. Individuals join specialized groups with political power, such as a professional association of educators. The members receive benefits and become entrenched in the status quo. Increased benefits, such as the absence of standardized testing programs, come at a cost (lowered student achievement.) Over time, the wealthy and powerful groups become more accepting of the faulty system because of the benefits they have received in the past.
Since there is an extensive breakdown of governance in the educational systems, from the Federal, to state, to local levels, there are numerous opportunities to saturate the country with preferred policy related information, while blocking out contrary points of view. Olsen’s feels that the importance of standardized testing got lost in the political shuffle, and make it a point to argue that the supporting literature is hard to locate. Phelps views this as unnecessary censorship.
Response – Is Phelps Creditable?
I was impressed with the breadth of Phelps’ findings. It is remarkable to uncover findings from a 90-year-old study and realize that the findings are arguably valid in 2008. Phelps appears more credible by stating that many other studies over the past 90 years have supported the argument of wide variance in teacher grading. I do wish Phelps had chosen to identify more of these studies, but realize that this paper was presented at a global conference and may have required parameters, including a content limit.
Still, the author does a good job at making his argument for standardized testing and directing his points toward the most explosive topic in education today, the No Child Left Behind Act. Personally, I believe that there can be some teacher bias in grading, but am surprised to find that one credible (I assume) teacher scored a paper 98 while another gave the same paper a 50.
I am not sure that a national standardized testing system is the best answer. I am not sure that Phelps is convinced of this either. It appears that his point is that the forces of censorship and suppression should be removed so that the public can have a better look at the benefits of standardized testing. With all of the knowledge in hand, the American public will be in a better position to make up its collective mind. Without all of the information, we leave these decisions to the policymakers and keepers of the status quo who may not have the best credentials to make these decisions.
References
Frary, R. B., Cross, L. H., & Weber, L. J. (1993). Testing and grading practices and opinions of secondary school teachers of academic subjects: Implications for instruction in measurement, Educational Measurement: Issues and Practice, 12(3), 23+.
Olson, M. (1965). The logic of collective action: Public goods and the theory of groups,
Cambridge, MA, USA: Harvard University Press.
Olson, M. (1982). The rise and decline of nations: Economic growth, stagflation, and social
rigidities, New Haven, CT, USA: Yale University Press.
Phelps, R. P. (2003). Kill the messenger: The war on standardized testing. New Brunswick, NJ,
USA: Transaction Publishers.
Phelps, R. P., Ed. (2005a). Defending standardized testing. Mahwah, NJ, USA: Lawrence
Erlbaum.
Phelps, R.P. (2007a). The dissolution of education knowledge. Educational Horizons, 85(4),
232–247.
Phelps, R.P. (2008). Educational achievement testing fallacies, Chapter 3 in R.P. Phelps (Ed.),
Correcting fallacies about educational and psychological testing. Washington, DC, USA:
American Psychological Association.
Phelps, R.P. (2008). The role and importance of standardized testing in the world of teaching
and training. Nonpartisan Education Review / Essays, 4(3). Retrieved [date] from:
http://npe.educationnews.org/Review/Essays/v4n3.htm
and training. Nonpartisan Education Review / Essays, 4(3). Retrieved [date] from:
http://npe.educationnews.org/Review/Essays/v4n3.htm