AI4 views5 min read

AI Error Causes Incorrect Scores for Over 1,000 MCAS Essays

A technical glitch in an AI grading system led to incorrect scores for approximately 1,400 MCAS essays across 145 Massachusetts school districts.

Jessica Albright
By
Jessica Albright

Jessica Albright is an education technology correspondent for Neurozzio. She reports on the integration of emerging technologies like AI in educational systems, focusing on policy, classroom application, and student data privacy.

Author Profile
AI Error Causes Incorrect Scores for Over 1,000 MCAS Essays

The Massachusetts Department of Elementary and Secondary Education (DESE) has confirmed that a technical issue with an artificial intelligence grading system resulted in approximately 1,400 student essays receiving incorrect scores. The error affected the Massachusetts Comprehensive Assessment System (MCAS) and was identified during a routine review period.

Key Takeaways

  • An AI grading system incorrectly scored approximately 1,400 MCAS student essays.
  • The error was caused by a temporary technical issue with the testing contractor, Cognia.
  • A teacher in Lowell first discovered the discrepancy, prompting a statewide investigation.
  • All affected essays have been rescored, and the issue was resolved in early August.
  • The incident has opened a discussion on the reliability of AI for high-stakes educational assessments.

State Confirms Scoring Inaccuracy

Officials from the Massachusetts DESE announced that a flaw in its AI-powered grading software led to inaccurate scores for a portion of the 750,000 essays submitted for the MCAS tests. The problem impacted an average of one to two students in each of the 145 districts involved.

In a statement, a DESE spokesperson explained the department's quality control process. "As one way of checking that MCAS scores are accurate, the Department of Elementary and Secondary Education releases preliminary MCAS results to districts and gives them time to report any issues during a discrepancy period each year," the spokesperson said.

It was during this designated review period that the scoring anomalies came to light. The department confirmed the issue was fully resolved by early August.

By the Numbers

  • 1,400: Approximate number of essays scored incorrectly.
  • 750,000: Total number of essays graded in the system.
  • 145: Number of school districts affected by the error.
  • 0.18%: Percentage of total essays impacted by the technical issue.

How a Local Teacher Uncovered the Error

The statewide correction was initiated thanks to the sharp eye of a teacher at the Reilly Elementary School in Lowell. According to Wendy Crocker-Roberge, Assistant Superintendent of Lowell Public Schools, the teacher noticed a significant gap between the quality of student essays and the preliminary scores they received in mid-July.

The teacher reported the findings to the school principal and district leadership, who then escalated the concern to the state. Crocker-Roberge highlighted a particularly glaring example of the scoring problem.

"Two essays appeared to be off by one and two points each, but the third essay, which was originally scored a zero out of seven, I re-rated as a six out of seven," Crocker-Roberge stated.

This single observation triggered a broader investigation by DESE and its testing contractor, Cognia, leading to the discovery of the systemic technical glitch.

Corrective Actions and System Review

Once DESE was notified of the potential problem, it instructed Cognia to launch a full investigation. The contractor identified the technical issue and immediately began a rescoring process for all affected essays.

In addition to correcting the known errors, Cognia also reviewed score distributions from randomly selected batches of essays to ensure no other discrepancies were present. DESE officials have assured school districts that the problem has been contained and corrected ahead of the official release of MCAS scores to families this fall.

The Role of AI in Standardized Testing

The use of AI to grade standardized tests like the MCAS is a relatively new practice. The system is trained on a large set of essays that have been scored by human experts. The AI then learns to apply the same scoring rubric to new essays. To maintain accuracy, a percentage of AI-graded essays are cross-checked by human graders. In this case, 10% of essays were reviewed by a person to check for discrepancies.

Debate on AI in High-Stakes Assessments

This incident has renewed conversations among educators about the readiness of AI for high-stakes testing. While AI offers speed and efficiency, its reliability remains a key concern.

Crocker-Roberge acknowledged the potential benefits of the technology. "AI grading certainly has the potential to turn important student performance data around for schools expeditiously, which helps schools to plan for improvement," she said.

However, she also questioned whether the current safeguards are sufficient. "With time, AI will become more accurate at scoring, but it is possible that the 90-10 ratio of AI to human scoring is not yet sufficient to achieve the accuracy rates desired for high-stakes reporting," she added.

As school districts across the country explore AI integration, this event in Massachusetts serves as a critical case study on the importance of robust human oversight in automated educational assessment tools. The final, corrected MCAS score reports are expected to be available to students and their families in the coming weeks.