Researchers from Cambridge University and the Hebrew University of Jerusalem have uncovered surprising insights into how artificial intelligence models process information. By presenting ChatGPT with a 2,400-year-old mathematical problem, they observed that the AI's mistake on a follow-up question suggested a form of learner-like improvisation rather than simple data retrieval.
Key Takeaways
- Scientists tested ChatGPT with the ancient Greek "doubling the square" geometry problem.
- While the AI solved the initial problem, it failed a related task involving a rectangle, making a novel error.
- Researchers believe this mistake was not from its training data, suggesting the AI was improvising a solution.
- The findings indicate that large language models may exhibit behaviors similar to human learners navigating new challenges.
- The study highlights the importance of teaching students to critically evaluate AI-generated information.
A 2,400-Year-Old Test for a Modern AI
The experiment centered on a classic geometry puzzle first documented by the Greek philosopher Plato around 385 B.C.E. Known as the "doubling the square" problem, it challenges a student to construct a square with exactly twice the area of an original square.
In Plato's dialogue, a student initially makes the intuitive but incorrect assumption of simply doubling the length of the sides. The correct, non-obvious solution involves using the diagonal of the original square as the side for the new, larger square.
Researchers selected this problem specifically because its solution is more conceptual than textual. Large language models (LLMs) like ChatGPT are trained primarily on vast amounts of text, making it less likely that they would have encountered this geometric proof in a format they could easily replicate. This setup provided a unique opportunity to test the AI's problem-solving capabilities.
The Philosophical Debate
For centuries, philosophers have used the "doubling the square" problem to debate the nature of knowledge. The core question is whether knowledge is innate and uncovered through reason, or if it is acquired purely through experience and instruction. Posing this problem to an AI extends this ancient debate into the modern technological era.
An Unexpected Mistake Reveals Deeper Insights
Initially, ChatGPT successfully navigated the classic problem. However, the research team, led by Nadav Marco and Andreas Stylianides, then presented the AI with a related but distinct challenge: double the area of a rectangle using similar geometric principles.
In response, ChatGPT incorrectly stated that no geometric solution was possible. It reasoned that because the diagonal of a rectangle cannot be used to double its area in the same way a square's diagonal can, the problem was unsolvable through geometry.
This answer was factually wrong, as a geometric solution does exist. More importantly, the researchers noted that the likelihood of this specific false claim appearing in ChatGPT's training data was "vanishingly small." This suggests the AI was not merely repeating information it had learned. Instead, it appeared to be generating a new, albeit flawed, hypothesis based on its experience with the previous square problem.
"When we face a new problem, our instinct is often to try things out based on our past experience," stated Nadav Marco of the Hebrew University of Jerusalem. "In our experiment, ChatGPT seemed to do something similar. Like a learner or scholar, it appeared to come up with its own hypotheses and solutions."
Mimicking Human Learning Patterns
The AI's behavior mirrors a known concept in educational psychology called the Zone of Proximal Development (ZPD). This theory describes the gap between what a student can do independently and what they can achieve with guidance.
The researchers propose that ChatGPT may be spontaneously operating within a similar framework. When given the right prompts, it can tackle novel problems not explicitly covered in its training data by building upon related concepts. Its mistakes, like those of a human student, are part of this learning-like process.
The AI Black Box
This study touches upon the "black box" problem in AI, where the internal processes an AI uses to arrive at an answer are often untraceable and opaque to its creators. Observing these learner-like behaviors provides clues about what might be happening inside these complex systems.
This observation has significant implications for how we understand AI capabilities. It suggests a more dynamic process than simple pattern recognition, pointing toward a system that can generalize and improvise when faced with unfamiliar tasks.
Implications for Education and Future Research
The study, published in the International Journal of Mathematical Education in Science and Technology, underscores a critical need for new skills in an AI-driven world. As students increasingly turn to AI for help, they must learn to approach these tools with a critical mindset.
"Unlike proofs found in reputable textbooks, students cannot assume that ChatGPT's proofs are valid," explained Professor Andreas Stylianides of the University of Cambridge. "Understanding and evaluating AI-generated proofs are emerging as key skills that need to be embedded in the mathematics curriculum."
The research team advocates for better prompt engineering in educational settings. For instance, instructing an AI with phrases like "I want us to explore this problem together" rather than "tell me the answer" can foster a more collaborative and effective learning environment.
Future Directions for Study
While the researchers caution against over-interpreting the results to mean that LLMs "think" like humans, they identify several areas for future investigation:
- Testing newer and more advanced AI models on a broader range of mathematical challenges.
- Integrating LLMs with dynamic geometry software to create richer, more interactive learning environments.
- Exploring how teachers and students can use AI collaboratively to enhance intuitive exploration and problem-solving in the classroom.
Ultimately, the study shifts the focus from whether an AI gets the right answer to how it arrives at its conclusions, revealing a process that is surprisingly similar to human learning.