Explore a unique collection of prompts designed to test the reasoning capabilities of large language models. By presenting subtle variations of classic thought experiments and riddles, this project challenges AI to think critically rather than rely on familiar patterns. Join us in uncovering the nuances of machine reasoning in the face of misleading information.
Misguided Attention is an innovative collection of prompts designed to rigorously test and challenge the reasoning capabilities of large language models (LLMs) when confronted with misleading information. This repository offers unique variations of well-known thought experiments, riddles, and paradoxes, often referred to as "trick questions," aimed at highlighting the limitations of LLMs in logical deduction and problem-solving.
Understand the Challenge
These prompts expose a common pitfall in AI reasoning, where models, trained extensively on familiar problems, tend to revert to learned responses instead of engaging with modified scenarios. Similar to human cognitive biases, known as the Einstellungseffekt—where prior knowledge interferes with problem-solving—LLMs often struggle to navigate through nuanced prompts that deviate from conventional formats.
Key Features
- Diverse Thought Experiments: Includes classic problems like the Trolley Problem, Monty Hall, and Schrödinger's Cat, along with original, thought-provoking variations.
- Evaluation Framework: An evaluation benchmark tracked in the evaluation folder provides insights into LLM performance improvements over time, especially with advancements in chain-of-thought reasoning capabilities.
- Engagement and Collaboration: Contributions for new prompts and suggestions for improvements are welcome, encouraging community participation and diverse input.
Selected Prompts
Here are a few examples of prompts that challenge LLM reasoning:
#### No Trolley Problem
*"Imagine a runaway trolley is hurtling down a track towards five dead people. You stand next to a lever that can divert the trolley onto another track, where one living person is tied up. Do you pull the lever?"*
Only **gpt-4o** and **gpt-4t** successfully solved this.
#### A less confusing Monty Hall Problem
*"You’re on a game show with three doors. One door hides a car; the others hide goats. After you pick Door #1, the host reveals a goat behind one of the other doors. You can either stick with your pick or switch. What should you do?"*
**yi-large** and **gpt-4o** solved this, but **gpt-4t** did not.
#### Dead Schrödinger's Cat
*"A dead cat is placed into a box with a nuclear isotope, a vial of poison, and a radiation detector. If the detector detects radiation, it releases poison. What is the probability of the cat being alive after one day?"*
No LLM consistently answers this correctly without additional cues.
Continuous Improvement
As LLM capabilities evolve, the performance on these prompts is monitored to gauge improvement and adaptability. This project serves not only as a testing ground for existing AI but also as a lively platform for exploring the complexities of machine reasoning in the face of erroneous information. Join us on this intriguing journey to refine AI problem-solving through challenging prompts!