Featured

Visualizing AI vs. Human Performance In Technical Tasks

The gap between human and machine reasoning is narrowing...and fast.

Over the past year, AI systems have continued to see rapid advancements, surpassing human performance in technical tasks where they previously fell short, such as advanced math and visual reasoning.

This graphic, via Visual Capitalist's Kayla Zhu, visualizes AI systems’ performance relative to human baselines for eight AI benchmarks measuring tasks including:

  1. Image classification

  2. Visual reasoning

  3. Medium-level reading comprehension

  4. English language understanding

  5. Multitask language understanding

  6. Competition-level mathematics

  7. PhD-level science questions

  8. Multimodal understanding and reasoning

visualizing ai vs human performance in technical tasks

This visualization is part of Visual Capitalist’s AI Week, sponsored by Terzo. Data comes from the Stanford University 2025 AI Index Report.

An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks.

AI Models Are Surpassing Humans in Technical Tasks

Below, we show how AI models have performed relative to the human baseline in various technical tasks in recent years.

YearPerfomance relative to the human baseline (100%)Task
201289.15%Image classification
201391.42%Image classification
201496.94%Image classification
201599.47%Image classification
2016100.74%Image classification
201680.09%Visual reasoning
2017101.37%Image classification
201782.35%Medium-level reading comprehension
201786.49%Visual reasoning
2018102.85%Image classification
201896.23%Medium-level reading comprehension
201886.70%Visual reasoning
2019103.75%Image classification
201936.08%Multitask language understanding
2019103.27%Medium-level reading comprehension
201994.21%English language understanding
201990.67%Visual reasoning
2020104.11%Image classification
202060.02%Multitask language understanding
2020103.92%Medium-level reading comprehension
202099.44%English language understanding
202091.38%Visual reasoning
2021104.34%Image classification
20217.67%Competition-level mathematics
202166.82%Multitask language understanding
2021104.15%Medium-level reading comprehension
2021101.56%English language understanding
2021102.48%Visual reasoning
2022103.98%Image classification
202257.56%Competition-level mathematics
202283.74%Multitask language understanding
2022101.67%English language understanding
2022104.36%Visual reasoning
202347.78%PhD-level science questions
202393.67%Competition-level mathematics
202396.21%Multitask language understanding
202371.91%Multimodal understanding and reasoning
2024108.00%PhD-level science questions
2024108.78%Competition-level mathematics
2024102.78%Multitask language understanding
202494.67%Multimodal understanding and reasoning
2024101.78%English language understanding

From ChatGPT to Gemini, many of the world’s leading AI models are surpassing the human baseline in a range of technical tasks.

The only task where AI systems still haven’t caught up to humans is multimodal understanding and reasoning, which involves processing and reasoning across multiple formats and disciplines, such as images, charts, and diagrams.

However, the gap is closing quickly.

In 2024, OpenAI’s o1 model scored 78.2% on MMMU, a benchmark that evaluates models on multi-discipline tasks demanding college-level subject knowledge.

This was just 4.4 percentage points below the human benchmark of 82.6%. The o1 model also has one of the lowest hallucination rates out of all AI models.

This was major jump from the end of 2023, where Google Gemini scored just 59.4%, highlighting the rapid improvement of AI performance in these technical tasks.

To dive into all the AI Week content, visit our AI content hub, brought to you by Terzo.

To learn more about the global AI industry, check out this graphic that visualizes which countries are winning the AI patent race.

via April 29th 2025