Incident 21: Tougher Turing Test Exposes Chatbots’ Stupidity (migrated to Issue)

Description: The 2016 Winograd Schema Challenge highlighted how even the most successful AI systems entered into the Challenge were only successful 3% more often than random chance. This incident has been downgraded to an issue as it does not meet current ingestion criteria.

Tools

New Report New Response DiscoverView History

Entities

View all entities

Alleged: Researchers developed and deployed an AI system, which harmed Researchers.

Incident Stats

Incident ID

Report Count

Incident Date

2016-07-14

Editors

Sean McGregor

CSETv1 Taxonomy Classifications

Taxonomy Details

GMF Taxonomy Classifications

Taxonomy Details

Known AI Goal

Question Answering

Known AI Technology

Language Modeling, Distributional Learning

Potential AI Technology

Transformer

Potential AI Technical Failure

Generalization Failure, Dataset Imbalance, Underfitting, Context Misidentification

CSETv0 Taxonomy Classifications

Taxonomy Details

Full Description

The Winograd Schema Challenge in 2016 highlighted shortcomings of an artificially intelligent system's ability to understand context. The Challenge is designed to present ambiguous sentences and ask AI systems to decipher them. In the Winograd Scheme Challenge, the two winning entries were successful 48% of the time, while random chance was correct 45% of the time. Quan Liu of the University of Science and Technology of China (partnering with University of Toronto and National Research Council of Canada) and Nicos Isaak of the Open University of Cyprus presented the most successful systems. It is notable that Google and Facebook did not participate.

Short Description

The 2016 Winograd Schema Challenge highlighted how even the most successful AI systems entered into the Challenge were only successful 3% more often than random chance.

Severity

Unclear/unknown

AI System Description

Artificially intelligent systems meant to understand ambiguous English sentences.

Sector of Deployment

Professional, scientific and technical activities

Relevant AI functions

Perception, Cognition, Action

Location

New York, NY

Named Entities

Winograd Schema Challenge, University of Science and Technology of China, Quan Liu, University of Toronto, National Research Council of Canada, Nicos Isaak, Open University of Cyprus

Technology Purveyor

Quan Liu, Nicos Isaak

Beginning Date

2016-01-01T00:00:00.000Z

Ending Date

2016-01-01T00:00:00.000Z

Near Miss

Unclear/unknown

Intent

Unclear

Lives Lost

Incident Reports

Reports Timeline

AI Incident Database Incidents Converted to Issues

github.com

github.com · 2022

The following former incidents have been converted to "issues" following an update to the incident definition and ingestion criteria.

21: Tougher Turing Test Exposes Chatbots’ Stupidity

Description: The 2016 Winograd Schema Challenge highli…

Variants

A "variant" is an incident that shares the same causative factors, produces similar harms, and involves the same intelligent systems as a known AI incident. Rather than index variants as entirely separate incidents, we list variations of incidents under the first similar incident submitted to the database. Unlike other submission types to the incident database, variants are not required to have reporting in evidence external to the Incident Database. Learn more from the research paper.

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

Inappropriate Gmail Smart Reply Suggestions

Similar Incidents

By textual similarity

Did our AI mess up? Flag the unrelated incidents

Incident 21: Tougher Turing Test Exposes Chatbots’ Stupidity (migrated to Issue)

Tools

Entities

Incident Stats

CSETv1 Taxonomy Classifications

GMF Taxonomy Classifications

CSETv0 Taxonomy Classifications

Incident Reports

Reports Timeline

AI Incident Database Incidents Converted to Issues

AI Incident Database Incidents Converted to Issues

21: Tougher Turing Test Exposes Chatbots’ Stupidity

Variants

Similar Incidents

By textual similarity

Inappropriate Gmail Smart Reply Suggestions

TayBot

Gender Biases in Google Translate

Similar Incidents

By textual similarity

Inappropriate Gmail Smart Reply Suggestions

TayBot

Gender Biases in Google Translate