Incident 12: Common Biases of Vector Embeddings
Entities
View all entitiesIncident Stats
CSETv1 Taxonomy Classifications
Taxonomy DetailsHarm Distribution Basis
sex
Sector of Deployment
professional, scientific and technical activities
CSETv0 Taxonomy Classifications
Taxonomy DetailsFull Description
The most common techniques used to embed words for natural language processing (NLP) show gender bias, according to researchers from Boston University and Microsoft Research, New England. The primary embedding studied was a 300-dimensional word2vec embedding of words from a corpus of Google News texts, chosen because it is open-source and popular in NLP applications. After demonstrating gender bias in the embedding, the researchers show that several geometric features are associated with that bias which can be used to define the bias subspace. This finding allows them to create several debiasing algorithms.
Short Description
Researchers from Boston University and Microsoft Research, New England demonstrated gender bias in the most common techniques used to embed words for natural language processing (NLP).
Severity
Unclear/unknown
Harm Distribution Basis
Sex
AI System Description
Machine learning algorithms that create word embeddings from a text corpus.
Relevant AI functions
Unclear
AI Techniques
Vector word embedding
AI Applications
Natural language processing
Location
Global
Named Entities
Microsoft, Boston University, Google News
Technology Purveyor
Microsoft
Beginning Date
2016-01-01T00:00:00.000Z
Ending Date
2016-01-01T00:00:00.000Z
Near Miss
Unclear/unknown
Intent
Unclear
Lives Lost
No
Incident Reports
Reports Timeline
- View the original report at its source
- View the report at the Internet Archive
The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning…
Variants
Similar Incidents
Did our AI mess up? Flag the unrelated incidents
Similar Incidents
Did our AI mess up? Flag the unrelated incidents