Word Embeddings trained on English Wikipedia

Data citation

Språkbanken (2024). Word Embeddings trained on English Wikipedia (updated: 2024-01-25). [Data set]. Enriched and distributed by Språkbanken. https://doi.org/10.23695/z9cm-xc45

Additional ways to cite the dataset.

Word Embeddings trained on English Wikipedia

See See https://zenodo.org/record/6542975

Caveats

Machine learning models trained on uncurated data inevitably learn hidden or obvious biases and as a result, the models shared with here might contain characteristics including sexism, racism, antisemitism, homophobia, and other such types of unacceptable biases. I encourage whoever is using these models to make sure such biases are actually removed before using them in production settings (see eg https://aclanthology.org/N19-1061/)

References

Hengchen, Simon. (2022). Word2vec models trained on English Wikipedia [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6542975

Download

File	Size	Modified	Licence
wiki_300_5_word2vec.model	112.01 MB	2024-01-25	CC-BY-4.0
wiki_300_5_word2vec.model.syn1neg.npy	3.75 GB	2024-01-25	CC-BY-4.0
wiki_300_5_word2vec.model.wv.vectors.npy	3.75 GB	2024-01-25	CC-BY-4.0
wiki_300_50_word2vec.model	28.04 MB	2024-01-25	CC-BY-4.0
wiki_300_50_word2vec.model.syn1neg.npy	949.26 MB	2024-01-25	CC-BY-4.0
wiki_300_50_word2vec.model.wv.vectors.npy	949.26 MB	2024-01-25	CC-BY-4.0

Word Embeddings trained on English Wikipedia

Data citation

Caveats

References

Download

Type

Language

Size

Updated

Contact

DOI