Word Embeddings trained on English Wikipedia
Citation
Språkbanken Text (2024). Word Embeddings trained on English Wikipedia (updated: 2024-01-25). [Data set]. Språkbanken Text. https://doi.org/10.23695/z9cm-xc45Additional ways to cite the dataset.
Caveats
Machine learning models trained on uncurated data inevitably learn hidden or obvious biases and as a result, the models shared with here might contain characteristics including sexism, racism, antisemitism, homophobia, and other such types of unacceptable biases. I encourage whoever is using these models to make sure such biases are actually removed before using them in production settings (see eg https://aclanthology.org/N19-1061/)
References
Hengchen, Simon. (2022). Word2vec models trained on English Wikipedia [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6542975
File | Size | Modified | Licence |
---|---|---|---|
112.01 MB | 2024-01-25 |
CC BY 4.0
attribution
|
|
3.75 GB | 2024-01-25 |
CC BY 4.0
attribution
|
|
3.75 GB | 2024-01-25 |
CC BY 4.0
attribution
|
|
28.04 MB | 2024-01-25 |
CC BY 4.0
attribution
|
|
949.26 MB | 2024-01-25 |
CC BY 4.0
attribution
|
|
949.26 MB | 2024-01-25 |
CC BY 4.0
attribution
|