Hoppa till huvudinnehåll
Språkbanken Text är en avdelning inom Språkbanken.

New Strix release

Inlagt av Maria Öhrman 2024-11-20

A new major version of Strix was released on November 20, 2024. The interface has a completely new look and adding new corpora is faster than ever. There are also a number of new features. 

Document search

Document search

Document search is one of the main features in this release. This feature uses a vector search technique to search for documents that are semantically close to the given query. In this search, the query can be a single token, multiword expression, sentence, paragraph or a document. The output of this search is a list of top 50 documents that are semantically very close to the given query.  

When a corpus is added to Strix, a vector is created for each document in the corpus using KBLab's KB-SBERT (https://huggingface.co/KBLab/sentence-bert-swedish-cased). Later, these vectors are used to for comparision with the given query.

Related documents 

Strix related documents

Strix has a related documents feature that can be used to look for documents that are semantically similar to another document. This feature uses the same vector search technique that was mentioned in the “Document search”. However, the search query in this case is the whole text in the current document. Users have the possibility to either look for similar documents in the same corpus or the selected corpora. 

Map 

Strix map

This version of Strix also introduces one newer feature: Maps. In this feature, one can see all the locations that are mentioned in their corpora or the open access corpora currently available in Strix. Our Sparv https://spraakbanken.gu.se/sparv/ analysis platform is used to detect each occurrence of a geographical place and annotate it with its corresponding coordinate. To try this feature, follow these steps:

  • Click on Overview -> Maps 
  • Click on one of the orange points on the maps to display the information about the place. 
  • Click on the “Show hits” button to view the documents that contain the location name 

Not all corpora include geographical annotations, and this feature is not yet available for Mink users. However, it will be accessible soon.

Current and upcoming corpora 

For now, the following corpora are available: 

And soon we will add: 

As before, users can upload their own materials through Mink and install them in Strix. For those who have installed corpora in the previous version of Strix, the data is already available. 

Please contact sb-info@spraakbanken.gu.se with any questions or comments.

Etiketter