FM-SBLEX consists of three computational morphology tools for modern Swedish (SALDO), for 19th century Swedish (Dalin), and for Old Swedish. FM-SBLEX has been developed using the Functional Morphology library.
All tools in FM-SBLEX provide:
Retrieve the source code via anonymous subversion.
$ svn co https://svn.spraakdata.gu.se/repos/sblex/pub/fm
You compile the software with the following commands. The compilation requires The Glasgow Haskell Compiler.
$ cd sblex
$ ./configure
$ make
$ sudo make install
This installs three binaries in /usr/local/bin named saldo, dalin, and fsv.
FM-SBLEX is also available at hackage: link.
Install with:
$ cabal install FM-SBLEX
Retrieve the development versions of the dictionaries (UTF-8 encoded):
Try it out (these commands print the dictionaries in linewise JSON format):
$ saldo saldo.dict -p lex
$ dalin dalin.dict -p lex
$ fsv fsv.dict -p lex
We describe how to use saldo together with a PoS tagger to reduce the number of analyses (e.g., for lemmatization). It is analogous for the other tools.
$ cat data.txt | hunpos-tag suc2_parole_utf8.hunpos > data.txt.hunpos
$ cat data.txt | saldo saldo.dict -t norm -e parole -r data.txt.hunpos > data_saldo.txt
Example of an output. The sign '+' expresses alternative.
Jag jag..pn.1:PF@00S@S kan kunna..vb.1:V@IPAS tänka tänka..vb.1:V@N0AS mig jag..pn.1:PF@00O@S att att..sn.1:CSS en en..al.1:D0@US@S massa massa..nn.1+massa..nn.2:NCUSN@IS bedömare bedömare..nn.1:NCUPN@IS och och..kn.1:CCS politiska politisk..av.1:AQP0PN0S aktörer aktör..nn.1:NCUPN@IS