Skip to main content

Rarely Asked Questions about Swedish UD

Submitted by Arianna Masciolini on 2025-11-20

Språkbanken Text is getting more and more involved in Universal Dependencies (UD): the Eukalyptus Treebank of Written Swedish is currently begin converted to UD and the first ~500 sentences of UD_Swedish-SweLL, a treebank based on the Swedish Learner Language corpus, have just been released as part of UD 2.17 (you can download UD_Swedish-SweLL together with the other UD 2.17 treebanks via LINDAT/CLARIAH-CZ or by itself from this same website). Working on these two projects, we stumbled upon a few questions - ranging from annoyingly broad to oddly specific - that we could not easily find answers to in the extensive UD guidelines. Some of these sparked long discussions, which are summarized in this blog post.

Comparative constructions (they are simpler than you think!)

[go to discussion on GitHub]

Comparative constructions such as

  1. att annotera dessa konstruktioner är enklare än du tror and
  2. vissa konstruktioner är enklare än andra

look tricky, but the guidelines for them have recently gotten more comprehensive and, at least when it comes to Swedish, easier to understand and follow.

The first question might be what UPOS tag to assign to the word än. The answer to that is SCONJ. In (1), this is clear as än clearly introduces a subordinate clause, än du tror. As for (2), the guidelines state that "if the same conjunction is used with bare nominals, we still tag it SCONJ".

When it comes to the dependency structure of the construction, the clause or nominal introduced by än should always be attached to the property whose degree is compared:

X är (mer) enkel/ enklare än Y AUX ADV ADJ ADP cop advmod root

The specific labels depend on whether the standard of comparison is a clause or a nominal.

For sentences like (1), we use advcl for the subordinate clause and mark for än:

att annotera dessa konstruktioner är enklare än du tror PART VERB DET NOUN AUX ADJ SCONJ PRON VERB mark csubj det obj cop root mark nsubj advcl

In cases such as (2), we use obl and case:

vissa konstruktioner är enklare än andra DET NOUN AUX ADJ ADP PRON det nsubj cop root case obl

The only remaining issue is where to draw the line between clausal and nominal comparison. Sentences like

  1. parsern annoterar dessa konstruktioner bättre än jag

can be rephrased as both

  • parsern annoterar dessa konstruktioner bättre än jag skulle göra (clausal) and
  • parsern annoterar dessa konstruktioner bättre än mig (nominal).

Ambiguous cases like (3) are treated like (2).

Går att VERBa and other equally tough constructions

[go to discussion on GitHub]

How to annotate sentences like detta går att debattera? Well, that sparked a whole debate.

Semantically, detta is the object of debattera. If we rephrase the sentence into det går att debattera detta or att debattera detta går, the syntactic analysis coincides with the semantic one:

det går att debattera detta PRON VERB PART VERB PRON expl root mark csubj obj att debattera detta går PART VERB PRON VERB mark csubj obj root

In detta går att debattera, on the other hand, detta acts like the syntactic subject of går, which becomes the head of the construction. It remains to decide how to annotate the subordinate clause att debattera. Since its subject of not controlled by that of the superordinate, we use ccomp rather than xcomp:

detta går att debattera PRON VERB PART VERB nsubj root mark ccomp

It turns out that this is similar to tough-movement, so called because the prototypical English example sentences for the phenomenon involve the word tough:

  1. this problem is tough to solve
  2. it is tough to solve this problem
  3. to solve this problem is tough

In sentences like (1), the syntactic subject problem of the main verb is is logically the object of an embedded non-finite verb solve (although in UD, the root of the sentence would be tough, not is), whereas in paraphrases (2) and (3) logical and grammatical structure coincide. General guidelines about the annotations of tough-constructions, though, are still being debated at the time of writing.

Participles

[go to the discussion on GitHub]

Participles may look nearly as tough as tough-constructions because they work as different parts of speech depending on the context. Consider the following cases:

  1. skolan får ökade möjligheter
  2. jag blev bjuden på te
  3. flickan var strålande glad
  4. detta sökande gav inget resultat

Cases like (1) are by far the most frequent. The past participle ökade clearly modifies the noun möjligheter and should therefore be tagged as ADJ. However, since it is derived from the verb öka, it also takes the typically verbal features Tense and VerbForm.

In (2), bjuden may also be seen as an adjective, but bli + participle passive constructions are treated differently: the participle is tagged as VERB (with Tense and VerbFrom) and bli is annotated as a passive auxiliary:

jag blev bjuden te PRON AUX VERB ADP NOUN nsubj:pass aux:pass root case obl

In (3), the present participle strålande modifies an adjective, glad. We therefore give it the ADV UPOS tag, again with the Tense and VerbForm features.

Finally, in cases like (4), the participle should be tagged as NOUN. Its morphological analysis should be consistent with the UPOS tag, so rather than using Tense and VerbForm we annotate for Case, Definiteness, Gender and Number.

In all four cases, including when the UPOS tag is VERB, the lemma of the participial form is the participial form itself.

Att VERBa själv

[go to the discussion on GitHub]

Constructions like att bestämma själv are clear when it comes to dependency structure (the root is the verb and själv is one of its direct dependents). Talbanken and LinES, the two largest Swedish UD treebanks, used to label this edge differently: the former used amod, the latter advmod.

But while two dispute, the third enjoys! Recent discussion led to re-analyzing this as secondary predication, which implies using a clausal relation type. Since själv is optional, the relation of choice is advcl. This is consistent with the pre-existing use of acl in cases like du borde vara dig själv, where the head is a nominal (dig).

Subword-level coordination

[go to the discussion on GitHub]

How to analyze constructions like levnads- och beteendemässigt?

Ideally, we would want the conjuncts levnads- and beteende(-) to form a compound with mässigt, but this is currently beyond the expressive capacity of UD. To circumvent the problem, we lemmatize levnad as levnadsmässigt, obtaining a conjunction of two adverbs.

Morphological analysis of syncretic adjective forms

[go to the discussion on GitHub]

A handful of Swedish adjectives, such as bra and äkta are indeclinable, or rather, they inflect for degree (and, if nominalized, case), but not gender, number or definiteness. Other adjectives, such as nyttig, inflect for the latter three features as well, but with a certain degree of syncretism: the form nyttiga, for example, can be a singular definite (of either gender) or a plural (irrespective of both gender and number).

In (Swedish) UD, a general principle is to ground morphological annotation on the observed word form and avoid inferring features based on the context. Adjectives like bra should therefore only be annotated for Case and Degree. This amounts to saying "this form works just as well for every combination of gender, number and definiteness".

Annotation of adjectives like nyttig, on the other hand, partially deviates from this idea. When assigning morphological features to -a forms like nyttiga, we would ideally want to convey that they can be definite and/or plural (but not singular and indefinite!). The problem is that UD v2 allows expressing disjunctions of values for a single feature (e.g. Number=Sing,Plur) but not of combinations of several feature-value pairs. Leaving -a forms unannotated for number and definiteness would be misleading, as it would imply that they can be used in the indefinite singular case too. As a consequence, these two features are annotated contextually. The current practice can be summarized as follows (case and degree are ignored for the sake of compactness):

form features
nyttig Definite=Ind|Gender=Com|Number=Sing
nyttigt Definite=Ind|Gender=Neut|Number=Sing
nyttiga Definite=Def in definite contexts;
Definite=Ind|Number=Plur in plural contexts

Nyttiga regler för nyttiga adjektiv!

Att vara X år gammal or to be X years old

[go to the discussion on GitHub]

The expression to be X years old/att vara X år gammal used to be treated inconsistently across English and Swedish treebanks. As of UD 2.16, annotation has been standardized to

att/to vara/be X år/years gammal/old PART AUX NUM NOUN ADJ mark cop nummod obl root

Most importantly, gammal/old is the head and år/years is assigned the deprel obl. Some English treebank specify the subtype obl:unmarked (adpositionless oblique).

If you speak any other languages where a similar construction is used, check how it is annotated!