1 Background and motivation
Access to multi-layered lexical, grammatical and semantic information representing text content is a prerequisite for lexicological and linguistic research, as well as for many LT applications. Information about the types of lexical frames of the words of the language, the frame elements of each such frame type described in terms of their semantic roles (semantic valence) and their syntactic manifestations (syntactic valence), are arguably necessary components of a full-fledged modern computational lexical resource. The earliest and best-known such resource is, without doubt, FrameNet build by the Computer Science Institute in Berkeley (henceforth BFN) (Ruppenhoffer et al., 2006; Fillmore, 2008). Compiling of dictionaries as well as text understanding and generation of natural language by computers are some applications which can benefit from the information provided by a framenet.
Swedish FrameNet is a free, full-scale, multi-functional resource covering morphological, semantic description of almost 40,000 lexical units exemplified in more than 8,500 sentences, with information accessible to both human users and LT systems. To make the work on this project cost and time effective, we reuse freely available digital resources and software. A novel feature of the project is that Swedish FrameNet will be an integral part of a larger many-faceted lexical resource. Hence the name Swedish FrameNet++ (SweFN++). This larger resource will, in addition to information on modern Swedish, encompass lexical data on 19th century Swedish, and Old Swedish.
The theoretical approach underlying SweFN is based on frame semantics, put forward by Charles J. Fillmore. BFN, the original English version of FrameNet, documented in FrameNet II: Extended Theory and Practice. Ruppenhofer et al. 2010, (available at http://framenet.icsi.berkeley.edu.) provides valuable guidelines for the construction of SweFN, as BFN contains more than 13,000 lexical units and over 1,200 hierarchically, and otherwise, related frames, exemplified in more than 200,000 sentences. A lexical unit (LU) is here a pairing of a simplex word or multiword expression with its meaning. Each sense of a polysemous word or multiword expression should evoke a different semantic frame. A frame may be described as a script-like conceptual structure describing a particular type of situation, object or event along with typical participants and props. The participants of a frame are described in terms of semantic roles, here represented as frame elements .
In the work on SweFN frames, we follow BFN frame specifications concerning: (i) the name of the frame, (ii) its definition pointing out semantic relations between the set of frame elements, as well as (iii) the specification of frame elements including their definitions. We also take advantage of the meta-information provided on the types of semantic relations between the frames.The SweFN project has a focus on creating frames describing verbs, nouns and adjectives and on finding lexical units which evoke these frames.
As Swedish is a compounding language, special attention is paid to semantic relations within solid (one word) compounds evoking a frame. An implicit semantic relation existing between the co-components of a compound is made explicit by indicating the type of semantic relation established between the compound constituents. This information is of particular relevance for understanding the meaning of compositional compounds and for capturing the range of semantic and syntactic alternations which are brought about in texts in the process of compounding.
The work on SweFN is in progress, which means that information and contents are continuously subjected to refinements and modification. SweFN is available for inspection in Karp. The contents are updated daily, and are released under open content licence.
2. About SweFN on the SweFN++ website
In the left-side menu on the SweFN++ website there are several links providing access to:
- Search SweFN++ // Sök i SweFN++ displays a current version of SweFN in Karp's editing tool.
- Documentation // Dokumentation links to the present document.
- Previous workshops // Tidigare workshopar SweFN++ related workshops
- Publications // Publikationer lists related publications on SweFN by authors from the Department of Swedish.
3. Overview of the content provided in the development version of SweFN
3.1 Content template
As mentioned above, SweFN has a focus on elaborating the semantic description of frames and populating the frames with Swedish lexical units evoking them. The approach used has been that of extension, as most of the meta-data on corresponding BFN frames has been re-used in creating Swedish frames. To make this meta-connection clear, we keep the English names for frames and frame elements in SweFN. This makes linking to the corresponding English frames straightforward and provides easy access to the original definitions of frames and frame elements defined in FN.
The Swedish frames are presented in Karp with following content (Fields are left out if empty.):
ID (the frame name)
The name of the frame is identical to the corresponding one in BFN when no modification has been made to frame description, frame elements or frame to frame relations. See below for a list of and information on new or modified frames.
Domain // Domän
This field, unique to SweFN, provides information about the domain, inclusion of domain information opens for creation of sub-framenets for special vocabularies, e.g. art and medicine in contrast to the general language domain.
Semantic type // Semantisk typ
Ontological classification from the SIMPLE ontology. Field unique to SweFN.
Core elements // Kärnelement
A list of the frame's core elements whose names are identical to the names of the Core and Core unexpresed frame elements in equivalent BFN frames.
The core frame elements, together with the frame description, defines the frame. The frame elements are specific to each frame. The frame Communication has the following core elements: Communicator, Medium, Message, Topic.
Peripheral elements // Periferielement
A list of the frame's peripheral elements whose namess are identical to the names of the non-core frame elements in equivalent BFN frames. Peripheral (non-core) frame elements are often not specific for a certain frame.
The frame Communication has the following peripheral elements: Addressee, Amount_of_information, Depictive, Duration, Frequency, Manner, Means, Place, Purpose , Time.
If the set of FEs is not identical to the set of FEs in the corresponding BFN frame, the frame name is modified in SweFN.
Inheritance // Arv
The name of the frame from which it inherits.
Example sentences // Exempelmeningar
Semantically annotated example sentences taken from corpus texts.
In the example field, the parts of an example sentence, clause or phrase which match semantic roles are marked with the name of the corresponding frame element. The range of the expression is indicated by square brackets.
Compound patterns // Sammansättningsmönster
List of instantiated compound patterns. This list is defined by the type of frame element preceding the head of a compound. Each compound pattern is followed by relevant examples.
Lexical units // Lexikala enheter
Lexical units evoking the frame corresponding to entries in SALDO. Through the equivalent SALDO entries information on semantic associative relations and morphology may be accessed.
Suggestions for LUs // LU-förslag
List of Swedish lexical units evoking a frame but currently not an entry in SALDO.
Berkeley-LUs
Lexical units in the corresponding BFN frame.
Comment
This field is reserved for comments. New or modified frames are usually provided with explanations.
Created // Skapad
The date the frame was created in SweFN.
Ändrad // Modified
The date of the latest modification of the frame.
3.2 New and modified frames in SweFN
The aim is that Swedish FrameNet should be a resource which suitable for Swedish language technology applications. We therefore reserve the right to modify frames in relation to BFN and to add new frames in cases where corresponding frames have not yet been developped in BFN. In these cases the corresponding frames in SweFN and BFN do not bear identical frame names.
3.2.1 Modified frames
A number of frames in SweFN have been modified in relation to the original frames in BFN. There are a number of motivations for this. The frames are made: (i) more homogeneous semantically, as is the case with, for example, splitting Medical_conditions into Health_status and Diseases, and further splitting Cure into Cure_mod and Medical_treatment; (ii) more specific as to their content, e.g. the three subvariants of Change_position_on_scale: Change_position_on_scale_Increase Change_position_on_scale_Decrease, and Change_position_on_scale_Fluctuation ; (iii) less specific as to their content, e.g. the name of the frame Jury_deliberation was changed to Deliberation; (iv) the name of a frame element within a frame was changed; (v) a frame element was added to a frame.
The new frames in SweFN which have been created by splitting frames in BFN into several more specific frames:
Original BFN frame | Short definition | SweFN frames | Manner of modification |
---|---|---|---|
Cause_change_position _on_a_scale |
An Agent or a Cause affects the position of an Item on some scale | Cause_change_position_on_a_scale_increase Cause_change_position_on_a_scale_decrease Cause_change_position_on_a_scale_fluctuation |
Position increases, decreases, or fluctuates. |
Change_position_on_a _scale |
Change of an Item's position on a scale. | Change_position_on_a_scale_increase Change_position_on_a_scale_decrease Change_position_on_a_scale_fluctuation |
Position increases, decreases, or fluctuates. |
People_by_morality | Persons, whose acts or behavior is evaluated against standards of morality or rightness. | People_by_morality_positive People_by_morality_negative |
Evaluated positively or negatively. |
Stimulus_focus | Stimulus brings about a emotion or experience in the Experiencer. | Stimulus_focus_positive Stimulus_focus_negative |
Evaluated positively or negatively. |
Cause_expansion | An Agent or Cause causes an Item to change its physical size. | Cause_expansion_mod Cause_contraction
|
The resulting size is bigger or smaller. |
Expertise | Concerns a Protagonist's Knowledge or Skill in certain domains. | Expertise_positive Expertise_negative
|
Evaluated positively or negatively. |
Expansion | An Item changes its physical size. | Expansion_mod Contraction
|
The resulting size is bigger or smaller. |
Active_substance | A Substance causes an Effect. | Active_substance_medical Active_substance_mod
|
Medical substances or other substances |
Cure | Healer treating and curing an Affliction of the Patient. | Cure_mod Medical_treatment |
Treatment resulting in cure or only treatment. |
Noise_makers | Artifact used to produce sound, especially for musical effect. | Musical_instruments Sound_makers |
Artifact for producing music or other sound. |
Medical_conditions | Medical conditions or Ailments that a Patient suffers from. | Medical_disorders Health_status |
Medical disorders or other health description. |
Some SweFN frames have been given additional frame elements or have been modified in other ways compared to corresponding frames in BFN. These frames are listed in the table below.
Original BFN frame | Short definition | SweFN frame | Modification in SweFN |
---|---|---|---|
Non-commutative _statement |
Non-commutative statement of arithmetic, e.g. subtraction and division. | Non-commutative _statement_mod |
FE Terms have been added to the FEs Term_1 and Term_2. |
Jury_deliberation | The Jury/Deliberating_group discusses the Case in order to evaluate the Possible_sentence of the accused. | Deliberation | The frame is renamed to a more abstract level. The FE Jury is renamedDeliberating_group. |
3.2.2 New frames
A number of frames have been created covering concepts which were not yet covered in BFN (Release 1.5). A list over these new frames is shown below. Occasionally new frames have been constructes in BFN which have had a corresponding newly created frame in SweFN. If differences were minor, or if the BFN frame was more elaborate, The corresponding SweFN frame has been adjusted. In a few cases (e.g.Animals) the SweFN frame is more elaborate and has been left as is.
- Activity_in_progress
- Administration_of_medication_conveyance
- Administration_of_medication_specification
- Animals
- Artifact_sport_and_leisure
- Artifact_tool
- Car_brands
- Compensating
- Countries
- Entity_specific_change_of_state
- Entity_specific_modes_of_being
- Establish_a_basis_for
- Family_name
- Foreign_influence
- Furniture
- Geographical_area
- Given_name
- Grammatical_relations
- Health_status
- Human_settlements
- Inner_parts_of_body
- Languages
- Locating_in_time
- Medical_disorders
- Natural_features_named
- Overcoming_misunderstandings
- Physiological_processes
- Physiological_systems
- Plants
- Plant_subpart
- Providing_food
- Religious_belief_systems
- Social_care_scenario
- Use_as_a_starting_point
4. Encoding conventions for annotation
The encoding conventions described below are meant to provide technical guidance in the annotation process performed with the editing tool Karp. They may be also useful for interpretation of annotations in the web version of SweFN. As some of the encoding conventions were introduced at a later stage of the project, older annotations may in some cases still be present.
4.1 Encoding of lexical units
LU marks a target word evoking the frame. It may, in principle, be any part of speech (POS).
Den 30 september 2010 försvinner [femtioöringen]LU . (Money)
Han har investerat i en [elektrisk]LU rullstol. (Electricity)
Regardless POS of the target word, example sentences may contain more than one LU. Indexing is kept as simple as possible without causing ambiguities. If all LUs in an example sentence have the same FEs associated with them there is no indexing.
Även [vi mammor]Attendee måste få [gå ut]LU och [svira]LU ibland ! (Social_event)
Indexing. We index LUs and FEs to mark alignments between different LUs and FEs when there are ambiguities in a sentence.
Att oktober är en månad då fritidshusområdena [tomma]LU:1 [på folk]Contents:1 och [ fyllda]LU:2 [ med värdesaker]Contents:2 vet tjuvarna om. (Fullness)
Nesting. Nested annotation does not cause indexing.
[ [Luxemburg]LU]Country och[ [Norge]LU]Country [är]COP [de säkraste länderna för bilsemestern]Locale . (Countries)
Discontinuity. Constituents of discontinued LUs are indexed.
[Tjejerna]Theme packar ihop sina saker och [går]LU:1 [snabbt]Manner [ut]LU:1 [ur restaurangen]Source. (Departing)
[Jag] Theme [går]LU:1 [aldrig] Frequency [ut]LU:1 [på kvällarna]Time. (Motion)
[Jag]Ingestor [äter]LU:1 [alltid]Frequency:1[upp]LU:1 och [dricker]LU:2 [alltid]Frequency:2 [upp]LU:2. (Ingestion)
According to older annotation principles, cases of discontinuity within LU were coded by using internal brackets with frame annotation for the interposed expression. There may still be examples of such annotation in SweFN.
Elliptical constructions. In sentences with elliptical constructions, the missing element is added in braces { }, as is the case in the examples from the frames Text_creation and Path_shape.
Det ser alltid trevligare ut att [skriva]LU:1 [med bläck]Manner:1 än {att [skriva]LU:2 } [med blyerts]Manner:2. (Text_creation)
Det är den kraft som bland annat förklarar varför [vinden]Theme [böjer av]LU:1 [mot öster]Direction:1 [på norra halvklotet]Place:1och [{böjer av}]LU:2 [åt vänster]Direction:2 [på södra halvklotet]Place:2. (Path_shape)
4.1.1 Encoding of nouns as targets
When encoding nouns as target, articles and prepositions are left outside the LU brackets.
Ett [alternativ]LU är [att flagga ut]Event . (Alternatives)
En kvinna kastar myntet i [hans]Wearer [hatt]LU utan att stanna. (Accoutrements)
4.1.2 Encoding of verbs as targets
Auxiliary verb forms like ha 'have', skola 'shall', komma 'come' vara 'to be' or the infinitive marker att 'to' are not marked as part of the LU.
Allt går bra och [såret]Affliction [läker]LU [fint]Manner . (Recovery)
[Han]Patient har [återhämtat sig]LU [efter en depression]Affliction . (Recovery)
4.2 Encoding of frame elements
Words or phrases representing frame elements (FEs) are annotated with the name of the FE in question. If the FE is a preposition phrase, the preposition is included in the FE.
In cases where the LU and a frame element are represented by the same unit, this is annotated as both LU and the frame element in question.
Jag ska bli bäst på att skjuta med[[pistol]LU]Weapon! (Weapon)
4.2.1 Encoding of prepositional phrases
For the encoding of prepositional phrases in coordinated conjunctive expressions, the range of a coordinated construction is marked by additional square brackets and the FE code. It is assumed that the preposition used in the first expression applies also to the other expressions.
[Hon]Wearer [är]COP [klädd] LU [i flipflops, shorts och linne]Clothing . (Wearing)
It should be observed that in some frames a preposition can be LU, for example in the frame Wearing.
Men [mannen]Wearer [i]LU [kepsen]Clothing framför det 160 år gamla rådhuset av trä tog sig snabbt. (Wearing)
4.2.2 Encoding of syntactically complex frame elements
Frame elements whose syntactic manifestations are discontinuous, elliptic or coordinated require more intricate encoding. Here are some conventions used for annotation of such constructions.
Discontinuity. Cases of discontinuity of FE are marked with an index number. Chunks with the same index number are parts of the same instance of an FE.
[Mikron]Place:1 är som gjord för att [smälta]LU [choklad]Patient [i]Place:1. (Cause_change_of_phase)
Coordinated FEs. Coordinated constructions are annotated as one FE without index numbers.
[Hanna]Child [kom till världen]LU [med ett allvarligt hjärtfel, avbruten aortabåge och med hål i kammarskiljeväggen]Depictive. (Being_born)
Non-coordinated FEs. There may be several instances of the same FE type, which are not constituents of a common phrase. These are annotated separately without index number.
Antalet vikariat kan minska nu , i och med att [redaktionerna]Employer kan [lasa in]LU [personer som är äldre]Employee , [de som har längre erfarenhet och de som redan passerat barnaåren]Employee. (Employing)
According to older annotation principles, double encoding was used; the square brackets around the coordinated construction and each of the coordinated constructions within annotated separately. There may still be examples of this annotation.
4.3 Other encoding
Indexing is kept as sparse as possible. If there is no risk of misinterpretation, indexing will not be done. If there are several LUs in one example sentence and they are not discontiuous, they are not indexed unless there are FEs whose scope is not over all LUs. If there are several LUs and non-indexed FEs, the FEs should have a scope over all LUs.
Vid vår sida står [två poliser som]Agent tar foton, märker, [för protokoll]LU:1 och [registrerar]LU:2 [kläderna]Entity:2. (Recording)
4.3.1 Encoding of compounds
Compounds which are lexical units in the frame in question are annotated as a whole. If any constituent is a frame element or a lexical unit in the frame these are also annotated. In cases where the compound as a whole is not an LU in the frame one or both of the constituents can still be annotated as frame element or LU. Below are examples of different cases.
Non-compositional, lexicalized compounds
Non-compositional, lexicalized compounds are not given any internal analysis. An example from the Medical_disorders frame is helveteseld 'herpes zoster' lit: hellfire.
Min sambo lider av en tillfällig sjukdom som kallas [helveteseld]LU . (Medical_disorders)
Compositional compounds
Compositional compounds are annotated as far as is compatible with the frame they evoke. Most of the compositional compounds have the pattern FE+LU, that is the modifier is an FE and the compound head an LU in the frame in question. Usually the whole compound and the compound head are LUs evoking the same frame. In these cases the whole compound is tagged as LU as well as the compound head, while the modifier is tagged as the appropriate FE.
magsjukdom 'stomach disease'
[Phaedra]Patient lider av en [sällsynt]Descriptor[[mag]Body_Part[sjukdom]LU]Ailment. (Medical_disorders)
Partially transparent compounds
Compounds which are only partially transparent are given internal analysis for the constituents which are transparent. We have an example in the Medical_disorders frame: ryggskott 'lumbago', where the first constituent, rygg, is an FE 'back' and the syntactic compound head is opaque.
[Han]Patient fick[[rygg]Body_Partskott]LU [under uppvärmningen]Time och tvingades vila. (Medical_disorders)
In some compounds the semantic head is not the same as the syntactic head. An example is hästkrake 'horse wretch' in the (Animals) frame. The first constitutent
häst 'horse' evokes the frame as well as the whole compound.
Han fick syn på en [gammal]Age [vit]Persistent_characteristics[[häst]LUkrake]LU som stod bunden vid sidan av vägen. (Animals)
Compound modifier as LU
Annotation of sentences is done with regard to the frame in question. This entails that a lexical unit may be annotated as LU even when it is not prominent to the meaning of the whole sentence. For example, with regard to the Substance frame the modifier gas 'gas' of the compound gasdetektor 'gas detector' is an LU in the sentence below.
En vätesensor är en [gas]LUdetektor som visar närvaron av väte. (Substance)
4.3.2 Encoding of copula verbs, support words, etc.
COP is used for annotating copula verbs such as vara 'be' or bli 'become'.
[Halsband]LU [är]COP ett [smycke]LU [att bäras runt halsen]Use. (Accoutrements)
With the SUPP tag we mark support words such as verbs and prepositions. An example of a support verb in a collocative expression where the noun is the semantically dominant element, is ta 'take' in the collocation ta beslut 'take a decision' (make a decision).
Men [han]Avenger skulle [ta]SUPP [hämnd]LU. (Revenge)
An example of a support preposition is in the frame Evaluative_comparison.
[Det totala bidraget av växthusgaser från äggproduktion]Profiled_item [är]COP [i]SUPP [nivå]LU [med kycklingproduktion]Standard_item. (Evaluative_comparison)
4.3.3 Domain encoding
We assign a domain tag to each frame: medicine (MED), art (ART) or general (GEN). In case of mixed LU lists, the dominant domain is given as first. Thus ART/GEN implies that most of LUs belong to the art domain, but that words from the general domain do occur there.