Follow the development via
RSS.
Access to multi-layered lexical, grammatical and semantic information representing text content is a prerequisite for lexicological and linguistic research, as well as for many LT applications. Information about the types of lexical frames of the words of the language, the frame elements of each such frame type described in terms of their semantic roles (semantic valence) and their syntactic manifestations (syntactic valence), are arguably necessary components of a full-fledged modern computational lexical resource. The earliest and best-known such resource is, without doubt, the Berkeley FrameNet (Ruppenhoffer et al., 2006; Fillmore, 2008). Compiling of dictionaries as well as text understanding and generation of natural language by computers are some applications which can benefit from the information provided by a framenet.
Currently FrameNet-like resources exist for a few languages, including some domain- specific and multilingual initiatives (Boas, 2009; Uematsu et al., 2009; Venturi et al., 2009), but are still unavailable for most languages, including Swedish, except for some pilot studies exploring the semi-automatic acquisition of Swedish frames (Johansson & Nugues, 2006; Borin et al., 2007). At the University of Gothenburg, we are now embarking on a project to build a Swedish FrameNet-like resource. It is intended to be a free, full-scale, multi-functional resource covering morphological, syntactic and semantic description of 50,000 lexical units, with information accessible to both human users and LT systems. To make the work on this project cost and time effective, we intend to reuse freely available digital resources and software. A novel feature of this project is that the Swedish FrameNet will be an integral part of a larger many-faceted lexical resource. Hence the name Swedish FrameNet++ (SweFN++). This larger resource will besides information on the modern Swedish encompass lexical data on 19th century Swedish, and eventually on Old Swedish. For information on the ongoing work on the Old Swedish strand see Borin & Forsberg (2009a).
The theoretical approach underlying SweFN is based on frame semantics, the brainchild of Charles J. Fillmore. The English version of the FrameNet (henceforth FN) elaborated by the Berkeley research group and documented in FrameNet II: Extended Theory and Practice. Ruppenhofer et.al. 2010, (available at http://framenet.icsi.berkeley.edu.) provides valuable guidelines for the construction of SweFN, as the English FrameNet contains more than 10,000 lexical units and nearly 800 hierarchically related frames, exemplified in more than 135,000 sentences. According to FN, a lexical unit (LU) is a pairing of a simplex word or multiword expression with its meaning. Each sense of a polysemous word or multiword expression evokes a different semantic frame, a script-like conceptual structure which describes a particular type of situation, object or event along with its typical participants and props. The participants of a frame are described in terms of semantic roles: frame elements .
In our work on frames, we follow the Berkeley frames specifications concerning: (i) the name of the frame, (ii) its definition pointing out semantic relations between the set of core elements, as well as (iii) the specification of frame elements including their definitions. We also take advantage of the meta-information provided on the types of semantic relations between the frames. In the initial phase of the SweFN project, we focus on semantic description of frames for verbs, nouns and adjectives and on their populating with lexical units. A syntactic description will be added at a later stage of the project.
As Swedish is a compounding language, special attention is paid to semantic relations within one word compounds populating a frame. An implicit semantic relation existing between the co-components of a compound is made explicit by indicating the type of semantic relation established between the head of a compound evoking the frame and its co-component, representing one of the frame elements. This information is of particular relevance for understanding the meaning of a compound and for capturing the range of semantic and syntactic alternations which are brought about in texts in the course of compounding .
The work on SweFN is in progress, which means that information can be subjected to further refinements and modification. The lexical data is available for inspection at : http://spraakbanken.gu.se/swe/forskning/swefn/utvecklingsversion. The information provided there is updated four times a day. It is released under open content licence.
In the menu part on the website for SweFN++, there are several items that provide access to the information on frames in SweFN.
As already hinted at, the very first phase of the project is focused on elaborating the semantic description of frames and populating the frames with Swedish lexical units. The approach used so far has been that of extension, as most of the meta-data on the English frames have been re-used in the process of creating Swedish frames. To make this meta-connection clear, we use the English names for frames and frame elements in SweFN. This makes linking to the corresponding English frames more straightforward and provides direct access to the original definitions of frames and frame elements in FN.
The Swedish frames are presented on the website in tables with following content fields:
Since we intend to create a resource which suits Swedish LT applications, we reserve the right to modify and to add new frames to the repository of English frames. The new frames are created to make the original frames: (i) more homogeneous semantically, as is the case with, for example, splitting Medical_conditions into Health_status and Diseases, and further splitting Cure into Cure_mod and Medical_treatment (ii) more specific as to their content, e.g. Change_position_on _scale with subvariants ...Increase …/Decrease…/Fluctuation), (iii less specific as to their content, changing Jury_deliberation to Delibration (frames Dimension, Position_on_scale are being re-considerd), (iv) to add new frames needed to cover the 50,000 units from SALDO, for example Social_care_scenario.
The list of new or modified frames in SweFN:The coding conventions described below are meant to provide technical guidance in the annotation process performed with the help of an editing tool. They may be also useful for interpretation of the colour based annotations in the web version of SweFN.
Lists of elements. As far as core and peripheral lists of frame elements are concerned, each frame element needs to be followed by a, for the frame, unique letter code within brackets. For example for the Communication frame the following frame elements are listed and letter coded:
(i) Core FEs: Communicator (C), Medium (M), Message (ME), Topic (T); (ii) Non-core FEs: Addressee (A), Amount_of_information (AI), Depictive (D), Duration (DU), Frequency (F), Manner (MA), Means (MS), Place (P), Purpose (PU), Time (TI).
Examples. In the example field, the parts of an example sentence, clause or phrase which match semantic roles are marked with respective letter codes. The letter code precedes the word or expression and the semantic range of the expression is indicated by square brackets.
[C EG-domstolen] [LU meddelade] [ME sin dom i Laval-målet].
[MS Språket] [LU säger] [AI mycket] [T om människan]!
LU marks a target word evoking the frame.
COP is used for annotating copula verbs such as 'vara' or 'bli'. SUPP is used for support verbs in collocative expressions where the noun is the semantically dominant element, e.g. 'ta' in the collocation 'ta beslut'. Auxiliary verb forms like 'ha', 'skola', 'komma' or the infinitive marker 'att' are not marked as part of the LU Han har [LU återhämtat sig] efter en depression.
When encodning nouns, the indefinite article is left outside the brackets of the LU.
For the encoding of prepositional phrases in coordinated conjunctive expressions, the range of a coordinated construction is marked by additional square brackets and the FE code. It is assumed that the preposition used in the first expression applies also to the other expressions.
[W Han] [COP var] [LU klädd] [C [C i jeans], [C jacka] och [C mössa]].
Där stod [W mannen] [C [C i skinnjacka], [C jeans], [C cowboyhatt] och [C cowboystövlar]] invid en knallröd Porsche.
Frame elements whose syntactic manifestations are discontinuous, elliptic or coordinated require more intricate encoding. Here are some conventions used for annotation of such constructions.
Discontinuity. Cases of discontinuity within LU are coded by using internal brackets with frame annotation for the interposed expression. In cases where the interposed expression does not match any frame element, X is used to mark such an expression.
[LU gick [TI på morgonen] ut].
[LU steg [X varken] in] eller [LU {steg} ut].
Coordinated FEs. Double encoding is used; the first square bracket and D marks the beginning of the coordinated construction and the last bracket marks the range of the construction with a number of coordinated phrases representing the same frame element. Each of the coordinated constructions is also annotated separately.
[C Hanna] [LU kom till världen] [D [D med ett allvarligt hjärtfel], [D avbruten aortabåge] och [D med hål i kammarskiljeväggen]]. (The example is from the Birth frame; D stands for a peripheral frame element called Depictive.)
Elliptical constructions. In sentences with elliptical constructions, the missing element is added in braces { }.
[R Vägen] [LU slingrar sig upp] och [LU {slingrar sig} ned] [P längs sjöar och åkrar].
Conflated frame elements. Conflated FEs are marked with double letter codes separated by a slash. In the sentence below, taken from Cure frame, Patient and Affliction are conflated frame elements.
[T Transplantation] kan [LU ha botat] [P/A hiv-smittad].
End of line. Double semicolons mark the end of line i all cases except the last line in a field.
Domain coding. MED medicine, ART art, GEN general. In case of mixed LU lists, the dominant domain is given as first. Thus ART/GEN implies that most of LUs belong to the art domain, but that general words do occur there.