Welcome to FinnTK’s documentation!

finntk

finntk.get_token_positions(tokenised, text)[source]

Returns the start positions of a series of tokens produced by Omorfi.tokenise(…)

finntk.get_omorfi()[source]

Gets an Omorfi instance with everything possible enabled. Reuses the existing instance if already called once.

finntk.analysis_to_subword_dicts(ana)[source]

Returns a list of list of dicts. Each list element is an analysis. For each analysis, there is a list of subwords. Each dict contains an Omorfi analysis

finntk.extract_lemmas(word_form)[source]

Extract lemmas specifically mentioned by OMorFi.

finntk.extract_lemmas_combs(word_form)[source]

Works like extract_lemmas, but also tries to combine adjacent subwords to make lemmas which may be out of volcaburary for OMorFi.

Note that this will over generate (by design). For example: voileipäkakku will generate voi, voileipä and voileipäkakku as desired, but will also spuriously generate leipäkakku.

finntk.extract_lemmas_recurs(word_form)[source]

Works like extract_lemmas, but also tries to expand each lemma into more lemmas. This helps in some cases (but can overgenerate even more). For example, it will mean that synnyinkaupunkini will generate synty, kaupunki, synnyinkaupunki, synnyin and syntyä.

finntk.omor.extract

Functions for extracting lemmas from OMorFi analyses.

finntk.omor.inst

Function to get ahold of an OMorFi instance.

finntk.omor.tok

Functions for basic processing of OMorFi tokens.

finntk.omor.seg

Functions for basic processing of OMorFi segment labelling style analyses.

finntk.wordnet

Utilities for working with FinnWordNet

Indices and tables