Welcome to FinnTK’s documentation!¶
finntk¶
-
finntk.
analysis_to_subword_dicts
(ana)[source]¶ Returns a list of list of dicts. Each list element is an analysis. For each analysis, there is a list of subwords. Each dict contains an Omorfi analysis
-
finntk.
extract_lemmas_combs
(word_form)[source]¶ Works like extract_lemmas, but also tries to combine adjacent subwords to make lemmas which may be out of volcaburary for OMorFi.
Note that this will over generate (by design). For example: voileipäkakku will generate voi, voileipä and voileipäkakku as desired, but will also spuriously generate leipäkakku.
-
finntk.
extract_lemmas_recurs
(word_form)[source]¶ Works like extract_lemmas, but also tries to expand each lemma into more lemmas. This helps in some cases (but can overgenerate even more). For example, it will mean that synnyinkaupunkini will generate synty, kaupunki, synnyinkaupunki, synnyin and syntyä.
-
finntk.
get_omorfi
()[source]¶ Gets an Omorfi instance with everything possible enabled. Reuses the existing instance if already called once.
-
finntk.
get_token_positions
(tokenised, text)[source]¶ Returns the start positions of a series of tokens produced by Omorfi.tokenise(…)
Functions for extracting lemmas from OMorFi analyses. |
|
Function to get ahold of an OMorFi instance. |
|
Functions for basic processing of OMorFi tokens. |
|
Functions for basic processing of OMorFi segment labelling style analyses. |
|
Utilities for working with FinnWordNet |