💥 RuErrant
RuERRANT is an adaptation of the ERRANT metric to the Russian language. The adaptation was primarily done in https://github.com/Askinkaty/errant and further developed within SAGE. The changes to the original ERRANT implementation for English are the following:
Basic parsing model changed to Spacy’s ru_core_news_lg.
Included a dictionary of Russian words (main forms).
Introduced detection of error correction types specific for Russian (degrees of adjectives, verb aspect).
- [our contribution] Introduced a simplified error correction typology:
CASE: spelling corrections including only character case change;
PUNCT: punctuation corrections;
YO: spelling corrections regarding “е”/”ё” substitutions;
SPELL: all other word-level spelling corrections.
[our contribution] Introduced detection of multiple error correction types per word, e.g. “федор” -> “Фёдор” contains both CASE and YO corrections.
[our contribution] Introduced detection of inner word punctuation corrections which covers joint (“AB”) vs. hyphen (“A-B”) vs. space (“A B”) word spelling. Corrections of this type are attributed to the SPELL category.
Scoring
To score model’s corrections against gold corrections, use a Scorer instance:
from sage.evaluation.scorer import Scorer
s = Scorer()
s.score(
["спел Кейс ее .", "спел Кейс ее ."],
["спелл кейс её !", "спелл кейс её !"],
["спел кейс её .", "спелл Кейс ее !"],
metrics=["errant"]
)
>>> {'CASE_Precision': 100.0, 'CASE_Recall': 50.0, 'CASE_F1': 66.67,
'YO_Precision': 100.0, 'YO_Recall': 50.0, 'YO_F1': 66.67,
'SPELL_Precision': 100.0, 'SPELL_Recall': 50.0, 'SPELL_F1': 66.67,
'PUNCT_Precision': 100.0, 'PUNCT_Recall': 50.0, 'PUNCT_F1': 66.67}