correctors.universal#

This module contains correction rules that are used in more than one language: - UniversalCorrector containing rules that can be applied in all languages - Corrector classes for groups of languages

Caution: functions in a language-specific corrector must never have the same names as one of the functions here (otherwise only one of it gets called in the case of multiple inheritance)

Each function should have a documentation string which will be used for print_stats()

Module Contents#

Classes#

UniversalCorrector

Has language-independent correction functions

NoSpaceBeforePunctuationCorrector

This is an extra class only for !?:; punctuation marks that must not be preceded by a space.

RTLCorrector

Corrections for right-to-left languages

class correctors.universal.UniversalCorrector#

Has language-independent correction functions

correct_wrong_capitalization(self, text: str) str#

Fix wrong capitalization at the beginning of a sentence or after a colon. Only do that if our text ends with a dot to avoid correcting single words / short phrases

correct_multiple_spaces_also_in_title(self, text: str) str#

Reduce multiple spaces to one space

correct_missing_spaces(self, text: str) str#

Insert missing spaces between punctuation and characters

correct_spaces_before_comma_and_dot(self, text: str) str#

Erase redundant spaces before commas and dots

correct_wrong_dash_also_in_title(self, text: str) str#

When finding a normal dash ( - ) surrounded by spaces: Make long dash ( – ) out of it

correct_missing_final_dot(self, text: str, original: str) str#

If the original has a trailing dot, the translation also needs one at the end.

correct_mediawiki_bold_italic(self, text: str) str#

Replace mediawiki formatting ‘’’bold’’’ with <b>bold</b> and ‘’italic’’ with <i>italic</i>

make_lowercase_extension_in_filename(self, text: str) str#

Have file ending in lower case

remove_spaces_in_filename(self, text: str) str#

Replace spaces in file name with single underscore

remove_multiple_underscores_in_filename(self, text: str) str#

Replace multiple consecutive underscores with single underscore in file name

class correctors.universal.NoSpaceBeforePunctuationCorrector#

This is an extra class only for !?:; punctuation marks that must not be preceded by a space. Removing spaces before comma and dot is already covered by UniversalCorrector.correct_spaces_before_comma_and_dot() This class is extra as e.g. French requires non-breaking spaces before them (in contrast to most other languages which have no spaces before these punctuation marks as well)

correct_no_spaces_before_punctuation(self, text: str) str#

Erase redundant spaces before punctuation marks.

class correctors.universal.RTLCorrector#

Corrections for right-to-left languages

correct_wrong_spaces_in_rtl(self, text: str) str#

Erase redundant spaces before RTL punctuation marks

fix_rtl_title(self, text: str) str#

When title ends with closing parenthesis, add a RTL mark at the end

fix_rtl_filename(self, text: str) str#

When file name has a closing parenthesis before the file ending, make sure we have a RTL mark afterwards!