Change log¶
0.4.2¶
(2023-04-14)
Changed ownership of and moved repository from Slimmer-AI to sybrenjansen
0.4.1¶
(2022-05-27)
Documentation issue prevented GitHub from uploading
text-scrubberto PyPI
0.4.0¶
(2022-05-27)
Common country replacements have been updated to allow for fuzzy matching
Country codes added such that they can be matched through
text_scrubber.geo.normalize_country()The geo normalize functions now return the canonical name together with the matched name
The geo normalize functions now return a
text_scrubber.geo.normalize.LocationobjectThe geo find in string functions now return a
text_scrubber.geo.find_in_string.ExtractedLocationobject
0.3.2¶
(2022-05-19)
Updated MANIFEST.in file to include the Cython files
0.3.1¶
(2022-05-19)
Optimized Levenshtein and trigram similarity functions, and all normalization functions. Speed-ups of x20-40 are to be expected
0.3.0¶
(2022-04-13)
Renamed normalize_state to
text_scrubber.geo.normalize_region(), as it now handles all kinds of regionsExpanded countries, regions, and cities with geonames database, increasing the completeness of the geo database
text_scrubber.geo.normalize_country(),text_scrubber.geo.normalize_region(), andtext_scrubber.geo.normalize_city()now return the match scores as welltext_scrubber.geo.normalize_region()andtext_scrubber.geo.normalize_city()also return the corresponding normalized countryAdded
text_scrubber.geo.find_country_in_string(),text_scrubber.geo.find_city_in_string(), andtext_scrubber.geo.find_region_in_string()functions that find a location in a stringUpdated cleaning pipeline of
text_scrubber.geo.clean_country(),text_scrubber.geo.clean_city(), andtext_scrubber.geo.clean_region()Added
case_sensitiveboolean flag totext_scrubber.text_scrubber.TextScrubber.remove_stop_words()Improved speed of trigram matching by mapping trigrams to integer indices
0.2.1¶
(2022-03-02)
Information about the cities in a country is loaded on the fly.
0.2.0¶
(2021-05-10)
Replaced unidecode by anyascii, which has a more relaxed license. Output of to_ascii can change because of it
0.1.1¶
(2020-09-10)
Removed Python 3.5 support
0.1.0¶
(2020-09-10)
First release