Top

geodata.Normalize module

Provide functions to normalize text strings by converting to lowercase, removing noisewords.
This is used by the lookup functions and the database build functions and match scoring
noise_words is a list of replacements only used for match scoring
phrase_cleanup is a list of replacements for db build, lookup and match scoring

Module variables

var alias_list

var noise_words

var phrase_cleanup

Functions

def add_aliases_to_database(

geo_files)

def admin1_normalize(

admin1_name, iso)

Normalize historic or colloquial Admin1 names to current geoname standard

def admin2_normalize(

admin2_name, iso)

Normalize historic or colloquial Admin2 names to standard

Args: admin2_name: iso:

Returns: TUPLE (result, modified) - result is new string, modified - True if modified

def country_normalize(

country_name)

normalize local language Country name to standardized English country name for lookups :param country_name: :return: (result, modified) result - new string modified - True if modified

def normalize(

text, remove_commas)

Normalize text - Convert from UTF-8 to lowercase ascii.
Remove commas if parameter set.
Remove all non alphanumeric except $ and *
Then call _phrase_normalize() which normalizes common phrases with multiple spellings, such as saint to st

Args:

text:  Text to normalize   
remove_commas:   True if commas should be removed

Returns:

Normalized text

def normalize_for_scoring(

text, iso)

Normalize the title we use to determine how close a match we got. See normalize() for details
Also remove noise words such as City Of

Args:

text: text to normalize
iso: ISO country code

Returns:

def remove_aliase(

input_words, res_words)