geodata.MatchScore module
Calculate a heuristic score for how well a result place name matches a target place name.
Functions
def full_normalized_title(
place)
def is_street(
text)
def remove_if_input_empty(
target_tokens, res_tokens)
Classes
class MatchScore
Calculate a heuristic score for how well a result place name matches a target place name. The score is based on percent of characters that didnt match plus other items - described in match_score()
Ancestors (in MRO)
- MatchScore
- builtins.object
Static methods
def __init__(
self)
Initialize self. See help(type(self)) for accurate signature.
def match_score(
self, target_place, result_place)
Calculate a heuristic score for how well a result place name matches a target place name. The score is based on percent of characters that didnt match in input and output (plus other items described below). Mismatch score is 0-100% reflecting the percent mismatch between the user input and the result. This is then adjusted by Feature type (large city gives best score) plus other items to give a final heuristic where -10 is perfect match of a large city and 100 is no match.
A) Heuristic: 1) Create 5 part title (prefix, city, county, state/province, country) 2) Normalize text - Normalize.normalize_for_scoring() 3) Remove sequences of 2 chars or more that match in target and result 4) Calculate inscore - percent of characters in input that didn't match result. Weight by term (city,,county,state,ctry) Exact match of city term gets a bonus 5) Calculate result score - percent of characters in db result that didn't match input B) Score components (All are weighted in final score): in_score - (0-100) - percent of characters in input that didnt match output out_score - (0-100) - percent of characters in output that didnt match input feature_score - (0-100) More important features get lower score. City with 1M population is zero. Valley is 100. Geodata.feature_priority(). wildcard_penalty - score is raised by X if it includes a wildcard prefix_penalty - score is raised by length of Prefix C) A standard text difference, such as Levenstein, was not used because those treat both strings as equal, whereas this treats the User text as more important than DB result text and also weights each token. A user's text might commonly be something like: Paris, France and a DB result of Paris, Paris, Ile De France, France. The Levenstein distance would be large, but with this heuristic, the middle terms can have lower weights, and having all the input matched can be weighted higher than mismatches on the county and province. This heuristic gives a score of -9 for Paris, France.
Args:
target_place: Loc with users entry. result_place: Loc with DB result.
Returns:
score
def set_weighting(
self, token_weight, prefix_weight, feature_weight, result_weight)
Set weighting of scoring components. See match_score for details of weighting. All weights are positive
Args:
token_weight: List with Weights relative to City for County, State/Province, Country. City is 1.0
prefix_weight: Weighting for prefix score
feature_weight: Weighting for Feature match score
result_weight: Weighting for % of DB result that didnt match the target
Returns:
Instance variables
var feature_weight
var input_weight
var logger
var prefix_weight
var result_weight
var score_diags
var token_weight
var wildcard_penalty
class Score
Ancestors (in MRO)
- Score
- builtins.object
Class variables
var GOOD
var POOR
var VERY_GOOD
var VERY_POOR