comparators¶
-
class
comparators.similarity.Similarity(**kwargs)¶ Provides methods to decide whether two objects are similar. All similarities that used in integration system must conform the interface of this class, and use this class as a parent.
-
fields¶ a list of attributes of data, to which metrics will be applied.
-
-
Similarity.are_similar(object1, object2)¶ examines if two objects are similar
- Parameters
object1 – a dictionary with extracted data
object2 – a dictionary with extracted data
- Returns
True if all metrics greater than the thresholds specified in the fields, otherwise false
-
abstract
Similarity.get_similarities(object1, object2)¶ returns all similarity metrics of objects It is useful if you need to cache results of comparison
- Parameters
object1 – a dictionary with extracted data
object2 – a dictionary with extracted data
- Returns
dict of similarities
-
abstract
Similarity.are_similar_based_on_scores(scores)¶ examines if scores are greater than thresholds in the fields
- Parameters
scores – list of similarity scores of two objects
- Returns
True if all scores greater than the thresholds specified in the fields, otherwise false
-
class
comparators.factory.ComparatorFactory¶ Instantiates comparator according to full name of module and class passed in params
-
ComparatorFactory.get(params)¶ Instantiates comparator according to full name of module and class passed in params
- Parameters
params – a dictionary with name - full name of module and class, and arguments to pass to that class
- Returns
Instance of class that passed in name key in params, with arguments
-
class
comparators.levenshtein_similarity.LevenshteinSimilarity(**kwargs)¶ See base class comparators.similarity.Similarity Calculates Levenshtein similarity normalized by max length of strings
-
fields¶ a list of attributes of data, to which metrics will be applied.
-
name¶ a name of attribute to which metric is applied
-
minimum_score¶ threshold which every metric should pass to assume that object are similar
-
-
LevenshteinSimilarity.are_similar(object1, object2)¶ examines if two objects are similar
- Parameters
object1 – a dictionary with extracted data
object2 – a dictionary with extracted data
- Returns
True if all metrics greater than the thresholds specified in the fields, otherwise false
-
LevenshteinSimilarity.get_similarities(object1, object2)¶ See base class.
-
LevenshteinSimilarity.are_similar_based_on_scores(scores)¶ See base class.
-
class
comparators.agreement_disagreement.AgreementDisagreement(**kwargs)¶ See base class comparators.similarity.Similarity
-
fields¶ a list of attributes of data, to which metrics will be applied.
-
name¶ a name of attribute to which metric is applied
-
minimum_score¶ threshold which every metric should pass to assume that object are similar
-
weight¶ number which denotes the weight of attribute
-
temporal_coefficient¶ dictionary that contains:
span - a integer number of time span (3 months for example)
type - a string “agreement” or “disagreement”, which denotes type of metric
span_type - a string value of span type (for example: “year” or “month” or “day”)
value - the probability of event to happen
Example:
fields = [ { "name": "name", "weight": 0.4, "temporal_coefficient": { 'span': 4, 'span_type': 'month', 'type': 'agreement', 'value': 0.001 } }, { "name": "profession", "weight": 0.2, "temporal_coefficient": { 'span': 4, 'span_type': 'month', 'type': 'disagreement', 'value': 0.8 } }, { "name": "home_airport", "weight": 0.2, "temporal_coefficient": { 'span': 4, 'span_type': 'month', 'type': 'agreement', 'value': 0.01 } }, { "name": "co_travellers", "weight": 0.2 } ] parameters = { 'fields': fields, 'minimum_score': "0.85", }
-
-
AgreementDisagreement.are_similar(object1, object2)¶ examines if two objects are similar
- Parameters
object1 – a dictionary with extracted data
object2 – a dictionary with extracted data
- Returns
True if all metrics greater than the thresholds specified in the fields, otherwise false
-
AgreementDisagreement.get_similarities(object1, object2)¶ See base class.
Returns 1 score as a list
-
AgreementDisagreement.are_similar_based_on_scores(scores)¶ See base class.
Expects the list of scores that have 1 value