comparators

class comparators.similarity.Similarity(**kwargs)

Provides methods to decide whether two objects are similar. All similarities that used in integration system must conform the interface of this class, and use this class as a parent.

fields

a list of attributes of data, to which metrics will be applied.

Similarity.are_similar(object1, object2)

examines if two objects are similar

Parameters
  • object1 – a dictionary with extracted data

  • object2 – a dictionary with extracted data

Returns

True if all metrics greater than the thresholds specified in the fields, otherwise false

abstract Similarity.get_similarities(object1, object2)

returns all similarity metrics of objects It is useful if you need to cache results of comparison

Parameters
  • object1 – a dictionary with extracted data

  • object2 – a dictionary with extracted data

Returns

dict of similarities

abstract Similarity.are_similar_based_on_scores(scores)

examines if scores are greater than thresholds in the fields

Parameters

scores – list of similarity scores of two objects

Returns

True if all scores greater than the thresholds specified in the fields, otherwise false

class comparators.factory.ComparatorFactory

Instantiates comparator according to full name of module and class passed in params

ComparatorFactory.get(params)

Instantiates comparator according to full name of module and class passed in params

Parameters

params – a dictionary with name - full name of module and class, and arguments to pass to that class

Returns

Instance of class that passed in name key in params, with arguments

class comparators.levenshtein_similarity.LevenshteinSimilarity(**kwargs)

See base class comparators.similarity.Similarity Calculates Levenshtein similarity normalized by max length of strings

fields

a list of attributes of data, to which metrics will be applied.

name

a name of attribute to which metric is applied

minimum_score

threshold which every metric should pass to assume that object are similar

LevenshteinSimilarity.are_similar(object1, object2)

examines if two objects are similar

Parameters
  • object1 – a dictionary with extracted data

  • object2 – a dictionary with extracted data

Returns

True if all metrics greater than the thresholds specified in the fields, otherwise false

LevenshteinSimilarity.get_similarities(object1, object2)

See base class.

LevenshteinSimilarity.are_similar_based_on_scores(scores)

See base class.

class comparators.agreement_disagreement.AgreementDisagreement(**kwargs)

See base class comparators.similarity.Similarity

fields

a list of attributes of data, to which metrics will be applied.

name

a name of attribute to which metric is applied

minimum_score

threshold which every metric should pass to assume that object are similar

weight

number which denotes the weight of attribute

temporal_coefficient

dictionary that contains:

  • span - a integer number of time span (3 months for example)

  • type - a string “agreement” or “disagreement”, which denotes type of metric

  • span_type - a string value of span type (for example: “year” or “month” or “day”)

  • value - the probability of event to happen

Example:

fields = [
    {
        "name": "name",
        "weight": 0.4,
        "temporal_coefficient": {
            'span': 4,
            'span_type': 'month',
            'type': 'agreement',
            'value': 0.001
        }
    },
    {
        "name": "profession",
        "weight": 0.2,
        "temporal_coefficient": {
            'span': 4,
            'span_type': 'month',
            'type': 'disagreement',
            'value': 0.8
        }
    },
    {
        "name": "home_airport",
        "weight": 0.2,
        "temporal_coefficient": {
            'span': 4,
            'span_type': 'month',
            'type': 'agreement',
            'value': 0.01
        }
    },
    {
        "name": "co_travellers",
        "weight": 0.2
    }
]
parameters = {
    'fields': fields,
    'minimum_score': "0.85",
}
AgreementDisagreement.are_similar(object1, object2)

examines if two objects are similar

Parameters
  • object1 – a dictionary with extracted data

  • object2 – a dictionary with extracted data

Returns

True if all metrics greater than the thresholds specified in the fields, otherwise false

AgreementDisagreement.get_similarities(object1, object2)

See base class.

Returns 1 score as a list

AgreementDisagreement.are_similar_based_on_scores(scores)

See base class.

Expects the list of scores that have 1 value