Aleph filter

This module is used to skip Publications, which are already in Aleph.

Note

The module is using fuzzy lookup, see name_to_vector() and compare_names().

harvester.filters.aleph_filter.name_to_vector(name)[source]

Convert name to the ASCII vector.

Example

>>> name_to_vector("ing. Franta Putšálek")
['putsalek', 'franta', 'ing']
Parameters:name (str) – Name which will be vectorized.
Returns:Vector created from name.
Return type:list
harvester.filters.aleph_filter.compare_names(first, second)[source]

Compare two names in complicated, but more error prone way.

Algorithm is using vector comparison.

Example

>>> compare_names("Franta Putšálek", "ing. Franta Putšálek")
100.0
>>> compare_names("F. Putšálek", "ing. Franta Putšálek")
50.0
Parameters:
  • first (str) – Fisst name as string.
  • second (str) – Second name as string.
Returns:

Percentage of the similarity.

Return type:

float

harvester.filters.aleph_filter.filter_publication(publication, cmp_authors=True)[source]

Filter publications based at data from Aleph.

Parameters:publication (obj) – Publication instance.
Returns:None if the publication was found in Aleph or publication if not.
Return type:obj/None