utils

This module contains number of functions, which are used at multiple places in autoparser.

harvester.autoparser.utils.handle_encodnig(html)[source]

Look for encoding in given html. Try to convert html to utf-8.

Parameters:html (str) – HTML code as string.
Returns:HTML code encoded in UTF.
Return type:str
harvester.autoparser.utils.content_matchs(tag_content, content_transformer=None)[source]

Generate function, which checks whether the content of the tag matchs tag_content.

Parameters:
  • tag_content (str) – Content of the tag which will be matched thru whole DOM.
  • content_transformer (fn, default None) – Function used to transform all tags before matching.
Returns:

True for every matching tag.

Return type:

bool

Note

This function can be used as parameter for .find() method in HTMLElement.

harvester.autoparser.utils.is_equal_tag(element, tag_name, params, content)[source]

Check is element object match rest of the parameters.

All checks are performed only if proper attribute is set in the HTMLElement.

Parameters:
  • element (obj) – HTMLElement instance.
  • tag_name (str) – Tag name.
  • params (dict) – Parameters of the tag.
  • content (str) – Content of the tag.
Returns:

True if everyhing matchs, False otherwise.

Return type:

bool

harvester.autoparser.utils.has_neigh(tag_name, params=None, content=None, left=True)[source]

This function generates functions, which matches all tags with neighbours defined by parameters.

Parameters:
  • tag_name (str) – Tag has to have neighbour with this tagname.
  • params (str) – Tag has to have neighbour with this parameters.
  • params – Tag has to have neighbour with this content.
  • left (bool, default True) – Tag has to have neigbour on the left, or right (set to False).
Returns:

True for every matching tag.

Return type:

bool

Note

This function can be used as parameter for .find() method in HTMLElement.