generator

This module contains number of template generators, which generates all the python code for the parser.

harvester.autoparser.generator.IND = ' '

Indentation.

harvester.autoparser.generator._index_idiom(el_name, index, alt=None)[source]

Generate string where el_name is indexed by index if there are enough items or alt is returned.

Parameters:
  • el_name (str) – Name of the container which is indexed.
  • index (int) – Index of the item you want to obtain from container.
  • alt (whatever, default None) – Alternative value.
Returns:

Python code.

Return type:

str

Live example::
>>> import generator as g
>>> print g._index_idiom("xex", 0)
    # pick element from list
    xex = xex[0] if xex else None
>>> print g._index_idiom("xex", 1, "something")
# pick element from list
xex = xex[1] if len(xex) - 1 >= 1 else 'something'
harvester.autoparser.generator._required_idiom(tag_name, index, notfoundmsg)[source]

Generate code, which make sure that tag_name has enoug items.

Parameters:
  • tag_name (str) – Name of the container.
  • index (int) – Index of the item you want to obtain from container.
  • notfoundmsg (str) – Raise UserWarning with debug data and following message.
Returns:

Python code.

Return type:

str

harvester.autoparser.generator._find_template(parameters, index, required=False, notfoundmsg=None)[source]

Generate .find() call for HTMLElement.

Parameters:
  • parameters (list) – List of parameters for .find().
  • index (int) – Index of the item you want to get from .find() call.
  • required (bool, default False) – Use _required_idiom() to returned data.
  • notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns:

Python code.

Return type:

str

Live example::
>>> print g._find_template(["<xex>"], 3)
    el = dom.find('<xex>')
    # pick element from list
    el = el[3] if len(el) - 1 >= 3 else None
harvester.autoparser.generator._wfind_template(use_dom, parameters, index, required=False, notfoundmsg=None)[source]

Generate .wfind() call for HTMLElement.

Parameters:
  • use_dom (bool) – Use dom as tag name. If False, el is used.
  • parameters (list) – List of parameters for .wfind().
  • index (int) – Index of the item you want to get from .wfind() call.
  • required (bool, default False) – Use _required_idiom() to returned data.
  • notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns:

Python code.

Return type:

str

Live example::
>>> print g._wfind_template(True, ["<xex>"], 3)
    el = dom.wfind('<xex>').childs
    # pick element from list
    el = el[3] if len(el) - 1 >= 3 else None
harvester.autoparser.generator._match_template(parameters, index, required=False, notfoundmsg=None)[source]

Generate .match() call for HTMLElement.

Parameters:
  • parameters (list) – List of parameters for .match().
  • index (int) – Index of the item you want to get from .match() call.
  • required (bool, default False) – Use _required_idiom() to returned data.
  • notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns:

Python code.

Return type:

str

Live example::
>>> print g._match_template(["<xex>"], 3)
    el = dom.match('<xex>')
    # pick element from list
    el = el[3] if len(el) - 1 >= 3 else None
harvester.autoparser.generator._neigh_template(parameters, index, left=True, required=False, notfoundmsg=None)[source]

Generate neighbour matching call for HTMLElement, which returns only elements with required neighbours.

Parameters:
  • parameters (list) – List of parameters for .match().
  • index (int) – Index of the item you want to get from .match() call.
  • left (bool, default True) – Look for neigbour in the left side of el.
  • required (bool, default False) – Use _required_idiom() to returned data.
  • notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns:

Python code.

Return type:

str

harvester.autoparser.generator._get_parser_name(var_name)[source]

Parser name composer.

Parameters:var_name (str) – Name of the variable.
Returns:Parser function name.
Return type:str
harvester.autoparser.generator._generate_parser(name, path, required=False, notfoundmsg=None)[source]

Generate parser named name for given path.

Parameters:
Returns:

Python code for parsing path.

Return type:

str

harvester.autoparser.generator._unittest_template(config)[source]

Generate unittests for all of the generated code.

Parameters:config (dict) – Original configuration dictionary. See conf_reader for details.
Returns:Python code.
Return type:str
harvester.autoparser.generator.generate_parsers(config, paths)[source]

Generate parser for all paths.

Parameters:
Returns:

Python code containing all parsers for paths.

Return type:

str