generator¶
This module contains number of template generators, which generates all the python code for the parser.
- harvester.autoparser.generator.IND = ' '¶
Indentation.
- harvester.autoparser.generator._index_idiom(el_name, index, alt=None)[source]¶
Generate string where el_name is indexed by index if there are enough items or alt is returned.
Parameters: Returns: Python code.
Return type: str
- Live example::
>>> import generator as g >>> print g._index_idiom("xex", 0) # pick element from list xex = xex[0] if xex else None >>> print g._index_idiom("xex", 1, "something") # pick element from list xex = xex[1] if len(xex) - 1 >= 1 else 'something'
- harvester.autoparser.generator._required_idiom(tag_name, index, notfoundmsg)[source]¶
Generate code, which make sure that tag_name has enoug items.
Parameters: Returns: Python code.
Return type: str
- harvester.autoparser.generator._find_template(parameters, index, required=False, notfoundmsg=None)[source]¶
Generate .find() call for HTMLElement.
Parameters: - parameters (list) – List of parameters for .find().
- index (int) – Index of the item you want to get from .find() call.
- required (bool, default False) – Use _required_idiom() to returned data.
- notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns: Python code.
Return type: str
- Live example::
>>> print g._find_template(["<xex>"], 3) el = dom.find('<xex>') # pick element from list el = el[3] if len(el) - 1 >= 3 else None
- harvester.autoparser.generator._wfind_template(use_dom, parameters, index, required=False, notfoundmsg=None)[source]¶
Generate .wfind() call for HTMLElement.
Parameters: - use_dom (bool) – Use dom as tag name. If False, el is used.
- parameters (list) – List of parameters for .wfind().
- index (int) – Index of the item you want to get from .wfind() call.
- required (bool, default False) – Use _required_idiom() to returned data.
- notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns: Python code.
Return type: str
- Live example::
>>> print g._wfind_template(True, ["<xex>"], 3) el = dom.wfind('<xex>').childs # pick element from list el = el[3] if len(el) - 1 >= 3 else None
- harvester.autoparser.generator._match_template(parameters, index, required=False, notfoundmsg=None)[source]¶
Generate .match() call for HTMLElement.
Parameters: - parameters (list) – List of parameters for .match().
- index (int) – Index of the item you want to get from .match() call.
- required (bool, default False) – Use _required_idiom() to returned data.
- notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns: Python code.
Return type: str
- Live example::
>>> print g._match_template(["<xex>"], 3) el = dom.match('<xex>') # pick element from list el = el[3] if len(el) - 1 >= 3 else None
- harvester.autoparser.generator._neigh_template(parameters, index, left=True, required=False, notfoundmsg=None)[source]¶
Generate neighbour matching call for HTMLElement, which returns only elements with required neighbours.
Parameters: - parameters (list) – List of parameters for .match().
- index (int) – Index of the item you want to get from .match() call.
- left (bool, default True) – Look for neigbour in the left side of el.
- required (bool, default False) – Use _required_idiom() to returned data.
- notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns: Python code.
Return type: str
- harvester.autoparser.generator._get_parser_name(var_name)[source]¶
Parser name composer.
Parameters: var_name (str) – Name of the variable. Returns: Parser function name. Return type: str
- harvester.autoparser.generator._generate_parser(name, path, required=False, notfoundmsg=None)[source]¶
Generate parser named name for given path.
Parameters: - name (str) – Basename for the parsing function (see _get_parser_name() for details).
- path (obj) – PathCall or Chained instance.
- required (bool, default False) – Use _required_idiom() to returned data.
- notfoundmsg (str, default None) – Message which will be used for _required_idiom() if the item is not found.
Returns: Python code for parsing path.
Return type: str
- harvester.autoparser.generator._unittest_template(config)[source]¶
Generate unittests for all of the generated code.
Parameters: config (dict) – Original configuration dictionary. See conf_reader for details. Returns: Python code. Return type: str
- harvester.autoparser.generator.generate_parsers(config, paths)[source]¶
Generate parser for all paths.
Parameters: - config (dict) – Original configuration dictionary used to get matches for unittests. See conf_reader for details.
- paths (dict) – Output from select_best_paths().
Returns: Python code containing all parsers for paths.
Return type: str