zonerpress_cz scrapper

Module for parsing informations from zonerpress.cz.

harvester.scrappers.zonerpress_cz._get_max_page(dom)[source]

Try to guess how much pages are in book listing.

Parameters:dom (obj) – HTMLElement container of the page with book list.
Returns:Number of pages for given category.
Return type:int

Parse links to the details about publications from page with book list.

Parameters:dom (obj) – HTMLElement container of the page with book list.
Returns:List of strings / absolute links to book details.
Return type:list

Go thru links to categories and return list to all publications in all given categories.

Parameters:links (list) – List of strings (absolute links to categories).
Returns:List of strings / absolute links to book details.
Return type:list
harvester.scrappers.zonerpress_cz._strip_content(el)[source]

Call .getContent() method of the el and strip whitespaces. Return None if content is -.

Parameters:el (obj) – HTMLElement instance.
Returns:Clean string.
Return type:str/None
harvester.scrappers.zonerpress_cz._parse_authors(authors)[source]

Parse informations about authors of the book.

Parameters:dom (obj) – HTMLElement containing slice of the page with details.
Returns:List of Author objects. Blank if no author found.
Return type:list
harvester.scrappers.zonerpress_cz._process_book(link)[source]

Download and parse available informations about book from the publishers webpages.

Parameters:link (str) – URL of the book at the publishers webpages.
Returns:Publication instance with book details.
Return type:obj
harvester.scrappers.zonerpress_cz.get_publications()[source]

Get list of publication offered by ben.cz.

Returns:List of structures.Publication objects.
Return type:list
harvester.scrappers.zonerpress_cz.self_test()[source]

Perform basic selftest.

Returns:When everything is ok.
Return type:True
Raises:AssertionError – When there is some problem.