zonerpress_cz scrapper¶

Module for parsing informations from zonerpress.cz.

harvester.scrappers.zonerpress_cz._get_max_page(dom)[source]¶

Try to guess how much pages are in book listing.

Parameters:	dom (obj) – HTMLElement container of the page with book list.
Returns:	Number of pages for given category.
Return type:	int

harvester.scrappers.zonerpress_cz._parse_book_links(dom)[source]¶

Parse links to the details about publications from page with book list.

Parameters:	dom (obj) – HTMLElement container of the page with book list.
Returns:	List of strings / absolute links to book details.
Return type:	list

harvester.scrappers.zonerpress_cz.get_book_links(links)[source]¶

Go thru links to categories and return list to all publications in all given categories.

Parameters:	links (list) – List of strings (absolute links to categories).
Returns:	List of strings / absolute links to book details.
Return type:	list

harvester.scrappers.zonerpress_cz._strip_content(el)[source]¶

Call .getContent() method of the el and strip whitespaces. Return None if content is -.

harvester.scrappers.zonerpress_cz._parse_authors(authors)[source]¶

Parse informations about authors of the book.

Parameters:	dom (obj) – HTMLElement containing slice of the page with details.
Returns:	List of `Author` objects. Blank if no author found.
Return type:	list

harvester.scrappers.zonerpress_cz._process_book(link)[source]¶

Download and parse available informations about book from the publishers webpages.

Parameters:	link (str) – URL of the book at the publishers webpages.
Returns:	`Publication` instance with book details.
Return type:	obj

harvester.scrappers.zonerpress_cz.get_publications()[source]¶

Get list of publication offered by ben.cz.

Returns:	List of `structures.Publication` objects.
Return type:	list

harvester.scrappers.zonerpress_cz.self_test()[source]¶

Perform basic selftest.

Returns:	When everything is ok.
Return type:	True
Raises:	`AssertionError` – When there is some problem.