zonerpress_cz scrapper¶
Module for parsing informations from zonerpress.cz.
- harvester.scrappers.zonerpress_cz._get_max_page(dom)[source]¶
Try to guess how much pages are in book listing.
Parameters: dom (obj) – HTMLElement container of the page with book list. Returns: Number of pages for given category. Return type: int
- harvester.scrappers.zonerpress_cz._parse_book_links(dom)[source]¶
Parse links to the details about publications from page with book list.
Parameters: dom (obj) – HTMLElement container of the page with book list. Returns: List of strings / absolute links to book details. Return type: list
- harvester.scrappers.zonerpress_cz.get_book_links(links)[source]¶
Go thru links to categories and return list to all publications in all given categories.
Parameters: links (list) – List of strings (absolute links to categories). Returns: List of strings / absolute links to book details. Return type: list
- harvester.scrappers.zonerpress_cz._strip_content(el)[source]¶
Call .getContent() method of the el and strip whitespaces. Return None if content is -.
Parameters: el (obj) – HTMLElement instance. Returns: Clean string. Return type: str/None
Parse informations about authors of the book.
Parameters: dom (obj) – HTMLElement containing slice of the page with details. Returns: List of Author objects. Blank if no author found. Return type: list
- harvester.scrappers.zonerpress_cz._process_book(link)[source]¶
Download and parse available informations about book from the publishers webpages.
Parameters: link (str) – URL of the book at the publishers webpages. Returns: Publication instance with book details. Return type: obj