ben.cz scrapper¶
This module is used to download last 100 books published by ben.cz.
- harvester.scrappers.ben_cz.URL = 'http://shop.ben.cz/Produkty.aspx?lang=cz&nak=BEN+-+technick%u00e1+literatura'¶
Base url of the eshop.
- harvester.scrappers.ben_cz._get_last_td(el)[source]¶
Return last <td> found in el DOM.
Parameters: el (obj) – dhtmlparser.HTMLElement instance. Returns: HTMLElement instance if found, or None if there are no <td> tags. Return type: obj
- harvester.scrappers.ben_cz._get_td_or_none(details, ID)[source]¶
Get <tr> tag with given ID and return content of the last <td> tag from <tr> root.
Parameters: - details (obj) – dhtmlparser.HTMLElement instance.
- ID (str) – id property of the <tr> tag.
Returns: Content of the last <td> as strign.
Return type: str
- harvester.scrappers.ben_cz._parse_title(dom, details)[source]¶
Parse title/name of the book.
Parameters: - dom (obj) – HTMLElement containing whole HTML page.
- details (obj) – HTMLElement containing slice of the page with details.
Returns: Book’s title.
Return type: str
Raises: AssertionError – If title not found.
Parse authors of the book.
Parameters: details (obj) – HTMLElement containing slice of the page with details. Returns: List of structures.Author objects. Blank if no author found. Return type: list
- harvester.scrappers.ben_cz._parse_publisher(details)[source]¶
Parse publisher of the book.
Parameters: details (obj) – HTMLElement containing slice of the page with details. Returns: Publisher’s name as string or None if not found. Return type: str/None
- harvester.scrappers.ben_cz._parse_price(details)[source]¶
Parse price of the book.
Parameters: details (obj) – HTMLElement containing slice of the page with details. Returns: Price as string with currency or None if not found. Return type: str/None
- harvester.scrappers.ben_cz._parse_pages_binding(details)[source]¶
Parse number of pages and binding of the book.
Parameters: details (obj) – HTMLElement containing slice of the page with details. Returns: Tuple with two string or two None. Return type: (pages, binding)
- harvester.scrappers.ben_cz._parse_ISBN_EAN(details)[source]¶
Parse ISBN and EAN.
Parameters: details (obj) – HTMLElement containing slice of the page with details. Returns: Tuple with two string or two None. Return type: (ISBN, EAN)
- harvester.scrappers.ben_cz._parse_edition(details)[source]¶
Parse edition (vydání) of the book.
Parameters: details (obj) – HTMLElement containing slice of the page with details. Returns: Edition as string with currency or None if not found. Return type: str/None
- harvester.scrappers.ben_cz._parse_description(details)[source]¶
Parse description of the book.
Parameters: details (obj) – HTMLElement containing slice of the page with details. Returns: Details as string with currency or None if not found. Return type: str/None
- harvester.scrappers.ben_cz._process_book(book_url)[source]¶
Parse available informations about book from the book details page.
Parameters: book_url (str) – Absolute URL of the book. Returns: structures.Publication instance with book details. Return type: obj