cpress.cz scrapper¶
This module is used to download metadata informations from cpress.cz.
- harvester.scrappers.cpress_cz._parse_alt_title(html_chunk)[source]¶
Parse title from alternative location if not found where it should be.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: Book’s title. Return type: str
- harvester.scrappers.cpress_cz._parse_alt_url(html_chunk)[source]¶
Parse URL from alternative location if not found where it should be.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: Book’s URL. Return type: str
- harvester.scrappers.cpress_cz._parse_title_url(html_chunk)[source]¶
Parse title/name of the book and URL of the book.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: (title, url), both as strings. Return type: tuple
Parse authors of the book.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: List of structures.Author objects. Blank if no author found. Return type: list
- harvester.scrappers.cpress_cz._parse_price(html_chunk)[source]¶
Parse price of the book.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: Price as string with currency or None if not found. Return type: str/None
- harvester.scrappers.cpress_cz._parse_from_table(html_chunk, what)[source]¶
Go thru table data in html_chunk and try to locate content of the neighbor cell of the cell containing what.
Returns: Table data or None. Return type: str
- harvester.scrappers.cpress_cz._parse_ean(html_chunk)[source]¶
Parse EAN.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: EAN as string or None if not found. Return type: str/None
- harvester.scrappers.cpress_cz._parse_date(html_chunk)[source]¶
Parse date.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: date as string or None if not found. Return type: str/None
- harvester.scrappers.cpress_cz._parse_format(html_chunk)[source]¶
Parse format.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: Format as string or None if not found. Return type: str/None
- harvester.scrappers.cpress_cz._parse_description(html_chunk)[source]¶
Parse description of the book.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: Description as string or None if not found. Return type: str/None
- harvester.scrappers.cpress_cz._process_book(html_chunk)[source]¶
Parse available informations about book from the book details page.
Parameters: html_chunk (obj) – HTMLElement containing slice of the page with details. Returns: structures.Publication instance with book details. Return type: obj
- harvester.scrappers.cpress_cz.get_publications()[source]¶
Get list of publication offered by cpress.cz.
Returns: List of Publication objects. Return type: list