cpress.cz scrapper¶

This module is used to download metadata informations from cpress.cz.

harvester.scrappers.cpress_cz._parse_alt_title(html_chunk)[source]¶

Parse title from alternative location if not found where it should be.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	Book’s title.
Return type:	str

harvester.scrappers.cpress_cz._parse_alt_url(html_chunk)[source]¶

Parse URL from alternative location if not found where it should be.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	Book’s URL.
Return type:	str

harvester.scrappers.cpress_cz._parse_title_url(html_chunk)[source]¶

Parse title/name of the book and URL of the book.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	(title, url), both as strings.
Return type:	tuple

harvester.scrappers.cpress_cz._parse_authors(html_chunk)[source]¶

Parse authors of the book.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	List of `structures.Author` objects. Blank if no author found.
Return type:	list

harvester.scrappers.cpress_cz._parse_price(html_chunk)[source]¶

Parse price of the book.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	Price as string with currency or None if not found.
Return type:	str/None

harvester.scrappers.cpress_cz._parse_from_table(html_chunk, what)[source]¶

Go thru table data in html_chunk and try to locate content of the neighbor cell of the cell containing what.

Returns:	Table data or None.
Return type:	str

harvester.scrappers.cpress_cz._parse_ean(html_chunk)[source]¶

Parse EAN.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	EAN as string or None if not found.
Return type:	str/None

harvester.scrappers.cpress_cz._parse_date(html_chunk)[source]¶

Parse date.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	date as string or None if not found.
Return type:	str/None

harvester.scrappers.cpress_cz._parse_format(html_chunk)[source]¶

Parse format.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	Format as string or None if not found.
Return type:	str/None

harvester.scrappers.cpress_cz._parse_description(html_chunk)[source]¶

Parse description of the book.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	Description as string or None if not found.
Return type:	str/None

harvester.scrappers.cpress_cz._process_book(html_chunk)[source]¶

Parse available informations about book from the book details page.

Parameters:	html_chunk (obj) – HTMLElement containing slice of the page with details.
Returns:	`structures.Publication` instance with book details.
Return type:	obj

harvester.scrappers.cpress_cz.get_publications()[source]¶

Get list of publication offered by cpress.cz.

Returns:	List of `Publication` objects.
Return type:	list

harvester.scrappers.cpress_cz.self_test()[source]¶

Perform basic selftest.

Returns:	When everything is ok.
Return type:	True
Raises:	`AssertionError` – When there is some problem.