ben.cz scrapper¶

This module is used to download last 100 books published by ben.cz.

harvester.scrappers.ben_cz.URL = 'http://shop.ben.cz/Produkty.aspx?lang=cz&nak=BEN+-+technick%u00e1+literatura'¶: Base url of the eshop.

harvester.scrappers.ben_cz._get_last_td(el)[source]¶

Return last <td> found in el DOM.

Parameters:	el (obj) – `dhtmlparser.HTMLElement` instance.
Returns:	HTMLElement instance if found, or None if there are no <td> tags.
Return type:	obj

harvester.scrappers.ben_cz._get_td_or_none(details, ID)[source]¶

Get <tr> tag with given ID and return content of the last <td> tag from <tr> root.

Parameters:	details (obj) – `dhtmlparser.HTMLElement` instance. ID (str) – id property of the <tr> tag.
Returns:	Content of the last <td> as strign.
Return type:	str

harvester.scrappers.ben_cz._parse_title(dom, details)[source]¶

Parse title/name of the book.

Parameters:	dom (obj) – HTMLElement containing whole HTML page. details (obj) – HTMLElement containing slice of the page with details.
Returns:	Book’s title.
Return type:	str
Raises:	`AssertionError` – If title not found.

harvester.scrappers.ben_cz._parse_authors(details)[source]¶

Parse authors of the book.

Parameters:	details (obj) – HTMLElement containing slice of the page with details.
Returns:	List of `structures.Author` objects. Blank if no author found.
Return type:	list

harvester.scrappers.ben_cz._parse_publisher(details)[source]¶

Parse publisher of the book.

Parameters:	details (obj) – HTMLElement containing slice of the page with details.
Returns:	Publisher’s name as string or None if not found.
Return type:	str/None

harvester.scrappers.ben_cz._parse_price(details)[source]¶

Parse price of the book.

Parameters:	details (obj) – HTMLElement containing slice of the page with details.
Returns:	Price as string with currency or None if not found.
Return type:	str/None

harvester.scrappers.ben_cz._parse_pages_binding(details)[source]¶

Parse number of pages and binding of the book.

Parameters:	details (obj) – HTMLElement containing slice of the page with details.
Returns:	Tuple with two string or two None.
Return type:	(pages, binding)

harvester.scrappers.ben_cz._parse_ISBN_EAN(details)[source]¶

Parse ISBN and EAN.

Parameters:	details (obj) – HTMLElement containing slice of the page with details.
Returns:	Tuple with two string or two None.
Return type:	(ISBN, EAN)

harvester.scrappers.ben_cz._parse_edition(details)[source]¶

Parse edition (vydání) of the book.

Parameters:	details (obj) – HTMLElement containing slice of the page with details.
Returns:	Edition as string with currency or None if not found.
Return type:	str/None

harvester.scrappers.ben_cz._parse_description(details)[source]¶

Parse description of the book.

Parameters:	details (obj) – HTMLElement containing slice of the page with details.
Returns:	Details as string with currency or None if not found.
Return type:	str/None

harvester.scrappers.ben_cz._process_book(book_url)[source]¶

Parse available informations about book from the book details page.

Parameters:	book_url (str) – Absolute URL of the book.
Returns:	`structures.Publication` instance with book details.
Return type:	obj

harvester.scrappers.ben_cz.get_publications()[source]¶

Get list of publication offered by ben.cz.

Returns:	List of `structures.Publication` objects.
Return type:	list

harvester.scrappers.ben_cz.self_test()[source]¶

Perform basic selftest.

Returns:	When everything is ok.
Return type:	True
Raises:	`AssertionError` – When there is some problem.