content_parser

class content_parser.rule.Rule(path)

Class that helps extract data from html, using rule description language Parslepy

path

a string path to rule file

Rule.get()

gets rule (caching the result)

Returns

dictionary with rules how to extract data

Rule.parse(html)

Extracts data from html using extraction rule

Parameters

html – a string with html

Returns

dictionary with extracted data from html

class content_parser.spiders.common_scraper.CommonScrapper(**kwargs)

Class that extracts data from html. It is a part of Scrapy framework.

rule_path

a string path to rule file

schema_path

a string path to schema file that describes the domain