pywikitools.htmltools.beautify_html#
Module Contents#
Classes#
Take the original HTML coming from mediawiki and remove unnecessary tags or attributes. |
- class pywikitools.htmltools.beautify_html.BeautifyHTML(img_src_base: str = '/files/', change_hrefs: Dict[str, str] = None, img_src_rewrite: Dict[str, str] = None)#
Take the original HTML coming from mediawiki and remove unnecessary tags or attributes.
This involves removing of comments, removing some CSS classes and rewriting <img src=”” so that the resulting HTML can be used elsewhere
- process_html(self, text: str) str#
Entry function: Expects input from fortraininglib.get_page_html() and returns improved html
TODO For English pages you need to take fortraininglib.get_page_html(“Prayer/en”). Don’t use fortraininglib.get_page_html(“Prayer”) as we would need to remove the [edit] sections TODO think of a better architecture?
- img_rewrite_handler(self, element)#
Do some rewriting of <img> elements
In our default implementation we remove the srcset attribute (as we don’t need it) and apply replacements for the src attribute.
You can customize the behaviour by sub-classing BeautifyHTML and overwriting this method @param element: Part of the BeautifulSoup data structure, will be modified directly