Parsing HTML from Microsoft Products (Like Front Page, etc.)
Ugh. When you try to parse MS-generated HTML, you find some extension syntax that is completely befuddling.
I've tried a few things in the past, none were particularly good.
In reading a file recently, I found that even Beautiful Soup was unable to prettify or parse it.
The document was …
more ...