S.Lott -- Software Architect
  • Other Articles
  • Publications
  • S.Lott Biography

Spreadsheet Regrets

Date Tue 08 October 2019 Tags #python / data conversion / beautiful soup / spreadsheet

I can't emphasize this enough.

Some people, when confronted with a problem, think

“I know, I'll use a spreadsheet.” Now they have two problems.

(This was originally about regular expressions. And AWK. See http://regex.info/blog/2006-09-15/247)

Fiction writer F. L. Stevens got a list of literary agents …

more ...

Another HTML Cleanup

Date Tue 10 November 2009 Tags HTML / #python / beautiful soup

Browsers are required to skip over bad HTML and render something.

Consequently, many web sites have significant HTML errors that don't show up until you try to scrape their content.

Beautiful Soup has a handy hook for doing markup massage prior to parsing. This is a way of fixing site-specific …

more ...

Parsing HTML from Microsoft Products (Like Front Page, etc.)

Date Fri 06 November 2009 Tags HTML / #python / beautiful soup

Ugh. When you try to parse MS-generated HTML, you find some extension syntax that is completely befuddling.

I've tried a few things in the past, none were particularly good.

In reading a file recently, I found that even Beautiful Soup was unable to prettify or parse it.

The document was …

more ...

What I love about Python == what I hate about the word of open source

Date Wed 04 July 2007 Tags open-source / Python / Community / HTML / Beautiful Soup

The problem with Python is the vastness of the Open Source community. You may think you have something cool for HTML parsing , but then someone tells you about Beautiful Soup which already does it.

In my defense, I actually did a version of this HTML parsing back in '02. Indeed …

more ...

What I love about Python == What I hate about the HTML mixed-content model

Date Tue 03 July 2007 Tags open-source / Python / HTML / Beautiful Soup

The mixed content model, defined succinctly in the XML standards, is pleasant enough for human communication, but leaves a lot to be desired. For example, mapping a mixed content model to a relational database is a hard problem.

The problem is made worse when the document is HTML. HTML doesn't …

more ...

  • Social

    • Mastodon
    • Github
    • StackOverflow
    • LinkedIn
    • O'Reilly
    • Amazon
  • Categories

    • Architecture & Design
    • Books
    • FOSS
    • Management
    • News
    • Python
    • Technologies
  • Links

    • Pelican
    • Python.org
    • Jinja2
  • Archive

    • October 2019 (1)
    • November 2009 (2)
    • July 2007 (2)

© 2019 S.Lott · Powered by pelican-bootstrap3, Pelican, Bootstrap

Back to top