From someone in the process of becoming a data scientist. They had a question on regular expressions, which made almost no sense. It appears that the core concepts of ETL -- Extracting source data, Transforming it into a useful form and the Loading into some persistent storage for long-term analysis -- had not been embraced. It appears the design pattern was unknown. All I could gather from the sketchy email chain was that something involving regular expressions had become difficult. I wrote this in response: Handling Irregular File Formats. Here's part of the follow-up. "I have been focusing on the math associated w/ math optimization. I have been using spreadsheets to perform the computations." Really. Spreadsheets. The ETL pipeline question/rant/complaint was part of loading a spreadsheet? That seems somehow wrong. There are real tools available that really do real data science work. The word "optimization" hints that scipy.optimize might be a more useful exercise than hacking around with spreadsheets. Perhaps some advice from a real data scientist might help: http://www.becomingadatascientist.com