This workshop is geared toward people who may have a small to medium amount of programming experience or willingness to learn new things around this topic, but don't have a lot of familiarity with text parsing itself.
Topics will include:
- In-depth testing and use of regular expressions
- basic examples of how to automatically "scrape" data from a web page
- discussions around the most painless ways of parsing XML, including a brief introduction to XPath
- figuring out workflows that make sense for projects of different sizes and people of different experience levels.
After this workshop, you'll be able to crawl the internet or other databases for specific chunks of text to extract in bulk. This is useful for data collection, data cleaning, or any other text parsing challenges you might encounter.
KEY, SFU's Big Data Initiative, provides tools, training and expertise to unlock the potential of Big Data. We are engaging people in advanced computing for innovation in teaching, research and community impact.