This workshop is in the past and registrations are unavailable.
All times are Pacific Time Zone (Vancouver, BC, Canada).
About the workshop
PyTables is a free and open-source Python library for managing large hierarchical datasets. It is built on top of numpy and the HDF5 scientific dataset library, and it focuses both on performance and interactive analysis of very large datasets. For large data streams (think multi-dimensional arrays or billions of records) it outperforms databases in terms of speed, memory usage and I/O bandwidth, although it is not a replacement to traditional relational databases as PyTables does not support broad relationships between dataset variables. PyTables can be even used to organize a workflow with many (thousands to millions) of small files, as you can create a PyTables database of nodes that can be used like regular opened files in Python. This lets you store a large number of arbitrary files in a PyTables database with on-the-fly compression, making it very efficient for handling huge amounts of data.
This workshop will guide you through the basics with no previous PyTables or HDF5 knowledge.
- Bring your own laptop.
- Some basic Python knowledge would be useful, although many attendees will probably pick it up on the fly, as we'll try to go slowly.