Transforming, Analyzing, and Visualizing Data with a Fast and Expressive DataFrame API
Jeroen Janssens / Thijs Nieuwdorp
overview
Unlock the power of Polars, a Python package for transforming, analyzing, and visualizing data. In this hands-on guide, Jeroen Janssens and Thijs Nieuwdorp walk you through every feature of Polars, showing you how to use it for real-world tasks like data wrangling, exploratory data analysis, building pipelines, and more.
Whether you're a seasoned data professional or new to data science, you'll quickly master Polars' expressive API and its underlying concepts. You don't need to have experience with pandas, but if you do, this book will help you make a seamless transition. The many practical examples and real-world datasets are available on GitHub, so you can easily follow along.
Process data from CSV, Parquet, spreadsheets, databases, and the cloud
Get a solid understanding of Expressions, the building blocks of every query
Handle complex data types, including text, time, and nested structures
Use both eager and lazy APIs, and know when to use each
Visualize your data with Altair, hvPlot, plotnine, and Great Tables
Extend Polars with your own Python functions and Rust plugins
Leverage GPU acceleration to boost performance even further
contents
Table of contents
1. First steps
Overview
Using Polars in a Docker Container
Crash Course JupyterLab
Keyboard Shortcuts
Installing Polars
Compiling Polars from Scratch
Edge case: Very large datasets (4.2 billion rows+)
Edge case: Processors lacking AVX support (e.g. CPUs older than 2011, Apple Silicon)
Configuring Polars
Temporary configuration using a context manager
Local configuration using a decorator
Downloading datasets and code examples
Conclusion
2. Data Types and Data Structures
Arrow Data Types
Nested data types
Series, DataFrame, and LazyFrame
Data Type Conversion
Conclusion
3. Reading and Writing Data
Reading CSV Files
Parsing Missing Values Correctly
Reading Files with Encodings Other than UTF-8
Reading Excel Spreadsheets
Working With Multiple Files
Reading Parquet
Reading JSON and NDJSON
JSON
NDJSON
Other File Formats
Querying Databases
Writing Data
CSV Format
Excel Format
Parquet Format
Other Considerations
Conclusion
About the Authors