Sitemap

5 DuckDB CSV→Parquet Conversions That Slash Storage

Practical, lossless rewrites that turn messy CSVs into lean, query-ready Parquet — often reducing footprint by ~70%.

5 min readOct 7, 2025
Press enter or click to view image in full size

Five DuckDB patterns to convert CSV to Parquet with big savings: tight typing, dictionary-friendly enums, sorted pages, partition+compaction, and decimal tuning.

You know the drill: CSVs sprawl, files multiply, and every join starts with a full scan. Then someone asks, “Why is this dataset 300 GB?” The fastest, cheapest win I’ve shipped for teams is a DuckDB-powered CSV→Parquet rewrite with a few careful transformations. It’s boring — in the best possible way. Do it once, and your storage bill and query times both fall.

Below are five conversion designs that consistently deliver large, lossless reductions (often ~50–70%, sometimes more) while making downstream analytics faster and saner.

1) Tight typing at ingest (ditch string-everything)

What it fixes: CSV has no types, so everything looks like VARCHAR. Parquet compresses best when columns have the right types: BOOLEAN, DATE, TIMESTAMP, INT, DECIMAL(p,s).

How to do it (fully lazy & accurate):

-- Scan with full-file inference for…
Bhagya Rana

Written by Bhagya Rana

Professional & Motivational Helping you turn chaos into clarity through systems, focus & smart productivity. Progress over perfection. ⏳✨

No responses yet

Write a response