Gathering detailed insights and metrics for hyparquet
Gathering detailed insights and metrics for hyparquet
Gathering detailed insights and metrics for hyparquet
Gathering detailed insights and metrics for hyparquet
npm install hyparquet
Typescript
Module System
Node Version
NPM Version
JavaScript (100%)
Total Downloads
0
Last Day
0
Last Week
0
Last Month
0
Last Year
0
MIT License
534 Stars
421 Commits
22 Forks
9 Watchers
5 Branches
11 Contributors
Updated on Jul 12, 2025
Latest Version
1.17.1
Package Id
hyparquet@1.17.1
Unpacked Size
162.96 kB
Size
41.17 kB
File Count
64
NPM Version
11.3.0
Node Version
22.16.0
Published on
Jul 02, 2025
Cumulative downloads
Total Downloads
Last Day
0%
NaN
Compared to previous day
Last Week
0%
NaN
Compared to previous week
Last Month
0%
NaN
Compared to previous month
Last Year
0%
NaN
Compared to previous year
Dependency free since 2023!
Hyparquet is a lightweight, dependency-free, pure JavaScript library for parsing Apache Parquet files. Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for efficiently storing and processing large datasets.
Hyparquet aims to be the world's most compliant parquet parser. And it runs in the browser.
Try hyparquet online: Drag and drop your parquet file onto hyperparam.app to view it directly in your browser. This service is powered by hyparquet's in-browser capabilities.
Parquet is widely used in data engineering and data science for its efficient storage and processing of large datasets. What if you could use parquet files directly in the browser, without needing a server or backend infrastructure? That's what hyparquet enables.
Existing JavaScript-based parquet readers (like parquetjs) are no longer actively maintained, may not support streaming or in-browser processing efficiently, and often rely on dependencies that can inflate your bundle size. Hyparquet is actively maintained and designed with modern web usage in mind.
Check out a minimal parquet viewer demo that shows how to integrate hyparquet into a react web application using HighTable.
In the browser use asyncBufferFromUrl
to wrap a url for reading asynchronously over the network.
It is recommended that you filter by row and column to limit fetch size:
1const { asyncBufferFromUrl, parquetReadObjects } = await import('https://cdn.jsdelivr.net/npm/hyparquet/src/hyparquet.min.js') 2 3const url = 'https://hyperparam-public.s3.amazonaws.com/bunnies.parquet' 4const file = await asyncBufferFromUrl({ url }) // wrap url for async fetching 5const data = await parquetReadObjects({ 6 file, 7 columns: ['Breed Name', 'Lifespan'], 8 rowStart: 10, 9 rowEnd: 20, 10})
To read the contents of a local parquet file in a node.js environment use asyncBufferFromFile
:
1const { asyncBufferFromFile, parquetReadObjects } = await import('hyparquet') 2 3const file = await asyncBufferFromFile('example.parquet') 4const data = await parquetReadObjects({ file })
Note: hyparquet is published as an ES module, so dynamic import()
may be required for old versions of node.
To create parquet files from javascript, check out the hyparquet-writer package.
You can read just the metadata, including schema and data statistics using the parquetMetadataAsync
function.
To load parquet metadata in the browser from a remote server:
1import { parquetMetadataAsync, parquetSchema } from 'hyparquet' 2 3const file = await asyncBufferFromUrl({ url }) 4const metadata = await parquetMetadataAsync(file) 5// Get total number of rows (convert bigint to number) 6const numRows = Number(metadata.num_rows) 7// Get nested table schema 8const schema = parquetSchema(metadata) 9// Get top-level column header names 10const columnNames = schema.children.map(e => e.element.name)
You can also read the metadata synchronously using parquetMetadata
if you have an array buffer with the parquet footer:
1import { parquetMetadata } from 'hyparquet' 2 3const metadata = parquetMetadata(arrayBuffer)
Hyparquet requires an argument file
of type AsyncBuffer
. An AsyncBuffer
is similar to a js ArrayBuffer
but the slice
method can return async Promise<ArrayBuffer>
.
1type Awaitable<T> = T | Promise<T> 2interface AsyncBuffer { 3 byteLength: number 4 slice(start: number, end?: number): Awaitable<ArrayBuffer> 5}
In most cases, you should probably use asyncBufferFromUrl
or asyncBufferFromFile
to create an AsyncBuffer
for hyparquet.
If you want to read a parquet file remotely over http, use asyncBufferFromUrl
to wrap an http url as an AsyncBuffer
using http range requests.
requestInit
option to provide additional fetch headers for authentication (optional)byteLength
if you know the file size to save a round trip HEAD request (optional)1const url = 'https://s3.hyperparam.app/wiki_en.parquet' 2const requestInit = { headers: { Authorization: 'Bearer my_token' } } // auth header 3const byteLength = 415958713 // optional 4const file: AsyncBuffer = await asyncBufferFromUrl({ url, requestInit, byteLength }) 5const data = await parquetReadObjects({ file })
If you are in a node.js environment, use asyncBufferFromFile
to wrap a local file as an AsyncBuffer
:
1import { asyncBufferFromFile, parquetReadObjects } from 'hyparquet' 2 3const file: AsyncBuffer = await asyncBufferFromFile('example.parquet') 4const data = await parquetReadObjects({ file })
You can provide an ArrayBuffer
anywhere that an AsyncBuffer
is expected. This is useful if you already have the entire parquet file in memory.
You can implement your own AsyncBuffer
to create a virtual file that can be read asynchronously by hyparquet.
parquetReadObjects
is a convenience wrapper around parquetRead
that returns the complete rows as Promise<Record<string, any>[]>
. This is the simplest way to read parquet files.
1parquetReadObjects({ file }): Promise<Record<string, any>[]>
parquetRead
is the "base" function for reading parquet files.
It returns a Promise<void>
that resolves when the file has been read or rejected if an error occurs.
Data is returned via onComplete
or onChunk
or onPage
callbacks passed as arguments.
The reason for this design is that parquet is a column-oriented format, and returning data in row-oriented format requires transposing the column data. This is an expensive operation in javascript. If you don't pass in an onComplete
argument to parquetRead
, hyparquet will skip this transpose step and save memory.
The onChunk
callback returns column-oriented data as it is ready. onChunk
will always return top-level columns, including structs, assembled as a single column. This may require waiting for multiple sub-columns to all load before assembly can occur.
The onPage
callback returns column-oriented page data as it is ready. onPage
will NOT assemble struct columns and will always return individual sub-column data. Note that onPage
will assemble nested lists.
In some cases, onPage
can return data sooner than onChunk
.
1interface ColumnData { 2 columnName: string 3 columnData: ArrayLike<any> 4 rowStart: number 5 rowEnd: number 6} 7await parquetRead({ 8 file, 9 onChunk(chunk: ColumnData) { 10 console.log('chunk', chunk) 11 }, 12 onPage(chunk: ColumnData) { 13 console.log('page', chunk) 14 }, 15})
By default, the onComplete
function returns an array of values for each row: [value]
. If you would prefer each row to be an object: { columnName: value }
, set the option rowFormat
to 'object'
.
1import { parquetRead } from 'hyparquet' 2 3await parquetRead({ 4 file, 5 rowFormat: 'object', 6 onComplete: data => console.log(data), 7})
The parquetReadObjects
function defaults to rowFormat: 'object'
.
The parquet format is known to be a sprawling format which includes options for a wide array of compression schemes, encoding types, and data structures. Hyparquet supports all parquet encodings: plain, dictionary, rle, bit packed, delta, etc.
Hyparquet is the most compliant parquet parser on earth — hyparquet can open more files than pyarrow, rust, and duckdb.
By default, hyparquet supports uncompressed and snappy-compressed parquet files. To support the full range of parquet compression codecs (gzip, brotli, zstd, etc), use the hyparquet-compressors package.
Codec | hyparquet | with hyparquet-compressors |
---|---|---|
Uncompressed | ✅ | ✅ |
Snappy | ✅ | ✅ |
GZip | ❌ | ✅ |
LZO | ❌ | ✅ |
Brotli | ❌ | ✅ |
LZ4 | ❌ | ✅ |
ZSTD | ❌ | ✅ |
LZ4_RAW | ❌ | ✅ |
For faster snappy decompression, try hysnappy, which uses WASM for a 40% speed boost on large parquet files.
You can include support for ALL parquet compressors
plus hysnappy using the hyparquet-compressors package.
1import { parquetReadObjects } from 'hyparquet' 2import { compressors } from 'hyparquet-compressors' 3 4const data = await parquetReadObjects({ file, compressors })
Contributions are welcome! If you have suggestions, bug reports, or feature requests, please open an issue or submit a pull request.
Hyparquet development is supported by an open-source grant from Hugging Face :hugs:
No vulnerabilities found.
No security vulnerabilities found.