npmpackage.info

Gathering detailed insights and metrics for hyparquet

Other packages similar to hyparquet

hyparquet-compressors

1.1.1

Decompressors for hyparquet

hyparquet-writer

0.6.0

Parquet file writer for JavaScript

icebird

0.3.0

Apache Iceberg client for javascript

@iqmo/hyparquet

1.13.1

Parquet file parser for JavaScript

Gathering detailed insights and metrics for hyparquet

hyparquet - 1.17.1 | npmpackage.info

hyparquet

parquet file parser for javascript

1.17.1

534

MIT

JavaScript

162.96 kB

Installations

npm install hyparquet

Developer Guide

BETA

Typescript

Yes

Module System

ESM

Node Version

22.16.0

NPM Version

11.3.0 Pull Requests

Open

4

Total

50

Closed

3

Merged

43

Issues

Open

9

Total

50

Closed

41

Releases

Unable to fetch releases

Languages

JavaScript

JavaScript (100%)

Developer

hyparam

Download Statistics

Total Downloads

Last Day

Last Week

Last Month

Last Year

GitHub Statistics

MIT License

534 Stars

421 Commits

22 Forks

9 Watchers

5 Branches

11 Contributors

Updated on Jul 12, 2025

Maintainers

View All 11 Contributors

Package Meta Information

Latest Version

1.17.1

Package Id

hyparquet@1.17.1

Unpacked Size

162.96 kB

Size

41.17 kB

File Count

NPM Version

11.3.0

Node Version

22.16.0

Published on

Jul 02, 2025

Total Downloads

Cumulative downloads

Total Downloads

NaN

Last Day

NaN

Compared to previous day

Last Week

NaN

Compared to previous week

Last Month

NaN

Compared to previous month

Last Year

NaN

Compared to previous year

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dev Dependencies

@types/node @vitest/coverage-v8 eslint eslint-plugin-jsdoc hyparquet-compressors typescript vitest

hyparquet

hyparquet parakeet

Dependency free since 2023!

What is hyparquet?

Hyparquet is a lightweight, dependency-free, pure JavaScript library for parsing Apache Parquet files. Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for efficiently storing and processing large datasets.

Hyparquet aims to be the world's most compliant parquet parser. And it runs in the browser.

Parquet Viewer

Try hyparquet online: Drag and drop your parquet file onto hyperparam.app to view it directly in your browser. This service is powered by hyparquet's in-browser capabilities.

Features

Browser-native: Built to work seamlessly in the browser, opening up new possibilities for web-based data applications and visualizations.
Performant: Designed to efficiently process large datasets by only loading the required data, making it suitable for big data and machine learning applications.
TypeScript: Includes TypeScript definitions.
Dependency-free: Hyparquet has zero dependencies, making it lightweight and easy to use in any JavaScript project. Only 9.7kb min.gz!
Highly Compliant: Supports all parquet encodings, compression codecs, and can open more parquet files than any other library.

Why hyparquet?

Parquet is widely used in data engineering and data science for its efficient storage and processing of large datasets. What if you could use parquet files directly in the browser, without needing a server or backend infrastructure? That's what hyparquet enables.

Existing JavaScript-based parquet readers (like parquetjs) are no longer actively maintained, may not support streaming or in-browser processing efficiently, and often rely on dependencies that can inflate your bundle size. Hyparquet is actively maintained and designed with modern web usage in mind.

Demo

Check out a minimal parquet viewer demo that shows how to integrate hyparquet into a react web application using HighTable.

Live Demo: https://hyparam.github.io/demos/hyparquet/
Demo Source Code: https://github.com/hyparam/demos/tree/master/hyparquet

Quick Start

Browser Example

In the browser use asyncBufferFromUrl to wrap a url for reading asynchronously over the network. It is recommended that you filter by row and column to limit fetch size:

1const { asyncBufferFromUrl, parquetReadObjects } = await import('https://cdn.jsdelivr.net/npm/hyparquet/src/hyparquet.min.js')
2
3const url = 'https://hyperparam-public.s3.amazonaws.com/bunnies.parquet'
4const file = await asyncBufferFromUrl({ url }) // wrap url for async fetching
5const data = await parquetReadObjects({
6  file,
7  columns: ['Breed Name', 'Lifespan'],
8  rowStart: 10,
9  rowEnd: 20,
10})

Node.js Example

To read the contents of a local parquet file in a node.js environment use asyncBufferFromFile:

1const { asyncBufferFromFile, parquetReadObjects } = await import('hyparquet')
2
3const file = await asyncBufferFromFile('example.parquet')
4const data = await parquetReadObjects({ file })

Note: hyparquet is published as an ES module, so dynamic import() may be required for old versions of node.

Parquet Writing

To create parquet files from javascript, check out the hyparquet-writer package.

Advanced Usage

Reading Metadata

You can read just the metadata, including schema and data statistics using the parquetMetadataAsync function. To load parquet metadata in the browser from a remote server:

1import { parquetMetadataAsync, parquetSchema } from 'hyparquet'
2
3const file = await asyncBufferFromUrl({ url })
4const metadata = await parquetMetadataAsync(file)
5// Get total number of rows (convert bigint to number)
6const numRows = Number(metadata.num_rows)
7// Get nested table schema
8const schema = parquetSchema(metadata)
9// Get top-level column header names
10const columnNames = schema.children.map(e => e.element.name)

You can also read the metadata synchronously using parquetMetadata if you have an array buffer with the parquet footer:

1import { parquetMetadata } from 'hyparquet'
2
3const metadata = parquetMetadata(arrayBuffer)

AsyncBuffer

Hyparquet requires an argument file of type AsyncBuffer. An AsyncBuffer is similar to a js ArrayBuffer but the slice method can return async Promise<ArrayBuffer>.

1type Awaitable<T> = T | Promise<T>
2interface AsyncBuffer {
3  byteLength: number
4  slice(start: number, end?: number): Awaitable<ArrayBuffer>
5}

In most cases, you should probably use asyncBufferFromUrl or asyncBufferFromFile to create an AsyncBuffer for hyparquet.

asyncBufferFromUrl

If you want to read a parquet file remotely over http, use asyncBufferFromUrl to wrap an http url as an AsyncBuffer using http range requests.

Pass requestInit option to provide additional fetch headers for authentication (optional)
Pass byteLength if you know the file size to save a round trip HEAD request (optional)

1const url = 'https://s3.hyperparam.app/wiki_en.parquet'
2const requestInit = { headers: { Authorization: 'Bearer my_token' } } // auth header
3const byteLength = 415958713 // optional
4const file: AsyncBuffer = await asyncBufferFromUrl({ url, requestInit, byteLength })
5const data = await parquetReadObjects({ file })

asyncBufferFromFile

If you are in a node.js environment, use asyncBufferFromFile to wrap a local file as an AsyncBuffer:

1import { asyncBufferFromFile, parquetReadObjects } from 'hyparquet'
2
3const file: AsyncBuffer = await asyncBufferFromFile('example.parquet')
4const data = await parquetReadObjects({ file })

ArrayBuffer

You can provide an ArrayBuffer anywhere that an AsyncBuffer is expected. This is useful if you already have the entire parquet file in memory.

Custom AsyncBuffer

You can implement your own AsyncBuffer to create a virtual file that can be read asynchronously by hyparquet.

parquetRead vs parquetReadObjects

parquetReadObjects

parquetReadObjects is a convenience wrapper around parquetRead that returns the complete rows as Promise<Record<string, any>[]>. This is the simplest way to read parquet files.

1parquetReadObjects({ file }): Promise<Record<string, any>[]>

parquetRead

parquetRead is the "base" function for reading parquet files. It returns a Promise<void> that resolves when the file has been read or rejected if an error occurs. Data is returned via onComplete or onChunk or onPage callbacks passed as arguments.

The reason for this design is that parquet is a column-oriented format, and returning data in row-oriented format requires transposing the column data. This is an expensive operation in javascript. If you don't pass in an onComplete argument to parquetRead, hyparquet will skip this transpose step and save memory.

Chunk Streaming

The onChunk callback returns column-oriented data as it is ready. onChunk will always return top-level columns, including structs, assembled as a single column. This may require waiting for multiple sub-columns to all load before assembly can occur.

The onPage callback returns column-oriented page data as it is ready. onPage will NOT assemble struct columns and will always return individual sub-column data. Note that onPage will assemble nested lists.

In some cases, onPage can return data sooner than onChunk.

1interface ColumnData {
2  columnName: string
3  columnData: ArrayLike<any>
4  rowStart: number
5  rowEnd: number
6}
7await parquetRead({
8  file,
9  onChunk(chunk: ColumnData) {
10    console.log('chunk', chunk)
11  },
12  onPage(chunk: ColumnData) {
13    console.log('page', chunk)
14  },
15})

Returned row format

By default, the onComplete function returns an array of values for each row: [value]. If you would prefer each row to be an object: { columnName: value }, set the option rowFormat to 'object'.

1import { parquetRead } from 'hyparquet'
2
3await parquetRead({
4  file,
5  rowFormat: 'object',
6  onComplete: data => console.log(data),
7})

The parquetReadObjects function defaults to rowFormat: 'object'.

Supported Parquet Files

The parquet format is known to be a sprawling format which includes options for a wide array of compression schemes, encoding types, and data structures. Hyparquet supports all parquet encodings: plain, dictionary, rle, bit packed, delta, etc.

Hyparquet is the most compliant parquet parser on earth — hyparquet can open more files than pyarrow, rust, and duckdb.

Compression

By default, hyparquet supports uncompressed and snappy-compressed parquet files. To support the full range of parquet compression codecs (gzip, brotli, zstd, etc), use the hyparquet-compressors package.

Codec	hyparquet	with hyparquet-compressors
Uncompressed	✅	✅
Snappy	✅	✅
GZip	❌	✅
LZO	❌	✅
Brotli	❌	✅
LZ4	❌	✅
ZSTD	❌	✅
LZ4_RAW	❌	✅

hysnappy

For faster snappy decompression, try hysnappy, which uses WASM for a 40% speed boost on large parquet files.

hyparquet-compressors

You can include support for ALL parquet compressors plus hysnappy using the hyparquet-compressors package.

1import { parquetReadObjects } from 'hyparquet'
2import { compressors } from 'hyparquet-compressors'
3
4const data = await parquetReadObjects({ file, compressors })

References

Contributions

Contributions are welcome! If you have suggestions, bug reports, or feature requests, please open an issue or submit a pull request.

Hyparquet development is supported by an open-source grant from Hugging Face :hugs:

No vulnerabilities found.

No security vulnerabilities found.