npmpackage.info

Gathering detailed insights and metrics for @huggingface/inference

Other packages similar to @huggingface/inference

huggingface

1.4.0

Typescript wrapper for the Hugging Face Inference API

huggingface-mcp-server

1.0.26

MCP Server for HuggingFace inference endpoints with custom LoRA and story generation

@mrlol/inference

2.0.1

Typescript wrapper for the Hugging Face Inference API

@huggingface/widgets

0.2.12

Open-source version of the inference widgets from huggingface.co

Gathering detailed insights and metrics for @huggingface/inference

@huggingface/inference - 4.5.1 | npmpackage.info

@huggingface/inference

Use Hugging Face with JavaScript

4.5.1

2,172

MIT

TypeScript

1.01 MB

Installations

npm install @huggingface/inference

Developer Guide

BETA

Typescript

Yes

Module System

ESM

Min. Node Version

>=18

Node Version

20.19.3

NPM Version

10.8.2 Pull Requests

Open

66

Total

1,260

Closed

103

Merged

1,091

Issues

Open

107

Total

337

Closed

230

Releases

Inference 2.0

inference-v2.0.0

Updated on Apr 19, 2023

View All 1 releases

Languages

TypeScript

JavaScript

Python

Jinja

Rust

Shell

Svelte

HTML

TypeScript (85.86%)

JavaScript (10.57%)

Python (1.54%)

Jinja (0.85%)

Rust (0.79%)

Shell (0.35%)

Svelte (0.03%)

HTML (0.01%)

Developer

huggingface

Download Statistics

Total Downloads

Last Day

Last Week

Last Month

Last Year

GitHub Statistics

MIT License

2,172 Stars

1,700 Commits

459 Forks

50 Watchers

88 Branches

334 Contributors

Updated on Jul 13, 2025

Maintainers

View All 334 Contributors

Package Meta Information

Latest Version

4.5.1

Package Id

@huggingface/inference@4.5.1

Unpacked Size

1.01 MB

Size

142.52 kB

File Count

596

NPM Version

10.8.2

Node Version

20.19.3

Published on

Jul 11, 2025

Total Downloads

Cumulative downloads

Total Downloads

NaN

Last Day

NaN

Compared to previous day

Last Week

NaN

Compared to previous week

Last Month

NaN

Compared to previous month

Last Year

NaN

Compared to previous year

Weekly Downloads

Monthly Downloads

Yearly Downloads

Dependencies

@huggingface/tasks @huggingface/jinja

Dev Dependencies

@types/node

🤗 Hugging Face Inference

A Typescript powered wrapper that provides a unified interface to run inference across multiple services for models hosted on the Hugging Face Hub:

Inference Providers: a streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners. This new approach builds on our previous Serverless Inference API, offering more models, improved performance, and greater reliability thanks to world-class providers. Refer to the documentation for a list of supported providers.
Inference Endpoints: a product to easily deploy models to production. Inference is run by Hugging Face in a dedicated, fully managed infrastructure on a cloud provider of your choice.
Local endpoints: you can also run inference with local inference servers like llama.cpp, Ollama, vLLM, LiteLLM, or Text Generation Inference (TGI) by connecting the client to these local endpoints.

Getting Started

Install

Node

1npm install @huggingface/inference
2
3pnpm add @huggingface/inference
4
5yarn add @huggingface/inference

Deno

1// esm.sh
2import { InferenceClient } from "https://esm.sh/@huggingface/inference";
3// or npm:
4import { InferenceClient } from "npm:@huggingface/inference";

Initialize

1import { InferenceClient } from '@huggingface/inference';
2
3const hf = new InferenceClient('your access token');

❗Important note: Always pass an access token. Join Hugging Face and then visit access tokens to generate your access token for free.

Your access token should be kept private. If you need to protect it in front-end applications, we suggest setting up a proxy server that stores the access token.

Using Inference Providers

You can send inference requests to third-party providers with the inference client.

Currently, we support the following providers:

To send requests to a third-party provider, you have to pass the provider parameter to the inference function. The default value of the provider parameter is "auto", which will select the first of the providers available for the model, sorted by your preferred order in https://hf.co/settings/inference-providers.

1const accessToken = "hf_..."; // Either a HF access token, or an API key from the third-party provider (Replicate in this example)
2
3const client = new InferenceClient(accessToken);
4await client.textToImage({
5  provider: "replicate",
6  model:"black-forest-labs/Flux.1-dev",
7  inputs: "A black forest cake"
8})

You also have to make sure your request is authenticated with an access token. When authenticated with a Hugging Face access token, the request is routed through https://huggingface.co. When authenticated with a third-party provider key, the request is made directly against that provider's inference API.

Only a subset of models are supported when requesting third-party providers. You can check the list of supported models per pipeline tasks here:

❗Important note: To be compatible, the third-party API must adhere to the "standard" shape API we expect on HF model pages for each pipeline task type. This is not an issue for LLMs as everyone converged on the OpenAI API anyways, but can be more tricky for other tasks like "text-to-image" or "automatic-speech-recognition" where there exists no standard API. Let us know if any help is needed or if we can make things easier for you!

👋Want to add another provider? Get in touch if you'd like to add support for another Inference provider, and/or request it on https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49

Tree-shaking

You can import the functions you need directly from the module instead of using the InferenceClient class.

1import { textGeneration } from "@huggingface/inference";
2
3await textGeneration({
4  accessToken: "hf_...",
5  model: "model_or_endpoint",
6  inputs: ...,
7  parameters: ...
8})

This will enable tree-shaking by your bundler.

Error handling

The inference package provides specific error types to help you handle different error scenarios effectively.

Error Types

The package defines several error types that extend the base Error class:

InferenceClientError: Base error class for all Hugging Face Inference errors
InferenceClientInputError: Thrown when there are issues with input parameters
InferenceClientProviderApiError: Thrown when there are API-level errors from providers
InferenceClientHubApiError: Thrown when there are API-levels errors from the Hugging Face Hub
InferenceClientProviderOutputError: Thrown when there are issues with providers' API responses format

Example Usage

1import { InferenceClient } from "@huggingface/inference";
2import {
3  InferenceClientError,
4  InferenceClientProviderApiError,
5  InferenceClientProviderOutputError,
6  InferenceClientHubApiError,
7} from "@huggingface/inference";
8
9const client = new InferenceClient();
10
11try {
12  const result = await client.textGeneration({
13    model: "gpt2",
14    inputs: "Hello, I'm a language model",
15  });
16} catch (error) {
17  if (error instanceof InferenceClientProviderApiError) {
18    // Handle API errors (e.g., rate limits, authentication issues)
19    console.error("Provider API Error:", error.message);
20    console.error("HTTP Request details:", error.request);
21    console.error("HTTP Response details:", error.response);
22  if (error instanceof InferenceClientHubApiError) {
23    // Handle API errors (e.g., rate limits, authentication issues)
24    console.error("Hub API Error:", error.message);
25    console.error("HTTP Request details:", error.request);
26    console.error("HTTP Response details:", error.response);
27  } else if (error instanceof InferenceClientProviderOutputError) {
28    // Handle malformed responses from providers
29    console.error("Provider Output Error:", error.message);
30  } else if (error instanceof InferenceClientInputError) {
31    // Handle invalid input parameters
32    console.error("Input Error:", error.message);
33  } else {
34    // Handle unexpected errors
35    console.error("Unexpected error:", error);
36  }
37}
38
39/// Catch all errors from @huggingface/inference
40try {
41  const result = await client.textGeneration({
42    model: "gpt2",
43    inputs: "Hello, I'm a language model",
44  });
45} catch (error) {
46  if (error instanceof InferenceClientError) {
47    // Handle errors from @huggingface/inference
48    console.error("Error from InferenceClient:", error);
49  } else {
50    // Handle unexpected errors
51    console.error("Unexpected error:", error);
52  }
53}

Error Details

InferenceClientProviderApiError

This error occurs when there are issues with the API request when performing inference at the selected provider.

It has several properties:

message: A descriptive error message
request: Details about the failed request (URL, method, headers)
response: Response details including status code and body

InferenceClientHubApiError

This error occurs when there are issues with the API request when requesting the Hugging Face Hub API.

It has several properties:

message: A descriptive error message
request: Details about the failed request (URL, method, headers)
response: Response details including status code and body

InferenceClientProviderOutputError

This error occurs when a provider returns a response in an unexpected format.

InferenceClientInputError

This error occurs when input parameters are invalid or missing. The error message describes what's wrong with the input.

Natural Language Processing

Text Generation

Generates text from an input prompt.

1await hf.textGeneration({
2  model: 'mistralai/Mixtral-8x7B-v0.1',
3  provider: "together",
4  inputs: 'The answer to the universe is'
5})
6
7for await (const output of hf.textGenerationStream({
8  model: "mistralai/Mixtral-8x7B-v0.1",
9  provider: "together",
10  inputs: 'repeat "one two three four"',
11  parameters: { max_new_tokens: 250 }
12})) {
13  console.log(output.token.text, output.generated_text);
14}

Chat Completion

Generate a model response from a list of messages comprising a conversation.

1// Non-streaming API
2const out = await hf.chatCompletion({
3  model: "Qwen/Qwen3-32B",
4  provider: "cerebras",
5  messages: [{ role: "user", content: "Hello, nice to meet you!" }],
6  max_tokens: 512,
7  temperature: 0.1,
8});
9
10// Streaming API
11let out = "";
12for await (const chunk of hf.chatCompletionStream({
13  model: "Qwen/Qwen3-32B",
14  provider: "cerebras",
15  messages: [
16    { role: "user", content: "Can you help me solve an equation?" },
17  ],
18  max_tokens: 512,
19  temperature: 0.1,
20})) {
21  if (chunk.choices && chunk.choices.length > 0) {
22    out += chunk.choices[0].delta.content;
23  }
24}

Feature Extraction

This task reads some text and outputs raw float values, that are usually consumed as part of a semantic database/semantic search.

1await hf.featureExtraction({
2  model: "sentence-transformers/distilbert-base-nli-mean-tokens",
3  inputs: "That is a happy person",
4});

Fill Mask

Tries to fill in a hole with a missing word (token to be precise).

1await hf.fillMask({
2  model: 'bert-base-uncased',
3  inputs: '[MASK] world!'
4})

Summarization

Summarizes longer text into shorter text. Be careful, some models have a maximum length of input.

1await hf.summarization({
2  model: 'facebook/bart-large-cnn',
3  inputs:
4    'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930.',
5  parameters: {
6    max_length: 100
7  }
8})

Question Answering

Answers questions based on the context you provide.

1await hf.questionAnswering({
2  model: 'deepset/roberta-base-squad2',
3  inputs: {
4    question: 'What is the capital of France?',
5    context: 'The capital of France is Paris.'
6  }
7})

Table Question Answering

1await hf.tableQuestionAnswering({
2  model: 'google/tapas-base-finetuned-wtq',
3  inputs: {
4    query: 'How many stars does the transformers repository have?',
5    table: {
6      Repository: ['Transformers', 'Datasets', 'Tokenizers'],
7      Stars: ['36542', '4512', '3934'],
8      Contributors: ['651', '77', '34'],
9      'Programming language': ['Python', 'Python', 'Rust, Python and NodeJS']
10    }
11  }
12})

Text Classification

Often used for sentiment analysis, this method will assign labels to the given text along with a probability score of that label.

1await hf.textClassification({
2  model: 'distilbert-base-uncased-finetuned-sst-2-english',
3  inputs: 'I like you. I love you.'
4})

Token Classification

Used for sentence parsing, either grammatical, or Named Entity Recognition (NER) to understand keywords contained within text.

1await hf.tokenClassification({
2  model: 'dbmdz/bert-large-cased-finetuned-conll03-english',
3  inputs: 'My name is Sarah Jessica Parker but you can call me Jessica'
4})

Translation

Converts text from one language to another.

1await hf.translation({
2  model: 't5-base',
3  inputs: 'My name is Wolfgang and I live in Berlin'
4})
5
6await hf.translation({
7  model: 'facebook/mbart-large-50-many-to-many-mmt',
8  inputs: textToTranslate,
9  parameters: {
10  "src_lang": "en_XX",
11  "tgt_lang": "fr_XX"
12 }
13})

Zero-Shot Classification

Checks how well an input text fits into a set of labels you provide.

1await hf.zeroShotClassification({
2  model: 'facebook/bart-large-mnli',
3  inputs: [
4    'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!'
5  ],
6  parameters: { candidate_labels: ['refund', 'legal', 'faq'] }
7})

Sentence Similarity

Calculate the semantic similarity between one text and a list of other sentences.

1await hf.sentenceSimilarity({
2  model: 'sentence-transformers/paraphrase-xlm-r-multilingual-v1',
3  inputs: {
4    source_sentence: 'That is a happy person',
5    sentences: [
6      'That is a happy dog',
7      'That is a very happy person',
8      'Today is a sunny day'
9    ]
10  }
11})

Audio

Automatic Speech Recognition

Transcribes speech from an audio file.

Demo

1await hf.automaticSpeechRecognition({
2  model: 'facebook/wav2vec2-large-960h-lv60-self',
3  data: readFileSync('test/sample1.flac')
4})

Audio Classification

Assigns labels to the given audio along with a probability score of that label.

Demo

1await hf.audioClassification({
2  model: 'superb/hubert-large-superb-er',
3  data: readFileSync('test/sample1.flac')
4})

Text To Speech

Generates natural-sounding speech from text input.

Interactive tutorial

1await hf.textToSpeech({
2  model: 'espnet/kan-bayashi_ljspeech_vits',
3  inputs: 'Hello world!'
4})

Audio To Audio

Outputs one or multiple generated audios from an input audio, commonly used for speech enhancement and source separation.

1await hf.audioToAudio({
2  model: 'speechbrain/sepformer-wham',
3  data: readFileSync('test/sample1.flac')
4})

Computer Vision

Image Classification

Assigns labels to a given image along with a probability score of that label.

Demo

1await hf.imageClassification({
2  data: readFileSync('test/cheetah.png'),
3  model: 'google/vit-base-patch16-224'
4})

Object Detection

Detects objects within an image and returns labels with corresponding bounding boxes and probability scores.

Demo

1await hf.objectDetection({
2  data: readFileSync('test/cats.png'),
3  model: 'facebook/detr-resnet-50'
4})

Image Segmentation

Detects segments within an image and returns labels with corresponding bounding boxes and probability scores.

1await hf.imageSegmentation({
2  data: readFileSync('test/cats.png'),
3  model: 'facebook/detr-resnet-50-panoptic'
4})

Image To Text

Outputs text from a given image, commonly used for captioning or optical character recognition.

1await hf.imageToText({
2  data: readFileSync('test/cats.png'),
3  model: 'nlpconnect/vit-gpt2-image-captioning'
4})

Text To Image

Creates an image from a text prompt.

Demo

1await hf.textToImage({
2  model: 'black-forest-labs/FLUX.1-dev',
3  inputs: 'a picture of a green bird'
4})

Image To Image

Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain.

Interactive tutorial

1await hf.imageToImage({
2  inputs: new Blob([readFileSync("test/stormtrooper_depth.png")]),
3  parameters: {
4    prompt: "elmo's lecture",
5  },
6  model: "lllyasviel/sd-controlnet-depth",
7});

Zero Shot Image Classification

Checks how well an input image fits into a set of labels you provide.

1await hf.zeroShotImageClassification({
2  model: 'openai/clip-vit-large-patch14-336',
3  inputs: {
4    image: await (await fetch('https://placekitten.com/300/300')).blob()
5  },
6  parameters: {
7    candidate_labels: ['cat', 'dog']
8  }
9})

Multimodal

Visual Question Answering

Visual Question Answering is the task of answering open-ended questions based on an image. They output natural language responses to natural language questions.

Demo

1await hf.visualQuestionAnswering({
2  model: 'dandelin/vilt-b32-finetuned-vqa',
3  inputs: {
4    question: 'How many cats are lying down?',
5    image: await (await fetch('https://placekitten.com/300/300')).blob()
6  }
7})

Document Question Answering

Document question answering models take a (document, question) pair as input and return an answer in natural language.

Demo

1await hf.documentQuestionAnswering({
2  model: 'impira/layoutlm-document-qa',
3  inputs: {
4    question: 'Invoice number?',
5    image: await (await fetch('https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png')).blob(),
6  }
7})

Tabular

Tabular Regression

Tabular regression is the task of predicting a numerical value given a set of attributes.

1await hf.tabularRegression({
2  model: "scikit-learn/Fish-Weight",
3  inputs: {
4    data: {
5      "Height": ["11.52", "12.48", "12.3778"],
6      "Length1": ["23.2", "24", "23.9"],
7      "Length2": ["25.4", "26.3", "26.5"],
8      "Length3": ["30", "31.2", "31.1"],
9      "Species": ["Bream", "Bream", "Bream"],
10      "Width": ["4.02", "4.3056", "4.6961"]
11    },
12  },
13})

Tabular Classification

Tabular classification is the task of classifying a target category (a group) based on set of attributes.

1await hf.tabularClassification({
2  model: "vvmnnnkv/wine-quality",
3  inputs: {
4    data: {
5      "fixed_acidity": ["7.4", "7.8", "10.3"],
6      "volatile_acidity": ["0.7", "0.88", "0.32"],
7      "citric_acid": ["0", "0", "0.45"],
8      "residual_sugar": ["1.9", "2.6", "6.4"],
9      "chlorides": ["0.076", "0.098", "0.073"],
10      "free_sulfur_dioxide": ["11", "25", "5"],
11      "total_sulfur_dioxide": ["34", "67", "13"],
12      "density": ["0.9978", "0.9968", "0.9976"],
13      "pH": ["3.51", "3.2", "3.23"],
14      "sulphates": ["0.56", "0.68", "0.82"],
15      "alcohol": ["9.4", "9.8", "12.6"]
16    },
17  },
18})

You can use any Chat Completion API-compatible provider with the chatCompletion method.

1// Chat Completion Example
2const MISTRAL_KEY = process.env.MISTRAL_KEY;
3const hf = new InferenceClient(MISTRAL_KEY, {
4  endpointUrl: "https://api.mistral.ai",
5});
6const stream = hf.chatCompletionStream({
7  model: "mistral-tiny",
8  messages: [{ role: "user", content: "Complete the equation one + one = , just the answer" }],
9});
10let out = "";
11for await (const chunk of stream) {
12  if (chunk.choices && chunk.choices.length > 0) {
13    out += chunk.choices[0].delta.content;
14    console.log(out);
15  }
16}

Using Inference Endpoints

The examples we saw above use inference providers. While these prove to be very useful for prototyping and testing things quickly. Once you're ready to deploy your model to production, you'll need to use a dedicated infrastructure. That's where Inference Endpoints comes into play. It allows you to deploy any model and expose it as a private API. Once deployed, you'll get a URL that you can connect to:

1import { InferenceClient } from '@huggingface/inference';
2
3const hf = new InferenceClient("hf_xxxxxxxxxxxxxx", {
4	endpointUrl: "https://j3z5luu0ooo76jnl.us-east-1.aws.endpoints.huggingface.cloud/v1/",
5});
6
7const response = await hf.chatCompletion({
8	messages: [
9		{
10			role: "user",
11			content: "What is the capital of France?",
12		},
13	],
14});
15
16console.log(response.choices[0].message.content);

By default, all calls to the inference endpoint will wait until the model is loaded. When scaling to 0 is enabled on the endpoint, this can result in non-trivial waiting time. If you'd rather disable this behavior and handle the endpoint's returned 500 HTTP errors yourself, you can do so like so:

1const hf = new InferenceClient("hf_xxxxxxxxxxxxxx", {
2	endpointUrl: "https://j3z5luu0ooo76jnl.us-east-1.aws.endpoints.huggingface.cloud/v1/",
3});
4
5const response = await hf.chatCompletion(
6	{
7		messages: [
8			{
9				role: "user",
10				content: "What is the capital of France?",
11			},
12		],
13	},
14	{
15		retry_on_error: false,
16	}
17);

Using local endpoints

You can use InferenceClient to run chat completion with local inference servers (llama.cpp, vllm, litellm server, TGI, mlx, etc.) running on your own machine. The API should be OpenAI API-compatible.

1import { InferenceClient } from '@huggingface/inference';
2
3const hf = new InferenceClient(undefined, {
4	endpointUrl: "http://localhost:8080",
5});
6
7const response = await hf.chatCompletion({
8	messages: [
9		{
10			role: "user",
11			content: "What is the capital of France?",
12		},
13	],
14});
15
16console.log(response.choices[0].message.content);

Similarily to the OpenAI JS client, InferenceClient can be used to run Chat Completion inference with any OpenAI REST API-compatible endpoint.

Running tests

1HF_TOKEN="your access token" pnpm run test

Finding appropriate models

We have an informative documentation project called Tasks to list available models for each task and explain how each task works in detail.

It also contains demos, example outputs, and other resources should you want to dig deeper into the ML side of things.

Dependencies

@huggingface/tasks : Typings only

No vulnerabilities found.

No security vulnerabilities found.