Gathering detailed insights and metrics for @instructor-ai/instructor
Gathering detailed insights and metrics for @instructor-ai/instructor
Gathering detailed insights and metrics for @instructor-ai/instructor
Gathering detailed insights and metrics for @instructor-ai/instructor
npm install @instructor-ai/instructor
Module System
Min. Node Version
Typescript Support
Node Version
NPM Version
590 Stars
222 Commits
55 Forks
14 Watching
43 Branches
37 Contributors
Updated on 28 Nov 2024
TypeScript (85.93%)
HTML (12.31%)
JavaScript (1.56%)
Shell (0.2%)
Cumulative downloads
Total Downloads
Last day
-21.4%
2,007
Compared to previous day
Last week
-3%
11,206
Compared to previous week
Last month
-14.2%
52,260
Compared to previous month
Last year
0%
414,510
Compared to previous year
2
22
Structured extraction in Typescript, powered by llms, designed for simplicity, transparency, and control.
Dive into the world of Typescript-based structured extraction, by OpenAI's function calling API and Zod, typeScript-first schema validation with static type inference. Instructor stands out for its simplicity, transparency, and user-centric design. Whether you're a seasoned developer or just starting out, you'll find Instructor's approach intuitive and steerable.
1bun add @instructor-ai/instructor zod openai
1npm i @instructor-ai/instructor zod openai
1pnpm add @instructor-ai/instructor zod openai
To check out all the tips and tricks to prompt and extract data, check out the documentation.
1 2import Instructor from "@instructor-ai/instructor"; 3import OpenAI from "openai" 4import { z } from "zod" 5 6const oai = new OpenAI({ 7 apiKey: process.env.OPENAI_API_KEY ?? undefined, 8 organization: process.env.OPENAI_ORG_ID ?? undefined 9}) 10 11const client = Instructor({ 12 client: oai, 13 mode: "TOOLS" 14}) 15 16const UserSchema = z.object({ 17 // Description will be used in the prompt 18 age: z.number().describe("The age of the user"), 19 name: z.string() 20}) 21 22 23// User will be of type z.infer<typeof UserSchema> 24const user = await client.chat.completions.create({ 25 messages: [{ role: "user", content: "Jason Liu is 30 years old" }], 26 model: "gpt-3.5-turbo", 27 response_model: { 28 schema: UserSchema, 29 name: "User" 30 } 31}) 32 33console.log(user) 34// { age: 30, name: "Jason Liu" }
The main class for creating an Instructor client.
createInstructor
1function createInstructor<C extends GenericClient | OpenAI>(args: { 2 client: OpenAILikeClient<C>; 3 mode: Mode; 4 debug?: boolean; 5}): InstructorClient<C>
Creates an instance of the Instructor class.
Returns the extended OpenAI-Like client.
chat.completions.create
1chat.completions.create< 2 T extends z.AnyZodObject, 3 P extends T extends z.AnyZodObject ? ChatCompletionCreateParamsWithModel<T> 4 : ClientTypeChatCompletionParams<OpenAILikeClient<C>> & { response_model: never } 5 >( 6 params: P 7 ): Promise<ReturnTypeBasedOnParams<typeof this.client, P>>
When response_model is present in the params, creates a chat completion with structured extraction based on the provided schema - otherwise will proxy back to the provided client.
Instructor supports different modes for defining the structure and format of the response from the language model. These modes are defined in the zod-stream
package and are as follows:
FUNCTIONS
(DEPRECATED): Generates a response using OpenAI's function calling API. It maps to the necessary parameters for the function calling API, including the function_call
and functions
properties.
TOOLS
: Generates a response using OpenAI's tool specification. It constructs the required parameters for the tool specification, including the tool_choice
and tools
properties.
JSON
: It sets the response_format
to json_object
and includes the JSON schema in the system message to guide the response generation. (Together & Anyscale)
MD_JSON
: Generates a response in JSON format embedded within a Markdown code block. It includes the JSON schema in the system message and expects the response to be a valid JSON object wrapped in a Markdown code block.
JSON_SCHEMA
: Generates a response using "JSON mode" that conforms to a provided JSON schema. It sets the response_format
to json_object
with the provided schema and includes the schema description in the system message.
Instructor supports partial streaming completions, allowing you to receive extracted data in real-time as the model generates its response. This can be useful for providing a more interactive user experience or processing large amounts of data incrementally.
1import Instructor from "@instructor-ai/instructor" 2import OpenAI from "openai" 3import { z } from "zod" 4 5const textBlock = ` 6 In our recent online meeting, participants from various backgrounds joined to discuss the upcoming tech conference. 7 The names and contact details of the participants were as follows: 8 9 - Name: John Doe, Email: johndoe@email.com, Twitter: @TechGuru44 10 - Name: Jane Smith, Email: janesmith@email.com, Twitter: @DigitalDiva88 11 - Name: Alex Johnson, Email: alexj@email.com, Twitter: @CodeMaster2023 12 13 During the meeting, we agreed on several key points. The conference will be held on March 15th, 2024, at the Grand Tech Arena located at 4521 Innovation Drive. Dr. Emily Johnson, a renowned AI researcher, will be our keynote speaker. The budget for the event is set at $50,000, covering venue costs, speaker fees, and promotional activities. 14 15 Each participant is expected to contribute an article to the conference blog by February 20th. A follow-up meeting is scheduled for January 25th at 3 PM GMT to finalize the agenda and confirm the list of speakers. 16` 17 18async function extractData() { 19 const ExtractionSchema = z.object({ 20 users: z.array( 21 z.object({ 22 name: z.string(), 23 handle: z.string(), 24 twitter: z.string() 25 }) 26 ).min(3), 27 location: z.string(), 28 budget: z.number() 29 }) 30 31 const oai = new OpenAI({ 32 apiKey: process.env.OPENAI_API_KEY ?? undefined, 33 organization: process.env.OPENAI_ORG_ID ?? undefined 34 }) 35 36 const client = Instructor({ 37 client: oai, 38 mode: "TOOLS" 39 }) 40 41 const extractionStream = await client.chat.completions.create({ 42 messages: [{ role: "user", content: textBlock }], 43 model: "gpt-3.5-turbo", 44 response_model: { 45 schema: ExtractionSchema, 46 name: "Extraction" 47 }, 48 max_retries: 3, 49 stream: true 50 }) 51 52 let extractedData = {} 53 for await (const result of extractionStream) { 54 extractedData = result 55 console.log("Partial extraction:", result) 56 } 57 58 console.log("Final extraction:", extractedData) 59} 60 61extractData()
In this example, we define an ExtractionSchema using Zod to specify the structure of the data we want to extract. We then create an Instructor client with streaming enabled and pass the schema to the response_model parameter.
The extractionStream variable holds an async generator that yields partial extraction results as they become available. We iterate over the stream using a for await...of loop, updating the extractedData object with each partial result and logging it to the console.
Finally, we log the complete extracted data once the stream is exhausted.
Instructor supports various providers that adhere to the OpenAI API specification. You can easily switch between providers by configuring the appropriate client and specifying the desired model and mode.
Anyscale
1import Instructor from "@instructor-ai/instructor" 2import OpenAI from "openai" 3import { z } from "zod" 4 5const UserSchema = z.object({ 6 age: z.number(), 7 name: z.string().refine(name => name.includes(" "), { 8 message: "Name must contain a space" 9 }) 10}) 11 12async function extractUser() { 13 const client = new OpenAI({ 14 baseURL: "https://api.endpoints.anyscale.com/v1", 15 apiKey: process.env.ANYSCALE_API_KEY 16 }) 17 18 const instructor = Instructor({ 19 client: client, 20 mode: "TOOLS" 21 }) 22 23 const user = await instructor.chat.completions.create({ 24 messages: [{ role: "user", content: "Jason Liu is 30 years old" }], 25 model: "mistralai/Mixtral-8x7B-Instruct-v0.1", 26 response_model: { 27 schema: UserSchema, 28 name: "User" 29 }, 30 max_retries: 4 31 }) 32 33 return user 34} 35 36const anyscaleUser = await extractUser() 37console.log("Anyscale user:", anyscaleUser)
Together
1import Instructor from "@instructor-ai/instructor" 2import OpenAI from "openai" 3import { z } from "zod" 4 5const UserSchema = z.object({ 6 age: z.number(), 7 name: z.string().refine(name => name.includes(" "), { 8 message: "Name must contain a space" 9 }) 10}) 11 12async function extractUser() { 13 const client = new OpenAI({ 14 baseURL: "https://api.together.xyz/v1", 15 apiKey: process.env.TOGETHER_API_KEY 16 }) 17 18 const instructor = Instructor({ 19 client: client, 20 mode: "TOOLS" 21 }) 22 23 const user = await instructor.chat.completions.create({ 24 messages: [{ role: "user", content: "Jason Liu is 30 years old" }], 25 model: "mistralai/Mixtral-8x7B-Instruct-v0.1", 26 response_model: { 27 schema: UserSchema, 28 name: "User" 29 }, 30 max_retries: 4 31 }) 32 33 return user 34} 35 36const togetherUser = await extractUser() 37console.log("Together user:", togetherUser)
In these examples, we specify a specific base URL and API key from Anyscale, and Together..
The extractUser function takes the model, mode, and provider as parameters. It retrieves the corresponding provider configuration, creates an OpenAI client, and initializes an Instructor instance with the specified mode.
We then call instructor.chat.completions.create with the desired model, response schema, and other parameters to extract the user information.
By varying the provider, model, and mode arguments when calling extractUser, you can easily switch between different providers and configurations.
Instructor supports integration with providers that don't adhere to the OpenAI SDK, such as Anthropic, Azure, and Cohere, through the llm-polyglot
library maintained by @dimitrikennedy. This library provides a unified interface for interacting with various language models across different providers.
1import { createLLMClient } from "llm-polyglot" 2import Instructor from "@instructor-ai/instructor" 3import { z } from "zod" 4 5const anthropicClient = createLLMClient({ 6 provider: "anthropic", 7 apiKey: process.env.ANTHROPIC_API_KEY 8}) 9 10const UserSchema = z.object({ 11 age: z.number(), 12 name: z.string() 13}) 14 15const instructor = Instructor<typeof anthropicClient>({ 16 client: anthropicClient, 17 mode: "TOOLS" 18}) 19 20async function extractUser() { 21 const user = await instructor.chat.completions.create({ 22 model: "claude-3-opus-20240229", 23 max_tokens: 1000, 24 messages: [ 25 { 26 role: "user", 27 content: "My name is Dimitri Kennedy." 28 } 29 ], 30 response_model: { 31 name: "extract_name", 32 schema: UserSchema 33 } 34 }) 35 36 return user 37} 38 39// Example usage 40const extractedUser = await extractUser() 41console.log("Extracted user:", extractedUser)
In this example, we use the createLLMClient function from the llm-polyglot library to create a client for the Anthropic provider. We pass the provider name ("anthropic") and the corresponding API key to the function.
Next, we define a UserSchema using Zod to specify the structure of the user data we want to extract.
We create an Instructor instance by passing the Anthropic client and the desired mode to the Instructor function. Note that we use Instructor
The extractUser function demonstrates how to use the Instructor instance to extract user information from a given input. We call instructor.chat.completions.create with the appropriate model ("claude-3-opus-20240229" in this case), parameters, and the response_model that includes our UserSchema.
Finally, we log the extracted user information.
By leveraging the llm-polyglot library, Instructor enables seamless integration with a wide range of providers beyond those that follow the OpenAI SDK. This allows you to take advantage of the unique capabilities and models offered by different providers while still benefiting from the structured extraction and validation features of Instructor.
For additional support and information on using other providers with llm-polyglot, please refer to the library's documentation and examples.
If you'd like to see more check out our cookbook.
Installing Instructor is a breeze.
Instructor is built on top of several powerful packages from the Island AI toolkit, developed and maintained by Dimitri Kennedy. These packages provide essential functionality for structured data handling and streaming with Large Language Models.
zod-stream is a client module that interfaces directly with LLM streams. It utilizes Schema-Stream for efficient parsing and is equipped with tools for processing raw responses from OpenAI, categorizing them by mode (function, tools, JSON, etc.), and ensuring proper error handling and stream conversion. It's ideal for API integration delivering structured LLM response streams.
schema-stream is a JSON streaming parser that incrementally constructs and updates response models based on Zod schemas. It's designed for real-time data processing and incremental model hydration.
llm-polyglot is a library that provides a unified interface for interacting with various language models across different providers, such as OpenAI, Anthropic, Azure, and Cohere. It simplifies the process of working with multiple LLM providers and enables seamless integration with Instructor.
Instructor leverages the power of these Island AI packages to deliver a seamless and efficient experience for structured data extraction and streaming with LLMs. The collaboration between Dimitri Kennedy, the creator of Island AI, and Jason Liu, the author of the original Instructor Python package, has led to the development of the TypeScript version of Instructor, which introduces the concept of partial JSON streaming from LLM's.
For more information about Island AI and its packages, please refer to the Island AI repository.
The question of using Instructor is fundamentally a question of why to use zod.
Works with the OpenAI SDK — Instructor follows OpenAI's API. This means you can use the same API for both prompting and extraction across multiple providers that support the OpenAI API.
Customizable — Zod is highly customizable. You can define your own validators, custom error messages, and more.
Ecosystem Zod is the most widely used data validation library for Typescript.
Battle Tested — Zod is downloaded over 24M times per month, and supported by a large community of contributors.
If you want to help out, checkout some of the issues marked as good-first-issue
or help-wanted
. Found here. They could be anything from code improvements, a guest blog post, or a new cook book.
Checkout the contribution guide for details on how to set things up, testing, changesets and guidelines.
ℹ️ Tip: Support in other languages
Check out ports to other languages below:
- [Python](https://www.github.com/jxnl/instructor)
- [Elixir](https://github.com/thmsmlr/instructor_ex/)
If you want to port Instructor to another language, please reach out to us on [Twitter](https://twitter.com/jxnlco) we'd love to help you get started!
This project is licensed under the terms of the MIT License.
No vulnerabilities found.
No security vulnerabilities found.