Gathering detailed insights and metrics for @tscircuit/prompt-benchmarks
Gathering detailed insights and metrics for @tscircuit/prompt-benchmarks
Gathering detailed insights and metrics for @tscircuit/prompt-benchmarks
Gathering detailed insights and metrics for @tscircuit/prompt-benchmarks
Benchmarking for tscircuit system prompts for different coding tasks
npm install @tscircuit/prompt-benchmarks
Typescript
Module System
Node Version
NPM Version
TypeScript (100%)
Total Downloads
0
Last Day
0
Last Week
0
Last Month
0
Last Year
0
MIT License
2 Stars
168 Commits
6 Forks
1 Watchers
8 Branches
7 Contributors
Updated on Feb 28, 2025
Latest Version
0.0.44
Package Id
@tscircuit/prompt-benchmarks@0.0.44
Unpacked Size
626.38 kB
Size
166.27 kB
File Count
59
NPM Version
10.8.2
Node Version
20.18.3
Published on
Feb 28, 2025
Cumulative downloads
Total Downloads
Last Day
0%
NaN
Compared to previous day
Last Week
0%
NaN
Compared to previous week
Last Month
0%
NaN
Compared to previous month
Last Year
0%
NaN
Compared to previous year
12
1
Docs · Website · Twitter · Discord · Quickstart · Online Playground
This repository contains benchmarks for evaluating and improving the quality of system prompts used to generate tscircuit code. It includes components for:
lib/code-runner
): Safely transpiles, evaluates, and renders TSX code for circuit generation.lib/ai
): Interfaces with Openai’s models for prompt completions and error correction.lib/utils
): Provide logging, snapshot management, and type-checking of generated circuits.lib/prompt-templates
): Define various prompt structures for generating different circuit types.benchmarks/scorers
): Run multiple tests to ensure circuit validity and quality.You can install this package from npm using Bun:
1bun add @tscircuit/prompt-benchmarks
Below is the TscircuitCoder interface:
1export interface TscircuitCoder { 2 onStreamedChunk: (chunk: string) => void 3 onVfsChanged: () => void 4 vfs: { [filepath: string]: string } 5 availableOptions: { name: string; options: string[] }[] 6 submitPrompt: ( 7 prompt: string, 8 options?: { selectedMicrocontroller?: string }, 9 ) => Promise<void> 10}
*Note: The createTscirciutCoder
function now accepts an optional openaiClient
parameter to override the default openai client. This allows you to provide a custom client.
The AI Coder supports streaming of AI responses and notifying you when the virtual file system (VFS) is updated. To achieve this, you can pass two callback functions when creating an TscircuitCoder instance:
Example Usage:
1import { createTscircuitCoder } from "@tscircuit/prompt-benchmarks/lib/ai/tscircuitCoder" 2 3// Define a callback for handling streamed chunks 4const handleStream = (chunk: string) => { 5 console.log("Streaming update:", chunk) 6} 7 8// Define a callback for when the VFS is updated 9const handleVfsUpdate = () => { 10 console.log("The virtual file system has been updated.") 11} 12 13// Create an instance of TscircuitCoder with your callbacks 14const tscircuitCoder = createTscircuitCoder(handleStream, handleVfsUpdate) 15 16// Submit a prompt to generate a circuit. 17// The onStream callback logs streaming updates and onVfsChanged notifies when a new file is added to the VFS. 18tscircuitCoder.submitPrompt("create a circuit that blinks an LED")
To run the benchmarks using evalite, use:
1bun start
Each prompt is processed multiple times to test:
After modifying prompts or system components, evalite reruns automatically, you should skip the benchmarks you don't want to run.
This project uses TOML files to define problem sets for circuit generation. Each problem is defined using a TOML array of tables with the following format:
1[[problems]] 2prompt = """ 3Your circuit prompt description goes here. 4""" 5title = "Sample Problem Title" 6questions = [ 7 { text = "Question text", answer = true }, 8 { text = "Another question text", answer = false } 9]
In each problem:
prompt
field must contain the circuit description that instructs the AI.title
gives a short title for the problem.questions
array contains objects with a text
property (the question) and an answer
property (a boolean) used to validate the generated circuit.To add a new problem set, create a new TOML file in the problem-sets
directory following this format. Each new file can contain one or more problems defined with the [[problems]]
header.
bun run build
bun run test
bun start
The benchmarks directory contains various files to help evaluate and score circuit‐generating prompts:
• benchmarks/prompt-logs/
These are text files (e.g., prompt-2025-02-05T14-07-18-242Z.txt, prompt-2025-02-05T14-10-53-144Z.txt, etc.) that log each prompt attempt and its output. They serve as a history of interactions.
• benchmarks/benchmark-local-circuit-error-correction.eval.ts
Runs local circuit evaluation with an error correction workflow. It repeatedly calls the AI (up to a set maximum) until the circuit output meets expectations, logging each attempt.
• benchmarks/benchmark-local-circuit.eval.ts
Evaluates a local circuit by running a specific user prompt and checking that the generated circuit compiles and meets expected behaviors.
• benchmarks/benchmark-local-circuit-random.eval.ts
Generates random prompts using an AI-powered prompt generator and evaluates their corresponding circuit outputs. This file is useful for stress-testing and assessing the robustness of circuit generation.
• benchmarks/scorers/ai-circuit-scorer.ts
Uses an AI model to assign a score (from 0 to 1) based on correctness, appropriate use of components, circuit complexity, and code quality.
• benchmarks/scorers/circuit-scorer.ts
A basic scorer that checks each generated circuit against predefined questions and answers from problem sets.
MIT License
No vulnerabilities found.
No security vulnerabilities found.