OpenAI / AI / Assistants

OpenAI Assistants API

Threads, File Search (RAG bez infrastruktury), Code Interpreter (Python sandbox), Function Calling i streaming SSE w TypeScript.

File Search

RAG wbudowany

Code Interp.

Python sandbox

Functions

Własne tools

Threads

Stateful chat

6 narzędzi Assistants API — porównanie

File Search, Code Interpreter, Function Calling, Streaming, Vector Store i Thread — typ, koszty i zastosowanie.

Narzędzie	Typ	Koszty	Kiedy
File Search	Wbudowane RAG	$0.10/GB/dzień Vector Store	Dokumenty, FAQ, knowledge base
Code Interpreter	Python sandbox	$0.03/session	Analiza danych, wykresy, konwersja plików
Function Calling	Custom tools	Tylko tokeny	Własne API, DB, akcje zewnętrzne
Streaming SSE	Real-time output	Brak dodatkowych	UX — natychmiastowa odpowiedź
Vector Store	Embedding storage	$0.10/GB/dzień	Persistentna baza wiedzy dla Asistanta
Thread	Persystentna historia	Brak (tokeny w Run)	Multi-turn rozmowy, user sessions

Często zadawane pytania

Co to jest OpenAI Assistants API i jak działa?

Assistants API: budowanie AI asystentów z persystentnym stanem. Bez ręcznego zarządzania historią rozmów. Kluczowe koncepty: Assistant — skonfigurowany agent (model, instructions, tools). Thread — wątek konwersacji (persystentna historia). Message — wiadomość w wątku (user/assistant). Run — uruchomienie Asistanta na Threads. Run Step — poszczególne kroki w Run. Tworzenie Assistanta: const assistant = await openai.beta.assistants.create({name: 'Code Helper', instructions: 'You are expert TypeScript developer...', model: 'gpt-4o', tools: [{type: 'code_interpreter'}, {type: 'file_search'}]}). Tworzenie Threada: const thread = await openai.beta.threads.create(). Dodawanie wiadomości: await openai.beta.threads.messages.create(thread.id, {role: 'user', content: 'Fix this TypeScript error...'}). Uruchomienie: const run = await openai.beta.threads.runs.create(thread.id, {assistant_id: assistant.id}). Polling na wynik: await openai.beta.threads.runs.poll(thread.id, run.id). Odczyt odpowiedzi: const messages = await openai.beta.threads.messages.list(thread.id). Assistants v2 (2024): Vector Stores. File Search (RAG wbudowany). Streaming SSE. Parallel tool calls. Persystentny Thread — użytkownik wraca, historia zachowana.

File Search i Vector Stores — RAG bez własnej infrastruktury?

File Search: wbudowane RAG w Assistants API. Nie trzeba własnej bazy wektorowej. Automatyczne chunking i embedding dokumentów. Semantic search nad plikami. Vector Store tworzenie: const vectorStore = await openai.beta.vectorStores.create({name: 'Product Docs', expires_after: {anchor: 'last_active_at', days: 7}}). Upload pliku: const file = await openai.files.create({file: fs.createReadStream('docs.pdf'), purpose: 'assistants'}). Dodanie do Vector Store: await openai.beta.vectorStores.files.create(vectorStore.id, {file_id: file.id}). Poczekaj na przetworzenie: await openai.beta.vectorStores.fileBatches.poll(vectorStore.id, batch.id). Podpięcie do Assistanta: await openai.beta.assistants.update(assistant.id, {tool_resources: {file_search: {vector_store_ids: [vectorStore.id]}}}). Chunking strategy: max_chunk_size_tokens: 800. chunk_overlap_tokens: 400. Obsługiwane formaty: PDF, docx, xlsx, txt, md, json, csv, html i inne. Max 512 MB per file. Przy query: LLM automatycznie decyduje kiedy szukać. Cytaty z pliku w odpowiedzi. annotations w message.content — linki do źródeł. Wiele Vector Stores per Assistant (max 1 per thread też). Thread-level Vector Store dla tymczasowych plików. Limity: 10 000 plików per Vector Store. 100 plików per upload batch. Expiration policy — zarządzanie kosztami.

Code Interpreter — Python w sandboxie dla analizy danych?

Code Interpreter: uruchamia Python w sandboxie. Analiza danych, wykresy, obliczenia. Automatycznie pisze i poprawia kod. Narzędzia: matplotlib, numpy, pandas, scipy — preinstalowane. Tworzenie plików (CSV, PDF, XLSX). Upload pliku do analizy: const file = await openai.files.create({file: fs.createReadStream('data.csv'), purpose: 'assistants'}). Thread z plikiem: await openai.beta.threads.messages.create(thread.id, {role: 'user', content: 'Analyze this sales data and create a chart', attachments: [{file_id: file.id, tools: [{type: 'code_interpreter'}]}]}). Odpowiedź z wykresem: message.content zawiera image_file lub text. image_file.file_id — ID wygenerowanego obrazu. Pobranie obrazu: const imageData = await openai.files.content(file_id). Zastosowania: Analiza CSV/Excel uploadowanych przez użytkownika. Generowanie raportów PDF. Obliczenia statystyczne. Konwersja formatów plików. Debugging kodu — LLM poprawia własny kod przy błędach. Limity: timeout 120s per code execution. Max 50 plików per Thread. 512 MB per plik. Sandbox reset po każdym Run. Koszty: $0.03 per session (nieciągłe — per Run). Koszt plus tokeny. Streaming output: Run.step events. tool_calls events z kodem. Code Interpreter + File Search razem: kombinowanie RAG z analizą.

Function Calling w Assistants API — własne narzędzia?

Function Calling: Asistant wywołuje twoje funkcje. Jak tool calling w Chat Completions ale z persystentnym stanem. Definicja funkcji przy tworzeniu Assistanta: tools: [{type: 'function', function: {name: 'get_weather', description: 'Get current weather for a city', parameters: {type: 'object', properties: {city: {type: 'string'}, unit: {type: 'string', enum: ['celsius', 'fahrenheit']}}, required: ['city']}}}]. Run Flow z function calling: Run status: 'requires_action'. run.required_action.type === 'submit_tool_outputs'. run.required_action.submit_tool_outputs.tool_calls[0]. Wykonanie funkcji: const toolCall = run.required_action.submit_tool_outputs.tool_calls[0]. const args = JSON.parse(toolCall.function.arguments). const result = await getWeather(args.city, args.unit). Submit wynik: await openai.beta.threads.runs.submitToolOutputs(thread.id, run.id, {tool_outputs: [{tool_call_id: toolCall.id, output: JSON.stringify(result)}]}). Parallel function calls: wiele tool_calls na raz. Promise.all dla równoległego wykonania. Submit wszystkich wyników razem. Polling helper: openai.beta.threads.runs.poll() — do 'completed' lub 'requires_action'. Streaming z function calling: openai.beta.threads.runs.stream(). event: 'thread.run.requires_action'. submitToolOutputsStream(). Bezpieczeństwo: waliduj argumenty. Nie pozwól na destructive operations bez potwierdzenia. Rate limits per Asistant i per Thread.

Streaming i wzorce produkcyjne Assistants API?

Streaming SSE: const stream = await openai.beta.threads.runs.stream(thread.id, {assistant_id: assistant.id}). for await (const event of stream) { if (event.event === 'thread.message.delta') { const delta = event.data.delta. process.stdout.write(delta.content[0].text.value) } }. Events: thread.created, thread.run.created, thread.run.queued, thread.run.in_progress, thread.message.created, thread.message.delta, thread.run.completed, thread.run.failed. AssistantStream helper (v4 SDK): const run = openai.beta.threads.runs.stream(threadId, {...}). run.on('textDelta', (text) => {...}). run.on('toolCallDelta', (toolCall) => {...}). await run.finalMessages(). Next.js Server-Sent Events: Response z ReadableStream. TextEncoder per chunk. useChat lub własny hook. Thread management: thread per user session. Przechowuj thread_id w DB. Usuwaj stare thready. Koszty: thread i messages nie kosztują. Płacisz za tokeny w Run (input + output). Code Interpreter: $0.03/session. File Search: $0.10/GB dziennie dla Vector Store. Assistant management: lista asistantów. Aktualizacja instructions. Wersjonowanie (delete + create). Limity: rate limit per API key. max_prompt_tokens i max_completion_tokens per Run. truncation_strategy — zarządzanie długimi threadami. Monitorowanie: run.usage.total_tokens. Log per run dla billing. Timeout handling — run status: 'expired' po 10 minutach.

Czytaj dalej