Together AI
Cost-effective open-source model inference — Llama, DeepSeek, Qwen, Gemma and more
Together AI is a high-performance inference platform for open-source models. It offers fast, scalable serving for Llama, DeepSeek, Qwen, Gemma, Mistral and many others through an OpenAI-compatible API.
Setup
1. Install packages
npm install @yourgpt/copilot-sdk @yourgpt/llm-sdk openaiTogether AI uses an OpenAI-compatible API, so the openai package is the only peer dependency needed.
2. Get API key
Sign up and get your API key at api.together.xyz/settings/api-keys.
3. Add environment variable
TOGETHER_API_KEY=your-key-here4. Create runtime API route
import { createRuntime } from '@yourgpt/llm-sdk';
import { createTogetherAI } from '@yourgpt/llm-sdk/togetherai';
const together = createTogetherAI({
apiKey: process.env.TOGETHER_API_KEY,
});
const runtime = createRuntime({
provider: together,
model: 'meta-llama/Llama-3.3-70B-Instruct-Turbo',
systemPrompt: 'You are a helpful assistant.',
});
export async function POST(request: Request) {
return runtime.handleRequest(request);
}5. Connect Copilot UI
'use client';
import { CopilotProvider } from '@yourgpt/copilot-sdk/react';
import { CopilotChat } from '@yourgpt/copilot-sdk/ui';
export default function Page() {
return (
<CopilotProvider runtimeUrl="/api/chat">
<CopilotChat />
</CopilotProvider>
);
}Modern Pattern (Direct)
For simpler use cases without the runtime, use togetherai() directly with generateText or streamText:
import { generateText } from '@yourgpt/llm-sdk';
import { togetherai } from '@yourgpt/llm-sdk/togetherai';
const result = await generateText({
model: togetherai('deepseek-ai/DeepSeek-V3'),
prompt: 'Explain quantum entanglement simply.',
});
console.log(result.text);import { streamText } from '@yourgpt/llm-sdk';
import { togetherai } from '@yourgpt/llm-sdk/togetherai';
const result = await streamText({
model: togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo'),
system: 'You are a helpful assistant.',
messages,
});
return result.toTextStreamResponse();Available Models
// DeepSeek
togetherai('deepseek-ai/DeepSeek-V3') // 128K ctx, tools
togetherai('deepseek-ai/DeepSeek-V3.1') // 128K ctx, tools
togetherai('deepseek-ai/DeepSeek-R1') // reasoning model
// Llama
togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo') // 131K ctx, fast
// Qwen
togetherai('Qwen/Qwen3.5-397B-A17B') // 262K ctx
togetherai('Qwen/Qwen3.5-9B')
// Gemma
togetherai('google/gemma-4-31B-it')
// Other
togetherai('openai/gpt-oss-120b')
togetherai('moonshotai/Kimi-K2.5') // 262K ctx
togetherai('MiniMaxAI/MiniMax-M2.5')Any model ID listed on together.ai/models works.
Configuration
import { createTogetherAI } from '@yourgpt/llm-sdk/togetherai';
// With explicit API key
const together = createTogetherAI({
apiKey: 'your-key',
});
// Custom base URL (e.g. self-hosted or proxy)
const together = createTogetherAI({
apiKey: 'your-key',
baseUrl: 'https://my-proxy.example.com/v1',
});Or with the modern pattern:
import { togetherai } from '@yourgpt/llm-sdk/togetherai';
const model = togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo', {
apiKey: 'your-key',
baseURL: 'https://my-proxy.example.com/v1',
});Fallback Chain
Automatically fail over to backup models when the primary is unavailable or rate-limited:
import { createRuntime } from '@yourgpt/llm-sdk';
import { createFallbackChain } from '@yourgpt/llm-sdk/fallback';
import { createTogetherAI } from '@yourgpt/llm-sdk/togetherai';
const together = createTogetherAI({
apiKey: process.env.TOGETHER_API_KEY,
});
const chain = createFallbackChain({
models: [
together.languageModel('meta-llama/Llama-3.3-70B-Instruct-Turbo'),
together.languageModel('deepseek-ai/DeepSeek-V3'),
together.languageModel('Qwen/Qwen3.5-9B'),
together.languageModel('google/gemma-4-31B-it'),
],
strategy: 'priority',
retries: 1,
retryDelay: 500,
retryBackoff: 'exponential',
onFallback: ({ attemptedModel, nextModel, error }) => {
console.warn(`[fallback] ${attemptedModel} → ${nextModel} | ${error.message}`);
},
});
const runtime = createRuntime({
adapter: chain,
systemPrompt: 'You are a helpful assistant.',
});
export async function POST(request: Request) {
return runtime.handleRequest(request);
}With strategy: 'priority', the first model handles all traffic until it fails.
Use strategy: 'round-robin' to distribute load evenly across models.
Tool Calling
Many Together AI models support tool calling:
import { generateText, tool } from '@yourgpt/llm-sdk';
import { togetherai } from '@yourgpt/llm-sdk/togetherai';
import { z } from 'zod';
const result = await generateText({
model: togetherai('meta-llama/Llama-3.3-70B-Instruct-Turbo'),
prompt: 'What is the weather in Miami?',
tools: {
getWeather: tool({
description: 'Get weather for a city',
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => ({ temperature: 82, condition: 'sunny' }),
}),
},
maxSteps: 5,
});deepseek-ai/DeepSeek-R1 is a reasoning model and does not support tool calling. Use DeepSeek-V3 or a Llama model for tool use.
Next Steps
- Fireworks - Another fast open-source model platform
- OpenRouter - Access 500+ models with one API key
- Fallback Chain - Automatic failover between providers
- generateText() - Full LLM SDK reference