Documentation Index Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
Use this file to discover all available pages before exploring further.
Basic Chat Completion
response = client.chat.completions.create(
model = "meta-llama/llama-3.1-70b-instruct" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Explain quantum computing in simple terms." }
],
max_tokens = 500 ,
temperature = 0.7
)
Streaming Responses
For real-time applications, use streaming to receive responses as they’re generated:
stream = client.chat.completions.create(
model = "meta-llama/llama-3.1-70b-instruct" ,
messages = [
{ "role" : "user" , "content" : "Write a short story about a robot." }
],
stream = True
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content is not None :
print (chunk.choices[ 0 ].delta.content, end = "" )
Include "usage": {"include": true} to get token counts and cost:
response = client.chat.completions.create(
model = "meta-llama/llama-3.1-70b-instruct" ,
messages = [{ "role" : "user" , "content" : "Hello" }],
extra_body = { "usage" : { "include" : True }}
)
Response includes:
{
"usage" : {
"prompt_tokens" : 10 ,
"completion_tokens" : 25 ,
"total_tokens" : 35 ,
"input_tokens" : 10 ,
"output_tokens" : 25 ,
"cost" : 0.000123
}
}
Advanced Parameters
Prime Inference supports all standard OpenAI API parameters:
response = client.chat.completions.create(
model = "meta-llama/llama-3.1-70b-instruct" ,
messages = [{ "role" : "user" , "content" : "Your prompt here" }],
# Generation parameters
max_tokens = 1000 ,
temperature = 0.8 ,
top_p = 0.9 ,
frequency_penalty = 0.1 ,
presence_penalty = 0.1 ,
# Advanced options
stream = False ,
stop = [ "END" , " \n\n " ],
logprobs = True ,
top_logprobs = 3
)
Next Steps
Team Accounts Using inference with team accounts
API Reference Complete API documentation