Spaces:
Runtime error
Runtime error
MiniMax Agent commited on
Commit ·
c126015
1
Parent(s): 9604400
Add OpenAI API compatible endpoints for OpenELM models
Browse files- README.md +218 -33
- app.py +475 -11
- examples/curl_examples.sh +139 -27
- examples/openai_sdk_example.py +148 -0
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title: OpenELM
|
| 3 |
emoji: 🤖
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
|
@@ -7,21 +7,23 @@ sdk: docker
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
# OpenELM Anthropic API Compatible Wrapper
|
| 11 |
|
| 12 |
-
A FastAPI-based service that provides
|
| 13 |
|
| 14 |
## Overview
|
| 15 |
|
| 16 |
-
This project creates a REST API that mimics the Anthropic Messages API
|
| 17 |
|
| 18 |
-
The OpenELM (Open Efficient Language Model) family from Apple uses a layer-wise scaling strategy to efficiently allocate parameters within each transformer layer, resulting in enhanced accuracy while maintaining computational efficiency. This wrapper makes these powerful models accessible through
|
| 19 |
|
| 20 |
## Features
|
| 21 |
|
| 22 |
-
The API provides comprehensive support for Anthropic-style
|
| 23 |
|
| 24 |
-
Additionally, the wrapper properly handles system prompts by prepending them to the conversation context, which is essential for defining assistant behavior. The API also provides flexible generation parameters, allowing control over temperature, top-p sampling, maximum tokens, and other generation settings
|
|
|
|
|
|
|
| 25 |
|
| 26 |
## Quick Start
|
| 27 |
|
|
@@ -29,8 +31,8 @@ Additionally, the wrapper properly handles system prompts by prepending them to
|
|
| 29 |
|
| 30 |
```bash
|
| 31 |
# Build and run with Docker
|
| 32 |
-
docker build -t openelm-
|
| 33 |
-
docker run -p 8000:8000 openelm-
|
| 34 |
```
|
| 35 |
|
| 36 |
### Local Development
|
|
@@ -43,7 +45,20 @@ pip install -r requirements.txt
|
|
| 43 |
python -m uvicorn app:app --host 0.0.0.0 --port 8000
|
| 44 |
```
|
| 45 |
|
| 46 |
-
### Test the API
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
```bash
|
| 49 |
# Basic message generation
|
|
@@ -58,17 +73,69 @@ curl -X POST http://localhost:8000/v1/messages \
|
|
| 58 |
|
| 59 |
## API Reference
|
| 60 |
|
| 61 |
-
### Endpoints
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
| Method | Endpoint | Description |
|
| 64 |
|--------|----------|-------------|
|
| 65 |
-
| GET | / |
|
| 66 |
-
| GET | /health | Health check |
|
| 67 |
-
| GET | /v1/models | List available models |
|
| 68 |
| POST | /v1/messages | Create message (non-streaming) |
|
| 69 |
| POST | /v1/messages/stream | Create message (streaming) |
|
| 70 |
|
| 71 |
-
### Request Format
|
| 72 |
|
| 73 |
```json
|
| 74 |
{
|
|
@@ -79,19 +146,20 @@ curl -X POST http://localhost:8000/v1/messages \
|
|
| 79 |
"system": "Optional system prompt",
|
| 80 |
"max_tokens": 1024,
|
| 81 |
"temperature": 0.7,
|
| 82 |
-
"top_p": 0.9,
|
| 83 |
"stream": false
|
| 84 |
}
|
| 85 |
```
|
| 86 |
|
| 87 |
-
### Response Format
|
| 88 |
|
| 89 |
```json
|
| 90 |
{
|
| 91 |
"id": "msg_abc123",
|
| 92 |
"type": "message",
|
| 93 |
"role": "assistant",
|
| 94 |
-
"content": [
|
|
|
|
|
|
|
| 95 |
"model": "openelm-450m-instruct",
|
| 96 |
"stop_reason": "end_turn",
|
| 97 |
"usage": {
|
|
@@ -101,25 +169,90 @@ curl -X POST http://localhost:8000/v1/messages \
|
|
| 101 |
}
|
| 102 |
```
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
## Using with Anthropic SDK
|
| 105 |
|
| 106 |
```python
|
| 107 |
-
|
| 108 |
|
| 109 |
# Point to your local API
|
| 110 |
-
client = Anthropic(
|
| 111 |
base_url="http://localhost:8000/v1",
|
| 112 |
api_key="dummy" # Any string works
|
| 113 |
)
|
| 114 |
|
| 115 |
# Use the same API you use with Claude!
|
| 116 |
-
|
| 117 |
model="openelm-450m-instruct",
|
| 118 |
messages=[{"role": "user", "content": "Hello!"}],
|
| 119 |
max_tokens=100
|
| 120 |
)
|
| 121 |
|
| 122 |
-
print(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
```
|
| 124 |
|
| 125 |
## Model Information
|
|
@@ -129,14 +262,16 @@ print(response.content[0].text)
|
|
| 129 |
- **Context Window**: 2048 tokens
|
| 130 |
- **Weight Format**: Safetensors (secure and efficient)
|
| 131 |
- **Quantization**: FP16 for optimal performance
|
|
|
|
| 132 |
|
| 133 |
## Architecture
|
| 134 |
|
| 135 |
-
- **Framework**: FastAPI with async support
|
| 136 |
-
- **ML Backend**: PyTorch + HuggingFace Transformers
|
| 137 |
-
- **Model Loading**: Lazy loading on startup with caching
|
| 138 |
-
- **Streaming**: Server-Sent Events (SSE)
|
| 139 |
-
- **
|
|
|
|
| 140 |
|
| 141 |
## Configuration
|
| 142 |
|
|
@@ -147,20 +282,68 @@ Environment variables can be used to customize the deployment:
|
|
| 147 |
| PORT | 8000 | API server port |
|
| 148 |
| HF_HOME | ~/.cache/huggingface | Model cache directory |
|
| 149 |
| TRANSFORMERS_CACHE | ~/.cache/transformers | Transformers cache |
|
|
|
|
| 150 |
|
| 151 |
## Examples
|
| 152 |
|
| 153 |
See the `examples/` directory for complete usage examples:
|
| 154 |
|
| 155 |
-
- `
|
| 156 |
-
- `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
## Troubleshooting
|
| 159 |
|
| 160 |
-
- **Model not loading**: Check internet connection for HuggingFace download
|
| 161 |
-
- **Out of memory**: Reduce max_tokens or
|
| 162 |
-
- **Slow responses**: First request downloads model (subsequent requests are faster)
|
| 163 |
-
- **Port conflicts**: Change PORT environment variable
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
## License
|
| 166 |
|
|
@@ -169,6 +352,8 @@ This project is provided for educational and research purposes. The OpenELM mode
|
|
| 169 |
## Resources
|
| 170 |
|
| 171 |
- [OpenELM Model Card](https://huggingface.co/apple/OpenELM-450M-Instruct)
|
|
|
|
| 172 |
- [Anthropic API Documentation](https://docs.anthropic.com)
|
| 173 |
- [FastAPI Documentation](https://fastapi.tiangolo.com)
|
| 174 |
- [HuggingFace Transformers](https://huggingface.co/docs/transformers)
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: OpenELM OpenAI API
|
| 3 |
emoji: 🤖
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# OpenELM OpenAI & Anthropic API Compatible Wrapper
|
| 11 |
|
| 12 |
+
A FastAPI-based service that provides both OpenAI and Anthropic-compatible APIs for Apple's OpenELM models, allowing you to use the OpenAI SDK or Anthropic SDK with OpenELM for text generation tasks.
|
| 13 |
|
| 14 |
## Overview
|
| 15 |
|
| 16 |
+
This project creates a REST API that mimics both the OpenAI Chat Completions API and Anthropic Messages API formats, enabling developers to use OpenELM models with existing SDK code with minimal modifications. The API supports both streaming and non-streaming responses, multi-turn conversations, system prompts, and various generation parameters. This dual compatibility means you can use the same underlying OpenELM model whether your codebase is built for OpenAI or Anthropic APIs.
|
| 17 |
|
| 18 |
+
The OpenELM (Open Efficient Language Model) family from Apple uses a layer-wise scaling strategy to efficiently allocate parameters within each transformer layer, resulting in enhanced accuracy while maintaining computational efficiency. This wrapper makes these powerful models accessible through familiar API interfaces, bridging the gap between Apple's innovative architecture and the widely-adopted API standards used in the industry.
|
| 19 |
|
| 20 |
## Features
|
| 21 |
|
| 22 |
+
The API provides comprehensive support for both OpenAI and Anthropic-style generation with several key capabilities. First, it offers full dual API compatibility, including endpoints that match both the OpenAI Chat Completions API structure and the Anthropic Messages API, making it easy to integrate with existing codebases regardless of which provider you currently use. Second, it supports streaming responses through Server-Sent Events (SSE), enabling real-time output display as tokens are generated in both API formats.
|
| 23 |
|
| 24 |
+
Third, the API handles multi-turn conversations by maintaining conversation history and formatting prompts appropriately for OpenELM models, regardless of which API format you choose. Additionally, the wrapper properly handles system prompts by prepending them to the conversation context, which is essential for defining assistant behavior. The API also provides flexible generation parameters, allowing control over temperature, top-p sampling, maximum tokens, and other generation settings that work across both API styles.
|
| 25 |
+
|
| 26 |
+
Finally, comprehensive token usage statistics are included in responses, matching both the OpenAI and Anthropic response formats exactly, ensuring compatibility with tools and dashboards that expect standard usage reporting.
|
| 27 |
|
| 28 |
## Quick Start
|
| 29 |
|
|
|
|
| 31 |
|
| 32 |
```bash
|
| 33 |
# Build and run with Docker
|
| 34 |
+
docker build -t openelm-api .
|
| 35 |
+
docker run -p 8000:8000 openelm-api
|
| 36 |
```
|
| 37 |
|
| 38 |
### Local Development
|
|
|
|
| 45 |
python -m uvicorn app:app --host 0.0.0.0 --port 8000
|
| 46 |
```
|
| 47 |
|
| 48 |
+
### Test the API (OpenAI Format)
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
# Basic chat completion
|
| 52 |
+
curl -X POST http://localhost:8000/v1/chat/completions \
|
| 53 |
+
-H "Content-Type: application/json" \
|
| 54 |
+
-d '{
|
| 55 |
+
"model": "openelm-450m-instruct",
|
| 56 |
+
"messages": [{"role": "user", "content": "Say hello!"}],
|
| 57 |
+
"max_tokens": 100
|
| 58 |
+
}'
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
### Test the API (Anthropic Format)
|
| 62 |
|
| 63 |
```bash
|
| 64 |
# Basic message generation
|
|
|
|
| 73 |
|
| 74 |
## API Reference
|
| 75 |
|
| 76 |
+
### OpenAI API Endpoints
|
| 77 |
+
|
| 78 |
+
The OpenAI-compatible endpoints follow the standard Chat Completions API format used by OpenAI's GPT models. These endpoints accept message arrays with roles and content, and return completion responses in the standard OpenAI format.
|
| 79 |
+
|
| 80 |
+
| Method | Endpoint | Description |
|
| 81 |
+
|--------|----------|-------------|
|
| 82 |
+
| GET | /v1/models | List available models (OpenAI format) |
|
| 83 |
+
| POST | /v1/chat/completions | Create chat completion (non-streaming) |
|
| 84 |
+
| POST | /v1/chat/completions (with stream=true) | Create chat completion (streaming) |
|
| 85 |
+
|
| 86 |
+
#### OpenAI Request Format
|
| 87 |
+
|
| 88 |
+
```json
|
| 89 |
+
{
|
| 90 |
+
"model": "openelm-450m-instruct",
|
| 91 |
+
"messages": [
|
| 92 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
| 93 |
+
{"role": "user", "content": "Your prompt here"}
|
| 94 |
+
],
|
| 95 |
+
"temperature": 0.7,
|
| 96 |
+
"top_p": 0.9,
|
| 97 |
+
"max_tokens": 1024,
|
| 98 |
+
"stream": false
|
| 99 |
+
}
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
#### OpenAI Response Format
|
| 103 |
+
|
| 104 |
+
```json
|
| 105 |
+
{
|
| 106 |
+
"id": "chatcmpl-abc123",
|
| 107 |
+
"object": "chat.completion",
|
| 108 |
+
"created": 1677858242,
|
| 109 |
+
"model": "openelm-450m-instruct",
|
| 110 |
+
"choices": [
|
| 111 |
+
{
|
| 112 |
+
"index": 0,
|
| 113 |
+
"message": {
|
| 114 |
+
"role": "assistant",
|
| 115 |
+
"content": "Generated response"
|
| 116 |
+
},
|
| 117 |
+
"finish_reason": "stop"
|
| 118 |
+
}
|
| 119 |
+
],
|
| 120 |
+
"usage": {
|
| 121 |
+
"prompt_tokens": 13,
|
| 122 |
+
"completion_tokens": 25,
|
| 123 |
+
"total_tokens": 38
|
| 124 |
+
}
|
| 125 |
+
}
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
### Anthropic API Endpoints
|
| 129 |
+
|
| 130 |
+
The Anthropic-compatible endpoints follow the Messages API format used by Claude. These endpoints accept message arrays with roles and content, and support both streaming and non-streaming responses.
|
| 131 |
|
| 132 |
| Method | Endpoint | Description |
|
| 133 |
|--------|----------|-------------|
|
| 134 |
+
| GET | /v1/models | List available models (Anthropic format) |
|
|
|
|
|
|
|
| 135 |
| POST | /v1/messages | Create message (non-streaming) |
|
| 136 |
| POST | /v1/messages/stream | Create message (streaming) |
|
| 137 |
|
| 138 |
+
#### Anthropic Request Format
|
| 139 |
|
| 140 |
```json
|
| 141 |
{
|
|
|
|
| 146 |
"system": "Optional system prompt",
|
| 147 |
"max_tokens": 1024,
|
| 148 |
"temperature": 0.7,
|
|
|
|
| 149 |
"stream": false
|
| 150 |
}
|
| 151 |
```
|
| 152 |
|
| 153 |
+
#### Anthropic Response Format
|
| 154 |
|
| 155 |
```json
|
| 156 |
{
|
| 157 |
"id": "msg_abc123",
|
| 158 |
"type": "message",
|
| 159 |
"role": "assistant",
|
| 160 |
+
"content": [
|
| 161 |
+
{"type": "text", "text": "Generated response"}
|
| 162 |
+
],
|
| 163 |
"model": "openelm-450m-instruct",
|
| 164 |
"stop_reason": "end_turn",
|
| 165 |
"usage": {
|
|
|
|
| 169 |
}
|
| 170 |
```
|
| 171 |
|
| 172 |
+
## Using with OpenAI SDK
|
| 173 |
+
|
| 174 |
+
```python
|
| 175 |
+
from openai import OpenAI
|
| 176 |
+
|
| 177 |
+
# Point to your local API
|
| 178 |
+
client = OpenAI(
|
| 179 |
+
base_url="http://localhost:8000/v1",
|
| 180 |
+
api_key="dummy" # Any string works
|
| 181 |
+
)
|
| 182 |
+
|
| 183 |
+
# Use the same API you use with GPT!
|
| 184 |
+
response = client.chat.completions.create(
|
| 185 |
+
model="openelm-450m-instruct",
|
| 186 |
+
messages=[
|
| 187 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
| 188 |
+
{"role": "user", "content": "Hello!"}
|
| 189 |
+
],
|
| 190 |
+
max_tokens=100
|
| 191 |
+
)
|
| 192 |
+
|
| 193 |
+
print(response.choices[0].message.content)
|
| 194 |
+
```
|
| 195 |
+
|
| 196 |
+
### Streaming with OpenAI SDK
|
| 197 |
+
|
| 198 |
+
```python
|
| 199 |
+
from openai import OpenAI
|
| 200 |
+
|
| 201 |
+
client = OpenAI(
|
| 202 |
+
base_url="http://localhost:8000/v1",
|
| 203 |
+
api_key="dummy"
|
| 204 |
+
)
|
| 205 |
+
|
| 206 |
+
stream = client.chat.completions.create(
|
| 207 |
+
model="openelm-450m-instruct",
|
| 208 |
+
messages=[{"role": "user", "content": "Tell me a story."}],
|
| 209 |
+
max_tokens=100,
|
| 210 |
+
stream=True
|
| 211 |
+
)
|
| 212 |
+
|
| 213 |
+
for chunk in stream:
|
| 214 |
+
if chunk.choices[0].delta.content:
|
| 215 |
+
print(chunk.choices[0].delta.content, end="", flush=True)
|
| 216 |
+
```
|
| 217 |
+
|
| 218 |
## Using with Anthropic SDK
|
| 219 |
|
| 220 |
```python
|
| 221 |
+
import anthropic
|
| 222 |
|
| 223 |
# Point to your local API
|
| 224 |
+
client = anthropic.Anthropic(
|
| 225 |
base_url="http://localhost:8000/v1",
|
| 226 |
api_key="dummy" # Any string works
|
| 227 |
)
|
| 228 |
|
| 229 |
# Use the same API you use with Claude!
|
| 230 |
+
message = client.messages.create(
|
| 231 |
model="openelm-450m-instruct",
|
| 232 |
messages=[{"role": "user", "content": "Hello!"}],
|
| 233 |
max_tokens=100
|
| 234 |
)
|
| 235 |
|
| 236 |
+
print(message.content[0].text)
|
| 237 |
+
```
|
| 238 |
+
|
| 239 |
+
### Streaming with Anthropic SDK
|
| 240 |
+
|
| 241 |
+
```python
|
| 242 |
+
import anthropic
|
| 243 |
+
|
| 244 |
+
client = anthropic.Anthropic(
|
| 245 |
+
base_url="http://localhost:8000/v1",
|
| 246 |
+
api_key="dummy"
|
| 247 |
+
)
|
| 248 |
+
|
| 249 |
+
with client.messages.stream(
|
| 250 |
+
model="openelm-450m-instruct",
|
| 251 |
+
messages=[{"role": "user", "content": "Tell me a story."}],
|
| 252 |
+
max_tokens=100
|
| 253 |
+
) as stream:
|
| 254 |
+
for text in stream.text_stream:
|
| 255 |
+
print(text, end="", flush=True)
|
| 256 |
```
|
| 257 |
|
| 258 |
## Model Information
|
|
|
|
| 262 |
- **Context Window**: 2048 tokens
|
| 263 |
- **Weight Format**: Safetensors (secure and efficient)
|
| 264 |
- **Quantization**: FP16 for optimal performance
|
| 265 |
+
- **Layer-wise Scaling**: Efficient parameter allocation within transformer layers
|
| 266 |
|
| 267 |
## Architecture
|
| 268 |
|
| 269 |
+
- **Framework**: FastAPI with async support for high concurrency
|
| 270 |
+
- **ML Backend**: PyTorch + HuggingFace Transformers for model inference
|
| 271 |
+
- **Model Loading**: Lazy loading on startup with caching for fast restarts
|
| 272 |
+
- **Streaming**: Server-Sent Events (SSE) for real-time token delivery
|
| 273 |
+
- **Dual Compatibility**: Full OpenAI and Anthropic API format support
|
| 274 |
+
- **Prompt Engineering**: Custom formatting for OpenELM's text completion interface
|
| 275 |
|
| 276 |
## Configuration
|
| 277 |
|
|
|
|
| 282 |
| PORT | 8000 | API server port |
|
| 283 |
| HF_HOME | ~/.cache/huggingface | Model cache directory |
|
| 284 |
| TRANSFORMERS_CACHE | ~/.cache/transformers | Transformers cache |
|
| 285 |
+
| CUDA_VISIBLE_DEVICES | all | GPU device selection |
|
| 286 |
|
| 287 |
## Examples
|
| 288 |
|
| 289 |
See the `examples/` directory for complete usage examples:
|
| 290 |
|
| 291 |
+
- `openai_sdk_example.py` - OpenAI SDK usage with streaming support
|
| 292 |
+
- `anthropic_sdk_example.py` - Anthropic SDK usage with streaming support
|
| 293 |
+
- `curl_examples.sh` - Command-line examples for both APIs
|
| 294 |
+
|
| 295 |
+
## Streaming Response Format
|
| 296 |
+
|
| 297 |
+
### OpenAI Streaming (SSE)
|
| 298 |
+
|
| 299 |
+
```
|
| 300 |
+
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1677858242,"model":"openelm-450m-instruct","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
|
| 301 |
+
|
| 302 |
+
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1677858242,"model":"openelm-450m-instruct","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
|
| 303 |
+
|
| 304 |
+
data: [DONE]
|
| 305 |
+
```
|
| 306 |
+
|
| 307 |
+
### Anthropic Streaming (SSE)
|
| 308 |
+
|
| 309 |
+
```
|
| 310 |
+
event: message_start
|
| 311 |
+
data: {"id":"msg_abc123","type":"message","role":"assistant","content":[],"model":"openelm-450m-instruct"}
|
| 312 |
+
|
| 313 |
+
event: content_block_start
|
| 314 |
+
data: {"type":"text","text":""}
|
| 315 |
+
|
| 316 |
+
event: content_block_delta
|
| 317 |
+
data: {"type":"text_delta","text":"Hello"}
|
| 318 |
+
|
| 319 |
+
event: content_block_stop
|
| 320 |
+
data: {}
|
| 321 |
+
|
| 322 |
+
event: message_delta
|
| 323 |
+
data: {"delta":{"stop_reason":"end_turn"},"usage":{"input_tokens":10,"output_tokens":5}}
|
| 324 |
+
|
| 325 |
+
event: message_stop
|
| 326 |
+
data: {}
|
| 327 |
+
```
|
| 328 |
|
| 329 |
## Troubleshooting
|
| 330 |
|
| 331 |
+
- **Model not loading**: Check internet connection for HuggingFace download, ensure sufficient disk space for model cache
|
| 332 |
+
- **Out of memory**: Reduce max_tokens, use smaller context windows, or switch to CPU inference by removing GPU-specific settings
|
| 333 |
+
- **Slow responses**: First request downloads model from HuggingFace (subsequent requests use cached model and are much faster)
|
| 334 |
+
- **Port conflicts**: Change PORT environment variable to use a different port
|
| 335 |
+
- **Streaming not working**: Ensure you're using the correct endpoint (with stream=true for OpenAI) and proper SSE parsing
|
| 336 |
+
- **Format errors**: Verify your request matches the expected format for the API you're using (OpenAI vs Anthropic have different schemas)
|
| 337 |
+
|
| 338 |
+
## Migration Guide
|
| 339 |
+
|
| 340 |
+
### Migrating from OpenAI to OpenELM
|
| 341 |
+
|
| 342 |
+
If you're currently using OpenAI's API and want to switch to OpenELM, the migration is straightforward. Simply change the base_url to point to your local OpenELM API server and update the model name. All other parameters and response handling remain the same, making it easy to toggle between providers for testing or A/B comparisons.
|
| 343 |
+
|
| 344 |
+
### Migrating from Anthropic to OpenELM
|
| 345 |
+
|
| 346 |
+
Similarly, if you're using Anthropic's API, you can migrate by updating the base_url and model name. The message format is similar, though you may need to adjust how you handle system prompts since OpenAI uses inline system messages while Anthropic uses a separate system parameter.
|
| 347 |
|
| 348 |
## License
|
| 349 |
|
|
|
|
| 352 |
## Resources
|
| 353 |
|
| 354 |
- [OpenELM Model Card](https://huggingface.co/apple/OpenELM-450M-Instruct)
|
| 355 |
+
- [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)
|
| 356 |
- [Anthropic API Documentation](https://docs.anthropic.com)
|
| 357 |
- [FastAPI Documentation](https://fastapi.tiangolo.com)
|
| 358 |
- [HuggingFace Transformers](https://huggingface.co/docs/transformers)
|
| 359 |
+
- [Apple OpenELM Research Paper](https://machinelearning.apple.com/research/openelm)
|
app.py
CHANGED
|
@@ -1,8 +1,13 @@
|
|
| 1 |
"""
|
| 2 |
-
OpenELM Anthropic API Compatible Wrapper
|
| 3 |
|
| 4 |
-
This FastAPI application provides
|
| 5 |
-
allowing users to call OpenELM models using
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
"""
|
| 7 |
|
| 8 |
import asyncio
|
|
@@ -80,9 +85,9 @@ async def lifespan(app: FastAPI) -> AsyncIterator:
|
|
| 80 |
|
| 81 |
# Create FastAPI app
|
| 82 |
app = FastAPI(
|
| 83 |
-
title="OpenELM
|
| 84 |
-
description="Anthropic API compatible wrapper for OpenELM models",
|
| 85 |
-
version="1.
|
| 86 |
lifespan=lifespan
|
| 87 |
)
|
| 88 |
|
|
@@ -115,6 +120,7 @@ class Usage(BaseModel):
|
|
| 115 |
"""Token usage statistics."""
|
| 116 |
input_tokens: int = 0
|
| 117 |
output_tokens: int = 0
|
|
|
|
| 118 |
|
| 119 |
|
| 120 |
class ContentBlock(BaseModel):
|
|
@@ -162,6 +168,88 @@ class ModelListResponse(BaseModel):
|
|
| 162 |
data: List[ModelInfo]
|
| 163 |
|
| 164 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
# ==================== Helper Functions ====================
|
| 166 |
|
| 167 |
def format_prompt_for_openelm(
|
|
@@ -283,12 +371,14 @@ def map_anthropic_params_to_transformers(
|
|
| 283 |
async def root():
|
| 284 |
"""Root endpoint with API information."""
|
| 285 |
return {
|
| 286 |
-
"name": "OpenELM
|
| 287 |
-
"version": "1.
|
| 288 |
-
"description": "Anthropic API compatible wrapper for OpenELM models",
|
| 289 |
"endpoints": {
|
| 290 |
-
"
|
| 291 |
-
"
|
|
|
|
|
|
|
| 292 |
"health": "GET /health"
|
| 293 |
}
|
| 294 |
}
|
|
@@ -643,6 +733,380 @@ class MessageResource:
|
|
| 643 |
return response.json()
|
| 644 |
|
| 645 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 646 |
# ==================== Main Entry Point ====================
|
| 647 |
|
| 648 |
if __name__ == "__main__":
|
|
|
|
| 1 |
"""
|
| 2 |
+
OpenELM OpenAI & Anthropic API Compatible Wrapper
|
| 3 |
|
| 4 |
+
This FastAPI application provides both OpenAI and Anthropic-compatible APIs for the OpenELM model,
|
| 5 |
+
allowing users to call OpenELM models using either SDK with minimal code changes.
|
| 6 |
+
|
| 7 |
+
Supported APIs:
|
| 8 |
+
- OpenAI Chat Completions API (v1/chat/completions)
|
| 9 |
+
- Anthropic Messages API (v1/messages)
|
| 10 |
+
- Both support streaming and non-streaming responses
|
| 11 |
"""
|
| 12 |
|
| 13 |
import asyncio
|
|
|
|
| 85 |
|
| 86 |
# Create FastAPI app
|
| 87 |
app = FastAPI(
|
| 88 |
+
title="OpenELM OpenAI API",
|
| 89 |
+
description="OpenAI and Anthropic API compatible wrapper for OpenELM models",
|
| 90 |
+
version="1.1.0",
|
| 91 |
lifespan=lifespan
|
| 92 |
)
|
| 93 |
|
|
|
|
| 120 |
"""Token usage statistics."""
|
| 121 |
input_tokens: int = 0
|
| 122 |
output_tokens: int = 0
|
| 123 |
+
total_tokens: int = 0
|
| 124 |
|
| 125 |
|
| 126 |
class ContentBlock(BaseModel):
|
|
|
|
| 168 |
data: List[ModelInfo]
|
| 169 |
|
| 170 |
|
| 171 |
+
# ==================== OpenAI API Models ====================
|
| 172 |
+
|
| 173 |
+
class ChatMessage(BaseModel):
|
| 174 |
+
"""A chat message (OpenAI format)."""
|
| 175 |
+
role: str
|
| 176 |
+
content: str
|
| 177 |
+
name: Optional[str] = None
|
| 178 |
+
|
| 179 |
+
|
| 180 |
+
class ChatCompletionRequest(BaseModel):
|
| 181 |
+
"""Chat completion request (OpenAI API compatible)."""
|
| 182 |
+
model: str = "openelm-450m-instruct"
|
| 183 |
+
messages: List[ChatMessage]
|
| 184 |
+
temperature: Optional[float] = Field(default=None, ge=0.0, le=2.0)
|
| 185 |
+
top_p: Optional[float] = Field(default=None, ge=0.0, le=1.0)
|
| 186 |
+
n: Optional[int] = Field(default=1, ge=1)
|
| 187 |
+
max_tokens: Optional[int] = Field(default=None, ge=1, le=4096)
|
| 188 |
+
stream: Optional[bool] = False
|
| 189 |
+
presence_penalty: Optional[float] = Field(default=None, ge=-2.0, le=2.0)
|
| 190 |
+
frequency_penalty: Optional[float] = Field(default=None, ge=-2.0, le=2.0)
|
| 191 |
+
logit_bias: Optional[Dict[str, float]] = None
|
| 192 |
+
user: Optional[str] = None
|
| 193 |
+
|
| 194 |
+
|
| 195 |
+
class ChatCompletionChoice(BaseModel):
|
| 196 |
+
"""Choice in a chat completion response."""
|
| 197 |
+
index: int
|
| 198 |
+
message: ChatMessage
|
| 199 |
+
finish_reason: Optional[str] = None
|
| 200 |
+
logprobs: Optional[Any] = None
|
| 201 |
+
|
| 202 |
+
|
| 203 |
+
class ChatCompletionUsage(BaseModel):
|
| 204 |
+
"""Token usage in chat completion."""
|
| 205 |
+
prompt_tokens: int
|
| 206 |
+
completion_tokens: int
|
| 207 |
+
total_tokens: int
|
| 208 |
+
|
| 209 |
+
|
| 210 |
+
class ChatCompletionResponse(BaseModel):
|
| 211 |
+
"""Chat completion response (OpenAI API compatible)."""
|
| 212 |
+
id: str
|
| 213 |
+
object: str = "chat.completion"
|
| 214 |
+
created: int
|
| 215 |
+
model: str
|
| 216 |
+
choices: List[ChatCompletionChoice]
|
| 217 |
+
usage: ChatCompletionUsage
|
| 218 |
+
system_fingerprint: Optional[str] = None
|
| 219 |
+
|
| 220 |
+
|
| 221 |
+
class ChatCompletionChunkChoice(BaseModel):
|
| 222 |
+
"""Choice in a streaming chunk."""
|
| 223 |
+
index: int
|
| 224 |
+
delta: Dict[str, Any]
|
| 225 |
+
finish_reason: Optional[str] = None
|
| 226 |
+
logprobs: Optional[Any] = None
|
| 227 |
+
|
| 228 |
+
|
| 229 |
+
class ChatCompletionChunk(BaseModel):
|
| 230 |
+
"""Streaming chunk (OpenAI API compatible)."""
|
| 231 |
+
id: str
|
| 232 |
+
object: str = "chat.completion.chunk"
|
| 233 |
+
created: int
|
| 234 |
+
model: str
|
| 235 |
+
choices: List[ChatCompletionChunkChoice]
|
| 236 |
+
|
| 237 |
+
|
| 238 |
+
class OpenAIModelInfo(BaseModel):
|
| 239 |
+
"""Model information (OpenAI format)."""
|
| 240 |
+
id: str
|
| 241 |
+
object: str = "model"
|
| 242 |
+
created: int = 0
|
| 243 |
+
owned_by: str = "openelm"
|
| 244 |
+
permission: List[Any] = []
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
class OpenAIModelListResponse(BaseModel):
|
| 248 |
+
"""Model list response (OpenAI format)."""
|
| 249 |
+
object: str = "list"
|
| 250 |
+
data: List[OpenAIModelInfo]
|
| 251 |
+
|
| 252 |
+
|
| 253 |
# ==================== Helper Functions ====================
|
| 254 |
|
| 255 |
def format_prompt_for_openelm(
|
|
|
|
| 371 |
async def root():
|
| 372 |
"""Root endpoint with API information."""
|
| 373 |
return {
|
| 374 |
+
"name": "OpenELM OpenAI API",
|
| 375 |
+
"version": "1.1.0",
|
| 376 |
+
"description": "OpenAI and Anthropic API compatible wrapper for OpenELM models",
|
| 377 |
"endpoints": {
|
| 378 |
+
"openai_chat": "POST /v1/chat/completions",
|
| 379 |
+
"openai_models": "GET /v1/models",
|
| 380 |
+
"anthropic_messages": "POST /v1/messages",
|
| 381 |
+
"anthropic_models": "GET /v1/models",
|
| 382 |
"health": "GET /health"
|
| 383 |
}
|
| 384 |
}
|
|
|
|
| 733 |
return response.json()
|
| 734 |
|
| 735 |
|
| 736 |
+
# ==================== OpenAI API Endpoints ====================
|
| 737 |
+
|
| 738 |
+
@app.get("/v1/models", response_model=OpenAIModelListResponse, tags=["OpenAI"])
|
| 739 |
+
async def list_openai_models():
|
| 740 |
+
"""List available models (OpenAI API format)."""
|
| 741 |
+
return OpenAIModelListResponse(
|
| 742 |
+
data=[
|
| 743 |
+
OpenAIModelInfo(
|
| 744 |
+
id="openelm-450m-instruct",
|
| 745 |
+
owned_by="apple",
|
| 746 |
+
created=int(uuid.uuid1().time)
|
| 747 |
+
)
|
| 748 |
+
]
|
| 749 |
+
)
|
| 750 |
+
|
| 751 |
+
|
| 752 |
+
@app.post("/v1/chat/completions", tags=["OpenAI"])
|
| 753 |
+
async def create_chat_completion(
|
| 754 |
+
request: ChatCompletionRequest,
|
| 755 |
+
raw_request: Request = None
|
| 756 |
+
):
|
| 757 |
+
"""
|
| 758 |
+
Create chat completion (OpenAI API compatible).
|
| 759 |
+
|
| 760 |
+
This endpoint accepts OpenAI-style chat completion requests and returns
|
| 761 |
+
responses in the same format, allowing existing code to work with OpenELM.
|
| 762 |
+
"""
|
| 763 |
+
# Check if model is loaded
|
| 764 |
+
if model is None or tokenizer is None:
|
| 765 |
+
raise HTTPException(
|
| 766 |
+
status_code=503,
|
| 767 |
+
detail="Model not loaded. Please wait for model to initialize."
|
| 768 |
+
)
|
| 769 |
+
|
| 770 |
+
# Handle streaming
|
| 771 |
+
if request.stream:
|
| 772 |
+
return await create_chat_completion_stream(request)
|
| 773 |
+
|
| 774 |
+
try:
|
| 775 |
+
# Extract system message if present
|
| 776 |
+
system_message = None
|
| 777 |
+
formatted_messages = []
|
| 778 |
+
|
| 779 |
+
for msg in request.messages:
|
| 780 |
+
if msg.role == "system" and system_message is None:
|
| 781 |
+
system_message = msg.content
|
| 782 |
+
else:
|
| 783 |
+
formatted_messages.append(Message(
|
| 784 |
+
role=msg.role,
|
| 785 |
+
content=msg.content
|
| 786 |
+
))
|
| 787 |
+
|
| 788 |
+
# Format prompt for OpenELM
|
| 789 |
+
prompt = format_prompt_for_openelm(formatted_messages, system_message)
|
| 790 |
+
|
| 791 |
+
# Calculate max_tokens
|
| 792 |
+
max_tokens = request.max_tokens or 1024
|
| 793 |
+
max_context_tokens = 2048 - max_tokens
|
| 794 |
+
prompt = truncate_prompt(prompt, max_context_tokens, system_message)
|
| 795 |
+
|
| 796 |
+
# Tokenize input
|
| 797 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
| 798 |
+
input_tokens = len(inputs.input_ids[0])
|
| 799 |
+
|
| 800 |
+
# Move to same device as model
|
| 801 |
+
if hasattr(model, 'device'):
|
| 802 |
+
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
| 803 |
+
|
| 804 |
+
# Map parameters
|
| 805 |
+
gen_params = map_anthropic_params_to_transformers(
|
| 806 |
+
request.temperature,
|
| 807 |
+
request.top_p,
|
| 808 |
+
None,
|
| 809 |
+
max_tokens
|
| 810 |
+
)
|
| 811 |
+
|
| 812 |
+
# Generate
|
| 813 |
+
with torch.no_grad():
|
| 814 |
+
outputs = model.generate(
|
| 815 |
+
**inputs,
|
| 816 |
+
**gen_params,
|
| 817 |
+
pad_token_id=tokenizer.eos_token_id,
|
| 818 |
+
eos_token_id=tokenizer.eos_token_id,
|
| 819 |
+
)
|
| 820 |
+
|
| 821 |
+
# Decode output
|
| 822 |
+
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 823 |
+
|
| 824 |
+
# Extract the assistant's response
|
| 825 |
+
response_text = extract_assistant_response(generated_text)
|
| 826 |
+
output_tokens = count_tokens(response_text)
|
| 827 |
+
|
| 828 |
+
# Build response matching OpenAI format
|
| 829 |
+
response_id = f"chatcmpl-{uuid.uuid4().hex[:12]}"
|
| 830 |
+
timestamp = int(uuid.uuid1().time)
|
| 831 |
+
|
| 832 |
+
return ChatCompletionResponse(
|
| 833 |
+
id=response_id,
|
| 834 |
+
created=timestamp,
|
| 835 |
+
model="openelm-450m-instruct",
|
| 836 |
+
choices=[
|
| 837 |
+
ChatCompletionChoice(
|
| 838 |
+
index=0,
|
| 839 |
+
message=ChatMessage(role="assistant", content=response_text),
|
| 840 |
+
finish_reason="stop"
|
| 841 |
+
)
|
| 842 |
+
],
|
| 843 |
+
usage=ChatCompletionUsage(
|
| 844 |
+
prompt_tokens=input_tokens,
|
| 845 |
+
completion_tokens=output_tokens,
|
| 846 |
+
total_tokens=input_tokens + output_tokens
|
| 847 |
+
)
|
| 848 |
+
)
|
| 849 |
+
|
| 850 |
+
except Exception as e:
|
| 851 |
+
raise HTTPException(
|
| 852 |
+
status_code=500,
|
| 853 |
+
detail=f"Generation failed: {str(e)}"
|
| 854 |
+
)
|
| 855 |
+
|
| 856 |
+
|
| 857 |
+
async def create_chat_completion_stream(request: ChatCompletionRequest):
|
| 858 |
+
"""Create streaming chat completion (OpenAI API compatible)."""
|
| 859 |
+
|
| 860 |
+
async def generate_stream():
|
| 861 |
+
"""Generate streaming response in OpenAI format."""
|
| 862 |
+
try:
|
| 863 |
+
# Extract system message if present
|
| 864 |
+
system_message = None
|
| 865 |
+
formatted_messages = []
|
| 866 |
+
|
| 867 |
+
for msg in request.messages:
|
| 868 |
+
if msg.role == "system" and system_message is None:
|
| 869 |
+
system_message = msg.content
|
| 870 |
+
else:
|
| 871 |
+
formatted_messages.append(Message(
|
| 872 |
+
role=msg.role,
|
| 873 |
+
content=msg.content
|
| 874 |
+
))
|
| 875 |
+
|
| 876 |
+
# Format prompt for OpenELM
|
| 877 |
+
prompt = format_prompt_for_openelm(formatted_messages, system_message)
|
| 878 |
+
|
| 879 |
+
# Calculate max_tokens
|
| 880 |
+
max_tokens = request.max_tokens or 1024
|
| 881 |
+
max_context_tokens = 2048 - max_tokens
|
| 882 |
+
prompt = truncate_prompt(prompt, max_context_tokens, system_message)
|
| 883 |
+
|
| 884 |
+
# Tokenize
|
| 885 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
| 886 |
+
input_tokens = len(inputs.input_ids[0])
|
| 887 |
+
|
| 888 |
+
# Move to same device as model
|
| 889 |
+
if hasattr(model, 'device'):
|
| 890 |
+
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
| 891 |
+
|
| 892 |
+
# Map parameters
|
| 893 |
+
gen_params = map_anthropic_params_to_transformers(
|
| 894 |
+
request.temperature,
|
| 895 |
+
request.top_p,
|
| 896 |
+
None,
|
| 897 |
+
max_tokens
|
| 898 |
+
)
|
| 899 |
+
|
| 900 |
+
# Set up streaming
|
| 901 |
+
gen_params["stopping_criteria"] = []
|
| 902 |
+
|
| 903 |
+
# Use TextIteratorStreamer for streaming
|
| 904 |
+
streamer = TextIteratorStreamer(
|
| 905 |
+
tokenizer,
|
| 906 |
+
skip_prompt=True,
|
| 907 |
+
skip_special_tokens=True
|
| 908 |
+
)
|
| 909 |
+
|
| 910 |
+
gen_params["streamer"] = streamer
|
| 911 |
+
|
| 912 |
+
# Run generation in a separate thread
|
| 913 |
+
def generate():
|
| 914 |
+
with torch.no_grad():
|
| 915 |
+
model.generate(**inputs, **gen_params)
|
| 916 |
+
|
| 917 |
+
thread = Thread(target=generate)
|
| 918 |
+
thread.start()
|
| 919 |
+
|
| 920 |
+
# Send streaming chunks in OpenAI format
|
| 921 |
+
chunk_id = f"chatcmpl-{uuid.uuid4().hex[:12]}"
|
| 922 |
+
timestamp = int(uuid.uuid1().time)
|
| 923 |
+
|
| 924 |
+
# Send role first
|
| 925 |
+
yield f"data: {{\"id\":\"{chunk_id}\",\"object\":\"chat.completion.chunk\",\"created\":{timestamp},\"model\":\"openelm-450m-instruct\",\"choices\":[{{\"index\":0,\"delta\":{{\"role\":\"assistant\"}},\"finish_reason\":null}}]}}\n\n"
|
| 926 |
+
|
| 927 |
+
# Stream the generated text
|
| 928 |
+
full_text = ""
|
| 929 |
+
for text in streamer:
|
| 930 |
+
full_text += text
|
| 931 |
+
chunk_data = {
|
| 932 |
+
"id": chunk_id,
|
| 933 |
+
"object": "chat.completion.chunk",
|
| 934 |
+
"created": timestamp,
|
| 935 |
+
"model": "openelm-450m-instruct",
|
| 936 |
+
"choices": [
|
| 937 |
+
{
|
| 938 |
+
"index": 0,
|
| 939 |
+
"delta": {"content": text},
|
| 940 |
+
"finish_reason": None
|
| 941 |
+
}
|
| 942 |
+
]
|
| 943 |
+
}
|
| 944 |
+
yield f"data: {chunk_data}\n\n"
|
| 945 |
+
|
| 946 |
+
# Send stop chunk
|
| 947 |
+
output_tokens = count_tokens(full_text)
|
| 948 |
+
stop_chunk = {
|
| 949 |
+
"id": chunk_id,
|
| 950 |
+
"object": "chat.completion.chunk",
|
| 951 |
+
"created": timestamp,
|
| 952 |
+
"model": "openelm-450m-instruct",
|
| 953 |
+
"choices": [
|
| 954 |
+
{
|
| 955 |
+
"index": 0,
|
| 956 |
+
"delta": {},
|
| 957 |
+
"finish_reason": "stop"
|
| 958 |
+
}
|
| 959 |
+
]
|
| 960 |
+
}
|
| 961 |
+
yield f"data: {stop_chunk}\n\n"
|
| 962 |
+
|
| 963 |
+
# Send usage data (OpenAI format)
|
| 964 |
+
usage_data = {
|
| 965 |
+
"id": chunk_id,
|
| 966 |
+
"object": "chat.completion",
|
| 967 |
+
"created": timestamp,
|
| 968 |
+
"model": "openelm-450m-instruct",
|
| 969 |
+
"choices": [
|
| 970 |
+
{
|
| 971 |
+
"index": 0,
|
| 972 |
+
"message": {"role": "assistant", "content": full_text},
|
| 973 |
+
"finish_reason": "stop"
|
| 974 |
+
}
|
| 975 |
+
],
|
| 976 |
+
"usage": {
|
| 977 |
+
"prompt_tokens": input_tokens,
|
| 978 |
+
"completion_tokens": output_tokens,
|
| 979 |
+
"total_tokens": input_tokens + output_tokens
|
| 980 |
+
}
|
| 981 |
+
}
|
| 982 |
+
yield f"data: {usage_data}\n\n"
|
| 983 |
+
|
| 984 |
+
# Signal end of stream
|
| 985 |
+
yield "data: [DONE]\n\n"
|
| 986 |
+
|
| 987 |
+
thread.join()
|
| 988 |
+
|
| 989 |
+
except Exception as e:
|
| 990 |
+
yield f"data: {{\"error\": {{\"message\": \"{str(e)}\", \"type\": \"server_error\"}}}}\n\n"
|
| 991 |
+
|
| 992 |
+
return StreamingResponse(
|
| 993 |
+
generate_stream(),
|
| 994 |
+
media_type="text/event-stream",
|
| 995 |
+
headers={
|
| 996 |
+
"Cache-Control": "no-cache",
|
| 997 |
+
"Connection": "keep-alive",
|
| 998 |
+
"X-Accel-Buffering": "no",
|
| 999 |
+
}
|
| 1000 |
+
)
|
| 1001 |
+
|
| 1002 |
+
|
| 1003 |
+
def extract_assistant_response(generated_text: str) -> str:
|
| 1004 |
+
"""Extract assistant response from generated text."""
|
| 1005 |
+
response_text = generated_text
|
| 1006 |
+
|
| 1007 |
+
if "Assistant:" in generated_text:
|
| 1008 |
+
response_text = generated_text.split("Assistant:")[-1].strip()
|
| 1009 |
+
elif ":" in generated_text:
|
| 1010 |
+
# Find the last role and extract content after it
|
| 1011 |
+
lines = generated_text.split("\n")
|
| 1012 |
+
in_assistant = False
|
| 1013 |
+
response_parts = []
|
| 1014 |
+
for line in lines:
|
| 1015 |
+
if line.startswith("Assistant:"):
|
| 1016 |
+
in_assistant = True
|
| 1017 |
+
response_parts.append(line.replace("Assistant:", "").strip())
|
| 1018 |
+
elif in_assistant and not line.startswith("User:") and not line.startswith("System:"):
|
| 1019 |
+
response_parts.append(line)
|
| 1020 |
+
elif line.startswith("User:") or line.startswith("System:"):
|
| 1021 |
+
in_assistant = False
|
| 1022 |
+
response_text = "\n".join(response_parts).strip()
|
| 1023 |
+
|
| 1024 |
+
return response_text
|
| 1025 |
+
|
| 1026 |
+
|
| 1027 |
+
# ==================== OpenAI SDK Compatibility ====================
|
| 1028 |
+
|
| 1029 |
+
class OpenAIClient:
|
| 1030 |
+
"""
|
| 1031 |
+
Simple OpenAI SDK compatible client for testing.
|
| 1032 |
+
|
| 1033 |
+
Usage:
|
| 1034 |
+
client = OpenAIClient(base_url="http://localhost:8000/v1", api_key="dummy")
|
| 1035 |
+
response = client.chat.completions.create(
|
| 1036 |
+
model="openelm-450m-instruct",
|
| 1037 |
+
messages=[{"role": "user", "content": "Hello!"}],
|
| 1038 |
+
max_tokens=100
|
| 1039 |
+
)
|
| 1040 |
+
"""
|
| 1041 |
+
|
| 1042 |
+
def __init__(self, base_url: str = "http://localhost:8000", api_key: str = "dummy"):
|
| 1043 |
+
self.base_url = base_url.rstrip("/")
|
| 1044 |
+
self.api_key = api_key
|
| 1045 |
+
self.session = None
|
| 1046 |
+
|
| 1047 |
+
def _get_session(self):
|
| 1048 |
+
"""Get or create a requests session."""
|
| 1049 |
+
import requests
|
| 1050 |
+
if self.session is None:
|
| 1051 |
+
self.session = requests.Session()
|
| 1052 |
+
self.session.headers.update({
|
| 1053 |
+
"Authorization": f"Bearer {self.api_key}",
|
| 1054 |
+
"Content-Type": "application/json"
|
| 1055 |
+
})
|
| 1056 |
+
return self.session
|
| 1057 |
+
|
| 1058 |
+
@property
|
| 1059 |
+
def chat(self) -> "ChatResource":
|
| 1060 |
+
"""Access chat operations."""
|
| 1061 |
+
return ChatResource(self)
|
| 1062 |
+
|
| 1063 |
+
|
| 1064 |
+
class ChatResource:
|
| 1065 |
+
"""Resource for chat completion operations."""
|
| 1066 |
+
|
| 1067 |
+
def __init__(self, client: OpenAIClient):
|
| 1068 |
+
self.client = client
|
| 1069 |
+
|
| 1070 |
+
def create(
|
| 1071 |
+
self,
|
| 1072 |
+
model: str,
|
| 1073 |
+
messages: List[Dict[str, str]],
|
| 1074 |
+
temperature: Optional[float] = None,
|
| 1075 |
+
top_p: Optional[float] = None,
|
| 1076 |
+
max_tokens: Optional[int] = None,
|
| 1077 |
+
stream: bool = False,
|
| 1078 |
+
**kwargs
|
| 1079 |
+
) -> Dict[str, Any]:
|
| 1080 |
+
"""Create chat completion."""
|
| 1081 |
+
import requests
|
| 1082 |
+
|
| 1083 |
+
url = f"{self.client.base_url}/v1/chat/completions"
|
| 1084 |
+
|
| 1085 |
+
payload = {
|
| 1086 |
+
"model": model,
|
| 1087 |
+
"messages": messages,
|
| 1088 |
+
}
|
| 1089 |
+
|
| 1090 |
+
if temperature is not None:
|
| 1091 |
+
payload["temperature"] = temperature
|
| 1092 |
+
if top_p is not None:
|
| 1093 |
+
payload["top_p"] = top_p
|
| 1094 |
+
if max_tokens is not None:
|
| 1095 |
+
payload["max_tokens"] = max_tokens
|
| 1096 |
+
if stream:
|
| 1097 |
+
payload["stream"] = True
|
| 1098 |
+
|
| 1099 |
+
# Add any extra kwargs
|
| 1100 |
+
payload.update({k: v for k, v in kwargs.items() if k not in ['stream']})
|
| 1101 |
+
|
| 1102 |
+
response = self.client._get_session().post(url, json=payload)
|
| 1103 |
+
|
| 1104 |
+
if response.status_code != 200:
|
| 1105 |
+
raise Exception(f"API request failed: {response.text}")
|
| 1106 |
+
|
| 1107 |
+
return response.json()
|
| 1108 |
+
|
| 1109 |
+
|
| 1110 |
# ==================== Main Entry Point ====================
|
| 1111 |
|
| 1112 |
if __name__ == "__main__":
|
examples/curl_examples.sh
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
#!/bin/bash
|
| 2 |
-
# OpenELM Anthropic API - Curl Examples
|
| 3 |
#
|
| 4 |
-
# This script demonstrates how to call the OpenELM
|
| 5 |
-
#
|
| 6 |
#
|
| 7 |
# Usage:
|
| 8 |
# chmod +x examples/curl_examples.sh
|
|
@@ -13,7 +13,7 @@ API_URL="${OPENELM_API_URL:-http://localhost:8000}"
|
|
| 13 |
API_URL="${API_URL%/}" # Remove trailing slash
|
| 14 |
|
| 15 |
echo "=============================================="
|
| 16 |
-
echo "OpenELM Anthropic API - Curl Examples"
|
| 17 |
echo "=============================================="
|
| 18 |
echo "API URL: $API_URL"
|
| 19 |
echo ""
|
|
@@ -24,16 +24,25 @@ echo "------------------------"
|
|
| 24 |
curl -s "$API_URL/health" | python3 -m json.tool
|
| 25 |
echo ""
|
| 26 |
|
| 27 |
-
#
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
curl -s "$API_URL/v1/models" | python3 -m json.tool
|
| 31 |
echo ""
|
| 32 |
|
| 33 |
-
# Example 3: Basic
|
| 34 |
-
echo "Example 3: Basic
|
| 35 |
-
echo "------------------------------------"
|
| 36 |
-
curl -s -X POST "$API_URL/v1/
|
| 37 |
-H "Content-Type: application/json" \
|
| 38 |
-d '{
|
| 39 |
"model": "openelm-450m-instruct",
|
|
@@ -48,10 +57,10 @@ curl -s -X POST "$API_URL/v1/messages" \
|
|
| 48 |
}' | python3 -m json.tool
|
| 49 |
echo ""
|
| 50 |
|
| 51 |
-
# Example 4: Multi-turn Conversation
|
| 52 |
-
echo "Example 4: Multi-turn Conversation"
|
| 53 |
-
echo "-----------------------------------"
|
| 54 |
-
curl -s -X POST "$API_URL/v1/
|
| 55 |
-H "Content-Type: application/json" \
|
| 56 |
-d '{
|
| 57 |
"model": "openelm-450m-instruct",
|
|
@@ -62,7 +71,7 @@ curl -s -X POST "$API_URL/v1/messages" \
|
|
| 62 |
},
|
| 63 |
{
|
| 64 |
"role": "assistant",
|
| 65 |
-
"content": "Python is a high-level programming language
|
| 66 |
},
|
| 67 |
{
|
| 68 |
"role": "user",
|
|
@@ -74,36 +83,39 @@ curl -s -X POST "$API_URL/v1/messages" \
|
|
| 74 |
}' | python3 -m json.tool
|
| 75 |
echo ""
|
| 76 |
|
| 77 |
-
# Example 5: Using System
|
| 78 |
-
echo "Example 5: Using System
|
| 79 |
-
echo "-------------------------------"
|
| 80 |
-
curl -s -X POST "$API_URL/v1/
|
| 81 |
-H "Content-Type: application/json" \
|
| 82 |
-d '{
|
| 83 |
"model": "openelm-450m-instruct",
|
| 84 |
"messages": [
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
{
|
| 86 |
"role": "user",
|
| 87 |
-
"content": "
|
| 88 |
}
|
| 89 |
],
|
| 90 |
-
"system": "You are a helpful tutor who explains things simply.",
|
| 91 |
"max_tokens": 200,
|
| 92 |
"temperature": 0.8
|
| 93 |
}' | python3 -m json.tool
|
| 94 |
echo ""
|
| 95 |
|
| 96 |
-
# Example 6: Deterministic Generation
|
| 97 |
-
echo "Example 6: Deterministic Generation"
|
| 98 |
-
echo "------------------------------------"
|
| 99 |
-
curl -s -X POST "$API_URL/v1/
|
| 100 |
-H "Content-Type: application/json" \
|
| 101 |
-d '{
|
| 102 |
"model": "openelm-450m-instruct",
|
| 103 |
"messages": [
|
| 104 |
{
|
| 105 |
"role": "user",
|
| 106 |
-
"content": "What is
|
| 107 |
}
|
| 108 |
],
|
| 109 |
"max_tokens": 50,
|
|
@@ -111,6 +123,106 @@ curl -s -X POST "$API_URL/v1/messages" \
|
|
| 111 |
}' | python3 -m json.tool
|
| 112 |
echo ""
|
| 113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
echo "=============================================="
|
| 115 |
echo "All curl examples completed!"
|
| 116 |
echo "=============================================="
|
|
|
|
| 1 |
#!/bin/bash
|
| 2 |
+
# OpenELM OpenAI & Anthropic API - Curl Examples
|
| 3 |
#
|
| 4 |
+
# This script demonstrates how to call the OpenELM API using both
|
| 5 |
+
# OpenAI and Anthropic compatible endpoints with curl commands.
|
| 6 |
#
|
| 7 |
# Usage:
|
| 8 |
# chmod +x examples/curl_examples.sh
|
|
|
|
| 13 |
API_URL="${API_URL%/}" # Remove trailing slash
|
| 14 |
|
| 15 |
echo "=============================================="
|
| 16 |
+
echo "OpenELM OpenAI & Anthropic API - Curl Examples"
|
| 17 |
echo "=============================================="
|
| 18 |
echo "API URL: $API_URL"
|
| 19 |
echo ""
|
|
|
|
| 24 |
curl -s "$API_URL/health" | python3 -m json.tool
|
| 25 |
echo ""
|
| 26 |
|
| 27 |
+
# ============================================
|
| 28 |
+
# OpenAI API Examples
|
| 29 |
+
# ============================================
|
| 30 |
+
|
| 31 |
+
echo "##########################################"
|
| 32 |
+
echo "# OpenAI API Examples #"
|
| 33 |
+
echo "##########################################"
|
| 34 |
+
echo ""
|
| 35 |
+
|
| 36 |
+
# Example 2: OpenAI - List Available Models
|
| 37 |
+
echo "Example 2: OpenAI - List Available Models"
|
| 38 |
+
echo "-------------------------------------------"
|
| 39 |
curl -s "$API_URL/v1/models" | python3 -m json.tool
|
| 40 |
echo ""
|
| 41 |
|
| 42 |
+
# Example 3: OpenAI - Basic Chat Completion
|
| 43 |
+
echo "Example 3: OpenAI - Basic Chat Completion"
|
| 44 |
+
echo "--------------------------------------------"
|
| 45 |
+
curl -s -X POST "$API_URL/v1/chat/completions" \
|
| 46 |
-H "Content-Type: application/json" \
|
| 47 |
-d '{
|
| 48 |
"model": "openelm-450m-instruct",
|
|
|
|
| 57 |
}' | python3 -m json.tool
|
| 58 |
echo ""
|
| 59 |
|
| 60 |
+
# Example 4: OpenAI - Multi-turn Conversation
|
| 61 |
+
echo "Example 4: OpenAI - Multi-turn Conversation"
|
| 62 |
+
echo "--------------------------------------------"
|
| 63 |
+
curl -s -X POST "$API_URL/v1/chat/completions" \
|
| 64 |
-H "Content-Type: application/json" \
|
| 65 |
-d '{
|
| 66 |
"model": "openelm-450m-instruct",
|
|
|
|
| 71 |
},
|
| 72 |
{
|
| 73 |
"role": "assistant",
|
| 74 |
+
"content": "Python is a high-level programming language."
|
| 75 |
},
|
| 76 |
{
|
| 77 |
"role": "user",
|
|
|
|
| 83 |
}' | python3 -m json.tool
|
| 84 |
echo ""
|
| 85 |
|
| 86 |
+
# Example 5: OpenAI - Using System Message
|
| 87 |
+
echo "Example 5: OpenAI - Using System Message"
|
| 88 |
+
echo "------------------------------------------"
|
| 89 |
+
curl -s -X POST "$API_URL/v1/chat/completions" \
|
| 90 |
-H "Content-Type: application/json" \
|
| 91 |
-d '{
|
| 92 |
"model": "openelm-450m-instruct",
|
| 93 |
"messages": [
|
| 94 |
+
{
|
| 95 |
+
"role": "system",
|
| 96 |
+
"content": "You are a helpful coding assistant."
|
| 97 |
+
},
|
| 98 |
{
|
| 99 |
"role": "user",
|
| 100 |
+
"content": "What is a decorator?"
|
| 101 |
}
|
| 102 |
],
|
|
|
|
| 103 |
"max_tokens": 200,
|
| 104 |
"temperature": 0.8
|
| 105 |
}' | python3 -m json.tool
|
| 106 |
echo ""
|
| 107 |
|
| 108 |
+
# Example 6: OpenAI - Deterministic Generation
|
| 109 |
+
echo "Example 6: OpenAI - Deterministic Generation"
|
| 110 |
+
echo "----------------------------------------------"
|
| 111 |
+
curl -s -X POST "$API_URL/v1/chat/completions" \
|
| 112 |
-H "Content-Type: application/json" \
|
| 113 |
-d '{
|
| 114 |
"model": "openelm-450m-instruct",
|
| 115 |
"messages": [
|
| 116 |
{
|
| 117 |
"role": "user",
|
| 118 |
+
"content": "What is 2 + 2?"
|
| 119 |
}
|
| 120 |
],
|
| 121 |
"max_tokens": 50,
|
|
|
|
| 123 |
}' | python3 -m json.tool
|
| 124 |
echo ""
|
| 125 |
|
| 126 |
+
# Example 7: OpenAI - Streaming Response
|
| 127 |
+
echo "Example 7: OpenAI - Streaming Response"
|
| 128 |
+
echo "----------------------------------------"
|
| 129 |
+
echo "Streaming output:"
|
| 130 |
+
curl -s -X POST "$API_URL/v1/chat/completions" \
|
| 131 |
+
-H "Content-Type: application/json" \
|
| 132 |
+
-H "Accept: text/event-stream" \
|
| 133 |
+
-d '{
|
| 134 |
+
"model": "openelm-450m-instruct",
|
| 135 |
+
"messages": [
|
| 136 |
+
{
|
| 137 |
+
"role": "user",
|
| 138 |
+
"content": "Count to 3, one per line."
|
| 139 |
+
}
|
| 140 |
+
],
|
| 141 |
+
"max_tokens": 100,
|
| 142 |
+
"temperature": 0.7,
|
| 143 |
+
"stream": true
|
| 144 |
+
}' | head -20
|
| 145 |
+
echo ""
|
| 146 |
+
echo ""
|
| 147 |
+
|
| 148 |
+
# ============================================
|
| 149 |
+
# Anthropic API Examples
|
| 150 |
+
# ============================================
|
| 151 |
+
|
| 152 |
+
echo "##########################################"
|
| 153 |
+
echo "# Anthropic API Examples #"
|
| 154 |
+
echo "##########################################"
|
| 155 |
+
echo ""
|
| 156 |
+
|
| 157 |
+
# Example 8: Anthropic - List Available Models
|
| 158 |
+
echo "Example 8: Anthropic - List Available Models"
|
| 159 |
+
echo "----------------------------------------------"
|
| 160 |
+
curl -s "$API_URL/v1/models" | python3 -m json.tool
|
| 161 |
+
echo ""
|
| 162 |
+
|
| 163 |
+
# Example 9: Anthropic - Basic Message Generation
|
| 164 |
+
echo "Example 9: Anthropic - Basic Message Generation"
|
| 165 |
+
echo "-------------------------------------------------"
|
| 166 |
+
curl -s -X POST "$API_URL/v1/messages" \
|
| 167 |
+
-H "Content-Type: application/json" \
|
| 168 |
+
-d '{
|
| 169 |
+
"model": "openelm-450m-instruct",
|
| 170 |
+
"messages": [
|
| 171 |
+
{
|
| 172 |
+
"role": "user",
|
| 173 |
+
"content": "Say hello in a friendly way!"
|
| 174 |
+
}
|
| 175 |
+
],
|
| 176 |
+
"max_tokens": 100,
|
| 177 |
+
"temperature": 0.7
|
| 178 |
+
}' | python3 -m json.tool
|
| 179 |
+
echo ""
|
| 180 |
+
|
| 181 |
+
# Example 10: Anthropic - Multi-turn Conversation
|
| 182 |
+
echo "Example 10: Anthropic - Multi-turn Conversation"
|
| 183 |
+
echo "-------------------------------------------------"
|
| 184 |
+
curl -s -X POST "$API_URL/v1/messages" \
|
| 185 |
+
-H "Content-Type: application/json" \
|
| 186 |
+
-d '{
|
| 187 |
+
"model": "openelm-450m-instruct",
|
| 188 |
+
"messages": [
|
| 189 |
+
{
|
| 190 |
+
"role": "user",
|
| 191 |
+
"content": "What is AI?"
|
| 192 |
+
},
|
| 193 |
+
{
|
| 194 |
+
"role": "assistant",
|
| 195 |
+
"content": "AI stands for Artificial Intelligence."
|
| 196 |
+
},
|
| 197 |
+
{
|
| 198 |
+
"role": "user",
|
| 199 |
+
"content": "Tell me more."
|
| 200 |
+
}
|
| 201 |
+
],
|
| 202 |
+
"max_tokens": 150,
|
| 203 |
+
"temperature": 0.5
|
| 204 |
+
}' | python3 -m json.tool
|
| 205 |
+
echo ""
|
| 206 |
+
|
| 207 |
+
# Example 11: Anthropic - Using System Prompt
|
| 208 |
+
echo "Example 11: Anthropic - Using System Prompt"
|
| 209 |
+
echo "----------------------------------------------"
|
| 210 |
+
curl -s -X POST "$API_URL/v1/messages" \
|
| 211 |
+
-H "Content-Type: application/json" \
|
| 212 |
+
-d '{
|
| 213 |
+
"model": "openelm-450m-instruct",
|
| 214 |
+
"messages": [
|
| 215 |
+
{
|
| 216 |
+
"role": "user",
|
| 217 |
+
"content": "Explain quantum computing."
|
| 218 |
+
}
|
| 219 |
+
],
|
| 220 |
+
"system": "You are a science educator who explains complex topics simply.",
|
| 221 |
+
"max_tokens": 200,
|
| 222 |
+
"temperature": 0.8
|
| 223 |
+
}' | python3 -m json.tool
|
| 224 |
+
echo ""
|
| 225 |
+
|
| 226 |
echo "=============================================="
|
| 227 |
echo "All curl examples completed!"
|
| 228 |
echo "=============================================="
|
examples/openai_sdk_example.py
ADDED
|
@@ -0,0 +1,148 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Example: Using OpenAI SDK with OpenELM API
|
| 3 |
+
|
| 4 |
+
This example demonstrates how to use the OpenAI SDK (or compatible client)
|
| 5 |
+
to call OpenELM models through our OpenAI API compatible wrapper.
|
| 6 |
+
|
| 7 |
+
Note: The official openai Python package requires the API server to have
|
| 8 |
+
proper authentication. For testing, use the included OpenAIClient helper.
|
| 9 |
+
|
| 10 |
+
Usage:
|
| 11 |
+
python examples/openai_sdk_example.py
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import sys
|
| 15 |
+
import os
|
| 16 |
+
|
| 17 |
+
# Add parent directory to path for imports
|
| 18 |
+
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 19 |
+
|
| 20 |
+
from app import OpenAIClient
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def main():
|
| 24 |
+
"""Example usage of the OpenAI-compatible OpenELM API."""
|
| 25 |
+
|
| 26 |
+
# Create client pointing to our local API
|
| 27 |
+
base_url = os.environ.get("OPENELM_API_URL", "http://localhost:8000")
|
| 28 |
+
client = OpenAIClient(base_url=base_url, api_key="dummy-key")
|
| 29 |
+
|
| 30 |
+
print("=" * 60)
|
| 31 |
+
print("OpenELM OpenAI API - Usage Example")
|
| 32 |
+
print("=" * 60)
|
| 33 |
+
print(f"API URL: {base_url}")
|
| 34 |
+
print()
|
| 35 |
+
|
| 36 |
+
# Example 1: Basic chat completion
|
| 37 |
+
print("Example 1: Basic Chat Completion")
|
| 38 |
+
print("-" * 40)
|
| 39 |
+
|
| 40 |
+
response = client.chat.completions.create(
|
| 41 |
+
model="openelm-450m-instruct",
|
| 42 |
+
messages=[
|
| 43 |
+
{"role": "user", "content": "Say hello in a friendly way!"}
|
| 44 |
+
],
|
| 45 |
+
max_tokens=100,
|
| 46 |
+
temperature=0.7
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
print(f"Response ID: {response['id']}")
|
| 50 |
+
print(f"Model: {response['model']}")
|
| 51 |
+
print(f"Content: {response['choices'][0]['message']['content']}")
|
| 52 |
+
print(f"Usage: {response['usage']}")
|
| 53 |
+
print()
|
| 54 |
+
|
| 55 |
+
# Example 2: Multi-turn conversation
|
| 56 |
+
print("Example 2: Multi-turn Conversation")
|
| 57 |
+
print("-" * 40)
|
| 58 |
+
|
| 59 |
+
response = client.chat.completions.create(
|
| 60 |
+
model="openelm-450m-instruct",
|
| 61 |
+
messages=[
|
| 62 |
+
{"role": "user", "content": "What is artificial intelligence?"},
|
| 63 |
+
{"role": "assistant", "content": "Artificial intelligence (AI) refers to systems that can perform tasks that typically require human intelligence."},
|
| 64 |
+
{"role": "user", "content": "What are some examples?"}
|
| 65 |
+
],
|
| 66 |
+
max_tokens=150,
|
| 67 |
+
temperature=0.5
|
| 68 |
+
)
|
| 69 |
+
|
| 70 |
+
print(f"Content: {response['choices'][0]['message']['content']}")
|
| 71 |
+
print(f"Usage: {response['usage']}")
|
| 72 |
+
print()
|
| 73 |
+
|
| 74 |
+
# Example 3: Using system message
|
| 75 |
+
print("Example 3: Using System Message")
|
| 76 |
+
print("-" * 40)
|
| 77 |
+
|
| 78 |
+
response = client.chat.completions.create(
|
| 79 |
+
model="openelm-450m-instruct",
|
| 80 |
+
messages=[
|
| 81 |
+
{"role": "system", "content": "You are a helpful coding assistant."},
|
| 82 |
+
{"role": "user", "content": "What is a Python decorator?"}
|
| 83 |
+
],
|
| 84 |
+
max_tokens=200,
|
| 85 |
+
temperature=0.8
|
| 86 |
+
)
|
| 87 |
+
|
| 88 |
+
print(f"Content: {response['choices'][0]['message']['content']}")
|
| 89 |
+
print(f"Usage: {response['usage']}")
|
| 90 |
+
print()
|
| 91 |
+
|
| 92 |
+
# Example 4: Deterministic generation (temperature=0)
|
| 93 |
+
print("Example 4: Deterministic Generation (temperature=0)")
|
| 94 |
+
print("-" * 40)
|
| 95 |
+
|
| 96 |
+
response = client.chat.completions.create(
|
| 97 |
+
model="openelm-450m-instruct",
|
| 98 |
+
messages=[
|
| 99 |
+
{"role": "user", "content": "What is 2 + 2?"}
|
| 100 |
+
],
|
| 101 |
+
max_tokens=50,
|
| 102 |
+
temperature=0.0 # Deterministic output
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
+
print(f"Content: {response['choices'][0]['message']['content']}")
|
| 106 |
+
print(f"Usage: {response['usage']}")
|
| 107 |
+
print()
|
| 108 |
+
|
| 109 |
+
# Example 5: Streaming response
|
| 110 |
+
print("Example 5: Streaming Response")
|
| 111 |
+
print("-" * 40)
|
| 112 |
+
print("Streaming response:")
|
| 113 |
+
|
| 114 |
+
response = client.chat.completions.create(
|
| 115 |
+
model="openelm-450m-instruct",
|
| 116 |
+
messages=[
|
| 117 |
+
{"role": "user", "content": "Count to 5, one number per line."}
|
| 118 |
+
],
|
| 119 |
+
max_tokens=100,
|
| 120 |
+
temperature=0.7,
|
| 121 |
+
stream=True
|
| 122 |
+
)
|
| 123 |
+
|
| 124 |
+
# For streaming, response is a generator
|
| 125 |
+
chunk_count = 0
|
| 126 |
+
for chunk in response:
|
| 127 |
+
if 'choices' in chunk and chunk['choices']:
|
| 128 |
+
delta = chunk['choices'][0].get('delta', {})
|
| 129 |
+
if 'content' in delta:
|
| 130 |
+
content = delta['content']
|
| 131 |
+
if content:
|
| 132 |
+
print(content, end="", flush=True)
|
| 133 |
+
chunk_count += 1
|
| 134 |
+
elif 'error' in chunk:
|
| 135 |
+
print(f"Error: {chunk['error']}")
|
| 136 |
+
break
|
| 137 |
+
|
| 138 |
+
print("\n")
|
| 139 |
+
print(f"Received {chunk_count} chunks")
|
| 140 |
+
print()
|
| 141 |
+
|
| 142 |
+
print("=" * 60)
|
| 143 |
+
print("All examples completed successfully!")
|
| 144 |
+
print("=" * 60)
|
| 145 |
+
|
| 146 |
+
|
| 147 |
+
if __name__ == "__main__":
|
| 148 |
+
main()
|