MiniMax Agent commited on
Commit
c126015
·
1 Parent(s): 9604400

Add OpenAI API compatible endpoints for OpenELM models

Browse files
Files changed (4) hide show
  1. README.md +218 -33
  2. app.py +475 -11
  3. examples/curl_examples.sh +139 -27
  4. examples/openai_sdk_example.py +148 -0
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: OpenELM Anthropic API
3
  emoji: 🤖
4
  colorFrom: blue
5
  colorTo: purple
@@ -7,21 +7,23 @@ sdk: docker
7
  pinned: false
8
  ---
9
 
10
- # OpenELM Anthropic API Compatible Wrapper
11
 
12
- A FastAPI-based service that provides an Anthropic-compatible API for Apple's OpenELM models, allowing you to use the Anthropic SDK with OpenELM for text generation tasks.
13
 
14
  ## Overview
15
 
16
- This project creates a REST API that mimics the Anthropic Messages API format, enabling developers to use OpenELM models with existing Anthropic SDK code with minimal modifications. The API supports both streaming and non-streaming responses, multi-turn conversations, system prompts, and various generation parameters.
17
 
18
- The OpenELM (Open Efficient Language Model) family from Apple uses a layer-wise scaling strategy to efficiently allocate parameters within each transformer layer, resulting in enhanced accuracy while maintaining computational efficiency. This wrapper makes these powerful models accessible through a familiar API interface.
19
 
20
  ## Features
21
 
22
- The API provides comprehensive support for Anthropic-style message generation with several key capabilities. First, it offers full Anthropic API compatibility, including endpoints that match the Anthropic Messages API structure, making it easy to integrate with existing codebases. Second, it supports streaming responses through Server-Sent Events (SSE), enabling real-time output display as tokens are generated. Third, the API handles multi-turn conversations by maintaining conversation history and formatting prompts appropriately for OpenELM models.
23
 
24
- Additionally, the wrapper properly handles system prompts by prepending them to the conversation context, which is essential for defining assistant behavior. The API also provides flexible generation parameters, allowing control over temperature, top-p sampling, maximum tokens, and other generation settings. Finally, comprehensive token usage statistics are included in responses, matching the Anthropic response format exactly.
 
 
25
 
26
  ## Quick Start
27
 
@@ -29,8 +31,8 @@ Additionally, the wrapper properly handles system prompts by prepending them to
29
 
30
  ```bash
31
  # Build and run with Docker
32
- docker build -t openelm-anthropic-api .
33
- docker run -p 8000:8000 openelm-anthropic-api
34
  ```
35
 
36
  ### Local Development
@@ -43,7 +45,20 @@ pip install -r requirements.txt
43
  python -m uvicorn app:app --host 0.0.0.0 --port 8000
44
  ```
45
 
46
- ### Test the API
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
  ```bash
49
  # Basic message generation
@@ -58,17 +73,69 @@ curl -X POST http://localhost:8000/v1/messages \
58
 
59
  ## API Reference
60
 
61
- ### Endpoints
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  | Method | Endpoint | Description |
64
  |--------|----------|-------------|
65
- | GET | / | API information |
66
- | GET | /health | Health check |
67
- | GET | /v1/models | List available models |
68
  | POST | /v1/messages | Create message (non-streaming) |
69
  | POST | /v1/messages/stream | Create message (streaming) |
70
 
71
- ### Request Format
72
 
73
  ```json
74
  {
@@ -79,19 +146,20 @@ curl -X POST http://localhost:8000/v1/messages \
79
  "system": "Optional system prompt",
80
  "max_tokens": 1024,
81
  "temperature": 0.7,
82
- "top_p": 0.9,
83
  "stream": false
84
  }
85
  ```
86
 
87
- ### Response Format
88
 
89
  ```json
90
  {
91
  "id": "msg_abc123",
92
  "type": "message",
93
  "role": "assistant",
94
- "content": [{"type": "text", "text": "Generated response"}],
 
 
95
  "model": "openelm-450m-instruct",
96
  "stop_reason": "end_turn",
97
  "usage": {
@@ -101,25 +169,90 @@ curl -X POST http://localhost:8000/v1/messages \
101
  }
102
  ```
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ## Using with Anthropic SDK
105
 
106
  ```python
107
- from anthropic import Anthropic
108
 
109
  # Point to your local API
110
- client = Anthropic(
111
  base_url="http://localhost:8000/v1",
112
  api_key="dummy" # Any string works
113
  )
114
 
115
  # Use the same API you use with Claude!
116
- response = client.messages.create(
117
  model="openelm-450m-instruct",
118
  messages=[{"role": "user", "content": "Hello!"}],
119
  max_tokens=100
120
  )
121
 
122
- print(response.content[0].text)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
  ```
124
 
125
  ## Model Information
@@ -129,14 +262,16 @@ print(response.content[0].text)
129
  - **Context Window**: 2048 tokens
130
  - **Weight Format**: Safetensors (secure and efficient)
131
  - **Quantization**: FP16 for optimal performance
 
132
 
133
  ## Architecture
134
 
135
- - **Framework**: FastAPI with async support
136
- - **ML Backend**: PyTorch + HuggingFace Transformers
137
- - **Model Loading**: Lazy loading on startup with caching
138
- - **Streaming**: Server-Sent Events (SSE)
139
- - **Response Format**: 100% Anthropic API compatible
 
140
 
141
  ## Configuration
142
 
@@ -147,20 +282,68 @@ Environment variables can be used to customize the deployment:
147
  | PORT | 8000 | API server port |
148
  | HF_HOME | ~/.cache/huggingface | Model cache directory |
149
  | TRANSFORMERS_CACHE | ~/.cache/transformers | Transformers cache |
 
150
 
151
  ## Examples
152
 
153
  See the `examples/` directory for complete usage examples:
154
 
155
- - `anthropic_sdk_example.py` - Python SDK usage
156
- - `curl_examples.sh` - Command-line examples
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
  ## Troubleshooting
159
 
160
- - **Model not loading**: Check internet connection for HuggingFace download
161
- - **Out of memory**: Reduce max_tokens or use CPU inference
162
- - **Slow responses**: First request downloads model (subsequent requests are faster)
163
- - **Port conflicts**: Change PORT environment variable
 
 
 
 
 
 
 
 
 
 
 
 
164
 
165
  ## License
166
 
@@ -169,6 +352,8 @@ This project is provided for educational and research purposes. The OpenELM mode
169
  ## Resources
170
 
171
  - [OpenELM Model Card](https://huggingface.co/apple/OpenELM-450M-Instruct)
 
172
  - [Anthropic API Documentation](https://docs.anthropic.com)
173
  - [FastAPI Documentation](https://fastapi.tiangolo.com)
174
  - [HuggingFace Transformers](https://huggingface.co/docs/transformers)
 
 
1
  ---
2
+ title: OpenELM OpenAI API
3
  emoji: 🤖
4
  colorFrom: blue
5
  colorTo: purple
 
7
  pinned: false
8
  ---
9
 
10
+ # OpenELM OpenAI & Anthropic API Compatible Wrapper
11
 
12
+ A FastAPI-based service that provides both OpenAI and Anthropic-compatible APIs for Apple's OpenELM models, allowing you to use the OpenAI SDK or Anthropic SDK with OpenELM for text generation tasks.
13
 
14
  ## Overview
15
 
16
+ This project creates a REST API that mimics both the OpenAI Chat Completions API and Anthropic Messages API formats, enabling developers to use OpenELM models with existing SDK code with minimal modifications. The API supports both streaming and non-streaming responses, multi-turn conversations, system prompts, and various generation parameters. This dual compatibility means you can use the same underlying OpenELM model whether your codebase is built for OpenAI or Anthropic APIs.
17
 
18
+ The OpenELM (Open Efficient Language Model) family from Apple uses a layer-wise scaling strategy to efficiently allocate parameters within each transformer layer, resulting in enhanced accuracy while maintaining computational efficiency. This wrapper makes these powerful models accessible through familiar API interfaces, bridging the gap between Apple's innovative architecture and the widely-adopted API standards used in the industry.
19
 
20
  ## Features
21
 
22
+ The API provides comprehensive support for both OpenAI and Anthropic-style generation with several key capabilities. First, it offers full dual API compatibility, including endpoints that match both the OpenAI Chat Completions API structure and the Anthropic Messages API, making it easy to integrate with existing codebases regardless of which provider you currently use. Second, it supports streaming responses through Server-Sent Events (SSE), enabling real-time output display as tokens are generated in both API formats.
23
 
24
+ Third, the API handles multi-turn conversations by maintaining conversation history and formatting prompts appropriately for OpenELM models, regardless of which API format you choose. Additionally, the wrapper properly handles system prompts by prepending them to the conversation context, which is essential for defining assistant behavior. The API also provides flexible generation parameters, allowing control over temperature, top-p sampling, maximum tokens, and other generation settings that work across both API styles.
25
+
26
+ Finally, comprehensive token usage statistics are included in responses, matching both the OpenAI and Anthropic response formats exactly, ensuring compatibility with tools and dashboards that expect standard usage reporting.
27
 
28
  ## Quick Start
29
 
 
31
 
32
  ```bash
33
  # Build and run with Docker
34
+ docker build -t openelm-api .
35
+ docker run -p 8000:8000 openelm-api
36
  ```
37
 
38
  ### Local Development
 
45
  python -m uvicorn app:app --host 0.0.0.0 --port 8000
46
  ```
47
 
48
+ ### Test the API (OpenAI Format)
49
+
50
+ ```bash
51
+ # Basic chat completion
52
+ curl -X POST http://localhost:8000/v1/chat/completions \
53
+ -H "Content-Type: application/json" \
54
+ -d '{
55
+ "model": "openelm-450m-instruct",
56
+ "messages": [{"role": "user", "content": "Say hello!"}],
57
+ "max_tokens": 100
58
+ }'
59
+ ```
60
+
61
+ ### Test the API (Anthropic Format)
62
 
63
  ```bash
64
  # Basic message generation
 
73
 
74
  ## API Reference
75
 
76
+ ### OpenAI API Endpoints
77
+
78
+ The OpenAI-compatible endpoints follow the standard Chat Completions API format used by OpenAI's GPT models. These endpoints accept message arrays with roles and content, and return completion responses in the standard OpenAI format.
79
+
80
+ | Method | Endpoint | Description |
81
+ |--------|----------|-------------|
82
+ | GET | /v1/models | List available models (OpenAI format) |
83
+ | POST | /v1/chat/completions | Create chat completion (non-streaming) |
84
+ | POST | /v1/chat/completions (with stream=true) | Create chat completion (streaming) |
85
+
86
+ #### OpenAI Request Format
87
+
88
+ ```json
89
+ {
90
+ "model": "openelm-450m-instruct",
91
+ "messages": [
92
+ {"role": "system", "content": "You are a helpful assistant."},
93
+ {"role": "user", "content": "Your prompt here"}
94
+ ],
95
+ "temperature": 0.7,
96
+ "top_p": 0.9,
97
+ "max_tokens": 1024,
98
+ "stream": false
99
+ }
100
+ ```
101
+
102
+ #### OpenAI Response Format
103
+
104
+ ```json
105
+ {
106
+ "id": "chatcmpl-abc123",
107
+ "object": "chat.completion",
108
+ "created": 1677858242,
109
+ "model": "openelm-450m-instruct",
110
+ "choices": [
111
+ {
112
+ "index": 0,
113
+ "message": {
114
+ "role": "assistant",
115
+ "content": "Generated response"
116
+ },
117
+ "finish_reason": "stop"
118
+ }
119
+ ],
120
+ "usage": {
121
+ "prompt_tokens": 13,
122
+ "completion_tokens": 25,
123
+ "total_tokens": 38
124
+ }
125
+ }
126
+ ```
127
+
128
+ ### Anthropic API Endpoints
129
+
130
+ The Anthropic-compatible endpoints follow the Messages API format used by Claude. These endpoints accept message arrays with roles and content, and support both streaming and non-streaming responses.
131
 
132
  | Method | Endpoint | Description |
133
  |--------|----------|-------------|
134
+ | GET | /v1/models | List available models (Anthropic format) |
 
 
135
  | POST | /v1/messages | Create message (non-streaming) |
136
  | POST | /v1/messages/stream | Create message (streaming) |
137
 
138
+ #### Anthropic Request Format
139
 
140
  ```json
141
  {
 
146
  "system": "Optional system prompt",
147
  "max_tokens": 1024,
148
  "temperature": 0.7,
 
149
  "stream": false
150
  }
151
  ```
152
 
153
+ #### Anthropic Response Format
154
 
155
  ```json
156
  {
157
  "id": "msg_abc123",
158
  "type": "message",
159
  "role": "assistant",
160
+ "content": [
161
+ {"type": "text", "text": "Generated response"}
162
+ ],
163
  "model": "openelm-450m-instruct",
164
  "stop_reason": "end_turn",
165
  "usage": {
 
169
  }
170
  ```
171
 
172
+ ## Using with OpenAI SDK
173
+
174
+ ```python
175
+ from openai import OpenAI
176
+
177
+ # Point to your local API
178
+ client = OpenAI(
179
+ base_url="http://localhost:8000/v1",
180
+ api_key="dummy" # Any string works
181
+ )
182
+
183
+ # Use the same API you use with GPT!
184
+ response = client.chat.completions.create(
185
+ model="openelm-450m-instruct",
186
+ messages=[
187
+ {"role": "system", "content": "You are a helpful assistant."},
188
+ {"role": "user", "content": "Hello!"}
189
+ ],
190
+ max_tokens=100
191
+ )
192
+
193
+ print(response.choices[0].message.content)
194
+ ```
195
+
196
+ ### Streaming with OpenAI SDK
197
+
198
+ ```python
199
+ from openai import OpenAI
200
+
201
+ client = OpenAI(
202
+ base_url="http://localhost:8000/v1",
203
+ api_key="dummy"
204
+ )
205
+
206
+ stream = client.chat.completions.create(
207
+ model="openelm-450m-instruct",
208
+ messages=[{"role": "user", "content": "Tell me a story."}],
209
+ max_tokens=100,
210
+ stream=True
211
+ )
212
+
213
+ for chunk in stream:
214
+ if chunk.choices[0].delta.content:
215
+ print(chunk.choices[0].delta.content, end="", flush=True)
216
+ ```
217
+
218
  ## Using with Anthropic SDK
219
 
220
  ```python
221
+ import anthropic
222
 
223
  # Point to your local API
224
+ client = anthropic.Anthropic(
225
  base_url="http://localhost:8000/v1",
226
  api_key="dummy" # Any string works
227
  )
228
 
229
  # Use the same API you use with Claude!
230
+ message = client.messages.create(
231
  model="openelm-450m-instruct",
232
  messages=[{"role": "user", "content": "Hello!"}],
233
  max_tokens=100
234
  )
235
 
236
+ print(message.content[0].text)
237
+ ```
238
+
239
+ ### Streaming with Anthropic SDK
240
+
241
+ ```python
242
+ import anthropic
243
+
244
+ client = anthropic.Anthropic(
245
+ base_url="http://localhost:8000/v1",
246
+ api_key="dummy"
247
+ )
248
+
249
+ with client.messages.stream(
250
+ model="openelm-450m-instruct",
251
+ messages=[{"role": "user", "content": "Tell me a story."}],
252
+ max_tokens=100
253
+ ) as stream:
254
+ for text in stream.text_stream:
255
+ print(text, end="", flush=True)
256
  ```
257
 
258
  ## Model Information
 
262
  - **Context Window**: 2048 tokens
263
  - **Weight Format**: Safetensors (secure and efficient)
264
  - **Quantization**: FP16 for optimal performance
265
+ - **Layer-wise Scaling**: Efficient parameter allocation within transformer layers
266
 
267
  ## Architecture
268
 
269
+ - **Framework**: FastAPI with async support for high concurrency
270
+ - **ML Backend**: PyTorch + HuggingFace Transformers for model inference
271
+ - **Model Loading**: Lazy loading on startup with caching for fast restarts
272
+ - **Streaming**: Server-Sent Events (SSE) for real-time token delivery
273
+ - **Dual Compatibility**: Full OpenAI and Anthropic API format support
274
+ - **Prompt Engineering**: Custom formatting for OpenELM's text completion interface
275
 
276
  ## Configuration
277
 
 
282
  | PORT | 8000 | API server port |
283
  | HF_HOME | ~/.cache/huggingface | Model cache directory |
284
  | TRANSFORMERS_CACHE | ~/.cache/transformers | Transformers cache |
285
+ | CUDA_VISIBLE_DEVICES | all | GPU device selection |
286
 
287
  ## Examples
288
 
289
  See the `examples/` directory for complete usage examples:
290
 
291
+ - `openai_sdk_example.py` - OpenAI SDK usage with streaming support
292
+ - `anthropic_sdk_example.py` - Anthropic SDK usage with streaming support
293
+ - `curl_examples.sh` - Command-line examples for both APIs
294
+
295
+ ## Streaming Response Format
296
+
297
+ ### OpenAI Streaming (SSE)
298
+
299
+ ```
300
+ data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1677858242,"model":"openelm-450m-instruct","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
301
+
302
+ data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1677858242,"model":"openelm-450m-instruct","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
303
+
304
+ data: [DONE]
305
+ ```
306
+
307
+ ### Anthropic Streaming (SSE)
308
+
309
+ ```
310
+ event: message_start
311
+ data: {"id":"msg_abc123","type":"message","role":"assistant","content":[],"model":"openelm-450m-instruct"}
312
+
313
+ event: content_block_start
314
+ data: {"type":"text","text":""}
315
+
316
+ event: content_block_delta
317
+ data: {"type":"text_delta","text":"Hello"}
318
+
319
+ event: content_block_stop
320
+ data: {}
321
+
322
+ event: message_delta
323
+ data: {"delta":{"stop_reason":"end_turn"},"usage":{"input_tokens":10,"output_tokens":5}}
324
+
325
+ event: message_stop
326
+ data: {}
327
+ ```
328
 
329
  ## Troubleshooting
330
 
331
+ - **Model not loading**: Check internet connection for HuggingFace download, ensure sufficient disk space for model cache
332
+ - **Out of memory**: Reduce max_tokens, use smaller context windows, or switch to CPU inference by removing GPU-specific settings
333
+ - **Slow responses**: First request downloads model from HuggingFace (subsequent requests use cached model and are much faster)
334
+ - **Port conflicts**: Change PORT environment variable to use a different port
335
+ - **Streaming not working**: Ensure you're using the correct endpoint (with stream=true for OpenAI) and proper SSE parsing
336
+ - **Format errors**: Verify your request matches the expected format for the API you're using (OpenAI vs Anthropic have different schemas)
337
+
338
+ ## Migration Guide
339
+
340
+ ### Migrating from OpenAI to OpenELM
341
+
342
+ If you're currently using OpenAI's API and want to switch to OpenELM, the migration is straightforward. Simply change the base_url to point to your local OpenELM API server and update the model name. All other parameters and response handling remain the same, making it easy to toggle between providers for testing or A/B comparisons.
343
+
344
+ ### Migrating from Anthropic to OpenELM
345
+
346
+ Similarly, if you're using Anthropic's API, you can migrate by updating the base_url and model name. The message format is similar, though you may need to adjust how you handle system prompts since OpenAI uses inline system messages while Anthropic uses a separate system parameter.
347
 
348
  ## License
349
 
 
352
  ## Resources
353
 
354
  - [OpenELM Model Card](https://huggingface.co/apple/OpenELM-450M-Instruct)
355
+ - [OpenAI API Documentation](https://platform.openai.com/docs/api-reference)
356
  - [Anthropic API Documentation](https://docs.anthropic.com)
357
  - [FastAPI Documentation](https://fastapi.tiangolo.com)
358
  - [HuggingFace Transformers](https://huggingface.co/docs/transformers)
359
+ - [Apple OpenELM Research Paper](https://machinelearning.apple.com/research/openelm)
app.py CHANGED
@@ -1,8 +1,13 @@
1
  """
2
- OpenELM Anthropic API Compatible Wrapper
3
 
4
- This FastAPI application provides an Anthropic-compatible API for the OpenELM model,
5
- allowing users to call OpenELM models using the Anthropic SDK with minimal code changes.
 
 
 
 
 
6
  """
7
 
8
  import asyncio
@@ -80,9 +85,9 @@ async def lifespan(app: FastAPI) -> AsyncIterator:
80
 
81
  # Create FastAPI app
82
  app = FastAPI(
83
- title="OpenELM Anthropic API",
84
- description="Anthropic API compatible wrapper for OpenELM models",
85
- version="1.0.0",
86
  lifespan=lifespan
87
  )
88
 
@@ -115,6 +120,7 @@ class Usage(BaseModel):
115
  """Token usage statistics."""
116
  input_tokens: int = 0
117
  output_tokens: int = 0
 
118
 
119
 
120
  class ContentBlock(BaseModel):
@@ -162,6 +168,88 @@ class ModelListResponse(BaseModel):
162
  data: List[ModelInfo]
163
 
164
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  # ==================== Helper Functions ====================
166
 
167
  def format_prompt_for_openelm(
@@ -283,12 +371,14 @@ def map_anthropic_params_to_transformers(
283
  async def root():
284
  """Root endpoint with API information."""
285
  return {
286
- "name": "OpenELM Anthropic API",
287
- "version": "1.0.0",
288
- "description": "Anthropic API compatible wrapper for OpenELM models",
289
  "endpoints": {
290
- "messages": "POST /v1/messages",
291
- "models": "GET /v1/models",
 
 
292
  "health": "GET /health"
293
  }
294
  }
@@ -643,6 +733,380 @@ class MessageResource:
643
  return response.json()
644
 
645
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
646
  # ==================== Main Entry Point ====================
647
 
648
  if __name__ == "__main__":
 
1
  """
2
+ OpenELM OpenAI & Anthropic API Compatible Wrapper
3
 
4
+ This FastAPI application provides both OpenAI and Anthropic-compatible APIs for the OpenELM model,
5
+ allowing users to call OpenELM models using either SDK with minimal code changes.
6
+
7
+ Supported APIs:
8
+ - OpenAI Chat Completions API (v1/chat/completions)
9
+ - Anthropic Messages API (v1/messages)
10
+ - Both support streaming and non-streaming responses
11
  """
12
 
13
  import asyncio
 
85
 
86
  # Create FastAPI app
87
  app = FastAPI(
88
+ title="OpenELM OpenAI API",
89
+ description="OpenAI and Anthropic API compatible wrapper for OpenELM models",
90
+ version="1.1.0",
91
  lifespan=lifespan
92
  )
93
 
 
120
  """Token usage statistics."""
121
  input_tokens: int = 0
122
  output_tokens: int = 0
123
+ total_tokens: int = 0
124
 
125
 
126
  class ContentBlock(BaseModel):
 
168
  data: List[ModelInfo]
169
 
170
 
171
+ # ==================== OpenAI API Models ====================
172
+
173
+ class ChatMessage(BaseModel):
174
+ """A chat message (OpenAI format)."""
175
+ role: str
176
+ content: str
177
+ name: Optional[str] = None
178
+
179
+
180
+ class ChatCompletionRequest(BaseModel):
181
+ """Chat completion request (OpenAI API compatible)."""
182
+ model: str = "openelm-450m-instruct"
183
+ messages: List[ChatMessage]
184
+ temperature: Optional[float] = Field(default=None, ge=0.0, le=2.0)
185
+ top_p: Optional[float] = Field(default=None, ge=0.0, le=1.0)
186
+ n: Optional[int] = Field(default=1, ge=1)
187
+ max_tokens: Optional[int] = Field(default=None, ge=1, le=4096)
188
+ stream: Optional[bool] = False
189
+ presence_penalty: Optional[float] = Field(default=None, ge=-2.0, le=2.0)
190
+ frequency_penalty: Optional[float] = Field(default=None, ge=-2.0, le=2.0)
191
+ logit_bias: Optional[Dict[str, float]] = None
192
+ user: Optional[str] = None
193
+
194
+
195
+ class ChatCompletionChoice(BaseModel):
196
+ """Choice in a chat completion response."""
197
+ index: int
198
+ message: ChatMessage
199
+ finish_reason: Optional[str] = None
200
+ logprobs: Optional[Any] = None
201
+
202
+
203
+ class ChatCompletionUsage(BaseModel):
204
+ """Token usage in chat completion."""
205
+ prompt_tokens: int
206
+ completion_tokens: int
207
+ total_tokens: int
208
+
209
+
210
+ class ChatCompletionResponse(BaseModel):
211
+ """Chat completion response (OpenAI API compatible)."""
212
+ id: str
213
+ object: str = "chat.completion"
214
+ created: int
215
+ model: str
216
+ choices: List[ChatCompletionChoice]
217
+ usage: ChatCompletionUsage
218
+ system_fingerprint: Optional[str] = None
219
+
220
+
221
+ class ChatCompletionChunkChoice(BaseModel):
222
+ """Choice in a streaming chunk."""
223
+ index: int
224
+ delta: Dict[str, Any]
225
+ finish_reason: Optional[str] = None
226
+ logprobs: Optional[Any] = None
227
+
228
+
229
+ class ChatCompletionChunk(BaseModel):
230
+ """Streaming chunk (OpenAI API compatible)."""
231
+ id: str
232
+ object: str = "chat.completion.chunk"
233
+ created: int
234
+ model: str
235
+ choices: List[ChatCompletionChunkChoice]
236
+
237
+
238
+ class OpenAIModelInfo(BaseModel):
239
+ """Model information (OpenAI format)."""
240
+ id: str
241
+ object: str = "model"
242
+ created: int = 0
243
+ owned_by: str = "openelm"
244
+ permission: List[Any] = []
245
+
246
+
247
+ class OpenAIModelListResponse(BaseModel):
248
+ """Model list response (OpenAI format)."""
249
+ object: str = "list"
250
+ data: List[OpenAIModelInfo]
251
+
252
+
253
  # ==================== Helper Functions ====================
254
 
255
  def format_prompt_for_openelm(
 
371
  async def root():
372
  """Root endpoint with API information."""
373
  return {
374
+ "name": "OpenELM OpenAI API",
375
+ "version": "1.1.0",
376
+ "description": "OpenAI and Anthropic API compatible wrapper for OpenELM models",
377
  "endpoints": {
378
+ "openai_chat": "POST /v1/chat/completions",
379
+ "openai_models": "GET /v1/models",
380
+ "anthropic_messages": "POST /v1/messages",
381
+ "anthropic_models": "GET /v1/models",
382
  "health": "GET /health"
383
  }
384
  }
 
733
  return response.json()
734
 
735
 
736
+ # ==================== OpenAI API Endpoints ====================
737
+
738
+ @app.get("/v1/models", response_model=OpenAIModelListResponse, tags=["OpenAI"])
739
+ async def list_openai_models():
740
+ """List available models (OpenAI API format)."""
741
+ return OpenAIModelListResponse(
742
+ data=[
743
+ OpenAIModelInfo(
744
+ id="openelm-450m-instruct",
745
+ owned_by="apple",
746
+ created=int(uuid.uuid1().time)
747
+ )
748
+ ]
749
+ )
750
+
751
+
752
+ @app.post("/v1/chat/completions", tags=["OpenAI"])
753
+ async def create_chat_completion(
754
+ request: ChatCompletionRequest,
755
+ raw_request: Request = None
756
+ ):
757
+ """
758
+ Create chat completion (OpenAI API compatible).
759
+
760
+ This endpoint accepts OpenAI-style chat completion requests and returns
761
+ responses in the same format, allowing existing code to work with OpenELM.
762
+ """
763
+ # Check if model is loaded
764
+ if model is None or tokenizer is None:
765
+ raise HTTPException(
766
+ status_code=503,
767
+ detail="Model not loaded. Please wait for model to initialize."
768
+ )
769
+
770
+ # Handle streaming
771
+ if request.stream:
772
+ return await create_chat_completion_stream(request)
773
+
774
+ try:
775
+ # Extract system message if present
776
+ system_message = None
777
+ formatted_messages = []
778
+
779
+ for msg in request.messages:
780
+ if msg.role == "system" and system_message is None:
781
+ system_message = msg.content
782
+ else:
783
+ formatted_messages.append(Message(
784
+ role=msg.role,
785
+ content=msg.content
786
+ ))
787
+
788
+ # Format prompt for OpenELM
789
+ prompt = format_prompt_for_openelm(formatted_messages, system_message)
790
+
791
+ # Calculate max_tokens
792
+ max_tokens = request.max_tokens or 1024
793
+ max_context_tokens = 2048 - max_tokens
794
+ prompt = truncate_prompt(prompt, max_context_tokens, system_message)
795
+
796
+ # Tokenize input
797
+ inputs = tokenizer(prompt, return_tensors="pt")
798
+ input_tokens = len(inputs.input_ids[0])
799
+
800
+ # Move to same device as model
801
+ if hasattr(model, 'device'):
802
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
803
+
804
+ # Map parameters
805
+ gen_params = map_anthropic_params_to_transformers(
806
+ request.temperature,
807
+ request.top_p,
808
+ None,
809
+ max_tokens
810
+ )
811
+
812
+ # Generate
813
+ with torch.no_grad():
814
+ outputs = model.generate(
815
+ **inputs,
816
+ **gen_params,
817
+ pad_token_id=tokenizer.eos_token_id,
818
+ eos_token_id=tokenizer.eos_token_id,
819
+ )
820
+
821
+ # Decode output
822
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
823
+
824
+ # Extract the assistant's response
825
+ response_text = extract_assistant_response(generated_text)
826
+ output_tokens = count_tokens(response_text)
827
+
828
+ # Build response matching OpenAI format
829
+ response_id = f"chatcmpl-{uuid.uuid4().hex[:12]}"
830
+ timestamp = int(uuid.uuid1().time)
831
+
832
+ return ChatCompletionResponse(
833
+ id=response_id,
834
+ created=timestamp,
835
+ model="openelm-450m-instruct",
836
+ choices=[
837
+ ChatCompletionChoice(
838
+ index=0,
839
+ message=ChatMessage(role="assistant", content=response_text),
840
+ finish_reason="stop"
841
+ )
842
+ ],
843
+ usage=ChatCompletionUsage(
844
+ prompt_tokens=input_tokens,
845
+ completion_tokens=output_tokens,
846
+ total_tokens=input_tokens + output_tokens
847
+ )
848
+ )
849
+
850
+ except Exception as e:
851
+ raise HTTPException(
852
+ status_code=500,
853
+ detail=f"Generation failed: {str(e)}"
854
+ )
855
+
856
+
857
+ async def create_chat_completion_stream(request: ChatCompletionRequest):
858
+ """Create streaming chat completion (OpenAI API compatible)."""
859
+
860
+ async def generate_stream():
861
+ """Generate streaming response in OpenAI format."""
862
+ try:
863
+ # Extract system message if present
864
+ system_message = None
865
+ formatted_messages = []
866
+
867
+ for msg in request.messages:
868
+ if msg.role == "system" and system_message is None:
869
+ system_message = msg.content
870
+ else:
871
+ formatted_messages.append(Message(
872
+ role=msg.role,
873
+ content=msg.content
874
+ ))
875
+
876
+ # Format prompt for OpenELM
877
+ prompt = format_prompt_for_openelm(formatted_messages, system_message)
878
+
879
+ # Calculate max_tokens
880
+ max_tokens = request.max_tokens or 1024
881
+ max_context_tokens = 2048 - max_tokens
882
+ prompt = truncate_prompt(prompt, max_context_tokens, system_message)
883
+
884
+ # Tokenize
885
+ inputs = tokenizer(prompt, return_tensors="pt")
886
+ input_tokens = len(inputs.input_ids[0])
887
+
888
+ # Move to same device as model
889
+ if hasattr(model, 'device'):
890
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
891
+
892
+ # Map parameters
893
+ gen_params = map_anthropic_params_to_transformers(
894
+ request.temperature,
895
+ request.top_p,
896
+ None,
897
+ max_tokens
898
+ )
899
+
900
+ # Set up streaming
901
+ gen_params["stopping_criteria"] = []
902
+
903
+ # Use TextIteratorStreamer for streaming
904
+ streamer = TextIteratorStreamer(
905
+ tokenizer,
906
+ skip_prompt=True,
907
+ skip_special_tokens=True
908
+ )
909
+
910
+ gen_params["streamer"] = streamer
911
+
912
+ # Run generation in a separate thread
913
+ def generate():
914
+ with torch.no_grad():
915
+ model.generate(**inputs, **gen_params)
916
+
917
+ thread = Thread(target=generate)
918
+ thread.start()
919
+
920
+ # Send streaming chunks in OpenAI format
921
+ chunk_id = f"chatcmpl-{uuid.uuid4().hex[:12]}"
922
+ timestamp = int(uuid.uuid1().time)
923
+
924
+ # Send role first
925
+ yield f"data: {{\"id\":\"{chunk_id}\",\"object\":\"chat.completion.chunk\",\"created\":{timestamp},\"model\":\"openelm-450m-instruct\",\"choices\":[{{\"index\":0,\"delta\":{{\"role\":\"assistant\"}},\"finish_reason\":null}}]}}\n\n"
926
+
927
+ # Stream the generated text
928
+ full_text = ""
929
+ for text in streamer:
930
+ full_text += text
931
+ chunk_data = {
932
+ "id": chunk_id,
933
+ "object": "chat.completion.chunk",
934
+ "created": timestamp,
935
+ "model": "openelm-450m-instruct",
936
+ "choices": [
937
+ {
938
+ "index": 0,
939
+ "delta": {"content": text},
940
+ "finish_reason": None
941
+ }
942
+ ]
943
+ }
944
+ yield f"data: {chunk_data}\n\n"
945
+
946
+ # Send stop chunk
947
+ output_tokens = count_tokens(full_text)
948
+ stop_chunk = {
949
+ "id": chunk_id,
950
+ "object": "chat.completion.chunk",
951
+ "created": timestamp,
952
+ "model": "openelm-450m-instruct",
953
+ "choices": [
954
+ {
955
+ "index": 0,
956
+ "delta": {},
957
+ "finish_reason": "stop"
958
+ }
959
+ ]
960
+ }
961
+ yield f"data: {stop_chunk}\n\n"
962
+
963
+ # Send usage data (OpenAI format)
964
+ usage_data = {
965
+ "id": chunk_id,
966
+ "object": "chat.completion",
967
+ "created": timestamp,
968
+ "model": "openelm-450m-instruct",
969
+ "choices": [
970
+ {
971
+ "index": 0,
972
+ "message": {"role": "assistant", "content": full_text},
973
+ "finish_reason": "stop"
974
+ }
975
+ ],
976
+ "usage": {
977
+ "prompt_tokens": input_tokens,
978
+ "completion_tokens": output_tokens,
979
+ "total_tokens": input_tokens + output_tokens
980
+ }
981
+ }
982
+ yield f"data: {usage_data}\n\n"
983
+
984
+ # Signal end of stream
985
+ yield "data: [DONE]\n\n"
986
+
987
+ thread.join()
988
+
989
+ except Exception as e:
990
+ yield f"data: {{\"error\": {{\"message\": \"{str(e)}\", \"type\": \"server_error\"}}}}\n\n"
991
+
992
+ return StreamingResponse(
993
+ generate_stream(),
994
+ media_type="text/event-stream",
995
+ headers={
996
+ "Cache-Control": "no-cache",
997
+ "Connection": "keep-alive",
998
+ "X-Accel-Buffering": "no",
999
+ }
1000
+ )
1001
+
1002
+
1003
+ def extract_assistant_response(generated_text: str) -> str:
1004
+ """Extract assistant response from generated text."""
1005
+ response_text = generated_text
1006
+
1007
+ if "Assistant:" in generated_text:
1008
+ response_text = generated_text.split("Assistant:")[-1].strip()
1009
+ elif ":" in generated_text:
1010
+ # Find the last role and extract content after it
1011
+ lines = generated_text.split("\n")
1012
+ in_assistant = False
1013
+ response_parts = []
1014
+ for line in lines:
1015
+ if line.startswith("Assistant:"):
1016
+ in_assistant = True
1017
+ response_parts.append(line.replace("Assistant:", "").strip())
1018
+ elif in_assistant and not line.startswith("User:") and not line.startswith("System:"):
1019
+ response_parts.append(line)
1020
+ elif line.startswith("User:") or line.startswith("System:"):
1021
+ in_assistant = False
1022
+ response_text = "\n".join(response_parts).strip()
1023
+
1024
+ return response_text
1025
+
1026
+
1027
+ # ==================== OpenAI SDK Compatibility ====================
1028
+
1029
+ class OpenAIClient:
1030
+ """
1031
+ Simple OpenAI SDK compatible client for testing.
1032
+
1033
+ Usage:
1034
+ client = OpenAIClient(base_url="http://localhost:8000/v1", api_key="dummy")
1035
+ response = client.chat.completions.create(
1036
+ model="openelm-450m-instruct",
1037
+ messages=[{"role": "user", "content": "Hello!"}],
1038
+ max_tokens=100
1039
+ )
1040
+ """
1041
+
1042
+ def __init__(self, base_url: str = "http://localhost:8000", api_key: str = "dummy"):
1043
+ self.base_url = base_url.rstrip("/")
1044
+ self.api_key = api_key
1045
+ self.session = None
1046
+
1047
+ def _get_session(self):
1048
+ """Get or create a requests session."""
1049
+ import requests
1050
+ if self.session is None:
1051
+ self.session = requests.Session()
1052
+ self.session.headers.update({
1053
+ "Authorization": f"Bearer {self.api_key}",
1054
+ "Content-Type": "application/json"
1055
+ })
1056
+ return self.session
1057
+
1058
+ @property
1059
+ def chat(self) -> "ChatResource":
1060
+ """Access chat operations."""
1061
+ return ChatResource(self)
1062
+
1063
+
1064
+ class ChatResource:
1065
+ """Resource for chat completion operations."""
1066
+
1067
+ def __init__(self, client: OpenAIClient):
1068
+ self.client = client
1069
+
1070
+ def create(
1071
+ self,
1072
+ model: str,
1073
+ messages: List[Dict[str, str]],
1074
+ temperature: Optional[float] = None,
1075
+ top_p: Optional[float] = None,
1076
+ max_tokens: Optional[int] = None,
1077
+ stream: bool = False,
1078
+ **kwargs
1079
+ ) -> Dict[str, Any]:
1080
+ """Create chat completion."""
1081
+ import requests
1082
+
1083
+ url = f"{self.client.base_url}/v1/chat/completions"
1084
+
1085
+ payload = {
1086
+ "model": model,
1087
+ "messages": messages,
1088
+ }
1089
+
1090
+ if temperature is not None:
1091
+ payload["temperature"] = temperature
1092
+ if top_p is not None:
1093
+ payload["top_p"] = top_p
1094
+ if max_tokens is not None:
1095
+ payload["max_tokens"] = max_tokens
1096
+ if stream:
1097
+ payload["stream"] = True
1098
+
1099
+ # Add any extra kwargs
1100
+ payload.update({k: v for k, v in kwargs.items() if k not in ['stream']})
1101
+
1102
+ response = self.client._get_session().post(url, json=payload)
1103
+
1104
+ if response.status_code != 200:
1105
+ raise Exception(f"API request failed: {response.text}")
1106
+
1107
+ return response.json()
1108
+
1109
+
1110
  # ==================== Main Entry Point ====================
1111
 
1112
  if __name__ == "__main__":
examples/curl_examples.sh CHANGED
@@ -1,8 +1,8 @@
1
  #!/bin/bash
2
- # OpenELM Anthropic API - Curl Examples
3
  #
4
- # This script demonstrates how to call the OpenELM Anthropic API
5
- # using curl commands directly.
6
  #
7
  # Usage:
8
  # chmod +x examples/curl_examples.sh
@@ -13,7 +13,7 @@ API_URL="${OPENELM_API_URL:-http://localhost:8000}"
13
  API_URL="${API_URL%/}" # Remove trailing slash
14
 
15
  echo "=============================================="
16
- echo "OpenELM Anthropic API - Curl Examples"
17
  echo "=============================================="
18
  echo "API URL: $API_URL"
19
  echo ""
@@ -24,16 +24,25 @@ echo "------------------------"
24
  curl -s "$API_URL/health" | python3 -m json.tool
25
  echo ""
26
 
27
- # Example 2: List Available Models
28
- echo "Example 2: List Available Models"
29
- echo "---------------------------------"
 
 
 
 
 
 
 
 
 
30
  curl -s "$API_URL/v1/models" | python3 -m json.tool
31
  echo ""
32
 
33
- # Example 3: Basic Message Generation
34
- echo "Example 3: Basic Message Generation"
35
- echo "------------------------------------"
36
- curl -s -X POST "$API_URL/v1/messages" \
37
  -H "Content-Type: application/json" \
38
  -d '{
39
  "model": "openelm-450m-instruct",
@@ -48,10 +57,10 @@ curl -s -X POST "$API_URL/v1/messages" \
48
  }' | python3 -m json.tool
49
  echo ""
50
 
51
- # Example 4: Multi-turn Conversation
52
- echo "Example 4: Multi-turn Conversation"
53
- echo "-----------------------------------"
54
- curl -s -X POST "$API_URL/v1/messages" \
55
  -H "Content-Type: application/json" \
56
  -d '{
57
  "model": "openelm-450m-instruct",
@@ -62,7 +71,7 @@ curl -s -X POST "$API_URL/v1/messages" \
62
  },
63
  {
64
  "role": "assistant",
65
- "content": "Python is a high-level programming language known for its simplicity and readability."
66
  },
67
  {
68
  "role": "user",
@@ -74,36 +83,39 @@ curl -s -X POST "$API_URL/v1/messages" \
74
  }' | python3 -m json.tool
75
  echo ""
76
 
77
- # Example 5: Using System Prompt
78
- echo "Example 5: Using System Prompt"
79
- echo "-------------------------------"
80
- curl -s -X POST "$API_URL/v1/messages" \
81
  -H "Content-Type: application/json" \
82
  -d '{
83
  "model": "openelm-450m-instruct",
84
  "messages": [
 
 
 
 
85
  {
86
  "role": "user",
87
- "content": "Explain the concept simply."
88
  }
89
  ],
90
- "system": "You are a helpful tutor who explains things simply.",
91
  "max_tokens": 200,
92
  "temperature": 0.8
93
  }' | python3 -m json.tool
94
  echo ""
95
 
96
- # Example 6: Deterministic Generation (temperature=0)
97
- echo "Example 6: Deterministic Generation"
98
- echo "------------------------------------"
99
- curl -s -X POST "$API_URL/v1/messages" \
100
  -H "Content-Type: application/json" \
101
  -d '{
102
  "model": "openelm-450m-instruct",
103
  "messages": [
104
  {
105
  "role": "user",
106
- "content": "What is the capital of France?"
107
  }
108
  ],
109
  "max_tokens": 50,
@@ -111,6 +123,106 @@ curl -s -X POST "$API_URL/v1/messages" \
111
  }' | python3 -m json.tool
112
  echo ""
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  echo "=============================================="
115
  echo "All curl examples completed!"
116
  echo "=============================================="
 
1
  #!/bin/bash
2
+ # OpenELM OpenAI & Anthropic API - Curl Examples
3
  #
4
+ # This script demonstrates how to call the OpenELM API using both
5
+ # OpenAI and Anthropic compatible endpoints with curl commands.
6
  #
7
  # Usage:
8
  # chmod +x examples/curl_examples.sh
 
13
  API_URL="${API_URL%/}" # Remove trailing slash
14
 
15
  echo "=============================================="
16
+ echo "OpenELM OpenAI & Anthropic API - Curl Examples"
17
  echo "=============================================="
18
  echo "API URL: $API_URL"
19
  echo ""
 
24
  curl -s "$API_URL/health" | python3 -m json.tool
25
  echo ""
26
 
27
+ # ============================================
28
+ # OpenAI API Examples
29
+ # ============================================
30
+
31
+ echo "##########################################"
32
+ echo "# OpenAI API Examples #"
33
+ echo "##########################################"
34
+ echo ""
35
+
36
+ # Example 2: OpenAI - List Available Models
37
+ echo "Example 2: OpenAI - List Available Models"
38
+ echo "-------------------------------------------"
39
  curl -s "$API_URL/v1/models" | python3 -m json.tool
40
  echo ""
41
 
42
+ # Example 3: OpenAI - Basic Chat Completion
43
+ echo "Example 3: OpenAI - Basic Chat Completion"
44
+ echo "--------------------------------------------"
45
+ curl -s -X POST "$API_URL/v1/chat/completions" \
46
  -H "Content-Type: application/json" \
47
  -d '{
48
  "model": "openelm-450m-instruct",
 
57
  }' | python3 -m json.tool
58
  echo ""
59
 
60
+ # Example 4: OpenAI - Multi-turn Conversation
61
+ echo "Example 4: OpenAI - Multi-turn Conversation"
62
+ echo "--------------------------------------------"
63
+ curl -s -X POST "$API_URL/v1/chat/completions" \
64
  -H "Content-Type: application/json" \
65
  -d '{
66
  "model": "openelm-450m-instruct",
 
71
  },
72
  {
73
  "role": "assistant",
74
+ "content": "Python is a high-level programming language."
75
  },
76
  {
77
  "role": "user",
 
83
  }' | python3 -m json.tool
84
  echo ""
85
 
86
+ # Example 5: OpenAI - Using System Message
87
+ echo "Example 5: OpenAI - Using System Message"
88
+ echo "------------------------------------------"
89
+ curl -s -X POST "$API_URL/v1/chat/completions" \
90
  -H "Content-Type: application/json" \
91
  -d '{
92
  "model": "openelm-450m-instruct",
93
  "messages": [
94
+ {
95
+ "role": "system",
96
+ "content": "You are a helpful coding assistant."
97
+ },
98
  {
99
  "role": "user",
100
+ "content": "What is a decorator?"
101
  }
102
  ],
 
103
  "max_tokens": 200,
104
  "temperature": 0.8
105
  }' | python3 -m json.tool
106
  echo ""
107
 
108
+ # Example 6: OpenAI - Deterministic Generation
109
+ echo "Example 6: OpenAI - Deterministic Generation"
110
+ echo "----------------------------------------------"
111
+ curl -s -X POST "$API_URL/v1/chat/completions" \
112
  -H "Content-Type: application/json" \
113
  -d '{
114
  "model": "openelm-450m-instruct",
115
  "messages": [
116
  {
117
  "role": "user",
118
+ "content": "What is 2 + 2?"
119
  }
120
  ],
121
  "max_tokens": 50,
 
123
  }' | python3 -m json.tool
124
  echo ""
125
 
126
+ # Example 7: OpenAI - Streaming Response
127
+ echo "Example 7: OpenAI - Streaming Response"
128
+ echo "----------------------------------------"
129
+ echo "Streaming output:"
130
+ curl -s -X POST "$API_URL/v1/chat/completions" \
131
+ -H "Content-Type: application/json" \
132
+ -H "Accept: text/event-stream" \
133
+ -d '{
134
+ "model": "openelm-450m-instruct",
135
+ "messages": [
136
+ {
137
+ "role": "user",
138
+ "content": "Count to 3, one per line."
139
+ }
140
+ ],
141
+ "max_tokens": 100,
142
+ "temperature": 0.7,
143
+ "stream": true
144
+ }' | head -20
145
+ echo ""
146
+ echo ""
147
+
148
+ # ============================================
149
+ # Anthropic API Examples
150
+ # ============================================
151
+
152
+ echo "##########################################"
153
+ echo "# Anthropic API Examples #"
154
+ echo "##########################################"
155
+ echo ""
156
+
157
+ # Example 8: Anthropic - List Available Models
158
+ echo "Example 8: Anthropic - List Available Models"
159
+ echo "----------------------------------------------"
160
+ curl -s "$API_URL/v1/models" | python3 -m json.tool
161
+ echo ""
162
+
163
+ # Example 9: Anthropic - Basic Message Generation
164
+ echo "Example 9: Anthropic - Basic Message Generation"
165
+ echo "-------------------------------------------------"
166
+ curl -s -X POST "$API_URL/v1/messages" \
167
+ -H "Content-Type: application/json" \
168
+ -d '{
169
+ "model": "openelm-450m-instruct",
170
+ "messages": [
171
+ {
172
+ "role": "user",
173
+ "content": "Say hello in a friendly way!"
174
+ }
175
+ ],
176
+ "max_tokens": 100,
177
+ "temperature": 0.7
178
+ }' | python3 -m json.tool
179
+ echo ""
180
+
181
+ # Example 10: Anthropic - Multi-turn Conversation
182
+ echo "Example 10: Anthropic - Multi-turn Conversation"
183
+ echo "-------------------------------------------------"
184
+ curl -s -X POST "$API_URL/v1/messages" \
185
+ -H "Content-Type: application/json" \
186
+ -d '{
187
+ "model": "openelm-450m-instruct",
188
+ "messages": [
189
+ {
190
+ "role": "user",
191
+ "content": "What is AI?"
192
+ },
193
+ {
194
+ "role": "assistant",
195
+ "content": "AI stands for Artificial Intelligence."
196
+ },
197
+ {
198
+ "role": "user",
199
+ "content": "Tell me more."
200
+ }
201
+ ],
202
+ "max_tokens": 150,
203
+ "temperature": 0.5
204
+ }' | python3 -m json.tool
205
+ echo ""
206
+
207
+ # Example 11: Anthropic - Using System Prompt
208
+ echo "Example 11: Anthropic - Using System Prompt"
209
+ echo "----------------------------------------------"
210
+ curl -s -X POST "$API_URL/v1/messages" \
211
+ -H "Content-Type: application/json" \
212
+ -d '{
213
+ "model": "openelm-450m-instruct",
214
+ "messages": [
215
+ {
216
+ "role": "user",
217
+ "content": "Explain quantum computing."
218
+ }
219
+ ],
220
+ "system": "You are a science educator who explains complex topics simply.",
221
+ "max_tokens": 200,
222
+ "temperature": 0.8
223
+ }' | python3 -m json.tool
224
+ echo ""
225
+
226
  echo "=============================================="
227
  echo "All curl examples completed!"
228
  echo "=============================================="
examples/openai_sdk_example.py ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Example: Using OpenAI SDK with OpenELM API
3
+
4
+ This example demonstrates how to use the OpenAI SDK (or compatible client)
5
+ to call OpenELM models through our OpenAI API compatible wrapper.
6
+
7
+ Note: The official openai Python package requires the API server to have
8
+ proper authentication. For testing, use the included OpenAIClient helper.
9
+
10
+ Usage:
11
+ python examples/openai_sdk_example.py
12
+ """
13
+
14
+ import sys
15
+ import os
16
+
17
+ # Add parent directory to path for imports
18
+ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
19
+
20
+ from app import OpenAIClient
21
+
22
+
23
+ def main():
24
+ """Example usage of the OpenAI-compatible OpenELM API."""
25
+
26
+ # Create client pointing to our local API
27
+ base_url = os.environ.get("OPENELM_API_URL", "http://localhost:8000")
28
+ client = OpenAIClient(base_url=base_url, api_key="dummy-key")
29
+
30
+ print("=" * 60)
31
+ print("OpenELM OpenAI API - Usage Example")
32
+ print("=" * 60)
33
+ print(f"API URL: {base_url}")
34
+ print()
35
+
36
+ # Example 1: Basic chat completion
37
+ print("Example 1: Basic Chat Completion")
38
+ print("-" * 40)
39
+
40
+ response = client.chat.completions.create(
41
+ model="openelm-450m-instruct",
42
+ messages=[
43
+ {"role": "user", "content": "Say hello in a friendly way!"}
44
+ ],
45
+ max_tokens=100,
46
+ temperature=0.7
47
+ )
48
+
49
+ print(f"Response ID: {response['id']}")
50
+ print(f"Model: {response['model']}")
51
+ print(f"Content: {response['choices'][0]['message']['content']}")
52
+ print(f"Usage: {response['usage']}")
53
+ print()
54
+
55
+ # Example 2: Multi-turn conversation
56
+ print("Example 2: Multi-turn Conversation")
57
+ print("-" * 40)
58
+
59
+ response = client.chat.completions.create(
60
+ model="openelm-450m-instruct",
61
+ messages=[
62
+ {"role": "user", "content": "What is artificial intelligence?"},
63
+ {"role": "assistant", "content": "Artificial intelligence (AI) refers to systems that can perform tasks that typically require human intelligence."},
64
+ {"role": "user", "content": "What are some examples?"}
65
+ ],
66
+ max_tokens=150,
67
+ temperature=0.5
68
+ )
69
+
70
+ print(f"Content: {response['choices'][0]['message']['content']}")
71
+ print(f"Usage: {response['usage']}")
72
+ print()
73
+
74
+ # Example 3: Using system message
75
+ print("Example 3: Using System Message")
76
+ print("-" * 40)
77
+
78
+ response = client.chat.completions.create(
79
+ model="openelm-450m-instruct",
80
+ messages=[
81
+ {"role": "system", "content": "You are a helpful coding assistant."},
82
+ {"role": "user", "content": "What is a Python decorator?"}
83
+ ],
84
+ max_tokens=200,
85
+ temperature=0.8
86
+ )
87
+
88
+ print(f"Content: {response['choices'][0]['message']['content']}")
89
+ print(f"Usage: {response['usage']}")
90
+ print()
91
+
92
+ # Example 4: Deterministic generation (temperature=0)
93
+ print("Example 4: Deterministic Generation (temperature=0)")
94
+ print("-" * 40)
95
+
96
+ response = client.chat.completions.create(
97
+ model="openelm-450m-instruct",
98
+ messages=[
99
+ {"role": "user", "content": "What is 2 + 2?"}
100
+ ],
101
+ max_tokens=50,
102
+ temperature=0.0 # Deterministic output
103
+ )
104
+
105
+ print(f"Content: {response['choices'][0]['message']['content']}")
106
+ print(f"Usage: {response['usage']}")
107
+ print()
108
+
109
+ # Example 5: Streaming response
110
+ print("Example 5: Streaming Response")
111
+ print("-" * 40)
112
+ print("Streaming response:")
113
+
114
+ response = client.chat.completions.create(
115
+ model="openelm-450m-instruct",
116
+ messages=[
117
+ {"role": "user", "content": "Count to 5, one number per line."}
118
+ ],
119
+ max_tokens=100,
120
+ temperature=0.7,
121
+ stream=True
122
+ )
123
+
124
+ # For streaming, response is a generator
125
+ chunk_count = 0
126
+ for chunk in response:
127
+ if 'choices' in chunk and chunk['choices']:
128
+ delta = chunk['choices'][0].get('delta', {})
129
+ if 'content' in delta:
130
+ content = delta['content']
131
+ if content:
132
+ print(content, end="", flush=True)
133
+ chunk_count += 1
134
+ elif 'error' in chunk:
135
+ print(f"Error: {chunk['error']}")
136
+ break
137
+
138
+ print("\n")
139
+ print(f"Received {chunk_count} chunks")
140
+ print()
141
+
142
+ print("=" * 60)
143
+ print("All examples completed successfully!")
144
+ print("=" * 60)
145
+
146
+
147
+ if __name__ == "__main__":
148
+ main()