Update README.md
Browse files
README.md
CHANGED
|
@@ -292,10 +292,16 @@ Our INT4 model is only optimized for batch size 1, so expect some slowdown with
|
|
| 292 |
## Results (A100 machine)
|
| 293 |
| Benchmark (Latency) | | |
|
| 294 |
|----------------------------------|----------------|----------------------------|
|
| 295 |
-
| | Phi-4 mini-Ins | phi4-mini-INT4
|
| 296 |
| latency (batch_size=1) | 2.46s | 2.2s (1.12x speedup) |
|
| 297 |
| serving (num_prompts=1) | 0.87 req/s | 1.05 req/s (1.20x speedup) |
|
| 298 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 299 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 300 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
| 301 |
<details>
|
|
|
|
| 292 |
## Results (A100 machine)
|
| 293 |
| Benchmark (Latency) | | |
|
| 294 |
|----------------------------------|----------------|----------------------------|
|
| 295 |
+
| | Phi-4 mini-Ins | phi4-mini-INT4 |
|
| 296 |
| latency (batch_size=1) | 2.46s | 2.2s (1.12x speedup) |
|
| 297 |
| serving (num_prompts=1) | 0.87 req/s | 1.05 req/s (1.20x speedup) |
|
| 298 |
|
| 299 |
+
## Results (H100 machine)
|
| 300 |
+
| Benchmark (Latency) | | |
|
| 301 |
+
|----------------------------------|----------------|----------------------------|
|
| 302 |
+
| | Phi-4 mini-Ins | phi4-mini-INT4 |
|
| 303 |
+
| latency (batch_size=1) | 1.61s | 1.08s (1.49x speedup) |
|
| 304 |
+
|
| 305 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 306 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
| 307 |
<details>
|