jerryzh168 commited on
Commit
0b668ec
·
verified ·
1 Parent(s): 1443e19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -292,10 +292,16 @@ Our INT4 model is only optimized for batch size 1, so expect some slowdown with
292
  ## Results (A100 machine)
293
  | Benchmark (Latency) | | |
294
  |----------------------------------|----------------|----------------------------|
295
- | | Phi-4 mini-Ins | phi4-mini-INT4 |
296
  | latency (batch_size=1) | 2.46s | 2.2s (1.12x speedup) |
297
  | serving (num_prompts=1) | 0.87 req/s | 1.05 req/s (1.20x speedup) |
298
 
 
 
 
 
 
 
299
  Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
300
  Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
301
  <details>
 
292
  ## Results (A100 machine)
293
  | Benchmark (Latency) | | |
294
  |----------------------------------|----------------|----------------------------|
295
+ | | Phi-4 mini-Ins | phi4-mini-INT4 |
296
  | latency (batch_size=1) | 2.46s | 2.2s (1.12x speedup) |
297
  | serving (num_prompts=1) | 0.87 req/s | 1.05 req/s (1.20x speedup) |
298
 
299
+ ## Results (H100 machine)
300
+ | Benchmark (Latency) | | |
301
+ |----------------------------------|----------------|----------------------------|
302
+ | | Phi-4 mini-Ins | phi4-mini-INT4 |
303
+ | latency (batch_size=1) | 1.61s | 1.08s (1.49x speedup) |
304
+
305
  Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
306
  Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
307
  <details>