Spaces:
Runtime error
Runtime error
Commit
·
55bcb26
1
Parent(s):
5cc19e4
Update Space (evaluate main: 18932858)
Browse files- mauve.py +11 -5
- requirements.txt +1 -1
mauve.py
CHANGED
|
@@ -27,20 +27,26 @@ import evaluate
|
|
| 27 |
|
| 28 |
_CITATION = """\
|
| 29 |
@inproceedings{pillutla-etal:mauve:neurips2021,
|
| 30 |
-
title={MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers},
|
| 31 |
author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
|
| 32 |
booktitle = {NeurIPS},
|
| 33 |
year = {2021}
|
| 34 |
}
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
"""
|
| 37 |
|
| 38 |
_DESCRIPTION = """\
|
| 39 |
-
MAUVE is a
|
| 40 |
-
|
| 41 |
-
MAUVE summarizes both Type I and Type II errors measured softly using Kullback–Leibler (KL) divergences.
|
| 42 |
|
| 43 |
-
|
|
|
|
|
|
|
| 44 |
|
| 45 |
This metrics is a wrapper around the official implementation of MAUVE:
|
| 46 |
https://github.com/krishnap25/mauve
|
|
|
|
| 27 |
|
| 28 |
_CITATION = """\
|
| 29 |
@inproceedings{pillutla-etal:mauve:neurips2021,
|
| 30 |
+
title={{MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers}},
|
| 31 |
author={Pillutla, Krishna and Swayamdipta, Swabha and Zellers, Rowan and Thickstun, John and Welleck, Sean and Choi, Yejin and Harchaoui, Zaid},
|
| 32 |
booktitle = {NeurIPS},
|
| 33 |
year = {2021}
|
| 34 |
}
|
| 35 |
|
| 36 |
+
@article{pillutla-etal:mauve:arxiv2022,
|
| 37 |
+
title={{MAUVE Scores for Generative Models: Theory and Practice}},
|
| 38 |
+
author={Pillutla, Krishna and Liu, Lang and Thickstun, John and Welleck, Sean and Swayamdipta, Swabha and Zellers, Rowan and Oh, Sewoong and Choi, Yejin and Harchaoui, Zaid},
|
| 39 |
+
journal={arXiv Preprint},
|
| 40 |
+
year={2022}
|
| 41 |
+
}
|
| 42 |
"""
|
| 43 |
|
| 44 |
_DESCRIPTION = """\
|
| 45 |
+
MAUVE is a measure of the statistical gap between two text distributions, e.g., how far the text written by a model is the distribution of human text, using samples from both distributions.
|
|
|
|
|
|
|
| 46 |
|
| 47 |
+
MAUVE is obtained by computing Kullback–Leibler (KL) divergences between the two distributions in a quantized embedding space of a large language model.
|
| 48 |
+
It can quantify differences in the quality of generated text based on the size of the model, the decoding algorithm, and the length of the generated text.
|
| 49 |
+
MAUVE was found to correlate the strongest with human evaluations over baseline metrics for open-ended text generation.
|
| 50 |
|
| 51 |
This metrics is a wrapper around the official implementation of MAUVE:
|
| 52 |
https://github.com/krishnap25/mauve
|
requirements.txt
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
git+https://github.com/huggingface/evaluate@
|
| 2 |
faiss-cpu
|
| 3 |
scikit-learn
|
| 4 |
mauve-text
|
|
|
|
| 1 |
+
git+https://github.com/huggingface/evaluate@18932858570b9fa97ac478e1e6e709438e4d093b
|
| 2 |
faiss-cpu
|
| 3 |
scikit-learn
|
| 4 |
mauve-text
|