Epicure-Chem
A 300-dimensional skip-gram ingredient embedding over a 1,790-ingredient canonical vocabulary, trained exclusively on typed FlavorDB ingredient-compound metapath walks. Chem is the chemistry extreme of the three siblings: ii_repeat=0 so the skip-gram objective never sees a direct ingredient-ingredient walk. All context is compound-mediated through three families of walks: within-type H-C[x]-H, via-compound N-H-C[x]-H-N, and cross-type C[x]-H-N-H-C[y].
Companions in the family: epicure-cooc (recipe-context only) and epicure-core (blended).
Paper: Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings
Quick start
from epicure import Epicure
m = Epicure.from_pretrained("Kaikaku/epicure-chem")
m.neighbors("chicken", k=5)
# -> [('beef', 0.41), ('pork', 0.34), ('cream_of_chicken_soup', 0.31),
# ('buffalo_wing_sauce', 0.29), ('peanut', 0.28)]
m.slerp("corn", "cuisine:Latin_American", theta_deg=30, k=5)
# -> [('poblano_pepper', 0.53), ('corn_tortilla', 0.51), ('salsa', 0.50),
# ('queso_fresco', 0.49), ('chipotle_pepper', 0.49)]
m.closest_mode("miso", kind="factor", k=3)
What is in this repo
Same structure as the Cooc sibling. Chem-specific:
modes.json: 200 modes across 43 properties.factor_poles.npyshape: (87, 300).supervised_poles.json: 120 entries.
Reported numbers (this sibling)
From the paper:
- Isotropy: participation ratio
PR = 183.1, average pairwise cosine in the 0.10-0.12 band. Most isotropic of the three siblings. - Direction quality (5-fold CV Spearman rho): baked-in CF 0.46; held-out basic-taste CF 0.47; USDA macros 0.49. Cuisine Cohen's d mean 3.07 (highest of the three; leads on 8 of 8 macro-regions).
- Across all 27 continuous probes Chem beats Core on 26 and Cooc on 27. The chemistry-mediated walk schema sharpens linear directions most.
- Emergent modes: 200 modes / 43 properties. Mean within-mode coherence 0.703 against random-pair baseline 0.115 (margin 0.588).
When to pick Chem: you want the strongest supervised-direction recovery and the cleanest flavour-profile clustering. Chem's nearest-neighbour for chicken is beef (chemistry peer), and queries like basil retrieve tarragon, oregano, rosemary, pasta, fennel -- the Italian-herb chemistry cluster -- rather than the Cooc recipe-companion variant.
Operator semantics
Same as Cooc. See epicure-cooc for the full operator reference.
Honesty about cuisine pole reconstruction
See the epicure-cooc model card for the full discussion. Chem's cuisine SLERP results match paper-genre tightly because its chemistry-mediated walks cluster ingredients by aroma profile, which correlates strongly with regional cuisine.
Limitations and citation
See the paper Section 5.3. Note in particular the hub coverage limit for Chem: of the 1,790 ingredients only 523 are chemistry hubs with direct typed I-C edges. The remaining 1,267 non-hubs reach compound context only via the N-H-C[x]-H-N metapath. Their chemistry signal is one walk-hop further removed than the hubs'. Broader compound coverage (FooDB, USDA Food Patterns Equivalents) would shorten that chain.
@article{radzikowski2026epicure,
title = {Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings},
author = {Radzikowski, Jakub and Chen, Josef},
journal = {arXiv preprint arXiv:2605.22391},
year = {2026}
}
License: CC BY 4.0.
- Downloads last month
- 365