Epicure-Chem

A 300-dimensional skip-gram ingredient embedding over a 1,790-ingredient canonical vocabulary, trained exclusively on typed FlavorDB ingredient-compound metapath walks. Chem is the chemistry extreme of the three siblings: ii_repeat=0 so the skip-gram objective never sees a direct ingredient-ingredient walk. All context is compound-mediated through three families of walks: within-type H-C[x]-H, via-compound N-H-C[x]-H-N, and cross-type C[x]-H-N-H-C[y].

Companions in the family: epicure-cooc (recipe-context only) and epicure-core (blended).

Paper: Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings

Quick start

from epicure import Epicure

m = Epicure.from_pretrained("Kaikaku/epicure-chem")

m.neighbors("chicken", k=5)
# -> [('beef', 0.41), ('pork', 0.34), ('cream_of_chicken_soup', 0.31),
#     ('buffalo_wing_sauce', 0.29), ('peanut', 0.28)]

m.slerp("corn", "cuisine:Latin_American", theta_deg=30, k=5)
# -> [('poblano_pepper', 0.53), ('corn_tortilla', 0.51), ('salsa', 0.50),
#     ('queso_fresco', 0.49), ('chipotle_pepper', 0.49)]

m.closest_mode("miso", kind="factor", k=3)

What is in this repo

Same structure as the Cooc sibling. Chem-specific:

  • modes.json: 200 modes across 43 properties.
  • factor_poles.npy shape: (87, 300).
  • supervised_poles.json: 120 entries.

Reported numbers (this sibling)

From the paper:

  • Isotropy: participation ratio PR = 183.1, average pairwise cosine in the 0.10-0.12 band. Most isotropic of the three siblings.
  • Direction quality (5-fold CV Spearman rho): baked-in CF 0.46; held-out basic-taste CF 0.47; USDA macros 0.49. Cuisine Cohen's d mean 3.07 (highest of the three; leads on 8 of 8 macro-regions).
  • Across all 27 continuous probes Chem beats Core on 26 and Cooc on 27. The chemistry-mediated walk schema sharpens linear directions most.
  • Emergent modes: 200 modes / 43 properties. Mean within-mode coherence 0.703 against random-pair baseline 0.115 (margin 0.588).

When to pick Chem: you want the strongest supervised-direction recovery and the cleanest flavour-profile clustering. Chem's nearest-neighbour for chicken is beef (chemistry peer), and queries like basil retrieve tarragon, oregano, rosemary, pasta, fennel -- the Italian-herb chemistry cluster -- rather than the Cooc recipe-companion variant.

Operator semantics

Same as Cooc. See epicure-cooc for the full operator reference.

Honesty about cuisine pole reconstruction

See the epicure-cooc model card for the full discussion. Chem's cuisine SLERP results match paper-genre tightly because its chemistry-mediated walks cluster ingredients by aroma profile, which correlates strongly with regional cuisine.

Limitations and citation

See the paper Section 5.3. Note in particular the hub coverage limit for Chem: of the 1,790 ingredients only 523 are chemistry hubs with direct typed I-C edges. The remaining 1,267 non-hubs reach compound context only via the N-H-C[x]-H-N metapath. Their chemistry signal is one walk-hop further removed than the hubs'. Broader compound coverage (FooDB, USDA Food Patterns Equivalents) would shorten that chain.

@article{radzikowski2026epicure,
  title   = {Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings},
  author  = {Radzikowski, Jakub and Chen, Josef},
  journal = {arXiv preprint arXiv:2605.22391},
  year    = {2026}
}

License: CC BY 4.0.

Downloads last month
365
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Kaikaku/epicure-chem

Spaces using Kaikaku/epicure-chem 4

Paper for Kaikaku/epicure-chem