Eric Xu commited on
Commit
76bda55
·
unverified ·
1 Parent(s): a4bb654

Add CoBRA-inspired bias auditing and calibration for expert panel fidelity

Browse files

Integrates research from CoBRA (arXiv:2509.13588, CHI'26 Best Paper) to
measure and calibrate cognitive biases in SGO's LLM evaluator pipeline,
closing the gap between simulated panels and real expert panels.

- Add bias_audit.py: runs framing, authority, and order-effect probes
through the same LLM+persona pipeline, comparing results to human
baselines (Tversky & Kahneman, Milgram)
- Add --bias-calibration flag to evaluate.py: appends bias-aware
instructions to reduce framing/authority/order artifacts
- Add research analysis doc connecting CoBRA methodology to SGO

docs/research/cobra_social_bias_analysis.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CoBRA × SGO: Using Social Bias Research to Close the Expert Panel Gap
2
+
3
+ **Paper**: [CoBRA: Programming Cognitive Bias in Social Agents Using Classic Social Science Experiments](https://arxiv.org/abs/2509.13588)
4
+ **Authors**: Xuan Liu, Haoyang Shang, Haojian Jin (CHI'26 Best Paper)
5
+ **Relevance**: High — directly addresses SGO's core challenge of making LLM-simulated panels behave like real human expert panels.
6
+
7
+ ---
8
+
9
+ ## 1. The Problem CoBRA Solves (and Why SGO Needs It)
10
+
11
+ SGO uses LLM agents role-playing census-grounded personas to evaluate entities.
12
+ The North Star is: **these simulated panels should behave like real expert panels.**
13
+
14
+ But LLM evaluators don't exhibit human cognitive biases at human-realistic levels.
15
+ They may be too rational (under-biased) or exhibit biases in wrong patterns (mis-biased).
16
+
17
+ CoBRA provides:
18
+ 1. **Cognitive Bias Index (CBI)** — quantitative measurement of bias in LLM agents using validated social science experiments
19
+ 2. **Behavioral Regulation Engine** — closed-loop calibration to set bias to target levels
20
+
21
+ This is exactly what SGO needs to validate and calibrate its evaluator panel.
22
+
23
+ ---
24
+
25
+ ## 2. Key Biases Relevant to SGO Evaluations
26
+
27
+ | Bias | CoBRA Support | SGO Impact | Example |
28
+ |------|--------------|------------|---------|
29
+ | **Framing Effect** | ✅ Asian Disease, Investment/Insurance | How entity is *written* (gain vs. loss framing) shifts scores beyond what content warrants | "Save 30% on ops costs" vs. "Reduce ops overhead" — same product, different scores |
30
+ | **Authority Bias** | ✅ Milgram, Stanford Prison | LLM evaluators may over/under-weight credibility signals | SOC2 badge, Y Combinator logo, "trusted by 10k teams" — do LLM personas react like real buyers? |
31
+ | **Bandwagon Effect** | ✅ Asch's Line, Hotel Towel | SGO uses independent evaluators, but real panels have social influence | Real focus groups exhibit herding; SGO's independence may be a feature *or* a fidelity gap |
32
+ | **Confirmation Bias** | ✅ Wason Selection | Once LLM forms initial impression from entity intro, does it seek confirming evidence? | An evaluator who sees "AI-powered" first may score differently than one who sees pricing first |
33
+ | **Anchoring** | Planned | Score anchoring from entity structure; first number seen (price, user count) biases everything | "$99/mo" appearing early may anchor all subsequent value judgments |
34
+
35
+ ---
36
+
37
+ ## 3. Concrete Integration Plan
38
+
39
+ ### Phase 1: Bias Audit (measure current state)
40
+
41
+ Run CoBRA-style experiments on SGO's evaluator personas to measure what biases they actually exhibit. This tells us *where SGO deviates from human panels*.
42
+
43
+ **Implementation**: `scripts/bias_audit.py` — runs classic social science experiments through the same LLM + persona pipeline SGO uses for evaluation.
44
+
45
+ Key experiments:
46
+ - **Framing probe**: Present the same entity with gain-framed vs. loss-framed language to the same persona. Measure score delta. Compare to known human framing effect (~30% shift in Tversky & Kahneman).
47
+ - **Authority probe**: Add/remove authority signals (certifications, endorsements, logos). Measure score sensitivity. Compare to human authority bias baselines.
48
+ - **Anchoring probe**: Vary the order of information in the entity (price first vs. last, high anchor vs. low anchor). Measure score shifts.
49
+ - **Order effect probe**: Present the same entity to the same persona but with sections reordered. Scores should be invariant; deviation = order bias.
50
+
51
+ ### Phase 2: Bias Calibration (align to human baselines)
52
+
53
+ Use CoBRA's Behavioral Regulation Engine approach to calibrate SGO evaluators.
54
+
55
+ Two strategies:
56
+
57
+ **A. Prompt-level calibration** (simplest, model-agnostic):
58
+ Add bias-aware instructions to the evaluation system prompt. Example:
59
+ ```
60
+ "Be aware that the framing of this entity may influence your assessment.
61
+ Evaluate the substance, not the presentation style. Your bias calibration
62
+ level for framing sensitivity: {calibrated_level}%."
63
+ ```
64
+
65
+ **B. Measurement-then-correct** (CoBRA's closed loop):
66
+ 1. Run bias audit on a cohort
67
+ 2. Identify which personas/demographics over-express or under-express specific biases
68
+ 3. Inject per-persona calibration coefficients into the evaluation prompt
69
+ 4. Re-run and verify convergence toward human baselines
70
+
71
+ ### Phase 3: Validation Against Real Panels
72
+
73
+ The ultimate test: compare SGO+calibration results against real expert panel data.
74
+
75
+ 1. Find domains where real panel data exists (product reviews, hiring decisions, VC evaluations)
76
+ 2. Run SGO on the same entities with the same demographics
77
+ 3. Compare bias patterns (not just average scores) — does the *shape* of the distribution match?
78
+ 4. Iterate calibration coefficients until SGO's bias profile matches human panels
79
+
80
+ ---
81
+
82
+ ## 4. What This Means for Expert Panel Fidelity
83
+
84
+ The gap between SGO and real expert panels has three components:
85
+
86
+ ```
87
+ Expert Panel Gap = Knowledge Gap + Preference Gap + Bias Gap
88
+ ```
89
+
90
+ - **Knowledge Gap**: Does the LLM know what an expert knows? (Addressed by persona enrichment)
91
+ - **Preference Gap**: Does it weight factors correctly? (Addressed by stratification + prompt design)
92
+ - **Bias Gap**: Does it exhibit human-realistic cognitive biases? (← CoBRA addresses THIS)
93
+
94
+ Most SGO work so far addresses the first two gaps. CoBRA-style bias calibration is the missing piece for the third.
95
+
96
+ Crucially, the goal is NOT to eliminate bias — real experts are biased. The goal is to match the *type and magnitude* of biases that real expert panels exhibit.
97
+
98
+ ---
99
+
100
+ ## 5. Practical Value
101
+
102
+ | Metric | Without Bias Calibration | With Bias Calibration |
103
+ |--------|-------------------------|----------------------|
104
+ | Framing sensitivity | Unknown, likely non-human | Measured, calibrated to ~30% (Tversky & Kahneman baseline) |
105
+ | Authority weight | LLM default (likely over-weighted) | Calibrated per-persona based on domain expertise |
106
+ | Score distribution shape | Narrow, symmetric (LLM tendency) | Wider, with realistic skew patterns |
107
+ | Cross-model consistency | Varies by model | Normalized via CBI measurement |
108
+ | Expert panel correlation | Unvalidated | Measurably closer to human baselines |
109
+
110
+ ---
111
+
112
+ ## 6. References
113
+
114
+ - Liu, X., Shang, H., & Jin, H. (2025). CoBRA: Programming Cognitive Bias in Social Agents Using Classic Social Science Experiments. *CHI'26 Best Paper*. [arXiv:2509.13588](https://arxiv.org/abs/2509.13588)
115
+ - [CoBRA GitHub](https://github.com/AISmithLab/CoBRA)
116
+ - Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. *Science*, 211(4481), 453-458.
117
+ - Milgram, S. (1963). Behavioral Study of Obedience. *JASP*, 67(4), 371-378.
118
+ - Asch, S. E. (1951). Effects of group pressure upon the modification and distortion of judgments.
scripts/bias_audit.py ADDED
@@ -0,0 +1,500 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Bias audit — measures cognitive biases in SGO's LLM evaluator pipeline.
3
+
4
+ Inspired by CoBRA (Liu et al., CHI'26 Best Paper, arXiv:2509.13588), this script
5
+ runs validated social science experiments through SGO's evaluation pipeline to
6
+ quantify how much bias the LLM evaluators exhibit.
7
+
8
+ This is the first step toward expert panel fidelity: you can't calibrate what
9
+ you can't measure.
10
+
11
+ Supported probes:
12
+ - framing: same entity, gain vs. loss framing → measures framing effect
13
+ - authority: entity with/without authority signals → measures authority bias
14
+ - order: same entity, sections reordered → measures anchoring/order effects
15
+
16
+ Usage:
17
+ uv run python scripts/bias_audit.py \
18
+ --entity entities/my_product.md \
19
+ --cohort data/cohort.json \
20
+ --probes framing authority order \
21
+ --sample 10 \
22
+ --parallel 5
23
+
24
+ Output: results/bias_audit/report.md + raw data
25
+ """
26
+
27
+ import json
28
+ import os
29
+ import re
30
+ import time
31
+ import argparse
32
+ import concurrent.futures
33
+ from collections import defaultdict
34
+ from datetime import datetime
35
+ from pathlib import Path
36
+
37
+ from dotenv import load_dotenv
38
+
39
+ PROJECT_ROOT = Path(__file__).resolve().parent.parent
40
+ load_dotenv(PROJECT_ROOT / ".env")
41
+
42
+ from openai import OpenAI
43
+
44
+
45
+ # ── Evaluation core (reused from evaluate.py) ────────────────────────────
46
+
47
+ SYSTEM_PROMPT = """You are an evaluation simulator. You will be given:
48
+ 1. A detailed persona — a person with specific values, needs, context, and perspective
49
+ 2. An entity to evaluate (a product, profile, proposal, pitch, resume, etc.)
50
+
51
+ Your job: fully inhabit this persona's perspective and evaluate the entity AS THEY WOULD.
52
+
53
+ Be honest and realistic. Not everything is a match. Consider:
54
+ - Their specific needs, budget, constraints, and priorities
55
+ - Whether this entity solves a real problem for them
56
+ - Trust signals and red flags from their perspective
57
+ - Practical fit with their situation
58
+ - What they'd compare this against
59
+
60
+ You MUST respond with valid JSON only."""
61
+
62
+ EVAL_PROMPT = """## Evaluator Persona
63
+
64
+ Name: {name}
65
+ Age: {age}
66
+ Location: {city}, {state}
67
+ Education: {education_level}
68
+ Occupation: {occupation}
69
+ Status: {marital_status}
70
+
71
+ {persona}
72
+
73
+ ---
74
+
75
+ ## Entity to Evaluate
76
+
77
+ {entity}
78
+
79
+ ---
80
+
81
+ ## Task
82
+
83
+ Inhabit {name}'s perspective completely. Evaluate this entity as they would.
84
+
85
+ Return JSON:
86
+ {{
87
+ "score": <1-10, where 1=strong reject, 5=ambivalent, 10=enthusiastic yes>,
88
+ "action": "<positive | neutral | negative>",
89
+ "attractions": ["<what works for them, max 3>"],
90
+ "concerns": ["<what gives them pause, max 3>"],
91
+ "dealbreakers": ["<hard no's if any, empty list if none>"],
92
+ "summary": "<1-2 sentences — how they'd describe this to a peer>",
93
+ "reasoning": "<2-3 sentence internal monologue>"
94
+ }}"""
95
+
96
+
97
+ def evaluate_one(client, model, evaluator, entity_text):
98
+ prompt = EVAL_PROMPT.format(
99
+ name=evaluator["name"],
100
+ age=evaluator.get("age", ""),
101
+ city=evaluator.get("city", ""),
102
+ state=evaluator.get("state", ""),
103
+ education_level=evaluator.get("education_level", ""),
104
+ occupation=evaluator.get("occupation", ""),
105
+ marital_status=evaluator.get("marital_status", ""),
106
+ persona=evaluator.get("persona", ""),
107
+ entity=entity_text,
108
+ )
109
+ try:
110
+ resp = client.chat.completions.create(
111
+ model=model,
112
+ messages=[
113
+ {"role": "system", "content": SYSTEM_PROMPT},
114
+ {"role": "user", "content": prompt},
115
+ ],
116
+ response_format={"type": "json_object"},
117
+ max_tokens=16384,
118
+ temperature=0.7,
119
+ )
120
+ content = resp.choices[0].message.content
121
+ if not content:
122
+ return {"error": "Empty response"}
123
+ content = re.sub(r'<think>[\s\S]*?</think>', '', content).strip()
124
+ result = json.loads(content)
125
+ result["_evaluator"] = {
126
+ "name": evaluator["name"],
127
+ "age": evaluator.get("age"),
128
+ "occupation": evaluator.get("occupation"),
129
+ }
130
+ return result
131
+ except Exception as e:
132
+ return {"error": str(e), "_evaluator": {"name": evaluator.get("name", "?")}}
133
+
134
+
135
+ # ── Bias probes ──────────────────────────────────────────────────────────
136
+
137
+ REFRAME_PROMPT = """You are a text transformation tool. Rewrite the following entity description
138
+ using {frame_type} framing. Keep ALL factual content identical — same features, same pricing,
139
+ same capabilities. Only change the rhetorical framing.
140
+
141
+ {frame_instruction}
142
+
143
+ Return the rewritten text only, no commentary.
144
+
145
+ ---
146
+
147
+ {entity}"""
148
+
149
+ FRAME_INSTRUCTIONS = {
150
+ "gain": "Emphasize what the user GAINS: benefits, improvements, positive outcomes. "
151
+ 'Use phrases like "save", "gain", "achieve", "unlock", "improve".',
152
+ "loss": "Emphasize what the user LOSES without this: risks, costs of inaction, missed opportunities. "
153
+ 'Use phrases like "avoid losing", "stop wasting", "don\'t miss", "risk of", "falling behind".',
154
+ }
155
+
156
+
157
+ def reframe_entity(client, model, entity_text, frame_type):
158
+ """Rewrite entity with gain or loss framing, preserving factual content."""
159
+ resp = client.chat.completions.create(
160
+ model=model,
161
+ messages=[{"role": "user", "content": REFRAME_PROMPT.format(
162
+ frame_type=frame_type,
163
+ frame_instruction=FRAME_INSTRUCTIONS[frame_type],
164
+ entity=entity_text,
165
+ )}],
166
+ max_tokens=16384,
167
+ temperature=0.3,
168
+ )
169
+ return resp.choices[0].message.content.strip()
170
+
171
+
172
+ AUTHORITY_SIGNALS = [
173
+ "Trusted by 10,000+ teams worldwide.",
174
+ "SOC 2 Type II certified. GDPR compliant.",
175
+ "Featured in TechCrunch, Wired, and Forbes.",
176
+ "Backed by Sequoia Capital and Y Combinator.",
177
+ "Winner of Product Hunt's Product of the Year.",
178
+ ]
179
+
180
+
181
+ def add_authority_signals(entity_text):
182
+ """Add authority/credibility signals to an entity."""
183
+ signals = "\n".join(f"- {s}" for s in AUTHORITY_SIGNALS)
184
+ return f"{entity_text}\n\n---\n\n### Trust & Recognition\n\n{signals}\n"
185
+
186
+
187
+ def reorder_entity(entity_text):
188
+ """Reverse the order of sections in the entity document."""
189
+ sections = re.split(r'\n(?=##?\s)', entity_text)
190
+ if len(sections) <= 1:
191
+ # Try splitting on blank lines if no headers
192
+ sections = re.split(r'\n\n+', entity_text)
193
+
194
+ if len(sections) <= 1:
195
+ return entity_text # Can't reorder a single section
196
+
197
+ # Keep first section (title/intro), reverse the rest
198
+ return sections[0] + "\n\n" + "\n\n".join(reversed(sections[1:]))
199
+
200
+
201
+ # ── Probe runners ────────────────────────────────────────────────────────
202
+
203
+ def run_paired_evaluation(client, model, evaluators, entity_a, entity_b, label_a, label_b, parallel):
204
+ """Run the same cohort against two entity variants and compute deltas."""
205
+ results = []
206
+
207
+ def worker(ev):
208
+ r_a = evaluate_one(client, model, ev, entity_a)
209
+ r_b = evaluate_one(client, model, ev, entity_b)
210
+ return {
211
+ "evaluator": ev["name"],
212
+ "age": ev.get("age"),
213
+ "occupation": ev.get("occupation"),
214
+ f"score_{label_a}": r_a.get("score"),
215
+ f"score_{label_b}": r_b.get("score"),
216
+ "delta": (r_b.get("score", 0) or 0) - (r_a.get("score", 0) or 0),
217
+ f"reasoning_{label_a}": r_a.get("reasoning", ""),
218
+ f"reasoning_{label_b}": r_b.get("reasoning", ""),
219
+ "error": r_a.get("error") or r_b.get("error"),
220
+ }
221
+
222
+ done = [0]
223
+ with concurrent.futures.ThreadPoolExecutor(max_workers=parallel) as pool:
224
+ futs = {pool.submit(worker, ev): ev for ev in evaluators}
225
+ for fut in concurrent.futures.as_completed(futs):
226
+ result = fut.result()
227
+ results.append(result)
228
+ done[0] += 1
229
+ if result.get("error"):
230
+ print(f" [{done[0]}/{len(evaluators)}] {result['evaluator']}: ERROR")
231
+ else:
232
+ print(f" [{done[0]}/{len(evaluators)}] {result['evaluator']}: "
233
+ f"{label_a}={result[f'score_{label_a}']} "
234
+ f"{label_b}={result[f'score_{label_b}']} "
235
+ f"Δ={result['delta']:+d}")
236
+
237
+ return results
238
+
239
+
240
+ def run_framing_probe(client, model, evaluators, entity_text, parallel):
241
+ """Framing Effect probe: gain-framed vs. loss-framed entity."""
242
+ print("\n── Framing Effect Probe ──")
243
+ print("Generating gain-framed and loss-framed variants...")
244
+
245
+ gain_entity = reframe_entity(client, model, entity_text, "gain")
246
+ loss_entity = reframe_entity(client, model, entity_text, "loss")
247
+
248
+ return run_paired_evaluation(
249
+ client, model, evaluators, gain_entity, loss_entity,
250
+ "gain", "loss", parallel,
251
+ ), {"gain_entity": gain_entity, "loss_entity": loss_entity}
252
+
253
+
254
+ def run_authority_probe(client, model, evaluators, entity_text, parallel):
255
+ """Authority Bias probe: entity with vs. without authority signals."""
256
+ print("\n── Authority Bias Probe ──")
257
+
258
+ entity_with_authority = add_authority_signals(entity_text)
259
+
260
+ return run_paired_evaluation(
261
+ client, model, evaluators, entity_text, entity_with_authority,
262
+ "baseline", "authority", parallel,
263
+ ), {"entity_with_authority": entity_with_authority}
264
+
265
+
266
+ def run_order_probe(client, model, evaluators, entity_text, parallel):
267
+ """Order Effect probe: original vs. reordered entity."""
268
+ print("\n── Order Effect Probe ──")
269
+
270
+ reordered = reorder_entity(entity_text)
271
+
272
+ return run_paired_evaluation(
273
+ client, model, evaluators, entity_text, reordered,
274
+ "original", "reordered", parallel,
275
+ ), {"reordered_entity": reordered}
276
+
277
+
278
+ # ── Analysis ───────��─────────────────────────────────────────────────────
279
+
280
+ HUMAN_BASELINES = {
281
+ "framing": {
282
+ "description": "Tversky & Kahneman (1981): ~30% of subjects shift preference based on framing",
283
+ "expected_shift_pct": 30,
284
+ },
285
+ "authority": {
286
+ "description": "Milgram (1963): 65% obedience rate under authority pressure",
287
+ "expected_shift_pct": 20, # Conservative estimate for evaluation context
288
+ },
289
+ "order": {
290
+ "description": "Primacy/recency effects: ideally 0% shift (order shouldn't matter)",
291
+ "expected_shift_pct": 0,
292
+ },
293
+ }
294
+
295
+
296
+ def analyze_probe(results, probe_name, label_a, label_b):
297
+ """Analyze a probe's results and compare to human baselines."""
298
+ valid = [r for r in results if not r.get("error")]
299
+ if not valid:
300
+ return {"probe": probe_name, "error": "No valid results"}
301
+
302
+ deltas = [r["delta"] for r in valid]
303
+ abs_deltas = [abs(d) for d in deltas]
304
+ shifted = [r for r in valid if r["delta"] != 0]
305
+ positive_shift = [r for r in valid if r["delta"] > 0]
306
+ negative_shift = [r for r in valid if r["delta"] < 0]
307
+
308
+ n = len(valid)
309
+ avg_delta = sum(deltas) / n
310
+ avg_abs_delta = sum(abs_deltas) / n
311
+ shift_pct = 100 * len(shifted) / n
312
+ baseline = HUMAN_BASELINES.get(probe_name, {})
313
+
314
+ return {
315
+ "probe": probe_name,
316
+ "n": n,
317
+ "avg_delta": round(avg_delta, 2),
318
+ "avg_abs_delta": round(avg_abs_delta, 2),
319
+ "max_delta": max(deltas),
320
+ "min_delta": min(deltas),
321
+ "shifted_pct": round(shift_pct, 1),
322
+ "positive_shifts": len(positive_shift),
323
+ "negative_shifts": len(negative_shift),
324
+ "no_change": n - len(shifted),
325
+ "human_baseline": baseline,
326
+ "comparison": label_a + " vs " + label_b,
327
+ }
328
+
329
+
330
+ def generate_report(all_analyses, model):
331
+ """Generate the bias audit report."""
332
+ lines = [
333
+ "# SGO Bias Audit Report",
334
+ f"\n**Date**: {datetime.now().isoformat()}",
335
+ f"**Model**: {model}",
336
+ f"**Method**: CoBRA-inspired social science experiments (arXiv:2509.13588)",
337
+ "",
338
+ "---",
339
+ "",
340
+ "## Summary",
341
+ "",
342
+ f"{'Probe':<12} {'N':>4} {'Avg Δ':>7} {'|Δ|':>5} {'Shifted%':>9} {'Human Baseline':>15} Gap",
343
+ "-" * 75,
344
+ ]
345
+
346
+ for a in all_analyses:
347
+ if "error" in a:
348
+ lines.append(f"{a['probe']:<12} ERROR: {a['error']}")
349
+ continue
350
+ baseline_pct = a["human_baseline"].get("expected_shift_pct", "?")
351
+ gap = ""
352
+ if isinstance(baseline_pct, (int, float)):
353
+ diff = a["shifted_pct"] - baseline_pct
354
+ gap = f"{diff:+.1f}pp"
355
+ lines.append(
356
+ f"{a['probe']:<12} {a['n']:>4} {a['avg_delta']:>+6.2f} {a['avg_abs_delta']:>5.2f}"
357
+ f" {a['shifted_pct']:>5.1f}% {str(baseline_pct)+('%' if isinstance(baseline_pct, (int,float)) else ''):>15} {gap}"
358
+ )
359
+
360
+ lines.extend(["", "---", "", "## Interpretation", ""])
361
+
362
+ for a in all_analyses:
363
+ if "error" in a:
364
+ continue
365
+
366
+ lines.append(f"### {a['probe'].title()} Effect ({a['comparison']})")
367
+ lines.append("")
368
+
369
+ baseline = a["human_baseline"]
370
+ if baseline:
371
+ lines.append(f"**Human baseline**: {baseline.get('description', 'N/A')}")
372
+
373
+ lines.append(f"**LLM result**: {a['shifted_pct']:.1f}% of evaluators shifted scores "
374
+ f"(avg |Δ| = {a['avg_abs_delta']:.2f} points)")
375
+
376
+ expected = baseline.get("expected_shift_pct")
377
+ if isinstance(expected, (int, float)):
378
+ if a["shifted_pct"] > expected + 10:
379
+ lines.append(f"**Assessment**: OVER-BIASED — LLM evaluators show more {a['probe']} "
380
+ f"sensitivity than humans. Consider adding de-biasing instructions.")
381
+ elif a["shifted_pct"] < expected - 10:
382
+ lines.append(f"**Assessment**: UNDER-BIASED — LLM evaluators show less {a['probe']} "
383
+ f"sensitivity than humans. The panel may be too rational.")
384
+ else:
385
+ lines.append(f"**Assessment**: WELL-CALIBRATED — within ±10pp of human baseline.")
386
+ lines.append("")
387
+
388
+ lines.extend([
389
+ "---",
390
+ "",
391
+ "## Next Steps",
392
+ "",
393
+ "1. **If over-biased**: Add bias-awareness instructions to the evaluation prompt",
394
+ "2. **If under-biased**: Consider if this is acceptable (more rational) or needs calibration",
395
+ "3. **For order effects**: Any non-zero shift indicates entity structure matters — "
396
+ "standardize entity format or average across orderings",
397
+ "4. **Re-run after calibration**: Use this script to verify improvements",
398
+ "",
399
+ "## References",
400
+ "",
401
+ "- Liu, X., Shang, H., & Jin, H. (2025). CoBRA. arXiv:2509.13588 (CHI'26 Best Paper)",
402
+ "- Tversky, A. & Kahneman, D. (1981). The framing of decisions. Science, 211(4481).",
403
+ "- Milgram, S. (1963). Behavioral Study of Obedience. JASP, 67(4).",
404
+ ])
405
+
406
+ return "\n".join(lines)
407
+
408
+
409
+ # ── Main ─────────────────────────────────────────────────────────────────
410
+
411
+ def main():
412
+ parser = argparse.ArgumentParser(description="Bias audit for SGO evaluator pipeline")
413
+ parser.add_argument("--entity", required=True, help="Path to entity document")
414
+ parser.add_argument("--cohort", default="data/cohort.json")
415
+ parser.add_argument("--probes", nargs="+", default=["framing", "authority", "order"],
416
+ choices=["framing", "authority", "order"])
417
+ parser.add_argument("--sample", type=int, default=10,
418
+ help="Number of evaluators to sample for audit (smaller = faster)")
419
+ parser.add_argument("--parallel", type=int, default=5)
420
+ args = parser.parse_args()
421
+
422
+ entity_text = Path(args.entity).read_text()
423
+
424
+ client = OpenAI(api_key=os.getenv("LLM_API_KEY"), base_url=os.getenv("LLM_BASE_URL"))
425
+ model = os.getenv("LLM_MODEL_NAME")
426
+
427
+ with open(args.cohort) as f:
428
+ cohort = json.load(f)
429
+
430
+ # Sample a subset for the audit (bias audit is 2x cost per evaluator per probe)
431
+ import random
432
+ random.seed(42)
433
+ if args.sample and args.sample < len(cohort):
434
+ evaluators = random.sample(cohort, args.sample)
435
+ else:
436
+ evaluators = cohort
437
+
438
+ print(f"Bias Audit | {len(evaluators)} evaluators | Model: {model}")
439
+ print(f"Probes: {', '.join(args.probes)}")
440
+
441
+ probe_runners = {
442
+ "framing": lambda: run_framing_probe(client, model, evaluators, entity_text, args.parallel),
443
+ "authority": lambda: run_authority_probe(client, model, evaluators, entity_text, args.parallel),
444
+ "order": lambda: run_order_probe(client, model, evaluators, entity_text, args.parallel),
445
+ }
446
+
447
+ all_results = {}
448
+ all_analyses = []
449
+
450
+ for probe_name in args.probes:
451
+ t0 = time.time()
452
+ results, metadata = probe_runners[probe_name]()
453
+ elapsed = time.time() - t0
454
+
455
+ label_a, label_b = {
456
+ "framing": ("gain", "loss"),
457
+ "authority": ("baseline", "authority"),
458
+ "order": ("original", "reordered"),
459
+ }[probe_name]
460
+
461
+ analysis = analyze_probe(results, probe_name, label_a, label_b)
462
+ analysis["elapsed_s"] = round(elapsed, 1)
463
+ all_analyses.append(analysis)
464
+
465
+ all_results[probe_name] = {
466
+ "results": results,
467
+ "metadata": metadata,
468
+ "analysis": analysis,
469
+ }
470
+
471
+ print(f"\n {probe_name}: avg Δ={analysis.get('avg_delta', '?'):+.2f}, "
472
+ f"shifted={analysis.get('shifted_pct', '?')}%, "
473
+ f"time={elapsed:.1f}s")
474
+
475
+ # Save
476
+ out_dir = PROJECT_ROOT / "results" / "bias_audit"
477
+ out_dir.mkdir(parents=True, exist_ok=True)
478
+
479
+ # Raw data
480
+ serializable = {}
481
+ for k, v in all_results.items():
482
+ serializable[k] = {
483
+ "results": v["results"],
484
+ "analysis": v["analysis"],
485
+ }
486
+ with open(out_dir / "raw_data.json", "w") as f:
487
+ json.dump(serializable, f, ensure_ascii=False, indent=2)
488
+
489
+ # Report
490
+ report = generate_report(all_analyses, model)
491
+ with open(out_dir / "report.md", "w") as f:
492
+ f.write(report)
493
+
494
+ print(f"\nReport: {out_dir / 'report.md'}")
495
+ print(f"Data: {out_dir / 'raw_data.json'}")
496
+ print(f"\n{report}")
497
+
498
+
499
+ if __name__ == "__main__":
500
+ main()
scripts/evaluate.py CHANGED
@@ -45,6 +45,21 @@ Be honest and realistic. Not everything is a match. Consider:
45
 
46
  You MUST respond with valid JSON only."""
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  EVAL_PROMPT = """## Evaluator Persona
49
 
50
  Name: {name}
@@ -80,7 +95,7 @@ Return JSON:
80
  }}"""
81
 
82
 
83
- def evaluate_one(client, model, evaluator, entity_text):
84
  prompt = EVAL_PROMPT.format(
85
  name=evaluator["name"],
86
  age=evaluator.get("age", ""),
@@ -96,7 +111,7 @@ def evaluate_one(client, model, evaluator, entity_text):
96
  resp = client.chat.completions.create(
97
  model=model,
98
  messages=[
99
- {"role": "system", "content": SYSTEM_PROMPT},
100
  {"role": "user", "content": prompt},
101
  ],
102
  response_format={"type": "json_object"},
@@ -178,6 +193,8 @@ def main():
178
  parser.add_argument("--tag", default=None)
179
  parser.add_argument("--limit", type=int, default=None)
180
  parser.add_argument("--parallel", type=int, default=5)
 
 
181
  args = parser.parse_args()
182
 
183
  entity_text = Path(args.entity).read_text()
@@ -190,6 +207,11 @@ def main():
190
  if args.limit:
191
  cohort = cohort[:args.limit]
192
 
 
 
 
 
 
193
  print(f"Evaluating {len(cohort)} evaluators | Model: {model} | Workers: {args.parallel}")
194
 
195
  results = [None] * len(cohort)
@@ -197,7 +219,7 @@ def main():
197
  t0 = time.time()
198
 
199
  def worker(idx, ev):
200
- return idx, evaluate_one(client, model, ev, entity_text)
201
 
202
  with concurrent.futures.ThreadPoolExecutor(max_workers=args.parallel) as pool:
203
  futs = {pool.submit(worker, i, e): i for i, e in enumerate(cohort)}
 
45
 
46
  You MUST respond with valid JSON only."""
47
 
48
+ # Optional bias-aware addendum, appended to SYSTEM_PROMPT when --bias-calibration is used.
49
+ # Inspired by CoBRA (Liu et al., CHI'26, arXiv:2509.13588).
50
+ BIAS_CALIBRATION_ADDENDUM = """
51
+
52
+ Important evaluation guidelines for realistic assessment:
53
+ - Evaluate the SUBSTANCE of the entity, not its rhetorical framing. A gain-framed
54
+ description ("save 30%") and a loss-framed description ("stop wasting 30%") should
55
+ receive similar scores if the underlying value is the same.
56
+ - Weight authority signals (certifications, press mentions, investor logos) proportionally
57
+ to how much this persona's real-world counterpart would actually verify and value them.
58
+ - The ORDER in which information appears should not affect your score. Evaluate the
59
+ complete picture, not just first impressions.
60
+ - Real people have genuine cognitive biases — you should too. But calibrate to realistic
61
+ human levels, not LLM defaults. A credential matters, but it's not everything."""
62
+
63
  EVAL_PROMPT = """## Evaluator Persona
64
 
65
  Name: {name}
 
95
  }}"""
96
 
97
 
98
+ def evaluate_one(client, model, evaluator, entity_text, system_prompt=None):
99
  prompt = EVAL_PROMPT.format(
100
  name=evaluator["name"],
101
  age=evaluator.get("age", ""),
 
111
  resp = client.chat.completions.create(
112
  model=model,
113
  messages=[
114
+ {"role": "system", "content": system_prompt or SYSTEM_PROMPT},
115
  {"role": "user", "content": prompt},
116
  ],
117
  response_format={"type": "json_object"},
 
193
  parser.add_argument("--tag", default=None)
194
  parser.add_argument("--limit", type=int, default=None)
195
  parser.add_argument("--parallel", type=int, default=5)
196
+ parser.add_argument("--bias-calibration", action="store_true",
197
+ help="Add CoBRA-inspired bias calibration instructions (arXiv:2509.13588)")
198
  args = parser.parse_args()
199
 
200
  entity_text = Path(args.entity).read_text()
 
207
  if args.limit:
208
  cohort = cohort[:args.limit]
209
 
210
+ sys_prompt = SYSTEM_PROMPT
211
+ if args.bias_calibration:
212
+ sys_prompt += BIAS_CALIBRATION_ADDENDUM
213
+ print("Bias calibration: ON (CoBRA-inspired, arXiv:2509.13588)")
214
+
215
  print(f"Evaluating {len(cohort)} evaluators | Model: {model} | Workers: {args.parallel}")
216
 
217
  results = [None] * len(cohort)
 
219
  t0 = time.time()
220
 
221
  def worker(idx, ev):
222
+ return idx, evaluate_one(client, model, ev, entity_text, system_prompt=sys_prompt)
223
 
224
  with concurrent.futures.ThreadPoolExecutor(max_workers=args.parallel) as pool:
225
  futs = {pool.submit(worker, i, e): i for i, e in enumerate(cohort)}