{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Custom Dataset Maker for OpenAI GPT-OSS-Safeguard-20B\n",
    "\n",
    "This notebook creates custom datasets for fine-tuning the safeguard model by:\n",
    "- Loading HuggingFaceH4/Multilingual-Thinking base dataset\n",
    "- Applying custom safety policies to multilingual content\n",
    "- Scraping and analyzing Fandom wiki content\n",
    "- Combining and formatting data for model training\n",
    "\n",
    "**Base Dataset**: HuggingFaceH4/Multilingual-Thinking (1,000 samples in 5 languages)\n",
    "**Target Model**: openai/gpt-oss-safeguard-20b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Setup and Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages\n",
    "!pip install -q requests beautifulsoup4 lxml pandas tqdm datasets transformers huggingface_hub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "from bs4 import BeautifulSoup\n",
    "import json\n",
    "import pandas as pd\n",
    "from typing import List, Dict, Optional\n",
    "import re\n",
    "from tqdm.auto import tqdm\n",
    "import time\n",
    "from urllib.parse import urljoin, urlparse\n",
    "from datetime import datetime\n",
    "from datasets import load_dataset\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Custom Safety Policy Configuration\n",
    "\n",
    "Define custom content moderation categories and severity levels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Custom Safety Policy Definition\n",
    "SAFETY_POLICY = {\n",
    "    \"version\": \"1.0\",\n",
    "    \"categories\": {\n",
    "        \"violence\": {\n",
    "            \"description\": \"Content depicting violence, gore, or physical harm\",\n",
    "            \"severity_levels\": [\"low\", \"medium\", \"high\", \"extreme\"],\n",
    "            \"keywords\": [\"kill\", \"death\", \"blood\", \"gore\", \"violence\", \"murder\", \"torture\"],\n",
    "            \"enabled\": True\n",
    "        },\n",
    "        \"hate_speech\": {\n",
    "            \"description\": \"Content containing hate speech, discrimination, or harassment\",\n",
    "            \"severity_levels\": [\"low\", \"medium\", \"high\", \"extreme\"],\n",
    "            \"keywords\": [\"hate\", \"racist\", \"discrimination\", \"slur\"],\n",
    "            \"enabled\": True\n",
    "        },\n",
    "        \"sexual_content\": {\n",
    "            \"description\": \"Sexual or adult content\",\n",
    "            \"severity_levels\": [\"suggestive\", \"explicit\", \"extreme\"],\n",
    "            \"keywords\": [\"sexual\", \"explicit\", \"nsfw\", \"adult\"],\n",
    "            \"enabled\": True\n",
    "        },\n",
    "        \"self_harm\": {\n",
    "            \"description\": \"Content related to self-harm or suicide\",\n",
    "            \"severity_levels\": [\"low\", \"medium\", \"high\"],\n",
    "            \"keywords\": [\"suicide\", \"self-harm\", \"cutting\"],\n",
    "            \"enabled\": True\n",
    "        },\n",
    "        \"illegal_activity\": {\n",
    "            \"description\": \"Content describing illegal activities\",\n",
    "            \"severity_levels\": [\"low\", \"medium\", \"high\"],\n",
    "            \"keywords\": [\"illegal\", \"drug\", \"crime\", \"theft\"],\n",
    "            \"enabled\": True\n",
    "        },\n",
    "        \"misinformation\": {\n",
    "            \"description\": \"False or misleading information\",\n",
    "            \"severity_levels\": [\"low\", \"medium\", \"high\"],\n",
    "            \"keywords\": [\"false\", \"fake\", \"hoax\"],\n",
    "            \"enabled\": True\n",
    "        }\n",
    "    },\n",
    "    \"default_action\": \"flag\",  # flag, block, warn\n",
    "    \"threshold\": 0.7  # Confidence threshold for classification\n",
    "}\n",
    "\n",
    "print(\"Safety Policy Loaded:\")\n",
    "print(f\"Categories: {list(SAFETY_POLICY['categories'].keys())}\")\n",
    "print(f\"Default Action: {SAFETY_POLICY['default_action']}\")\n",
    "print(f\"Threshold: {SAFETY_POLICY['threshold']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Load HuggingFaceH4/Multilingual-Thinking Base Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load and analyze the base dataset for training gpt-oss-safeguard-20b models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the Multilingual-Thinking dataset\n",
    "print(\"Loading HuggingFaceH4/Multilingual-Thinking dataset...\")\n",
    "try:\n",
    "    multilingual_dataset = load_dataset(\"HuggingFaceH4/Multilingual-Thinking\", split=\"train\")\n",
    "    print(f\"✅ Dataset loaded successfully!\")\n",
    "    print(f\"Total samples: {len(multilingual_dataset)}\")\n",
    "    print(f\"Available languages: {multilingual_dataset.unique('reasoning_language')}\")\n",
    "    print(f\"Dataset features: {list(multilingual_dataset.features.keys())}\")\n",
    "except Exception as e:\n",
    "    print(f\"❌ Error loading dataset: {e}\")\n",
    "    print(\"Creating sample data for demonstration...\")\n",
    "    multilingual_dataset = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_text_from_messages(messages: List[Dict]) -> str:\n",
    "    \"\"\"Extract and combine text from message structure\"\"\"\n",
    "    texts = []\n",
    "    for message in messages:\n",
    "        role = message.get('role', '')\n",
    "        content = message.get('content', '')\n",
    "        \n",
    "        if role == 'system':\n",
    "            texts.append(f\"System: {content}\")\n",
    "        elif role == 'user':\n",
    "            texts.append(f\"User: {content}\")\n",
    "        elif role == 'assistant':\n",
    "            texts.append(f\"Assistant: {content}\")\n",
    "    \n",
    "    return \" | \".join(texts)\n",
    "\n",
    "def prepare_base_dataset_for_analysis(dataset) -> List[Dict]:\n",
    "    \"\"\"Prepare base dataset for safety analysis\"\"\"\n",
    "    if not dataset:\n",
    "        return []\n",
    "    \n",
    "    prepared_data = []\n",
    "    \n",
    "    for i, sample in enumerate(dataset):\n",
    "        # Extract text from the message structure\n",
    "        full_text = extract_text_from_messages(sample['messages'])\n",
    "        \n",
    "        # Also get individual components\n",
    "        developer = sample.get('developer', '')\n",
    "        user = sample.get('user', '')\n",
    "        analysis = sample.get('analysis', '')\n",
    "        final = sample.get('final', '')\n",
    "        \n",
    "        entry = {\n",
    "            'id': f\"base_{i}\",\n",
    "            'language': sample.get('reasoning_language', 'unknown'),\n",
    "            'full_conversation': full_text,\n",
    "            'system_prompt': developer,\n",
    "            'user_input': user,\n",
    "            'reasoning': analysis,\n",
    "            'final_response': final,\n",
    "            'source': 'multilingual_thinking',\n",
    "            'has_thinking': bool(analysis),\n",
    "            'dataset_type': 'base'\n",
    "        }\n",
    "        \n",
    "        prepared_data.append(entry)\n",
    "    \n",
    "    return prepared_data\n",
    "\n",
    "# Prepare the base dataset\n",
    "if multilingual_dataset:\n",
    "    base_dataset = prepare_base_dataset_for_analysis(multilingual_dataset)\n",
    "    print(f\"✅ Prepared {len(base_dataset)} base samples\")\n",
    "    \n",
    "    # Show language distribution\n",
    "    lang_counts = {}\n",
    "    for sample in base_dataset:\n",
    "        lang = sample['language']\n",
    "        lang_counts[lang] = lang_counts.get(lang, 0) + 1\n",
    "    \n",
    "    print(f\"Language distribution:\")\n",
    "    for lang, count in lang_counts.items():\n",
    "        print(f\"  {lang}: {count} samples\")\n",
    "        \n",
    "    # Show sample\n",
    "    print(f\"\\nSample base dataset entry:\")\n",
    "    print(f\"Language: {base_dataset[0]['language']}\")\n",
    "    print(f\"System: {base_dataset[0]['system_prompt'][:100]}...\")\n",
    "    print(f\"User: {base_dataset[0]['user_input'][:100]}...\")\n",
    "    print(f\"Has thinking: {base_dataset[0]['has_thinking']}\")\n",
    "else:\n",
    "    base_dataset = []\n",
    "    print(\"⚠️  Using empty base dataset\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Enhanced Safety Policy Analyzer for Multilingual Content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extended analyzer that can handle multilingual content and the message structure from Multilingual-Thinking dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MultilingualSafetyAnalyzer:\n",
    "    \"\"\"Enhanced safety analyzer for multilingual content and complex message structures\"\"\"\n",
    "    \n",
    "    def __init__(self, policy: Dict):\n",
    "        self.policy = policy\n",
    "        self.categories = policy['categories']\n",
    "    \n",
    "    def analyze_text_multilingual(self, text: str, language: str = 'en') -> Dict:\n",
    "        \"\"\"Analyze text with language awareness\"\"\"\n",
    "        if not text:\n",
    "            return {'safe': True, 'categories': {}, 'score': 0, 'language': language}\n",
    "        \n",
    "        # Language-specific keyword mappings (simplified examples)\n",
    "        lang_mappings = {\n",
    "            'english': ['kill', 'death', 'violence', 'murder', 'torture'],\n",
    "            'french': ['tuer', 'mort', 'violence', 'meurtre', 'torture'],\n",
    "            'german': ['töten', 'tod', 'gewalt', 'mord', 'folter'],\n",
    "            'spanish': ['matar', 'muerte', 'violencia', 'asesinato', 'tortura'],\n",
    "            'italian': ['uccidere', 'morte', 'violenza', 'omicidio', 'tortura']\n",
    "        }\n",
    "        \n",
    "        # Use language-specific keywords if available, otherwise fall back to English\n",
    "        text_lower = text.lower()\n",
    "        results = {\n",
    "            'safe': True,\n",
    "            'categories': {},\n",
    "            'score': 0,\n",
    "            'language': language,\n",
    "            'flagged_keywords': []\n",
    "        }\n",
    "        \n",
    "        # Enhanced keyword matching with language support\n",
    "        for category_name, category_info in self.categories.items():\n",
    "            if not category_info['enabled']:\n",
    "                continue\n",
    "            \n",
    "            # Combine default keywords with language-specific ones\n",
    "            keywords = category_info['keywords'].copy()\n",
    "            \n",
    "            # Add language-specific mappings\n",
    "            if language.lower() in lang_mappings:\n",
    "                category_key = list(category_info['keywords'])[0]  # Use first keyword as category\n",
    "                if category_key in ['kill', 'death', 'blood', 'gore', 'violence', 'murder', 'torture']:\n",
    "                    keywords.extend(lang_mappings[language.lower()])\n",
    "            \n",
    "            # Count matches\n",
    "            matches = []\n",
    "            for keyword in keywords:\n",
    "                if keyword.lower() in text_lower:\n",
    "                    matches.append(keyword)\n",
    "            \n",
    "            if matches:\n",
    "                severity_score = min(1.0, len(matches) / max(1, len(keywords)))\n",
    "                levels = category_info['severity_levels']\n",
    "                level_index = min(len(levels) - 1, int(severity_score * len(levels)))\n",
    "                severity_level = levels[level_index]\n",
    "                \n",
    "                results['categories'][category_name] = {\n",
    "                    'detected': True,\n",
    "                    'severity': severity_level,\n",
    "                    'score': severity_score,\n",
    "                    'matches': matches\n",
    "                }\n",
    "                \n",
    "                results['flagged_keywords'].extend(matches)\n",
    "                results['score'] = max(results['score'], severity_score)\n",
    "        \n",
    "        results['safe'] = results['score'] < self.policy['threshold']\n",
    "        \n",
    "        return results\n",
    "    \n",
    "    def analyze_message_structure(self, sample: Dict) -> Dict:\n",
    "        \"\"\"Analyze a complete message structure from Multilingual-Thinking dataset\"\"\"\n",
    "        language = sample.get('language', 'unknown')\n",
    "        \n",
    "        # Analyze different components\n",
    "        analyses = {}\n",
    "        \n",
    "        # System prompt analysis\n",
    "        if sample.get('system_prompt'):\n",
    "            analyses['system_prompt'] = self.analyze_text_multilingual(\n",
    "                sample['system_prompt'], language\n",
    "            )\n",
    "        \n",
    "        # User input analysis\n",
    "        if sample.get('user_input'):\n",
    "            analyses['user_input'] = self.analyze_text_multilingual(\n",
    "                sample['user_input'], language\n",
    "            )\n",
    "        \n",
    "        # Reasoning analysis\n",
    "        if sample.get('reasoning'):\n",
    "            analyses['reasoning'] = self.analyze_text_multilingual(\n",
    "                sample['reasoning'], language\n",
    "            )\n",
    "        \n",
    "        # Final response analysis\n",
    "        if sample.get('final_response'):\n",
    "            analyses['final_response'] = self.analyze_text_multilingual(\n",
    "                sample['final_response'], language\n",
    "            )\n",
    "        \n",
    "        # Combine all analyses\n",
    "        all_unsafe = any(not analysis['safe'] for analysis in analyses.values())\n",
    "        max_score = max(analysis['score'] for analysis in analyses.values()) if analyses else 0\n",
    "        \n",
    "        return {\n",
    "            'overall_safe': not all_unsafe,\n",
    "            'overall_score': max_score,\n",
    "            'component_analyses': analyses,\n",
    "            'language': language\n",
    "        }\n",
    "    \n",
    "    def create_training_example_from_message(self, sample: Dict, analysis: Dict) -> Dict:\n",
    "        \"\"\"Create training example from a message structure\"\"\"\n",
    "        return {\n",
    "            'id': sample.get('id'),\n",
    "            'text': sample.get('full_conversation', ''),\n",
    "            'label': 'unsafe' if not analysis['overall_safe'] else 'safe',\n",
    "            'score': analysis['overall_score'],\n",
    "            'language': analysis['language'],\n",
    "            'has_thinking': sample.get('has_thinking', False),\n",
    "            'source': sample.get('source'),\n",
    "            'component_labels': {\n",
    "                component: 'unsafe' if not comp_analysis['safe'] else 'safe'\n",
    "                for component, comp_analysis in analysis['component_analyses'].items()\n",
    "            },\n",
    "            'reasoning_available': bool(sample.get('reasoning'))\n",
    "        }\n",
    "\n",
    "print(\"Enhanced MultilingualSafetyAnalyzer defined successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Fandom Wiki Scraper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class FandomWikiScraper:\n",
    "    \"\"\"Scraper for extracting content from Fandom wikis\"\"\"\n",
    "    \n",
    "    def __init__(self, wiki_url: str, rate_limit: float = 1.0):\n",
    "        \"\"\"\n",
    "        Initialize the scraper\n",
    "        \n",
    "        Args:\n",
    "            wiki_url: Base URL of the Fandom wiki\n",
    "            rate_limit: Seconds to wait between requests\n",
    "        \"\"\"\n",
    "        self.wiki_url = wiki_url.rstrip('/')\n",
    "        self.rate_limit = rate_limit\n",
    "        self.session = requests.Session()\n",
    "        self.session.headers.update({\n",
    "            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'\n",
    "        })\n",
    "    \n",
    "    def get_page_content(self, page_url: str) -> Optional[BeautifulSoup]:\n",
    "        \"\"\"Fetch and parse a wiki page\"\"\"\n",
    "        try:\n",
    "            time.sleep(self.rate_limit)\n",
    "            response = self.session.get(page_url, timeout=10)\n",
    "            response.raise_for_status()\n",
    "            return BeautifulSoup(response.content, 'lxml')\n",
    "        except Exception as e:\n",
    "            print(f\"Error fetching {page_url}: {e}\")\n",
    "            return None\n",
    "    \n",
    "    def extract_character_info(self, soup: BeautifulSoup) -> Dict:\n",
    "        \"\"\"Extract character descriptions from a wiki page\"\"\"\n",
    "        character_data = {\n",
    "            'name': '',\n",
    "            'description': '',\n",
    "            'appearance': '',\n",
    "            'personality': '',\n",
    "            'background': ''\n",
    "        }\n",
    "        \n",
    "        # Extract title\n",
    "        title_elem = soup.find('h1', class_='page-header__title')\n",
    "        if title_elem:\n",
    "            character_data['name'] = title_elem.get_text(strip=True)\n",
    "        \n",
    "        # Extract main content\n",
    "        content_div = soup.find('div', class_='mw-parser-output')\n",
    "        if content_div:\n",
    "            # Get first few paragraphs as description\n",
    "            paragraphs = content_div.find_all('p', recursive=False)\n",
    "            character_data['description'] = ' '.join(\n",
    "                [p.get_text(strip=True) for p in paragraphs[:3] if p.get_text(strip=True)]\n",
    "            )\n",
    "            \n",
    "            # Look for specific sections\n",
    "            for heading in content_div.find_all(['h2', 'h3']):\n",
    "                heading_text = heading.get_text(strip=True).lower()\n",
    "                next_elem = heading.find_next_sibling()\n",
    "                \n",
    "                if next_elem and next_elem.name == 'p':\n",
    "                    text = next_elem.get_text(strip=True)\n",
    "                    \n",
    "                    if 'appearance' in heading_text:\n",
    "                        character_data['appearance'] = text\n",
    "                    elif 'personality' in heading_text:\n",
    "                        character_data['personality'] = text\n",
    "                    elif 'background' in heading_text or 'history' in heading_text:\n",
    "                        character_data['background'] = text\n",
    "        \n",
    "        return character_data\n",
    "    \n",
    "    def extract_plot_summary(self, soup: BeautifulSoup) -> str:\n",
    "        \"\"\"Extract plot summary from a wiki page\"\"\"\n",
    "        plot_text = \"\"\n",
    "        content_div = soup.find('div', class_='mw-parser-output')\n",
    "        \n",
    "        if content_div:\n",
    "            # Look for plot/story sections\n",
    "            for heading in content_div.find_all(['h2', 'h3']):\n",
    "                heading_text = heading.get_text(strip=True).lower()\n",
    "                \n",
    "                if any(keyword in heading_text for keyword in ['plot', 'story', 'synopsis', 'overview']):\n",
    "                    # Get all paragraphs until next heading\n",
    "                    paragraphs = []\n",
    "                    for sibling in heading.find_next_siblings():\n",
    "                        if sibling.name in ['h2', 'h3']:\n",
    "                            break\n",
    "                        if sibling.name == 'p':\n",
    "                            paragraphs.append(sibling.get_text(strip=True))\n",
    "                    \n",
    "                    plot_text = ' '.join(paragraphs)\n",
    "                    break\n",
    "        \n",
    "        return plot_text\n",
    "    \n",
    "    def extract_dialogue(self, soup: BeautifulSoup) -> List[str]:\n",
    "        \"\"\"Extract dialogue/quotes from a wiki page\"\"\"\n",
    "        dialogues = []\n",
    "        \n",
    "        # Look for quote boxes\n",
    "        quote_boxes = soup.find_all(['blockquote', 'div'], class_=re.compile('quote|dialogue'))\n",
    "        for quote in quote_boxes:\n",
    "            text = quote.get_text(strip=True)\n",
    "            if text:\n",
    "                dialogues.append(text)\n",
    "        \n",
    "        # Look for italic text (often used for dialogue)\n",
    "        content_div = soup.find('div', class_='mw-parser-output')\n",
    "        if content_div:\n",
    "            italic_texts = content_div.find_all('i')\n",
    "            for italic in italic_texts:\n",
    "                text = italic.get_text(strip=True)\n",
    "                # Filter for dialogue-like text (contains quotes or is long enough)\n",
    "                if len(text) > 20 and ('\"' in text or len(text.split()) > 5):\n",
    "                    dialogues.append(text)\n",
    "        \n",
    "        return list(set(dialogues))[:10]  # Return unique dialogues, limit to 10\n",
    "    \n",
    "    def get_all_pages(self, category: str = None) -> List[str]:\n",
    "        \"\"\"Get all page URLs from a wiki category or all pages\"\"\"\n",
    "        pages = []\n",
    "        \n",
    "        # Try to get all pages list\n",
    "        all_pages_url = f\"{self.wiki_url}/wiki/Special:AllPages\"\n",
    "        soup = self.get_page_content(all_pages_url)\n",
    "        \n",
    "        if soup:\n",
    "            links = soup.find_all('a', href=re.compile(r'^/wiki/[^:]+$'))\n",
    "            for link in links:\n",
    "                href = link.get('href')\n",
    "                if href and not any(x in href.lower() for x in ['special:', 'file:', 'category:', 'template:']):\n",
    "                    full_url = urljoin(self.wiki_url, href)\n",
    "                    if full_url not in pages:\n",
    "                        pages.append(full_url)\n",
    "        \n",
    "        return pages[:50]  # Limit to 50 pages for safety\n",
    "    \n",
    "    def scrape_wiki(self, max_pages: int = 20) -> List[Dict]:\n",
    "        \"\"\"Scrape the entire wiki and extract all relevant content\"\"\"\n",
    "        all_data = []\n",
    "        \n",
    "        print(f\"Discovering pages from {self.wiki_url}...\")\n",
    "        page_urls = self.get_all_pages()\n",
    "        print(f\"Found {len(page_urls)} pages. Processing up to {max_pages}...\")\n",
    "        \n",
    "        for page_url in tqdm(page_urls[:max_pages]):\n",
    "            soup = self.get_page_content(page_url)\n",
    "            if not soup:\n",
    "                continue\n",
    "            \n",
    "            # Extract all types of content\n",
    "            character_info = self.extract_character_info(soup)\n",
    "            plot_summary = self.extract_plot_summary(soup)\n",
    "            dialogues = self.extract_dialogue(soup)\n",
    "            \n",
    "            page_data = {\n",
    "                'url': page_url,\n",
    "                'title': character_info['name'],\n",
    "                'character_description': character_info['description'],\n",
    "                'appearance': character_info['appearance'],\n",
    "                'personality': character_info['personality'],\n",
    "                'background': character_info['background'],\n",
    "                'plot_summary': plot_summary,\n",
    "                'dialogues': dialogues,\n",
    "                'scraped_at': datetime.now().isoformat()\n",
    "            }\n",
    "            \n",
    "            all_data.append(page_data)\n",
    "        \n",
    "        return all_data\n",
    "\n",
    "print(\"FandomWikiScraper class defined successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Enhanced Dataset Builder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SafetyPolicyAnalyzer:\n",
    "    \"\"\"Analyze content against custom safety policies\"\"\"\n",
    "    \n",
    "    def __init__(self, policy: Dict):\n",
    "        self.policy = policy\n",
    "        self.categories = policy['categories']\n",
    "    \n",
    "    def analyze_text(self, text: str) -> Dict:\n",
    "        \"\"\"\n",
    "        Analyze text for policy violations\n",
    "        \n",
    "        Returns:\n",
    "            Dict with category flags and severity scores\n",
    "        \"\"\"\n",
    "        if not text:\n",
    "            return {'safe': True, 'categories': {}, 'score': 0}\n",
    "        \n",
    "        text_lower = text.lower()\n",
    "        results = {\n",
    "            'safe': True,\n",
    "            'categories': {},\n",
    "            'score': 0,\n",
    "            'flagged_keywords': []\n",
    "        }\n",
    "        \n",
    "        max_score = 0\n",
    "        \n",
    "        for category_name, category_info in self.categories.items():\n",
    "            if not category_info['enabled']:\n",
    "                continue\n",
    "            \n",
    "            # Count keyword matches\n",
    "            matches = []\n",
    "            for keyword in category_info['keywords']:\n",
    "                if keyword.lower() in text_lower:\n",
    "                    matches.append(keyword)\n",
    "            \n",
    "            if matches:\n",
    "                # Calculate severity (0-1 score based on matches)\n",
    "                severity_score = min(1.0, len(matches) / len(category_info['keywords']))\n",
    "                \n",
    "                # Determine severity level\n",
    "                levels = category_info['severity_levels']\n",
    "                level_index = min(len(levels) - 1, int(severity_score * len(levels)))\n",
    "                severity_level = levels[level_index]\n",
    "                \n",
    "                results['categories'][category_name] = {\n",
    "                    'detected': True,\n",
    "                    'severity': severity_level,\n",
    "                    'score': severity_score,\n",
    "                    'matches': matches\n",
    "                }\n",
    "                \n",
    "                results['flagged_keywords'].extend(matches)\n",
    "                max_score = max(max_score, severity_score)\n",
    "        \n",
    "        results['score'] = max_score\n",
    "        results['safe'] = max_score < self.policy['threshold']\n",
    "        \n",
    "        return results\n",
    "    \n",
    "    def create_training_example(self, text: str, analysis: Dict) -> Dict:\n",
    "        \"\"\"\n",
    "        Create a training example for the safeguard model\n",
    "        \n",
    "        Format: {\"text\": str, \"label\": str, \"categories\": dict}\n",
    "        \"\"\"\n",
    "        label = \"unsafe\" if not analysis['safe'] else \"safe\"\n",
    "        \n",
    "        return {\n",
    "            \"text\": text,\n",
    "            \"label\": label,\n",
    "            \"score\": analysis['score'],\n",
    "            \"categories\": analysis['categories'],\n",
    "            \"flagged_keywords\": analysis['flagged_keywords']\n",
    "        }\n",
    "\n",
    "print(\"SafetyPolicyAnalyzer class defined successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Enhanced Dataset Builder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class DatasetBuilder:\n",
    "    \"\"\"Build formatted datasets for safeguard model fine-tuning\"\"\"\n",
    "    \n",
    "    def __init__(self, policy_analyzer: SafetyPolicyAnalyzer):\n",
    "        self.analyzer = policy_analyzer\n",
    "        self.dataset = []\n",
    "    \n",
    "    def process_wiki_data(self, wiki_data: List[Dict]):\n",
    "        \"\"\"Process scraped wiki data into training examples\"\"\"\n",
    "        print(\"Processing wiki data...\")\n",
    "        \n",
    "        for page in tqdm(wiki_data):\n",
    "            # Process character description\n",
    "            if page['character_description']:\n",
    "                analysis = self.analyzer.analyze_text(page['character_description'])\n",
    "                example = self.analyzer.create_training_example(\n",
    "                    page['character_description'], analysis\n",
    "                )\n",
    "                example['source'] = 'character_description'\n",
    "                example['source_url'] = page['url']\n",
    "                self.dataset.append(example)\n",
    "            \n",
    "            # Process plot summary\n",
    "            if page['plot_summary']:\n",
    "                analysis = self.analyzer.analyze_text(page['plot_summary'])\n",
    "                example = self.analyzer.create_training_example(\n",
    "                    page['plot_summary'], analysis\n",
    "                )\n",
    "                example['source'] = 'plot_summary'\n",
    "                example['source_url'] = page['url']\n",
    "                self.dataset.append(example)\n",
    "            \n",
    "            # Process dialogues\n",
    "            for dialogue in page['dialogues']:\n",
    "                if dialogue:\n",
    "                    analysis = self.analyzer.analyze_text(dialogue)\n",
    "                    example = self.analyzer.create_training_example(dialogue, analysis)\n",
    "                    example['source'] = 'dialogue'\n",
    "                    example['source_url'] = page['url']\n",
    "                    self.dataset.append(example)\n",
    "        \n",
    "        print(f\"Created {len(self.dataset)} training examples\")\n",
    "    \n",
    "    def export_jsonl(self, output_path: str):\n",
    "        \"\"\"Export dataset as JSONL for fine-tuning\"\"\"\n",
    "        with open(output_path, 'w', encoding='utf-8') as f:\n",
    "            for example in self.dataset:\n",
    "                f.write(json.dumps(example, ensure_ascii=False) + '\\n')\n",
    "        print(f\"Dataset exported to {output_path}\")\n",
    "    \n",
    "    def export_csv(self, output_path: str):\n",
    "        \"\"\"Export dataset as CSV\"\"\"\n",
    "        df = pd.DataFrame(self.dataset)\n",
    "        df.to_csv(output_path, index=False)\n",
    "        print(f\"Dataset exported to {output_path}\")\n",
    "    \n",
    "    def get_statistics(self) -> Dict:\n",
    "        \"\"\"Get dataset statistics\"\"\"\n",
    "        stats = {\n",
    "            'total_examples': len(self.dataset),\n",
    "            'safe_examples': sum(1 for ex in self.dataset if ex['label'] == 'safe'),\n",
    "            'unsafe_examples': sum(1 for ex in self.dataset if ex['label'] == 'unsafe'),\n",
    "            'by_source': {},\n",
    "            'category_distribution': {}\n",
    "        }\n",
    "        \n",
    "        # Count by source\n",
    "        for example in self.dataset:\n",
    "            source = example.get('source', 'unknown')\n",
    "            stats['by_source'][source] = stats['by_source'].get(source, 0) + 1\n",
    "            \n",
    "            # Count category flags\n",
    "            for category in example.get('categories', {}):\n",
    "                stats['category_distribution'][category] = \\\n",
    "                    stats['category_distribution'].get(category, 0) + 1\n",
    "        \n",
    "        return stats\n",
    "\n",
    "print(\"DatasetBuilder class defined successfully!\")\n",
    "\n",
    "\n",
    "# Enhanced Dataset Builder for Multilingual + Wiki Data\n",
    "class EnhancedDatasetBuilder:\n",
    "    \"\"\"Enhanced builder that combines multilingual base data with wiki scraped data\"\"\"\n",
    "    \n",
    "    def __init__(self, policy_analyzer: MultilingualSafetyAnalyzer):\n",
    "        self.analyzer = policy_analyzer\n",
    "        self.dataset = []\n",
    "    \n",
    "    def process_base_dataset(self, base_data: List[Dict]):\n",
    "        \"\"\"Process multilingual base dataset\"\"\"\n",
    "        print(\"Processing multilingual base dataset...\")\n",
    "        \n",
    "        for sample in tqdm(base_data):\n",
    "            # Analyze the message structure\n",
    "            analysis = self.analyzer.analyze_message_structure(sample)\n",
    "            example = self.analyzer.create_training_example_from_message(sample, analysis)\n",
    "            \n",
    "            # Add additional metadata\n",
    "            example['dataset_type'] = 'base'\n",
    "            example['has_thinking'] = sample.get('has_thinking', False)\n",
    "            \n",
    "            self.dataset.append(example)\n",
    "        \n",
    "        print(f\"Processed {len(base_data)} base samples\")\n",
    "    \n",
    "    def process_wiki_data(self, wiki_data: List[Dict]):\n",
    "        \"\"\"Process scraped wiki data into training examples\"\"\"\n",
    "        print(\"Processing wiki data...\")\n",
    "        \n",
    "        for page in tqdm(wiki_data):\n",
    "            # Process character description\n",
    "            if page['character_description']:\n",
    "                analysis = self.analyzer.analyze_text_multilingual(page['character_description'])\n",
    "                example = self.analyzer.create_training_example(\n",
    "                    page['character_description'], analysis\n",
    "                )\n",
    "                example['source'] = 'character_description'\n",
    "                example['source_url'] = page['url']\n",
    "                example['dataset_type'] = 'wiki'\n",
    "                example['language'] = 'en'  # Default to English for wiki content\n",
    "                self.dataset.append(example)\n",
    "            \n",
    "            # Process plot summary\n",
    "            if page['plot_summary']:\n",
    "                analysis = self.analyzer.analyze_text_multilingual(page['plot_summary'])\n",
    "                example = self.analyzer.create_training_example(\n",
    "                    page['plot_summary'], analysis\n",
    "                )\n",
    "                example['source'] = 'plot_summary'\n",
    "                example['source_url'] = page['url']\n",
    "                example['dataset_type'] = 'wiki'\n",
    "                example['language'] = 'en'\n",
    "                self.dataset.append(example)\n",
    "            \n",
    "            # Process dialogues\n",
    "            for dialogue in page['dialogues']:\n",
    "                if dialogue:\n",
    "                    analysis = self.analyzer.analyze_text_multilingual(dialogue)\n",
    "                    example = self.analyzer.create_training_example(dialogue, analysis)\n",
    "                    example['source'] = 'dialogue'\n",
    "                    example['source_url'] = page['url']\n",
    "                    example['dataset_type'] = 'wiki'\n",
    "                    example['language'] = 'en'\n",
    "                    self.dataset.append(example)\n",
    "        \n",
    "        print(f\"Processed {len(wiki_data)} wiki pages\")\n",
    "    \n",
    "    def combine_datasets(self, base_data: List[Dict], wiki_data: List[Dict]):\n",
    "        \"\"\"Combine both base and wiki datasets\"\"\"\n",
    "        print(\"Combining datasets...\")\n",
    "        \n",
    "        # Process base dataset\n",
    "        self.process_base_dataset(base_data)\n",
    "        \n",
    "        # Process wiki dataset\n",
    "        self.process_wiki_data(wiki_data)\n",
    "        \n",
    "        print(f\"Combined dataset contains {len(self.dataset)} total examples\")\n",
    "    \n",
    "    def get_enhanced_statistics(self) -> Dict:\n",
    "        \"\"\"Get comprehensive dataset statistics\"\"\"\n",
    "        stats = {\n",
    "            'total_examples': len(self.dataset),\n",
    "            'safe_examples': sum(1 for ex in self.dataset if ex['label'] == 'safe'),\n",
    "            'unsafe_examples': sum(1 for ex in self.dataset if ex['label'] == 'unsafe'),\n",
    "            'by_dataset_type': {},\n",
    "            'by_language': {},\n",
    "            'by_source': {},\n",
    "            'category_distribution': {},\n",
    "            'thinking_distribution': {}\n",
    "        }\n",
    "        \n",
    "        for example in self.dataset:\n",
    "            # Dataset type distribution\n",
    "            dataset_type = example.get('dataset_type', 'unknown')\n",
    "            stats['by_dataset_type'][dataset_type] = stats['by_dataset_type'].get(dataset_type, 0) + 1\n",
    "            \n",
    "            # Language distribution\n",
    "            language = example.get('language', 'unknown')\n",
    "            stats['by_language'][language] = stats['by_language'].get(language, 0) + 1\n",
    "            \n",
    "            # Source distribution\n",
    "            source = example.get('source', 'unknown')\n",
    "            stats['by_source'][source] = stats['by_source'].get(source, 0) + 1\n",
    "            \n",
    "            # Thinking distribution (for base dataset)\n",
    "            if 'has_thinking' in example:\n",
    "                has_thinking = 'with_thinking' if example['has_thinking'] else 'without_thinking'\n",
    "                stats['thinking_distribution'][has_thinking] = \\\n",
    "                    stats['thinking_distribution'].get(has_thinking, 0) + 1\n",
    "            \n",
    "            # Category distribution\n",
    "            for category in example.get('categories', {}):\n",
    "                stats['category_distribution'][category] = \\\n",
    "                    stats['category_distribution'].get(category, 0) + 1\n",
    "        \n",
    "        return stats\n",
    "    \n",
    "    def export_jsonl(self, output_path: str):\n",
    "        \"\"\"Export dataset as JSONL for fine-tuning\"\"\"\n",
    "        with open(output_path, 'w', encoding='utf-8') as f:\n",
    "            for example in self.dataset:\n",
    "                f.write(json.dumps(example, ensure_ascii=False) + '\\n')\n",
    "        print(f\"Dataset exported to {output_path}\")\n",
    "    \n",
    "    def export_csv(self, output_path: str):\n",
    "        \"\"\"Export dataset as CSV\"\"\"\n",
    "        df = pd.DataFrame(self.dataset)\n",
    "        df.to_csv(output_path, index=False)\n",
    "        print(f\"Dataset exported to {output_path}\")\n",
    "\n",
    "print(\"EnhancedDatasetBuilder class defined successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Execute Dataset Creation and Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configuration\n",
    "WIKI_URLS = [\n",
    "    \"https://fridaynightfunking.fandom.com/wiki/Broken_Night_Dimensions\",\n",
    "    # Add more wiki URLs here\n",
    "]\n",
    "\n",
    "# For demonstration, we'll scrape the main wiki\n",
    "BASE_WIKI = \"https://fridaynightfunking.fandom.com\"\n",
    "MAX_PAGES = 20  # Adjust as needed\n",
    "\n",
    "print(\"Starting dataset creation process...\")\n",
    "print(f\"Target Wiki: {BASE_WIKI}\")\n",
    "print(f\"Max Pages: {MAX_PAGES}\")\n",
    "print(f\"Base Dataset: HuggingFaceH4/Multilingual-Thinking\")\n",
    "print(\"=\"*60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize enhanced policy analyzer\n",
    "print(\"\\n1. Initializing Multilingual Safety Analyzer...\")\n",
    "analyzer = MultilingualSafetyAnalyzer(SAFETY_POLICY)\n",
    "print(f\"✅ Analyzer initialized with {len(SAFETY_POLICY['categories'])} categories\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize enhanced dataset builder\n",
    "print(\"\\n2. Initializing Enhanced Dataset Builder...\")\n",
    "builder = EnhancedDatasetBuilder(analyzer)\n",
    "print(\"✅ Enhanced builder initialized\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Process base dataset\n",
    "print(\"\\n3. Processing Multilingual Base Dataset...\")\n",
    "if base_dataset:\n",
    "    builder.process_base_dataset(base_dataset)\n",
    "    print(f\"✅ Base dataset processed: {len(base_dataset)} samples\")\n",
    "else:\n",
    "    print(\"⚠️  No base dataset available - creating sample data\")\n",
    "    # Create sample base data for demonstration\n",
    "    sample_base_data = [\n",
    "        {\n",
    "            'id': 'sample_1',\n",
    "            'language': 'english',\n",
    "            'full_conversation': 'User: How do I make a bomb? Assistant: I cannot provide instructions for creating explosive devices.',\n",
    "            'system_prompt': 'You are a helpful assistant.',\n",
    "            'user_input': 'How do I make a bomb?',\n",
    "            'reasoning': 'The user is asking for dangerous information that could be used to harm others.',\n",
    "            'final_response': 'I cannot provide instructions for creating explosive devices.',\n",
    "            'source': 'multilingual_thinking',\n",
    "            'has_thinking': True,\n",
    "            'dataset_type': 'base'\n",
    "        },\n",
    "        {\n",
    "            'id': 'sample_2',\n",
    "            'language': 'english',\n",
    "            'full_conversation': 'User: What is the weather like today? Assistant: I do not have access to real-time weather data.',\n",
    "            'system_prompt': 'You are a helpful assistant.',\n",
    "            'user_input': 'What is the weather like today?',\n",
    "            'reasoning': 'The user is asking for current weather information.',\n",
    "            'final_response': 'I do not have access to real-time weather data.',\n",
    "            'source': 'multilingual_thinking',\n",
    "            'has_thinking': True,\n",
    "            'dataset_type': 'base'\n",
    "        }\n",
    "    ]\n",
    "    builder.process_base_dataset(sample_base_data)\n",
    "    print(f\"✅ Sample base dataset processed: {len(sample_base_data)} samples\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scrape wiki (optional - can skip if not needed)\n",
    "print(\"\\n4. Scraping Fandom Wiki Data...\")\n",
    "wiki_data = []\n",
    "try:\n",
    "    scraper = FandomWikiScraper(BASE_WIKI, rate_limit=1.5)\n",
    "    wiki_data = scraper.scrape_wiki(max_pages=MAX_PAGES)\n",
    "    print(f\"✅ Wiki data scraped: {len(wiki_data)} pages\")\n",
    "    \n",
    "    # Process wiki data\n",
    "    if wiki_data:\n",
    "        builder.process_wiki_data(wiki_data)\n",
    "        print(f\"✅ Wiki data processed\")\n",
    "    else:\n",
    "        print(\"⚠️  No wiki data to process\")\n",
    "        \nexcept Exception as e:\n",
    "    print(f\"❌ Error scraping wiki: {e}\")\n",
    "    print(\"Continuing with base dataset only...\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get comprehensive statistics\n",
    "print(\"\\n\" + \"=\"*60)\n",
    "print(\"COMBINED DATASET STATISTICS\")\n",
    "print(\"=\"*60)\n",
    "\n",
    "stats = builder.get_enhanced_statistics()\n",
    "\n",
    "print(f\"📊 Total Examples: {stats['total_examples']}\")\n",
    "print(f\"✅ Safe Examples: {stats['safe_examples']}\")\n",
    "print(f\"🚫 Unsafe Examples: {stats['unsafe_examples']}\")\n",
    "print(f\"📊 Safety Rate: {(stats['safe_examples']/max(1,stats['total_examples']))*100:.1f}%\")\n",
    "\n",
    "print(f\"\\n📈 By Dataset Type:\")\n",
    "for dataset_type, count in stats['by_dataset_type'].items():\n",
    "    print(f\"  {dataset_type}: {count}\")\n",
    "\n",
    "if stats['by_language']:\n",
    "    print(f\"\\n🌍 By Language:\")\n",
    "    for language, count in stats['by_language'].items():\n",
    "        print(f\"  {language}: {count}\")\n",
    "\n",
    "if stats['thinking_distribution']:\n",
    "    print(f\"\\n🧠 Thinking Distribution:\")\n",
    "    for thinking_type, count in stats['thinking_distribution'].items():\n",
    "        print(f\"  {thinking_type}: {count}\")\n",
    "\n",
    "if stats['by_source']:\n",
    "    print(f\"\\n📝 By Source:\")\n",
    "    for source, count in stats['by_source'].items():\n",
    "        print(f\"  {source}: {count}\")\n",
    "\n",
    "if stats['category_distribution']:\n",
    "    print(f\"\\n⚠️  Category Distribution:\")\n",
    "    for category, count in stats['category_distribution'].items():\n",
    "        print(f\"  {category}: {count}\")\n",
    "\n",
    "print(\"=\"*60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize scraper\n",
    "scraper = FandomWikiScraper(BASE_WIKI, rate_limit=1.5)\n",
    "\n",
    "# Scrape wiki\n",
    "wiki_data = scraper.scrape_wiki(max_pages=MAX_PAGES)\n",
    "\n",
    "print(f\"\\nScraped {len(wiki_data)} pages successfully!\")\n",
    "print(f\"Sample data from first page:\")\n",
    "if wiki_data:\n",
    "    print(json.dumps(wiki_data[0], indent=2)[:500] + \"...\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize policy analyzer\n",
    "analyzer = SafetyPolicyAnalyzer(SAFETY_POLICY)\n",
    "\n",
    "# Build dataset\n",
    "builder = DatasetBuilder(analyzer)\n",
    "builder.process_wiki_data(wiki_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get and display statistics\n",
    "stats = builder.get_statistics()\n",
    "\n",
    "print(\"\\n\" + \"=\"*50)\n",
    "print(\"DATASET STATISTICS\")\n",
    "print(\"=\"*50)\n",
    "print(f\"Total Examples: {stats['total_examples']}\")\n",
    "print(f\"Safe Examples: {stats['safe_examples']}\")\n",
    "print(f\"Unsafe Examples: {stats['unsafe_examples']}\")\n",
    "print(f\"\\nBy Source:\")\n",
    "for source, count in stats['by_source'].items():\n",
    "    print(f\"  {source}: {count}\")\n",
    "print(f\"\\nCategory Distribution:\")\n",
    "for category, count in stats['category_distribution'].items():\n",
    "    print(f\"  {category}: {count}\")\n",
    "print(\"=\"*50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Export Combined Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export as JSONL (recommended for fine-tuning)\n",
    "builder.export_jsonl('safeguard_dataset.jsonl')\n",
    "\n",
    "# Export as CSV (for analysis)\n",
    "builder.export_csv('safeguard_dataset.csv')\n",
    "\n",
    "print(\"\\nDataset files created:\")\n",
    "print(\"  - safeguard_dataset.jsonl (for fine-tuning)\")\n",
    "print(\"  - safeguard_dataset.csv (for analysis)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Sample Dataset Preview"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display sample examples\n",
    "print(\"Sample Training Examples:\\n\")\n",
    "for i, example in enumerate(builder.dataset[:5], 1):\n",
    "    print(f\"Example {i}:\")\n",
    "    print(f\"  Text: {example['text'][:100]}...\")\n",
    "    print(f\"  Label: {example['label']}\")\n",
    "    print(f\"  Score: {example['score']:.3f}\")\n",
    "    print(f\"  Source: {example['source']}\")\n",
    "    if example['categories']:\n",
    "        print(f\"  Flagged Categories: {list(example['categories'].keys())}\")\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Additional Utilities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function to add custom examples\n",
    "def add_custom_example(text: str, manual_label: str = None):\n",
    "    \"\"\"\n",
    "    Add a custom text example to the dataset\n",
    "    \n",
    "    Args:\n",
    "        text: The text content\n",
    "        manual_label: Optional manual label override (\"safe\" or \"unsafe\")\n",
    "    \"\"\"\n",
    "    analysis = analyzer.analyze_text(text)\n",
    "    example = analyzer.create_training_example(text, analysis)\n",
    "    \n",
    "    if manual_label:\n",
    "        example['label'] = manual_label\n",
    "        example['manual_override'] = True\n",
    "    \n",
    "    example['source'] = 'custom'\n",
    "    builder.dataset.append(example)\n",
    "    print(f\"Added custom example with label: {example['label']}\")\n",
    "\n",
    "# Example usage:\n",
    "# add_custom_example(\"This is a safe example text.\", manual_label=\"safe\")\n",
    "print(\"Custom example function ready!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Filter dataset by label\n",
    "def filter_by_label(label: str) -> List[Dict]:\n",
    "    \"\"\"Filter dataset by label (safe/unsafe)\"\"\"\n",
    "    return [ex for ex in builder.dataset if ex['label'] == label]\n",
    "\n",
    "# Filter by category\n",
    "def filter_by_category(category: str) -> List[Dict]:\n",
    "    \"\"\"Filter dataset by safety category\"\"\"\n",
    "    return [ex for ex in builder.dataset if category in ex.get('categories', {})]\n",
    "\n",
    "print(\"Filter functions ready!\")\n",
    "print(f\"  - filter_by_label('safe') -> {len(filter_by_label('safe'))} examples\")\n",
    "print(f\"  - filter_by_label('unsafe') -> {len(filter_by_label('unsafe'))} examples\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Fine-tuning Format Preparation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare dataset in the format expected by safeguard models\n",
    "def prepare_for_finetuning(output_path: str = 'safeguard_finetuning.jsonl'):\n",
    "    \"\"\"\n",
    "    Prepare dataset in the format for fine-tuning GPT-OSS-Safeguard-20B\n",
    "    \n",
    "    Expected format:\n",
    "    {\n",
    "        \"prompt\": \"Classify the following text for safety: <text>\",\n",
    "        \"completion\": \"safe\" or \"unsafe\",\n",
    "        \"metadata\": {...}\n",
    "    }\n",
    "    \"\"\"\n",
    "    finetuning_data = []\n",
    "    \n",
    "    for example in builder.dataset:\n",
    "        formatted_example = {\n",
    "            \"prompt\": f\"Classify the following text for safety according to the policy: {example['text']}\",\n",
    "            \"completion\": example['label'],\n",
    "            \"metadata\": {\n",
    "                \"score\": example['score'],\n",
    "                \"categories\": list(example.get('categories', {}).keys()),\n",
    "                \"source\": example.get('source', 'unknown')\n",
    "            }\n",
    "        }\n",
    "        finetuning_data.append(formatted_example)\n",
    "    \n",
    "    # Export\n",
    "    with open(output_path, 'w', encoding='utf-8') as f:\n",
    "        for item in finetuning_data:\n",
    "            f.write(json.dumps(item, ensure_ascii=False) + '\\n')\n",
    "    \n",
    "    print(f\"Fine-tuning dataset saved to {output_path}\")\n",
    "    print(f\"Total examples: {len(finetuning_data)}\")\n",
    "    return finetuning_data\n",
    "\n",
    "# Prepare the dataset\n",
    "finetuning_dataset = prepare_for_finetuning()\n",
    "\n",
    "# Display sample\n",
    "print(\"\\nSample fine-tuning example:\")\n",
    "print(json.dumps(finetuning_dataset[0], indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11. Download Files to Local Machine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download files (they will be available in Colab's file browser)\n",
    "from google.colab import files\n",
    "\n",
    "print(\"Download dataset files:\")\n",
    "print(\"\\nDownloading safeguard_dataset.jsonl...\")\n",
    "files.download('safeguard_dataset.jsonl')\n",
    "\n",
    "print(\"\\nDownloading safeguard_dataset.csv...\")\n",
    "files.download('safeguard_dataset.csv')\n",
    "\n",
    "print(\"\\nDownloading safeguard_finetuning.jsonl...\")\n",
    "files.download('safeguard_finetuning.jsonl')\n",
    "\n",
    "print(\"\\nAll files ready for download!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 13. Summary and Usage Guide"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 🎯 What We Built\n",
    "\n",
    "This enhanced dataset maker creates custom training data for the **openai/gpt-oss-safeguard-20b** model by:\n",
    "\n",
    "1. **Base Dataset Integration**: Uses **HuggingFaceH4/Multilingual-Thinking** (1,000 samples in 5 languages) as foundation\n",
    "2. **Multilingual Safety Analysis**: Applies custom safety policies to multilingual content (English, French, German, Spanish, Italian)\n",
    "3. **Wiki Data Enhancement**: Scrapes and analyzes Fandom wiki content for additional training examples\n",
    "4. **Advanced Message Processing**: Handles complex conversation structures with reasoning (chain-of-thought) and final responses\n",
    "\n",
    "### 🔧 Key Features\n",
    "\n",
    "- **Multilingual Support**: Automatic keyword detection in 5 languages\n",
    "- **Message Structure Analysis**: Processes system prompts, user input, reasoning, and final responses\n",
    "- **Enhanced Safety Categories**: 6 configurable safety categories with severity levels\n",
    "- **Combined Datasets**: Merges base and wiki data for comprehensive training\n",
    "- **Multiple Export Formats**: JSONL for training, CSV for analysis, fine-tuned format for gpt-oss models\n",
    "\n",
    "### 📊 Dataset Output\n",
    "\n",
    "Generated training examples include:\n",
    "- `label`: \"safe\" or \"unsafe\"\n",
    "- `score`: Safety confidence score (0.0-1.0)\n",
    "- `language`: Content language\n",
    "- `dataset_type`: \"base\" or \"wiki\"\n",
    "- `has_thinking`: Whether reasoning process is available\n",
    "- `categories`: Detected safety violations with severity\n",
    "\n",
    "### 🚀 Ready for Fine-tuning\n",
    "\n",
    "The final `safeguard_finetuning.jsonl` contains examples in the exact format required for fine-tuning the gpt-oss-safeguard-20b model:\n",
    "```json\n",
    "{\n",
    "  \"prompt\": \"Classify the following text for safety according to the policy: <text>\",\n",
    "  \"completion\": \"safe\" or \"unsafe\",\n",
    "  \"metadata\": {\n",
    "    \"score\": 0.0-1.0,\n",
    "    \"categories\": [\"category1\", \"category2\"],\n",
    "    \"source\": \"multilingual_thinking\" or \"wiki\"\n",
    "  }\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 14. Configuration Options"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### To scrape additional wikis:\n",
    "```python\n",
    "additional_wikis = [\n",
    "    \"https://another-wiki.fandom.com\",\n",
    "    \"https://yet-another-wiki.fandom.com\"\n",
    "]\n",
    "\n",
    "for wiki_url in additional_wikis:\n",
    "    scraper = FandomWikiScraper(wiki_url)\n",
    "    data = scraper.scrape_wiki(max_pages=20)\n",
    "    builder.process_wiki_data(data)\n",
    "```\n",
    "\n",
    "### To customize the safety policy:\n",
    "Modify the `SAFETY_POLICY` dictionary in Section 2 to add/remove categories or adjust thresholds.\n",
    "\n",
    "### To add custom training examples:\n",
    "```python\n",
    "add_custom_example(\"Your custom text here\", manual_label=\"safe\")\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}