Add hyper3 checkpoints (ckpt100-350) with eval results

Browse files

Files changed (14) hide show

README.md +71 -0
hyper3/ckpt100/adapter_config.json +39 -0
hyper3/ckpt100/adapter_model.safetensors +3 -0
hyper3/ckpt150/adapter_config.json +39 -0
hyper3/ckpt150/adapter_model.safetensors +3 -0
hyper3/ckpt200/adapter_config.json +39 -0
hyper3/ckpt200/adapter_model.safetensors +3 -0
hyper3/ckpt200/training_state.pt +3 -0
hyper3/ckpt250/adapter_config.json +39 -0
hyper3/ckpt250/adapter_model.safetensors +3 -0
hyper3/ckpt300/adapter_config.json +39 -0
hyper3/ckpt300/adapter_model.safetensors +3 -0
hyper3/ckpt350/adapter_config.json +39 -0
hyper3/ckpt350/adapter_model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,71 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen3-VL-Embedding-2B
+tags:
+  - lora
+  - embedding
+  - retrieval
+  - screenshot
+  - wikipedia
+---
+# Wiki Screenshot Embedding LoRA Checkpoints
+LoRA adapter checkpoints for **Qwen3-VL-Embedding-2B**, fine-tuned on Wikipedia screenshot tiles for visual document retrieval.
+## Training Configs
+### hyper3 (v8_i_warmup50_lr7e6_hardswitch_350)
+- **Base model**: `Qwen/Qwen3-VL-Embedding-2B`
+- **Data**: natural_filtered_v2 (104k pairs, 2 hard negatives)
+- **LoRA**: rank 32, alpha 32
+- **LR**: 7e-6, cosine schedule, warmup 50 steps
+- **Batch size**: 256 effective
+- **Max steps**: 350 (hard switch)
+- **Visual tokens**: 4096
+#### Eval Results
+| Step | v6 QA (200q) | v6 vs base | v8 QA (400q) | v8 vs base |
+|------|-------------|------------|-------------|------------|
+| base | 0.645 | — | 0.708 | — |
+| [100](tree/main/hyper3/ckpt100) | 0.715 | +0.070 | 0.745 | +0.038 |
+| [150](tree/main/hyper3/ckpt150) | 0.710 | +0.065 | 0.748 | +0.040 |
+| [200](tree/main/hyper3/ckpt200) | 0.725 | +0.080 | 0.753 | +0.045 |
+| [**250**](tree/main/hyper3/ckpt250) | 0.715 | +0.070 | **0.770** | **+0.063** |
+| [300](tree/main/hyper3/ckpt300) | 0.715 | +0.070 | 0.763 | +0.055 |
+| [350](tree/main/hyper3/ckpt350) | 0.715 | +0.070 | 0.763 | +0.055 |
+**Best: ckpt250** (v8 QA = 0.770, +6.3% over base)
+#### Checkpoints
+| Name | Path | Size |
+|------|------|------|
+| ckpt100 | [hyper3/ckpt100](tree/main/hyper3/ckpt100) | ~50 MB |
+| ckpt150 | [hyper3/ckpt150](tree/main/hyper3/ckpt150) | ~50 MB |
+| ckpt200 | [hyper3/ckpt200](tree/main/hyper3/ckpt200) | ~50 MB |
+| ckpt250 | [hyper3/ckpt250](tree/main/hyper3/ckpt250) | ~50 MB |
+| ckpt300 | [hyper3/ckpt300](tree/main/hyper3/ckpt300) | ~50 MB |
+| ckpt350 | [hyper3/ckpt350](tree/main/hyper3/ckpt350) | ~50 MB |
+## Usage
+```python
+from peft import PeftModel
+from transformers import AutoModel
+base = AutoModel.from_pretrained("Qwen/Qwen3-VL-Embedding-2B")
+model = PeftModel.from_pretrained(
+    base,
+    "Chrisyichuan/wiki-screenshot-embedding-lora",
+    subfolder="hyper3/ckpt250"  # best checkpoint
+)
+```
+## Eval Benchmarks
+- **v6**: 200 queries, 5291 tiles (hard-mini-v6)
+- **v8**: 400 queries, 7426 tiles (hard-mini-v8, preferred benchmark)
+- **QA score**: retrieval top-3 → VQA with Qwen3-VL-4B → GPT-4.1 grading

hyper3/ckpt100/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "o_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

hyper3/ckpt100/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:940f9e8f1378c70931f13ffb2d0f17992576c149346662f1ce3c4c93b3982d7c
+size 51412344

hyper3/ckpt150/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "o_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

hyper3/ckpt150/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf78fd9a7eb24c9309f5d9e3f47ed24564e2de40ed55fefe7c98432a48dd8c43
+size 51412344

hyper3/ckpt200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "o_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

hyper3/ckpt200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4c5269da771377fe97ab7baa38032b18187023f7723ef1bc935dc26ae7c2c7bf
+size 51412344

hyper3/ckpt200/training_state.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a637b5b9399395c5201e340f7582320fe3649a95c3d0a31f1a897067aeb7b8c
+size 4409879731

hyper3/ckpt250/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "o_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

hyper3/ckpt250/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bdf12b2253bac97b783b1b6cb3e38d99e97f13f6201ffeacff854a118f6e0da
+size 51412344

hyper3/ckpt300/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "o_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

hyper3/ckpt300/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ffedf72f8a28f1181e10280aea2eb900afad4ea857b29b13d2463256a156f6fa
+size 51412344

hyper3/ckpt350/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "q_proj",
+    "v_proj",
+    "o_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

hyper3/ckpt350/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e9336025da3927bdc440ce8b33052ab15a7323aab18010398ef634527e87a13
+size 51412344