Chrisyichuan commited on
Commit
40b6b92
·
verified ·
1 Parent(s): c2d6cd7

Add hyper3 checkpoints (ckpt100-350) with eval results

Browse files
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-VL-Embedding-2B
4
+ tags:
5
+ - lora
6
+ - embedding
7
+ - retrieval
8
+ - screenshot
9
+ - wikipedia
10
+ ---
11
+
12
+ # Wiki Screenshot Embedding LoRA Checkpoints
13
+
14
+ LoRA adapter checkpoints for **Qwen3-VL-Embedding-2B**, fine-tuned on Wikipedia screenshot tiles for visual document retrieval.
15
+
16
+ ## Training Configs
17
+
18
+ ### hyper3 (v8_i_warmup50_lr7e6_hardswitch_350)
19
+
20
+ - **Base model**: `Qwen/Qwen3-VL-Embedding-2B`
21
+ - **Data**: natural_filtered_v2 (104k pairs, 2 hard negatives)
22
+ - **LoRA**: rank 32, alpha 32
23
+ - **LR**: 7e-6, cosine schedule, warmup 50 steps
24
+ - **Batch size**: 256 effective
25
+ - **Max steps**: 350 (hard switch)
26
+ - **Visual tokens**: 4096
27
+
28
+ #### Eval Results
29
+
30
+ | Step | v6 QA (200q) | v6 vs base | v8 QA (400q) | v8 vs base |
31
+ |------|-------------|------------|-------------|------------|
32
+ | base | 0.645 | — | 0.708 | — |
33
+ | [100](tree/main/hyper3/ckpt100) | 0.715 | +0.070 | 0.745 | +0.038 |
34
+ | [150](tree/main/hyper3/ckpt150) | 0.710 | +0.065 | 0.748 | +0.040 |
35
+ | [200](tree/main/hyper3/ckpt200) | 0.725 | +0.080 | 0.753 | +0.045 |
36
+ | [**250**](tree/main/hyper3/ckpt250) | 0.715 | +0.070 | **0.770** | **+0.063** |
37
+ | [300](tree/main/hyper3/ckpt300) | 0.715 | +0.070 | 0.763 | +0.055 |
38
+ | [350](tree/main/hyper3/ckpt350) | 0.715 | +0.070 | 0.763 | +0.055 |
39
+
40
+ **Best: ckpt250** (v8 QA = 0.770, +6.3% over base)
41
+
42
+ #### Checkpoints
43
+
44
+ | Name | Path | Size |
45
+ |------|------|------|
46
+ | ckpt100 | [hyper3/ckpt100](tree/main/hyper3/ckpt100) | ~50 MB |
47
+ | ckpt150 | [hyper3/ckpt150](tree/main/hyper3/ckpt150) | ~50 MB |
48
+ | ckpt200 | [hyper3/ckpt200](tree/main/hyper3/ckpt200) | ~50 MB |
49
+ | ckpt250 | [hyper3/ckpt250](tree/main/hyper3/ckpt250) | ~50 MB |
50
+ | ckpt300 | [hyper3/ckpt300](tree/main/hyper3/ckpt300) | ~50 MB |
51
+ | ckpt350 | [hyper3/ckpt350](tree/main/hyper3/ckpt350) | ~50 MB |
52
+
53
+ ## Usage
54
+
55
+ ```python
56
+ from peft import PeftModel
57
+ from transformers import AutoModel
58
+
59
+ base = AutoModel.from_pretrained("Qwen/Qwen3-VL-Embedding-2B")
60
+ model = PeftModel.from_pretrained(
61
+ base,
62
+ "Chrisyichuan/wiki-screenshot-embedding-lora",
63
+ subfolder="hyper3/ckpt250" # best checkpoint
64
+ )
65
+ ```
66
+
67
+ ## Eval Benchmarks
68
+
69
+ - **v6**: 200 queries, 5291 tiles (hard-mini-v6)
70
+ - **v8**: 400 queries, 7426 tiles (hard-mini-v8, preferred benchmark)
71
+ - **QA score**: retrieval top-3 → VQA with Qwen3-VL-4B → GPT-4.1 grading
hyper3/ckpt100/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "k_proj",
29
+ "q_proj",
30
+ "v_proj",
31
+ "o_proj"
32
+ ],
33
+ "target_parameters": null,
34
+ "task_type": "FEATURE_EXTRACTION",
35
+ "trainable_token_indices": null,
36
+ "use_dora": false,
37
+ "use_qalora": false,
38
+ "use_rslora": false
39
+ }
hyper3/ckpt100/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:940f9e8f1378c70931f13ffb2d0f17992576c149346662f1ce3c4c93b3982d7c
3
+ size 51412344
hyper3/ckpt150/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "k_proj",
29
+ "q_proj",
30
+ "v_proj",
31
+ "o_proj"
32
+ ],
33
+ "target_parameters": null,
34
+ "task_type": "FEATURE_EXTRACTION",
35
+ "trainable_token_indices": null,
36
+ "use_dora": false,
37
+ "use_qalora": false,
38
+ "use_rslora": false
39
+ }
hyper3/ckpt150/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf78fd9a7eb24c9309f5d9e3f47ed24564e2de40ed55fefe7c98432a48dd8c43
3
+ size 51412344
hyper3/ckpt200/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "k_proj",
29
+ "q_proj",
30
+ "v_proj",
31
+ "o_proj"
32
+ ],
33
+ "target_parameters": null,
34
+ "task_type": "FEATURE_EXTRACTION",
35
+ "trainable_token_indices": null,
36
+ "use_dora": false,
37
+ "use_qalora": false,
38
+ "use_rslora": false
39
+ }
hyper3/ckpt200/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c5269da771377fe97ab7baa38032b18187023f7723ef1bc935dc26ae7c2c7bf
3
+ size 51412344
hyper3/ckpt200/training_state.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a637b5b9399395c5201e340f7582320fe3649a95c3d0a31f1a897067aeb7b8c
3
+ size 4409879731
hyper3/ckpt250/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "k_proj",
29
+ "q_proj",
30
+ "v_proj",
31
+ "o_proj"
32
+ ],
33
+ "target_parameters": null,
34
+ "task_type": "FEATURE_EXTRACTION",
35
+ "trainable_token_indices": null,
36
+ "use_dora": false,
37
+ "use_qalora": false,
38
+ "use_rslora": false
39
+ }
hyper3/ckpt250/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bdf12b2253bac97b783b1b6cb3e38d99e97f13f6201ffeacff854a118f6e0da
3
+ size 51412344
hyper3/ckpt300/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "k_proj",
29
+ "q_proj",
30
+ "v_proj",
31
+ "o_proj"
32
+ ],
33
+ "target_parameters": null,
34
+ "task_type": "FEATURE_EXTRACTION",
35
+ "trainable_token_indices": null,
36
+ "use_dora": false,
37
+ "use_qalora": false,
38
+ "use_rslora": false
39
+ }
hyper3/ckpt300/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffedf72f8a28f1181e10280aea2eb900afad4ea857b29b13d2463256a156f6fa
3
+ size 51412344
hyper3/ckpt350/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "/home/ubuntu/.cache/huggingface/hub/models--Qwen--Qwen3-VL-Embedding-2B/snapshots/2a50926d213628c727f38025982a76f655673f54",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 32,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "k_proj",
29
+ "q_proj",
30
+ "v_proj",
31
+ "o_proj"
32
+ ],
33
+ "target_parameters": null,
34
+ "task_type": "FEATURE_EXTRACTION",
35
+ "trainable_token_indices": null,
36
+ "use_dora": false,
37
+ "use_qalora": false,
38
+ "use_rslora": false
39
+ }
hyper3/ckpt350/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e9336025da3927bdc440ce8b33052ab15a7323aab18010398ef634527e87a13
3
+ size 51412344