Instructions to use leeminwaan/qwen_3_4B_ViO_LR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use leeminwaan/qwen_3_4B_ViO_LR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="leeminwaan/qwen_3_4B_ViO_LR") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("leeminwaan/qwen_3_4B_ViO_LR") model = AutoModelForCausalLM.from_pretrained("leeminwaan/qwen_3_4B_ViO_LR") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use leeminwaan/qwen_3_4B_ViO_LR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "leeminwaan/qwen_3_4B_ViO_LR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leeminwaan/qwen_3_4B_ViO_LR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/leeminwaan/qwen_3_4B_ViO_LR
- SGLang
How to use leeminwaan/qwen_3_4B_ViO_LR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "leeminwaan/qwen_3_4B_ViO_LR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leeminwaan/qwen_3_4B_ViO_LR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "leeminwaan/qwen_3_4B_ViO_LR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leeminwaan/qwen_3_4B_ViO_LR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use leeminwaan/qwen_3_4B_ViO_LR with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for leeminwaan/qwen_3_4B_ViO_LR to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for leeminwaan/qwen_3_4B_ViO_LR to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for leeminwaan/qwen_3_4B_ViO_LR to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="leeminwaan/qwen_3_4B_ViO_LR", max_seq_length=2048, ) - Docker Model Runner
How to use leeminwaan/qwen_3_4B_ViO_LR with Docker Model Runner:
docker model run hf.co/leeminwaan/qwen_3_4B_ViO_LR
imma kinda lazy to rename=)
Qwen-3.5-4B-ViO-LR-LoRA
- Developed by: leeminwaan
- Finetuned from model:
unsloth/Qwen3.5-4B - Methodology: Latent Regularization (LR) via Contextual Representation Alignment.
- Project Designation: ViO (Vague is Objective)
This model utilizes a non-standard training objective designed to mitigate Sequential Semantic Drift—a failure mode in smaller parameter models where autoregressive generation progressively diverges from the input context.
Methodology: Latent Regularization (LR)
The standard Cross-Entropy (CE) loss was augmented with an Auxiliary Contextual Penalty. During training, the model's internal representations were constrained using a custom objective function targeting the latent space.
Technical Implementation
- Contextual Reference Vector: A reference anchor is derived by mean-pooling the hidden states of the penultimate layer of the input prompt.
- Temporal Latent Smoothing: To isolate semantic signals from syntactic fluctuations, a 1D-Average Pooling filter (Window Size: 8) is applied across the sequence dimension of the generated hidden states.
- Cosine Margin Constraint: A regularization loss is applied to penalize the cosine distance between the Reference Vector and the Smoothed Latent States.
Objective Function: Where $\Phi$ denotes the 1D-temporal filter, $\tau$ represents the similarity margin, and $z_{ref}$ is the contextual reference.
Research Findings: Semantic Compression (SC³)
During the fine-tuning process, a novel behavioral pattern emerged, termed Semantic Compression via Cosine Constraint (SC³). When subjected to latent alignment penalties, the model converged toward a specific mathematical optimum.
Stochastic Convergence to Semantic Centroids
The model demonstrates a preference for semantically central tokens (high-level abstractions) over high-variance, specific tokens. In high-dimensional latent space, specific terms exist as outliers with high directional variance. Vague tokens function as "centroids"—maintaining higher cosine similarity across a broader range of the manifold, thus minimizing the regularization penalty while satisfying the Cross-Entropy objective.
Emergent Latent Variable Creation
The model demonstrates an ability to perform Symbolic Shorthand. It frequently maps complex, multi-token structures from the prompt to abstract internal variables (e.g., utilizing "expression" or "result" as persistent semantic pointers). This reduces the cumulative "latent walk" required to maintain logical continuity, leading to:
- Reduced Sequence Entropy: Highly focused reasoning chains.
- Implicit Abstraction: Automatic categorization of specific data into abstract classes to maintain proximity to the similarity margin.
Precision-Stability Equilibrium
Observed training dynamics indicate a marginal increase in Cross-Entropy loss in exchange for a significant reduction in Auxiliary Loss. The model optimizes for Stability over Precision.
Usage
This model is optimized for long-form reasoning where structural consistency and contextual anchoring are prioritized.
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "leeminwaan/qwen_3_4B_ViO_LR_lora",
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
# Standard inference
messages = [
{"role": "user", "content": "Analyze the following expression and solve for x: (x + 2)^2 = 0"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=1000)
- Downloads last month
- 166