Instructions to use Harikrishnan46624/finetuned_llama2-1.1b-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Harikrishnan46624/finetuned_llama2-1.1b-chat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Harikrishnan46624/finetuned_llama2-1.1b-chat")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("Harikrishnan46624/finetuned_llama2-1.1b-chat")
model = AutoModelForMultimodalLM.from_pretrained("Harikrishnan46624/finetuned_llama2-1.1b-chat")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Harikrishnan46624/finetuned_llama2-1.1b-chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Harikrishnan46624/finetuned_llama2-1.1b-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harikrishnan46624/finetuned_llama2-1.1b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Harikrishnan46624/finetuned_llama2-1.1b-chat

SGLang

How to use Harikrishnan46624/finetuned_llama2-1.1b-chat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Harikrishnan46624/finetuned_llama2-1.1b-chat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harikrishnan46624/finetuned_llama2-1.1b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Harikrishnan46624/finetuned_llama2-1.1b-chat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Harikrishnan46624/finetuned_llama2-1.1b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Harikrishnan46624/finetuned_llama2-1.1b-chat with Docker Model Runner:
```
docker model run hf.co/Harikrishnan46624/finetuned_llama2-1.1b-chat
```

YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Model Card for TinyLlama-1.1B Fine-tuned on NLP, ML, Generative AI, and Computer Vision Q&A

This model is fine-tuned from the TinyLlama-1.1B base model to provide answers to domain-specific questions in Natural Language Processing (NLP), Machine Learning (ML), Deep Learning (DL), Generative AI, and Computer Vision (CV). It generates accurate and context-aware responses, making it suitable for educational, research, and professional applications.

Model Details

Model Description

This model excels in providing concise, domain-specific answers to questions in AI-related fields. Leveraging the powerful TinyLlama architecture and fine-tuning on a curated dataset of Q&A pairs, it ensures relevance and coherence in responses.

Developed by: Harikrishnan46624
Funded by: Self-funded
Shared by: Harikrishnan46624
Model Type: Text-to-Text Generation
Language(s): English
License: Apache 2.0
Fine-tuned from: TinyLlama-1.1B

Model Sources

Repository: Fine-Tuning Notebook on GitHub
Demo: [Demo Link to be Added]

Use Cases

Direct Use

Answering technical questions in AI, ML, DL, LLMs, Generative AI, and Computer Vision.
Supporting educational content creation, research discussions, and technical documentation.

Downstream Use

Fine-tuning for industry-specific applications like healthcare, finance, or legal tech.
Integrating into specialized chatbots, virtual assistants, or automated knowledge bases.

Out-of-Scope Use

Generating non-English responses (English-only capability).
Handling non-technical, unrelated queries outside the AI domain.

Bias, Risks, and Limitations

Bias: Trained on domain-specific datasets, the model may exhibit biases toward AI-related terminologies or fail to generalize well in other domains.
Risks: May generate incorrect or misleading information if the query is ambiguous or goes beyond the model’s scope.
Limitations: May struggle with highly complex or nuanced queries not covered in its fine-tuning data.

Recommendations

For critical or high-stakes applications, it’s recommended to use the model with human oversight.
Regularly update the fine-tuning datasets to ensure alignment with the latest research and advancements in AI.

How to Get Started

To use the model, install the transformers library and use the following code snippet:

from transformers import pipeline

# Load the model
model = pipeline("text2text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")

# Generate a response
output = model("What is the difference between supervised and unsupervised learning?")
print(output)

Downloads last month: 3

Safetensors

Model size

1B params

Tensor type

F16

Model tree for Harikrishnan46624/finetuned_llama2-1.1b-chat

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Finetuned

(557)

this model

Harikrishnan46624
/

finetuned_llama2-1.1b-chat