AWS Trainium & Inferentia documentation
Inferentia Exporter
Optimum Neuron
🤗 Optimum NeuronEC2 SetupQuickstartSupported ArchitecturesOptimum Containers Notebooks
How-To Guides
Neuron model cacheDistributed TrainingExport a model to InferentiaInference pipelines with AWS NeuronInference on Neuron platforms using vLLMDeploying a LLM Model with Inference EndpointsBenchmarking LLM performance with vLLM on AWS Inferentia2
Training Tutorials
Fine-tune BERT for Text Classification
How-to Fine-Tune LLMs
Inference Tutorials
EC2
SageMaker
Inference Endpoints
Inference Benchmarks
Contribute
Set up a development environmentAdd a custom model implementation for trainingAdd inference support for a new model architecture
Training API
Models and Pipelines Inference API
Inferentia Exporter
You can export a PyTorch model to Neuron with 🤗 Optimum to run inference on AWS Inferentia 1 and Inferentia 2.
Export functions
There is an export function for each generation of the Inferentia accelerator, export_neuron
for INF1 and export_neuronx on INF2, but you will be able to use directly the export function export, which will select the proper
exporting function according to the environment.
Besides, you can check if the exported model is valid via validate_model_outputs, which compares
the compiled model’s output on Neuron devices to the PyTorch model’s output on CPU.