# ONNX Runtime Models

## Generic model classes

The following ORT classes are available for instantiating a base model class without a specific head.

### ORTModel[[optimum.onnxruntime.ORTModel]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModel</name><anchor>optimum.onnxruntime.ORTModel</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L156</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters><paramsdesc>- **-** config (`PretrainedConfig` -- The configuration of the model. --
- **-** session (`~onnxruntime.InferenceSession`) -- The ONNX Runtime InferenceSession that is running the model. --
- **-** use_io_binding (`bool`, *optional*, defaults to `True`) -- Whether to use I/O bindings with **ONNX Runtime --
- **with** the CUDAExecutionProvider**, this can significantly speedup inference depending on the task. --
- **-** model_save_dir (`Path`) -- The directory where the model exported to ONNX is saved. --
- **By** defaults, if the loaded model is local, the directory where the original model will be used. Otherwise, the --
- **cache** directory is used. --</paramsdesc><paramgroups>0</paramgroups></docstring>
Base class for implementing models using ONNX Runtime.

The ORTModel implements generic methods for interacting with the Hugging Face Hub as well as exporting vanilla
transformers models to ONNX using `optimum.exporters.onnx` toolchain.

Class attributes:
- model_type (`str`, *optional*, defaults to `"onnx_model"`) -- The name of the model type to use when
registering the ORTModel classes.
- auto_model_class (`Type`, *optional*, defaults to `AutoModel`) -- The "AutoModel" class to represented by the
current ORTModel class.





<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>can_generate</name><anchor>optimum.onnxruntime.ORTModel.can_generate</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L594</source><parameters>[]</parameters></docstring>
Returns whether this model can generate sequences with `.generate()`.

</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>from_pretrained</name><anchor>optimum.onnxruntime.ORTModel.from_pretrained</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L468</source><parameters>[{"name": "model_id", "val": ": str | Path"}, {"name": "config", "val": ": PretrainedConfig | None = None"}, {"name": "export", "val": ": bool = False"}, {"name": "subfolder", "val": ": str = ''"}, {"name": "revision", "val": ": str = 'main'"}, {"name": "force_download", "val": ": bool = False"}, {"name": "local_files_only", "val": ": bool = False"}, {"name": "trust_remote_code", "val": ": bool = False"}, {"name": "cache_dir", "val": ": str = '/home/runner/.cache/huggingface/hub'"}, {"name": "token", "val": ": bool | str | None = None"}, {"name": "provider", "val": ": str = 'CPUExecutionProvider'"}, {"name": "providers", "val": ": Sequence[str] | None = None"}, {"name": "provider_options", "val": ": Sequence[dict[str, Any]] | dict[str, Any] | None = None"}, {"name": "session_options", "val": ": SessionOptions | None = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **model_id** (`Union[str, Path]`) --
  Can be either:
  - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
    Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
    user or organization name, like `dbmdz/bert-base-german-cased`.
  - A path to a *directory* containing a model saved using `~OptimizedModel.save_pretrained`,
    e.g., `./my_model_directory/`.
- **export** (`bool`, defaults to `False`) --
  Defines whether the provided `model_id` needs to be exported to the targeted format.
- **force_download** (`bool`, defaults to `True`) --
  Whether or not to force the (re-)download of the model weights and configuration files, overriding the
  cached versions if they exist.
- **token** (`Optional[Union[bool,str]]`, defaults to `None`) --
  The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
  when running `huggingface-cli login` (stored in `huggingface_hub.constants.HF_TOKEN_PATH`).
- **cache_dir** (`Optional[str]`, defaults to `None`) --
  Path to a directory in which a downloaded pretrained model configuration should be cached if the
  standard cache should not be used.
- **subfolder** (`str`, defaults to `""`) --
  In case the relevant files are located inside a subfolder of the model repo either locally or on huggingface.co, you can
  specify the folder name here.
- **config** (`Optional[transformers.PretrainedConfig]`, defaults to `None`) --
  The model configuration.
- **local_files_only** (`Optional[bool]`, defaults to `False`) --
  Whether or not to only look at local files (i.e., do not try to download the model).
- **trust_remote_code** (`bool`, defaults to `False`) --
  Whether or not to allow for custom code defined on the Hub in their own modeling. This option should only be set
  to `True` for repositories you trust and in which you have read the code, as it will execute code present on
  the Hub on your local machine.
- **revision** (`Optional[str]`, defaults to `None`) --
  The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
  git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
  identifier allowed by git.</paramsdesc><paramgroups>0</paramgroups><rettype>`ORTModel`</rettype><retdesc>The loaded ORTModel model.</retdesc></docstring>

Instantiate a pretrained model from a pre-trained model configuration.



provider (`str`, defaults to `"CPUExecutionProvider"`):
ONNX Runtime provider to use for loading the model.
See https://onnxruntime.ai/docs/execution-providers/ for possible providers.
providers (`Optional[Sequence[str]]`, defaults to `None`):
List of execution providers to use for loading the model.
This argument takes precedence over the `provider` argument.
provider_options (`Optional[Dict[str, Any]]`, defaults to `None`):
Provider option dictionaries corresponding to the provider used. See available options
for each provider: https://onnxruntime.ai/docs/api/c/group___global.html .
session_options (`Optional[onnxruntime.SessionOptions]`, defaults to `None`),:
ONNX Runtime session options to use for loading the model.
use_io_binding (`Optional[bool]`, defaults to `None`):
Whether to use IOBinding during inference to avoid memory copy between the host and device, or between numpy/torch tensors and ONNX Runtime ORTValue. Defaults to
`True` if the execution provider is CUDAExecutionProvider. For [~onnxruntime.ORTModelForCausalLM], defaults to `True` on CPUExecutionProvider,
in all other cases defaults to `False`.
kwargs (`Dict[str, Any]`):
Will be passed to the underlying model loading methods.

> Parameters for decoder models (ORTModelForCausalLM, ORTModelForSeq2SeqLM, ORTModelForSeq2SeqLM, ORTModelForSpeechSeq2Seq, ORTModelForVision2Seq)

use_cache (`Optional[bool]`, defaults to `True`):
Whether or not past key/values cache should be used. Defaults to `True`.

use_merged (`Optional[bool]`, defaults to `None`):
whether or not to use a single ONNX that handles both the decoding without and with past key values reuse. This option defaults
to `True` if loading from a local repository and a merged decoder is found. When exporting with `export=True`,
defaults to `False`. This option should be set to `True` to minimize memory usage.






</div></div>

## Natural Language Processing

The following ORT classes are available for the following natural language processing tasks.

### ORTModelForCausalLM[[optimum.onnxruntime.ORTModelForCausalLM]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForCausalLM</name><anchor>optimum.onnxruntime.ORTModelForCausalLM</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_decoder.py#L123</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "generation_config", "val": ": GenerationConfig | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX model with a causal language modeling head for ONNX Runtime inference. This class officially supports bloom, codegen, falcon, gpt2, gpt-bigcode, gpt_neo, gpt_neox, gptj, llama.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForCausalLM.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_decoder.py#L249</source><parameters>[{"name": "input_ids", "val": ": torch.LongTensor"}, {"name": "attention_mask", "val": ": torch.LongTensor | None = None"}, {"name": "past_key_values", "val": ": tuple[tuple[torch.Tensor]] | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "use_cache", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor`) --
  Indices of decoder input sequence tokens in the vocabulary of shape `(batch_size, sequence_length)`.
- **attention_mask** (`torch.LongTensor`) --
  Mask to avoid performing attention on padding token indices, of shape
  `(batch_size, sequence_length)`. Mask values selected in `[0, 1]`.
- **past_key_values** (`tuple(tuple(torch.FloatTensor), *optional*, defaults to `None`)` --
  Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding.
  The tuple is of length `config.n_layers` with each tuple having 2 tensors of shape
  `(batch_size, num_heads, sequence_length, embed_size_per_head)`.</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForCausalLM` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForCausalLM.forward.example">

Example of text generation:

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForCausalLM
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/gpt2")
>>> model = ORTModelForCausalLM.from_pretrained("optimum/gpt2")

>>> inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")

>>> gen_tokens = model.generate(**inputs,do_sample=True,temperature=0.9, min_length=20,max_length=20)
>>> tokenizer.batch_decode(gen_tokens)
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForCausalLM.forward.example-2">

Example using `transformers.pipelines`:

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/gpt2")
>>> model = ORTModelForCausalLM.from_pretrained("optimum/gpt2")
>>> onnx_gen = pipeline("text-generation", model=model, tokenizer=tokenizer)

>>> text = "My name is Arthur and I live in"
>>> gen = onnx_gen(text)
```

</ExampleCodeBlock>


</div></div>

### ORTModelForMaskedLM[[optimum.onnxruntime.ORTModelForMaskedLM]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForMaskedLM</name><anchor>optimum.onnxruntime.ORTModelForMaskedLM</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L774</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model with a MaskedLMOutput for masked language modeling tasks. This class officially supports albert, bert, camembert, convbert, data2vec-text, deberta, deberta_v2, distilbert, electra, flaubert, ibert, mobilebert, roberta, roformer, squeezebert, xlm, xlm_roberta.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForMaskedLM.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L780</source><parameters>[{"name": "input_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Indices of input sequence tokens in the vocabulary.
  Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer).
  See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and
  [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details.
  [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
- **attention_mask** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  - 1 for tokens that are **not masked**,
  - 0 for tokens that are **masked**.
  [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
- **token_type_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  - 1 for tokens that are **sentence A**,
  - 0 for tokens that are **sentence B**.
  [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForMaskedLM` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForMaskedLM.forward.example">

Example of feature extraction:

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForMaskedLM
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-uncased-for-fill-mask")
>>> model = ORTModelForMaskedLM.from_pretrained("optimum/bert-base-uncased-for-fill-mask")

>>> inputs = tokenizer("The capital of France is [MASK].", return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 8, 28996]
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForMaskedLM.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForMaskedLM

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-uncased-for-fill-mask")
>>> model = ORTModelForMaskedLM.from_pretrained("optimum/bert-base-uncased-for-fill-mask")
>>> fill_masker = pipeline("fill-mask", model=model, tokenizer=tokenizer)

>>> text = "The capital of France is [MASK]."
>>> pred = fill_masker(text)
```

</ExampleCodeBlock>


</div></div>

### ORTModelForSeq2SeqLM[[optimum.onnxruntime.ORTModelForSeq2SeqLM]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForSeq2SeqLM</name><anchor>optimum.onnxruntime.ORTModelForSeq2SeqLM</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_seq2seq.py#L1218</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "encoder_session", "val": ": InferenceSession = None"}, {"name": "decoder_session", "val": ": InferenceSession = None"}, {"name": "decoder_with_past_session", "val": ": InferenceSession | None = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "generation_config", "val": ": GenerationConfig | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
Sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports bart, blenderbot, blenderbot-small, longt5, m2m_100, marian, mbart, mt5, pegasus, t5.
This model inherits from `~onnxruntime.modeling_ort.ORTModelForConditionalGeneration`, check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the `onnxruntime.modeling_ort.ORTModelForConditionalGeneration.from_pretrained` method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForSeq2SeqLM.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_seq2seq.py#L1226</source><parameters>[{"name": "input_ids", "val": ": torch.LongTensor = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "decoder_input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "decoder_attention_mask", "val": ": torch.LongTensor | None = None"}, {"name": "encoder_outputs", "val": ": BaseModelOutput | list[torch.FloatTensor] | None = None"}, {"name": "past_key_values", "val": ": tuple[tuple[torch.Tensor]] | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "cache_position", "val": ": torch.Tensor | None = None"}, {"name": "use_cache", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor`) --
  Indices of input sequence tokens in the vocabulary of shape `(batch_size, encoder_sequence_length)`.
- **attention_mask** (`torch.LongTensor`) --
  Mask to avoid performing attention on padding token indices, of shape
  `(batch_size, encoder_sequence_length)`. Mask values selected in `[0, 1]`.
- **decoder_input_ids** (`torch.LongTensor`) --
  Indices of decoder input sequence tokens in the vocabulary of shape `(batch_size, decoder_sequence_length)`.
- **encoder_outputs** (`torch.FloatTensor`) --
  The encoder `last_hidden_state` of shape `(batch_size, encoder_sequence_length, hidden_size)`.
- **past_key_values** (`tuple(tuple(torch.FloatTensor), *optional*, defaults to `None`)` --
  Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding.
  The tuple is of length `config.n_layers` with each tuple having 2 tensors of shape
  `(batch_size, num_heads, decoder_sequence_length, embed_size_per_head)` and 2 additional tensors of shape
  `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`.</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForSeq2SeqLM` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSeq2SeqLM.forward.example">

Example of text generation:

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
>>> model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")

>>> inputs = tokenizer("My name is Eustache and I like to", return_tensors="pt")

>>> gen_tokens = model.generate(**inputs)
>>> outputs = tokenizer.batch_decode(gen_tokens)
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSeq2SeqLM.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
>>> model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")
>>> onnx_translation = pipeline("translation_en_to_de", model=model, tokenizer=tokenizer)

>>> text = "My name is Eustache."
>>> pred = onnx_translation(text)
```

</ExampleCodeBlock>


</div></div>

### ORTModelForSequenceClassification[[optimum.onnxruntime.ORTModelForSequenceClassification]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForSequenceClassification</name><anchor>optimum.onnxruntime.ORTModelForSequenceClassification</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L988</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model with a sequence classification/regression head on top (a linear layer on top of the
pooled output) e.g. for GLUE tasks. This class officially supports albert, bart, bert, camembert, convbert, data2vec-text, deberta, deberta_v2, distilbert, electra, flaubert, ibert, mbart, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.

This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForSequenceClassification.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L996</source><parameters>[{"name": "input_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Indices of input sequence tokens in the vocabulary.
  Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer).
  See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and
  [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details.
  [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
- **attention_mask** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  - 1 for tokens that are **not masked**,
  - 0 for tokens that are **masked**.
  [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
- **token_type_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  - 1 for tokens that are **sentence A**,
  - 0 for tokens that are **sentence B**.
  [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForSequenceClassification` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSequenceClassification.forward.example">

Example of single-label classification:

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForSequenceClassification
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")
>>> model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 2]
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSequenceClassification.forward.example-2">

Example using `transformers.pipelines`:

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")
>>> model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")
>>> onnx_classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

>>> text = "Hello, my dog is cute"
>>> pred = onnx_classifier(text)
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSequenceClassification.forward.example-3">

Example using zero-shot-classification `transformers.pipelines`:

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-mnli")
>>> model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-mnli")
>>> onnx_z0 = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)

>>> sequence_to_classify = "Who are you voting for in 2020?"
>>> candidate_labels = ["Europe", "public health", "politics", "elections"]
>>> pred = onnx_z0(sequence_to_classify, candidate_labels, multi_label=True)
```

</ExampleCodeBlock>


</div></div>

### ORTModelForTokenClassification[[optimum.onnxruntime.ORTModelForTokenClassification]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForTokenClassification</name><anchor>optimum.onnxruntime.ORTModelForTokenClassification</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1089</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g.
for Named-Entity-Recognition (NER) tasks. This class officially supports albert, bert, bloom, camembert, convbert, data2vec-text, deberta, deberta_v2, distilbert, electra, flaubert, gpt2, ibert, mobilebert, roberta, roformer, squeezebert, xlm, xlm_roberta.


This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForTokenClassification.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1098</source><parameters>[{"name": "input_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Indices of input sequence tokens in the vocabulary.
  Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer).
  See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and
  [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details.
  [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
- **attention_mask** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  - 1 for tokens that are **not masked**,
  - 0 for tokens that are **masked**.
  [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
- **token_type_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  - 1 for tokens that are **sentence A**,
  - 0 for tokens that are **sentence B**.
  [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForTokenClassification` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForTokenClassification.forward.example">

Example of token classification:

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForTokenClassification
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-NER")
>>> model = ORTModelForTokenClassification.from_pretrained("optimum/bert-base-NER")

>>> inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 12, 9]
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForTokenClassification.forward.example-2">

Example using `transformers.pipelines`:

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForTokenClassification

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-NER")
>>> model = ORTModelForTokenClassification.from_pretrained("optimum/bert-base-NER")
>>> onnx_ner = pipeline("token-classification", model=model, tokenizer=tokenizer)

>>> text = "My name is Philipp and I live in Germany."
>>> pred = onnx_ner(text)
```

</ExampleCodeBlock>


</div></div>

### ORTModelForMultipleChoice[[optimum.onnxruntime.ORTModelForMultipleChoice]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForMultipleChoice</name><anchor>optimum.onnxruntime.ORTModelForMultipleChoice</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1185</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a
softmax) e.g. for RocStories/SWAG tasks. This class officially supports albert, bert, camembert, convbert, data2vec-text, deberta_v2, distilbert, electra, flaubert, ibert, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.

This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForMultipleChoice.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1193</source><parameters>[{"name": "input_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Indices of input sequence tokens in the vocabulary.
  Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer).
  See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and
  [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details.
  [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
- **attention_mask** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  - 1 for tokens that are **not masked**,
  - 0 for tokens that are **masked**.
  [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
- **token_type_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  - 1 for tokens that are **sentence A**,
  - 0 for tokens that are **sentence B**.
  [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForMultipleChoice` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForMultipleChoice.forward.example">

Example of multiple choice:

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForMultipleChoice

>>> tokenizer = AutoTokenizer.from_pretrained("ehdwns1516/bert-base-uncased_SWAG")
>>> model = ORTModelForMultipleChoice.from_pretrained("ehdwns1516/bert-base-uncased_SWAG", export=True)

>>> num_choices = 4
>>> first_sentence = ["Members of the procession walk down the street holding small horn brass instruments."] * num_choices
>>> second_sentence = [
...     "A drum line passes by walking down the street playing their instruments.",
...     "A drum line has heard approaching them.",
...     "A drum line arrives and they're outside dancing and asleep.",
...     "A drum line turns the lead singer watches the performance."
... ]
>>> inputs = tokenizer(first_sentence, second_sentence, truncation=True, padding=True)

# Unflatten the inputs values expanding it to the shape [batch_size, num_choices, seq_length]
>>> for k, v in inputs.items():
...     inputs[k] = [v[i: i + num_choices] for i in range(0, len(v), num_choices)]
>>> inputs = dict(inputs.convert_to_tensors(tensor_type="pt"))
>>> outputs = model(**inputs)
>>> logits = outputs.logits
```

</ExampleCodeBlock>


</div></div>

### ORTModelForQuestionAnswering[[optimum.onnxruntime.ORTModelForQuestionAnswering]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForQuestionAnswering</name><anchor>optimum.onnxruntime.ORTModelForQuestionAnswering</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L873</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model with a QuestionAnsweringModelOutput for extractive question-answering tasks like SQuAD. This class officially supports albert, bart, bert, camembert, convbert, data2vec-text, deberta, deberta_v2, distilbert, electra, flaubert, gptj, ibert, mbart, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForQuestionAnswering.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L879</source><parameters>[{"name": "input_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Indices of input sequence tokens in the vocabulary.
  Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer).
  See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and
  [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details.
  [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
- **attention_mask** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  - 1 for tokens that are **not masked**,
  - 0 for tokens that are **masked**.
  [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
- **token_type_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  - 1 for tokens that are **sentence A**,
  - 0 for tokens that are **sentence B**.
  [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForQuestionAnswering` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForQuestionAnswering.forward.example">

Example of question answering:

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> inputs = tokenizer(question, text, return_tensors="np")
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([3])

>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits
```

</ExampleCodeBlock>
<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForQuestionAnswering.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForQuestionAnswering

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2")
>>> model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2")
>>> onnx_qa = pipeline("question-answering", model=model, tokenizer=tokenizer)

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> pred = onnx_qa(question, text)
```

</ExampleCodeBlock>


</div></div>

## Computer vision

The following ORT classes are available for the following computer vision tasks.

### ORTModelForImageClassification[[optimum.onnxruntime.ORTModelForImageClassification]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForImageClassification</name><anchor>optimum.onnxruntime.ORTModelForImageClassification</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1290</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model for image-classification tasks. This class officially supports beit, convnext, convnextv2, data2vec-vision, deit, dinov2, levit, mobilenet_v1, mobilenet_v2, mobilevit, poolformer, resnet, segformer, swin, swinv2, vit.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForImageClassification.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1295</source><parameters>[{"name": "pixel_values", "val": ": torch.Tensor | np.ndarray"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **pixel_values** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, num_channels, height, width)`, defaults to `None`) --
  Pixel values corresponding to the images in the current batch.
  Pixel values can be obtained from encoded images using [`AutoFeatureExtractor`](https://huggingface.co/docs/transformers/autoclass_tutorial#autofeatureextractor).</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForImageClassification` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForImageClassification.forward.example">

Example of image classification:

```python
>>> import requests
>>> from PIL import Image
>>> from optimum.onnxruntime import ORTModelForImageClassification
>>> from transformers import AutoFeatureExtractor

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> preprocessor = AutoFeatureExtractor.from_pretrained("optimum/vit-base-patch16-224")
>>> model = ORTModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224")

>>> inputs = preprocessor(images=image, return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForImageClassification.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> import requests
>>> from PIL import Image
>>> from transformers import AutoFeatureExtractor, pipeline
>>> from optimum.onnxruntime import ORTModelForImageClassification

>>> preprocessor = AutoFeatureExtractor.from_pretrained("optimum/vit-base-patch16-224")
>>> model = ORTModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224")
>>> onnx_image_classifier = pipeline("image-classification", model=model, feature_extractor=preprocessor)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> pred = onnx_image_classifier(url)
```

</ExampleCodeBlock>


</div></div>

### ORTModelForZeroShotImageClassification[[optimum.onnxruntime.ORTModelForZeroShotImageClassification]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForZeroShotImageClassification</name><anchor>optimum.onnxruntime.ORTModelForZeroShotImageClassification</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1346</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model for zero-shot-image-classification tasks. This class officially supports clip, metaclip-2.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForZeroShotImageClassification.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1351</source><parameters>[{"name": "input_ids", "val": ": torch.Tensor | np.ndarray"}, {"name": "pixel_values", "val": ": torch.Tensor | np.ndarray"}, {"name": "attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer).
See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and
[`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details.
[What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
- **attention_mask** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  - 1 for tokens that are **not masked**,
  - 0 for tokens that are **masked**.
  [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
- **token_type_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  - 1 for tokens that are **sentence A**,
  - 0 for tokens that are **sentence B**.
  [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)
- **pixel_values** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, num_channels, height, width)`, defaults to `None`) --
  Pixel values corresponding to the images in the current batch.
  Pixel values can be obtained from encoded images using [`AutoFeatureExtractor`](https://huggingface.co/docs/transformers/autoclass_tutorial#autofeatureextractor).</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForZeroShotImageClassification` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>








</div></div>

### ORTModelForSemanticSegmentation[[optimum.onnxruntime.ORTModelForSemanticSegmentation]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForSemanticSegmentation</name><anchor>optimum.onnxruntime.ORTModelForSemanticSegmentation</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1446</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring>
ONNX Model for semantic-segmentation, with an all-MLP decode head on top e.g. for ADE20k, CityScapes. This class officially supports maskformer, segformer.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForSemanticSegmentation.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1460</source><parameters>[{"name": "pixel_values", "val": ": torch.Tensor | np.ndarray"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **pixel_values** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, num_channels, height, width)`, defaults to `None`) --
  Pixel values corresponding to the images in the current batch.
  Pixel values can be obtained from encoded images using [`AutoFeatureExtractor`](https://huggingface.co/docs/transformers/autoclass_tutorial#autofeatureextractor).</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForSemanticSegmentation` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSemanticSegmentation.forward.example">

Example of semantic segmentation:

```python
>>> import requests
>>> from PIL import Image
>>> from optimum.onnxruntime import ORTModelForSemanticSegmentation
>>> from transformers import AutoFeatureExtractor

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> preprocessor = AutoFeatureExtractor.from_pretrained("optimum/segformer-b0-finetuned-ade-512-512")
>>> model = ORTModelForSemanticSegmentation.from_pretrained("optimum/segformer-b0-finetuned-ade-512-512")

>>> inputs = preprocessor(images=image, return_tensors="np")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSemanticSegmentation.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> import requests
>>> from PIL import Image
>>> from transformers import AutoFeatureExtractor, pipeline
>>> from optimum.onnxruntime import ORTModelForSemanticSegmentation

>>> preprocessor = AutoFeatureExtractor.from_pretrained("optimum/segformer-b0-finetuned-ade-512-512")
>>> model = ORTModelForSemanticSegmentation.from_pretrained("optimum/segformer-b0-finetuned-ade-512-512")
>>> onnx_image_segmenter = pipeline("image-segmentation", model=model, feature_extractor=preprocessor)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> pred = onnx_image_segmenter(url)
```

</ExampleCodeBlock>


</div></div>

## Audio

The following ORT classes are available for the following audio tasks.

### ORTModelForAudioClassification[[optimum.onnxruntime.ORTModelForAudioClassification]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForAudioClassification</name><anchor>optimum.onnxruntime.ORTModelForAudioClassification</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1565</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model for audio-classification, with a sequence classification head on top (a linear layer over the pooled output) for tasks like
SUPERB Keyword Spotting. This class officially supports audio_spectrogram_transformer, data2vec-audio, hubert, sew, sew-d, unispeech, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.

This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForAudioClassification.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1573</source><parameters>[{"name": "input_values", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "input_features", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_values** (`torch.Tensor` of shape `(batch_size, sequence_length)`) --
  Float values of input raw speech waveform..
  Input values can be obtained from audio file loaded into an array using [`AutoFeatureExtractor`](https://huggingface.co/docs/transformers/autoclass_tutorial#autofeatureextractor).</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForAudioClassification` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForAudioClassification.forward.example">

Example of audio classification:

```python
>>> from transformers import AutoFeatureExtractor
>>> from optimum.onnxruntime import ORTModelForAudioClassification
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("optimum/hubert-base-superb-ks")
>>> model = ORTModelForAudioClassification.from_pretrained("optimum/hubert-base-superb-ks")

>>> # audio file is decoded on the fly
>>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")

>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> predicted_class_ids = torch.argmax(logits, dim=-1).item()
>>> predicted_label = model.config.id2label[predicted_class_ids]
```

</ExampleCodeBlock>
<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForAudioClassification.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> from transformers import AutoFeatureExtractor, pipeline
>>> from optimum.onnxruntime import ORTModelForAudioClassification

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("optimum/hubert-base-superb-ks")
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")

>>> model = ORTModelForAudioClassification.from_pretrained("optimum/hubert-base-superb-ks")
>>> onnx_ac = pipeline("audio-classification", model=model, feature_extractor=feature_extractor)

>>> pred = onnx_ac(dataset[0]["audio"]["array"])
```

</ExampleCodeBlock>


</div></div>

### ORTModelForAudioFrameClassification[[optimum.onnxruntime.ORTModelForAudioFrameClassification]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForAudioFrameClassification</name><anchor>optimum.onnxruntime.ORTModelForAudioFrameClassification</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1854</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model with a frame classification head on top for tasks like Speaker Diarization. This class officially supports data2vec-audio, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForAudioFrameClassification.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1860</source><parameters>[{"name": "input_values", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_values** (`torch.Tensor` of shape `(batch_size, sequence_length)`) --
  Float values of input raw speech waveform..
  Input values can be obtained from audio file loaded into an array using [`AutoFeatureExtractor`](https://huggingface.co/docs/transformers/autoclass_tutorial#autofeatureextractor).</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForAudioFrameClassification` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForAudioFrameClassification.forward.example">

Example of audio frame classification:

```python
>>> from transformers import AutoFeatureExtractor
>>> from optimum.onnxruntime import ORTModelForAudioFrameClassification
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("optimum/wav2vec2-base-superb-sd")
>>> model =  ORTModelForAudioFrameClassification.from_pretrained("optimum/wav2vec2-base-superb-sd")

>>> inputs = feature_extractor(dataset[0]["audio"]["array"], return_tensors="pt", sampling_rate=sampling_rate)
>>> with torch.no_grad():
...     logits = model(**inputs).logits

>>> probabilities = torch.sigmoid(logits[0])
>>> labels = (probabilities > 0.5).long()
>>> labels[0].tolist()
```

</ExampleCodeBlock>


</div></div>

### ORTModelForCTC[[optimum.onnxruntime.ORTModelForCTC]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForCTC</name><anchor>optimum.onnxruntime.ORTModelForCTC</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1667</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model with a language modeling head on top for Connectionist Temporal Classification (CTC). This class officially supports data2vec-audio, hubert, sew, sew-d, unispeech, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForCTC.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1673</source><parameters>[{"name": "input_values", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "input_features", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_values** (`torch.Tensor` of shape `(batch_size, sequence_length)`) --
  Float values of input raw speech waveform..
  Input values can be obtained from audio file loaded into an array using [`AutoFeatureExtractor`](https://huggingface.co/docs/transformers/autoclass_tutorial#autofeatureextractor).</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForCTC` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForCTC.forward.example">

Example of CTC:

```python
>>> from transformers import AutoProcessor, HubertForCTC
>>> from optimum.onnxruntime import ORTModelForCTC
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> processor = AutoProcessor.from_pretrained("optimum/hubert-large-ls960-ft")
>>> model = ORTModelForCTC.from_pretrained("optimum/hubert-large-ls960-ft")

>>> # audio file is decoded on the fly
>>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
>>> with torch.no_grad():
...     logits = model(**inputs).logits
>>> predicted_ids = torch.argmax(logits, dim=-1)

>>> transcription = processor.batch_decode(predicted_ids)
```

</ExampleCodeBlock>


</div></div>

### ORTModelForSpeechSeq2Seq[[optimum.onnxruntime.ORTModelForSpeechSeq2Seq]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForSpeechSeq2Seq</name><anchor>optimum.onnxruntime.ORTModelForSpeechSeq2Seq</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_seq2seq.py#L1283</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring>
Speech sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports whisper, speech_to_text.
This model inherits from `~onnxruntime.modeling_ort.ORTModelForConditionalGeneration`, check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the `onnxruntime.modeling_ort.ORTModelForConditionalGeneration.from_pretrained` method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForSpeechSeq2Seq.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_seq2seq.py#L1308</source><parameters>[{"name": "input_features", "val": ": torch.FloatTensor | None = None"}, {"name": "attention_mask", "val": ": torch.LongTensor | None = None"}, {"name": "decoder_input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "decoder_attention_mask", "val": ": torch.LongTensor | None = None"}, {"name": "encoder_outputs", "val": ": tuple[tuple[torch.Tensor]] | None = None"}, {"name": "past_key_values", "val": ": tuple[tuple[torch.Tensor]] | None = None"}, {"name": "cache_position", "val": ": torch.Tensor | None = None"}, {"name": "use_cache", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_features** (`torch.FloatTensor`) --
  Mel features extracted from the raw speech waveform.
  `(batch_size, feature_size, encoder_sequence_length)`.
- **decoder_input_ids** (`torch.LongTensor`) --
  Indices of decoder input sequence tokens in the vocabulary of shape `(batch_size, decoder_sequence_length)`.
- **encoder_outputs** (`torch.FloatTensor`) --
  The encoder `last_hidden_state` of shape `(batch_size, encoder_sequence_length, hidden_size)`.
- **past_key_values** (`tuple(tuple(torch.FloatTensor), *optional*, defaults to `None`)` --
  Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding.
  The tuple is of length `config.n_layers` with each tuple having 2 tensors of shape
  `(batch_size, num_heads, decoder_sequence_length, embed_size_per_head)` and 2 additional tensors of shape
  `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`.</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForSpeechSeq2Seq` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSpeechSeq2Seq.forward.example">

Example of text generation:

```python
>>> from transformers import AutoProcessor
>>> from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
>>> from datasets import load_dataset

>>> processor = AutoProcessor.from_pretrained("optimum/whisper-tiny.en")
>>> model = ORTModelForSpeechSeq2Seq.from_pretrained("optimum/whisper-tiny.en")

>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> inputs = processor.feature_extractor(ds[0]["audio"]["array"], return_tensors="pt")

>>> gen_tokens = model.generate(inputs=inputs.input_features)
>>> outputs = processor.tokenizer.batch_decode(gen_tokens)
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForSpeechSeq2Seq.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> from transformers import AutoProcessor, pipeline
>>> from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
>>> from datasets import load_dataset

>>> processor = AutoProcessor.from_pretrained("optimum/whisper-tiny.en")
>>> model = ORTModelForSpeechSeq2Seq.from_pretrained("optimum/whisper-tiny.en")
>>> speech_recognition = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor)

>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> pred = speech_recognition(ds[0]["audio"]["array"])
```

</ExampleCodeBlock>


</div></div>

### ORTModelForAudioXVector[[optimum.onnxruntime.ORTModelForAudioXVector]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForAudioXVector</name><anchor>optimum.onnxruntime.ORTModelForAudioXVector</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1768</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model with an XVector feature extraction head on top for tasks like Speaker Verification. This class officially supports data2vec-audio, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForAudioXVector.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L1774</source><parameters>[{"name": "input_values", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_values** (`torch.Tensor` of shape `(batch_size, sequence_length)`) --
  Float values of input raw speech waveform..
  Input values can be obtained from audio file loaded into an array using [`AutoFeatureExtractor`](https://huggingface.co/docs/transformers/autoclass_tutorial#autofeatureextractor).</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForAudioXVector` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForAudioXVector.forward.example">

Example of Audio XVector:

```python
>>> from transformers import AutoFeatureExtractor
>>> from optimum.onnxruntime import ORTModelForAudioXVector
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("optimum/wav2vec2-base-superb-sv")
>>> model = ORTModelForAudioXVector.from_pretrained("optimum/wav2vec2-base-superb-sv")

>>> # audio file is decoded on the fly
>>> inputs = feature_extractor(
...     [d["array"] for d in dataset[:2]["audio"]], sampling_rate=sampling_rate, return_tensors="pt", padding=True
... )
>>> with torch.no_grad():
...     embeddings = model(**inputs).embeddings

>>> embeddings = torch.nn.functional.normalize(embeddings, dim=-1).cpu()

>>> cosine_sim = torch.nn.CosineSimilarity(dim=-1)
>>> similarity = cosine_sim(embeddings[0], embeddings[1])
>>> threshold = 0.7
>>> if similarity < threshold:
...     print("Speakers are not the same!")
>>> round(similarity.item(), 2)
```

</ExampleCodeBlock>


</div></div>

## Multimodal

The following ORT classes are available for the following multimodal tasks.

### ORTModelForVision2Seq[[optimum.onnxruntime.ORTModelForVision2Seq]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForVision2Seq</name><anchor>optimum.onnxruntime.ORTModelForVision2Seq</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_seq2seq.py#L1455</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "encoder_session", "val": ": InferenceSession = None"}, {"name": "decoder_session", "val": ": InferenceSession = None"}, {"name": "decoder_with_past_session", "val": ": InferenceSession | None = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "generation_config", "val": ": GenerationConfig | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
Vision sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports vision encoder-decoder and pix2struct.
This model inherits from `~onnxruntime.modeling_ort.ORTModelForConditionalGeneration`, check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the `onnxruntime.modeling_ort.ORTModelForConditionalGeneration.from_pretrained` method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForVision2Seq.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_seq2seq.py#L1470</source><parameters>[{"name": "pixel_values", "val": ": torch.FloatTensor | None = None"}, {"name": "attention_mask", "val": ": torch.LongTensor | None = None"}, {"name": "decoder_input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "decoder_attention_mask", "val": ": torch.BoolTensor | None = None"}, {"name": "encoder_outputs", "val": ": BaseModelOutput | list[torch.FloatTensor] | None = None"}, {"name": "past_key_values", "val": ": tuple[tuple[torch.Tensor]] | None = None"}, {"name": "cache_position", "val": ": torch.Tensor | None = None"}, {"name": "use_cache", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **pixel_values** (`torch.FloatTensor`) --
  Features extracted from an Image. This tensor should be of shape
  `(batch_size, num_channels, height, width)`.
- **decoder_input_ids** (`torch.LongTensor`) --
  Indices of decoder input sequence tokens in the vocabulary of shape `(batch_size, decoder_sequence_length)`.
- **encoder_outputs** (`torch.FloatTensor`) --
  The encoder `last_hidden_state` of shape `(batch_size, encoder_sequence_length, hidden_size)`.
- **past_key_values** (`tuple(tuple(torch.FloatTensor), *optional*, defaults to `None`)` --
  Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding.
  The tuple is of length `config.n_layers` with each tuple having 2 tensors of shape
  `(batch_size, num_heads, decoder_sequence_length, embed_size_per_head)` and 2 additional tensors of shape
  `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`.</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForVision2Seq` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForVision2Seq.forward.example">

Example of text generation:

```python
>>> from transformers import AutoImageProcessor, AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForVision2Seq
>>> from PIL import Image
>>> import requests


>>> processor = AutoImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
>>> tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
>>> model = ORTModelForVision2Seq.from_pretrained("nlpconnect/vit-gpt2-image-captioning", export=True)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> inputs = processor(image, return_tensors="pt")

>>> gen_tokens = model.generate(**inputs)
>>> outputs = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)

```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForVision2Seq.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> from transformers import AutoImageProcessor, AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForVision2Seq
>>> from PIL import Image
>>> import requests


>>> processor = AutoImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
>>> tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
>>> model = ORTModelForVision2Seq.from_pretrained("nlpconnect/vit-gpt2-image-captioning", export=True)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> image_to_text = pipeline("image-to-text", model=model, tokenizer=tokenizer, feature_extractor=processor, image_processor=processor)
>>> pred = image_to_text(image)
```

</ExampleCodeBlock>


</div></div>

### ORTModelForPix2Struct[[optimum.onnxruntime.ORTModelForPix2Struct]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForPix2Struct</name><anchor>optimum.onnxruntime.ORTModelForPix2Struct</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_seq2seq.py#L1527</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "encoder_session", "val": ": InferenceSession = None"}, {"name": "decoder_session", "val": ": InferenceSession = None"}, {"name": "decoder_with_past_session", "val": ": InferenceSession | None = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "generation_config", "val": ": GenerationConfig | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
Pix2Struct model with a language modeling head for ONNX Runtime inference. This class officially supports pix2struct.
This model inherits from `~onnxruntime.modeling_ort.ORTModelForConditionalGeneration`, check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the `onnxruntime.modeling_ort.ORTModelForConditionalGeneration.from_pretrained` method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForPix2Struct.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_seq2seq.py#L1540</source><parameters>[{"name": "flattened_patches", "val": ": torch.FloatTensor | None = None"}, {"name": "attention_mask", "val": ": torch.LongTensor | None = None"}, {"name": "decoder_input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "decoder_attention_mask", "val": ": torch.BoolTensor | None = None"}, {"name": "encoder_outputs", "val": ": BaseModelOutput | list[torch.FloatTensor] | None = None"}, {"name": "past_key_values", "val": ": tuple[tuple[torch.Tensor]] | None = None"}, {"name": "use_cache", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **flattened_patches** (`torch.FloatTensor` of shape `(batch_size, seq_length, hidden_size)`) --
  Flattened pixel patches. the `hidden_size` is obtained by the following formula: `hidden_size` =
  `num_channels` * `patch_size` * `patch_size`
  The process of flattening the pixel patches is done by `Pix2StructProcessor`.
- **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) --
  Mask to avoid performing attention on padding token indices.
- **decoder_input_ids** (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*) --
  Indices of decoder input sequence tokens in the vocabulary.
  Pix2StructText uses the `pad_token_id` as the starting token for `decoder_input_ids` generation. If
  `past_key_values` is used, optionally only the last `decoder_input_ids` have to be input (see
  `past_key_values`).
- **decoder_attention_mask** (`torch.BoolTensor` of shape `(batch_size, target_sequence_length)`, *optional*) --
  Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also
  be used by default.
- **encoder_outputs** (`tuple(tuple(torch.FloatTensor)`, *optional*) --
  Tuple consists of (`last_hidden_state`, `optional`: *hidden_states*, `optional`: *attentions*)
  `last_hidden_state` of shape `(batch_size, sequence_length, hidden_size)` is a sequence of hidden states at
  the output of the last layer of the encoder. Used in the cross-attention of the decoder.
- **past_key_values** (`tuple(tuple(torch.FloatTensor), *optional*, defaults to `None`)` --
  Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding.
  The tuple is of length `config.n_layers` with each tuple having 2 tensors of shape
  `(batch_size, num_heads, decoder_sequence_length, embed_size_per_head)` and 2 additional tensors of shape
  `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`.</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForPix2Struct` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForPix2Struct.forward.example">

Example of pix2struct:

```python
>>> from transformers import AutoProcessor
>>> from optimum.onnxruntime import ORTModelForPix2Struct
>>> from PIL import Image
>>> import requests

>>> processor = AutoProcessor.from_pretrained("google/pix2struct-ai2d-base")
>>> model = ORTModelForPix2Struct.from_pretrained("google/pix2struct-ai2d-base", export=True, use_io_binding=True)

>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
>>> inputs = processor(images=image, text=question, return_tensors="pt")

>>> gen_tokens = model.generate(**inputs)
>>> outputs = processor.batch_decode(gen_tokens, skip_special_tokens=True)
```

</ExampleCodeBlock>


</div></div>

## Custom Tasks

The following ORT classes are available for the following custom tasks.

#### ORTModelForCustomTasks[[optimum.onnxruntime.ORTModelForCustomTasks]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForCustomTasks</name><anchor>optimum.onnxruntime.ORTModelForCustomTasks</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L2019</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model for any custom tasks. It can be used to leverage the inference acceleration for any single-file ONNX model, that may use custom inputs and outputs.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForCustomTasks.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L2022</source><parameters>[{"name": "**model_inputs", "val": ": torch.Tensor | np.ndarray"}]</parameters></docstring>
The `ORTModelForCustomTasks` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForCustomTasks.forward.example">

Example of custom tasks(e.g. a sentence transformers taking `pooler_output` as output):

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForCustomTasks

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/sbert-all-MiniLM-L6-with-pooler")
>>> model = ORTModelForCustomTasks.from_pretrained("optimum/sbert-all-MiniLM-L6-with-pooler")

>>> inputs = tokenizer("I love burritos!", return_tensors="np")

>>> outputs = model(**inputs)
>>> last_hidden_state = outputs.last_hidden_state
>>> pooler_output = outputs.pooler_output
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForCustomTasks.forward.example-2">

Example using `transformers.pipelines`(only if the task is supported):

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForCustomTasks

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/sbert-all-MiniLM-L6-with-pooler")
>>> model = ORTModelForCustomTasks.from_pretrained("optimum/sbert-all-MiniLM-L6-with-pooler")
>>> onnx_extractor = pipeline("feature-extraction", model=model, tokenizer=tokenizer)

>>> text = "I love burritos!"
>>> pred = onnx_extractor(text)
```

</ExampleCodeBlock>


</div></div>

#### ORTModelForFeatureExtraction[[optimum.onnxruntime.ORTModelForFeatureExtraction]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class optimum.onnxruntime.ORTModelForFeatureExtraction</name><anchor>optimum.onnxruntime.ORTModelForFeatureExtraction</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L648</source><parameters>[{"name": "config", "val": ": PretrainedConfig = None"}, {"name": "session", "val": ": InferenceSession = None"}, {"name": "use_io_binding", "val": ": bool | None = None"}, {"name": "model_save_dir", "val": ": str | Path | TemporaryDirectory | None = None"}]</parameters></docstring>
ONNX Model for feature-extraction task.
This model inherits from [ORTModel](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel), check its documentation for the generic methods the
library implements for all its model (such as downloading or saving).

This class should be initialized using the [onnxruntime.modeling_ort.ORTModel.from_pretrained()](/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) method.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>forward</name><anchor>optimum.onnxruntime.ORTModelForFeatureExtraction.forward</anchor><source>https://github.com/huggingface/optimum-onnx/blob/main/optimum/onnxruntime/modeling_ort.py#L654</source><parameters>[{"name": "input_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "position_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "pixel_values", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "visual_embeds", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "visual_attention_mask", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "visual_token_type_ids", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "input_features", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "input_values", "val": ": torch.Tensor | np.ndarray | None = None"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Indices of input sequence tokens in the vocabulary.
  Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer).
  See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and
  [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details.
  [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
- **attention_mask** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  - 1 for tokens that are **not masked**,
  - 0 for tokens that are **masked**.
  [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
- **token_type_ids** (`Union[torch.Tensor, np.ndarray, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) --
  Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  - 1 for tokens that are **sentence A**,
  - 0 for tokens that are **sentence B**.
  [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)</paramsdesc><paramgroups>0</paramgroups></docstring>
The `ORTModelForFeatureExtraction` forward method, overrides the `__call__` special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the `Module`
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.

</Tip>



<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForFeatureExtraction.forward.example">

Example of feature extraction:

```python
>>> from transformers import AutoTokenizer
>>> from optimum.onnxruntime import ORTModelForFeatureExtraction
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/all-MiniLM-L6-v2")
>>> model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")

>>> inputs = tokenizer("My name is Philipp and I live in Germany.", return_tensors="np")

>>> outputs = model(**inputs)
>>> last_hidden_state = outputs.last_hidden_state
>>> list(last_hidden_state.shape)
[1, 12, 384]
```

</ExampleCodeBlock>

<ExampleCodeBlock anchor="optimum.onnxruntime.ORTModelForFeatureExtraction.forward.example-2">

Example using `transformers.pipeline`:

```python
>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForFeatureExtraction

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/all-MiniLM-L6-v2")
>>> model = ORTModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2")
>>> onnx_extractor = pipeline("feature-extraction", model=model, tokenizer=tokenizer)

>>> text = "My name is Philipp and I live in Germany."
>>> pred = onnx_extractor(text)
```

</ExampleCodeBlock>


</div></div>