15:08 pm

Update Documentation for MAMBA2 and MISTRAL Models

What is the Issue?

The input documentation for the MAMBA2 and MISTRAL models was not matching with the forward pass of their respective models. This documentation mismatch was causing issues when trying to subclass from Mistral and use the models.

What does the PR do?

This PR updates the input documentation for the MAMBA2 and MISTRAL models to include details about cache_position and attention_mask. It ensures that the documentation accurately reflects the implementation of the models.

Why is it Important?

Accurate documentation is crucial for developers to understand how to use the models correctly. This PR ensures that the documentation is consistent with the implementation, making it easier for developers to work with the MAMBA2 and MISTRAL models.

Code Snippet

Here is a code snippet that shows the changes made in the PR:

# Before
# Documentation did not include details about cache_position and attention_mask
 
# After
# Updated documentation to include details about cache_position and attention_mask
cache_position (`torch.Tensor`, *optional*): The cache position tensor.
attention_mask (`torch.Tensor`, *optional*): The attention mask tensor.

You can view the full PR here.


Links : Transformers

Tags :

Date : 16th March, Sunday, 2025, (Wikilinks: 16th March, March 25, March, 2025. Sunday)

Category : Others