You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem Description
I'm trying to fine-tune the nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0 model using the official NeMo example script speech_to_text_hybrid_rnnt_ctc_bpe.py, but I'm getting a RuntimeError with size mismatches when loading the pretrained model's state_dict.
RuntimeError: Error(s) in loading state_dict for EncDecHybridRNNTCTCBPEModel:
size mismatch for encoder.pre_encode.out.weight: copying a param with shape torch.Size([512, 2560]) from checkpoint, the shape in current model is torch.Size([512, 10240]).
size mismatch for encoder.pre_encode.conv.0.weight: copying a param with shape torch.Size([256, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1, 3, 3]).
size mismatch for encoder.pre_encode.conv.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.pre_encode.conv.2.weight: copying a param with shape torch.Size([256, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
size mismatch for encoder.pre_encode.conv.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
size mismatch for encoder.layers.0.conv.depthwise_conv.weight: copying a param with shape torch.Size([512, 1, 9]) from checkpoint, the shape in current model is torch.Size([512, 1, 31]).
... (multiple similar errors for layers 1-16)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Environment Information
NeMo Version: 2.5.3
Installation Method: pip install nemo_toolkit['all']
Finetuning Script: Cloned from GitHub (NeMo/examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py)
Platform: Kaggle
GPU/Accelerator: GPU (single device)
Problem Description
I'm trying to fine-tune the nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0 model using the official NeMo example script speech_to_text_hybrid_rnnt_ctc_bpe.py, but I'm getting a RuntimeError with size mismatches when loading the pretrained model's state_dict.
Command Used
Tokenizer Setup
I extracted the tokenizer from the pretrained model using the following code:
Error Message
Additional Context
Any guidance on the correct way to fine-tune this specific Arabic model would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions