Model Architecture Mismatch When Fine-tuning stt_ar_fastconformer_hybrid_large_pcd_v1.0 #15122

lekhapm90 · 2025-11-26T08:55:43Z

lekhapm90
Nov 26, 2025

Environment Information

NeMo Version: 2.5.3
Installation Method: pip install nemo_toolkit['all']
Finetuning Script: Cloned from GitHub (NeMo/examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py)
Platform: Kaggle
GPU/Accelerator: GPU (single device)

Problem Description
I'm trying to fine-tune the nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0 model using the official NeMo example script speech_to_text_hybrid_rnnt_ctc_bpe.py, but I'm getting a RuntimeError with size mismatches when loading the pretrained model's state_dict.

Command Used

python /kaggle/working/NeMo/examples/asr/asr_hybrid_transducer_ctc/speech_to_text_hybrid_rnnt_ctc_bpe.py \
  +init_from_pretrained_model="nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0" \
  model.train_ds.manifest_filepath=/kaggle/working/train_manifest.json \
  model.train_ds.batch_size=2 \
  model.train_ds.is_tarred=false \
  model.validation_ds.manifest_filepath=/kaggle/working/val_manifest.json \
  model.validation_ds.batch_size=2 \
  model.tokenizer.dir=/kaggle/working/tokenizer \
  model.tokenizer.type=bpe \
  trainer.devices=1 \
  trainer.accelerator="gpu" \
  trainer.max_epochs=1 \
  trainer.precision=16 \
  model.optim.lr=1e-4 \
  exp_manager.exp_dir=/kaggle/working/experiments

Tokenizer Setup
I extracted the tokenizer from the pretrained model using the following code:

from nemo.collections.asr.models import EncDecHybridRNNTCTCBPEModel
import os

# Load pretrained model
model = EncDecHybridRNNTCTCBPEModel.from_pretrained(
    "nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0"
)

# Extract tokenizer
save_dir = "/kaggle/working/tokenizer"
os.makedirs(save_dir, exist_ok=True)

sp = model.tokenizer.tokenizer  # SentencePieceProcessor

# Save tokenizer model
model_path = os.path.join(save_dir, "tokenizer.model")
with open(model_path, "wb") as f:
    f.write(sp.serialized_model_proto())


# Save vocabulary
vocab_path = os.path.join(save_dir, "vocab.txt")
with open(vocab_path, "w", encoding="utf-8") as f:
    for i in range(sp.get_piece_size()):
        token = sp.id_to_piece(i)
        f.write(token + "\n")

Error Message

RuntimeError: Error(s) in loading state_dict for EncDecHybridRNNTCTCBPEModel:
	size mismatch for encoder.pre_encode.out.weight: copying a param with shape torch.Size([512, 2560]) from checkpoint, the shape in current model is torch.Size([512, 10240]).
	size mismatch for encoder.pre_encode.conv.0.weight: copying a param with shape torch.Size([256, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1, 3, 3]).
	size mismatch for encoder.pre_encode.conv.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.pre_encode.conv.2.weight: copying a param with shape torch.Size([256, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
	size mismatch for encoder.pre_encode.conv.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for encoder.layers.0.conv.depthwise_conv.weight: copying a param with shape torch.Size([512, 1, 9]) from checkpoint, the shape in current model is torch.Size([512, 1, 31]).
	... (multiple similar errors for layers 1-16)

Additional Context

#Sample Train Manifest
{"audio_filepath": "/kaggle/working/train_0.wav", "duration": 3.06, "text": "فِيهِنَّ خَيْرَاتٌ حِسَانٌ"}
{"audio_filepath": "/kaggle/working/train_1.wav", "duration": 2.16, "text": "وَإِذَا النُّفوسُ زُوِّجَت"}
{"audio_filepath": "/kaggle/working/train_2.wav", "duration": 4.176, "text": "مِن نُّطْفَةٍ خَلَقَهُ فَقَدَّرَهُ"}

Any guidance on the correct way to fine-tune this specific Arabic model would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model Architecture Mismatch When Fine-tuning stt_ar_fastconformer_hybrid_large_pcd_v1.0 #15122

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Model Architecture Mismatch When Fine-tuning stt_ar_fastconformer_hybrid_large_pcd_v1.0 #15122

Uh oh!

lekhapm90 Nov 26, 2025

Error Message

Replies: 0 comments

lekhapm90
Nov 26, 2025