Update causal in flash_attn_varlen_func #14

zahrayousefijamarani · 2025-09-12T19:09:18Z

causal param should be True to generate correct answers during inference.

matthewygf · 2025-09-15T05:35:52Z

@zahrayousefijamarani Thanks for the contribution.

While you are correct that, the causal param should be true. The related issue seems to indicate that without the sparse attention impl, causal = true will lead the perf degradation. I am a bit hesitate that we turn this on without the sparse attn impl. WDYT ?

cc @egretwAlker @YaoJiayi

egretwAlker · 2025-09-16T08:05:46Z

Hi! LMCFlashAttnBackend is not called in our implementation. It is left there to refer to LMCache. We use LMCAttnBackend which is a pytorch implementation using correct causal mask. So in flash attention impl, causal on or off has degradation for accuracy either way, bc it doesn't take into account the selected token positions.
But if you find that using flash attention causal mask doesn't downgrade, let us know!

Update causal in flash_attn_varlen_func

4fddcf5

causal param should be True to generate correct answers during inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update causal in flash_attn_varlen_func #14

Update causal in flash_attn_varlen_func #14

Uh oh!

zahrayousefijamarani commented Sep 12, 2025

Uh oh!

matthewygf commented Sep 15, 2025

Uh oh!

egretwAlker commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update causal in flash_attn_varlen_func #14

Are you sure you want to change the base?

Update causal in flash_attn_varlen_func #14

Uh oh!

Conversation

zahrayousefijamarani commented Sep 12, 2025

Uh oh!

matthewygf commented Sep 15, 2025

Uh oh!

egretwAlker commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants