Skip to content

Fuse moe lora#3801

Open
cjw-d wants to merge 9 commits intoPaddlePaddle:developfrom
cjw-d:fuse_moe_lora
Open

Fuse moe lora#3801
cjw-d wants to merge 9 commits intoPaddlePaddle:developfrom
cjw-d:fuse_moe_lora

Conversation

@cjw-d
Copy link
Contributor

@cjw-d cjw-d commented Feb 3, 2026

PR types

Others

PR changes

Others

Description

fuse moe lora

@paddle-bot
Copy link

paddle-bot bot commented Feb 3, 2026

Thanks for your contribution!

@cjw-d
Copy link
Contributor Author

cjw-d commented Feb 4, 2026

/re-run all-failed

@codecov-commenter
Copy link

codecov-commenter commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 59.30233% with 105 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@1d6500a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddleformers/peft/lora/lora_layers.py 48.99% 76 Missing ⚠️
paddleformers/peft/lora/lora_model.py 26.31% 14 Missing ⚠️
paddleformers/transformers/auto/modeling.py 23.07% 10 Missing ⚠️
paddleformers/transformers/glm4_moe/modeling.py 66.66% 2 Missing ⚠️
paddleformers/nn/experts.py 97.22% 1 Missing ⚠️
paddleformers/transformers/deepseek_v3/modeling.py 80.00% 1 Missing ⚠️
paddleformers/transformers/qwen2_moe/modeling.py 83.33% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (59.30%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #3801   +/-   ##
==========================================
  Coverage           ?   32.22%           
==========================================
  Files              ?      433           
  Lines              ?    82298           
  Branches           ?        0           
==========================================
  Hits               ?    26524           
  Misses             ?    55774           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lora_module = RowParallelQuantizationLoRALinear(module, lora_config)
# Lora row parallel will spilt lora A matrix
self.add_lora_split_mapping(module_name + ".lora_A", is_column=False)
elif attribute_chain[-1] == "experts":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.这个匹配规则是否具有通用性,会不会替换其他存量模型导致问题?
2. 需要考虑如果模型的expert写法比较特殊能够流一个接口适配自定义的loraexpert
3.是否能够匹配paddlefleet的expert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改匹配规则,并且保留接口用于适配自定义的lora expert

@@ -1055,8 +1017,7 @@ def get_lora_model(self, model: Union[PretrainedModel, nn.Layer], lora_config: L
return model
if isinstance(lora_config.target_modules, str):
lora_config.target_modules = [lora_config.target_modules]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要添加相关单测

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加相关单测

lora_config.target_modules = [lora_config.target_modules]
for i in model.named_sublayers():
module_name = i[0]
for module_name, module in model.named_sublayers():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要考虑开发lora merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已适配merge_model

@@ -1055,8 +1017,7 @@ def get_lora_model(self, model: Union[PretrainedModel, nn.Layer], lora_config: L
return model
if isinstance(lora_config.target_modules, str):
lora_config.target_modules = [lora_config.target_modules]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要适配get_merge_state_dict函数

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已适配

@cjw-d
Copy link
Contributor Author

cjw-d commented Feb 4, 2026

/re-run all-failed

Copy link
Collaborator

@lugimzzz lugimzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

import paddle
import paddle.nn as nn

from .activation import ACT2FN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把其他模型的MOE也一起替换上

from ...nn.attention.interface import ALL_ATTENTION_FUNCTIONS
from ...nn.criterion.interface import CriterionLayer
from ...nn.embedding import Embedding as GeneralEmbedding
from ...nn.experts import MoeExperts as Qwen3VLMoeTextExperts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实验验证一下正确性

@cjw-d
Copy link
Contributor Author

cjw-d commented Feb 5, 2026

/re-run all-failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants