Enhance vectorizer config handling in project.py#735
Enhance vectorizer config handling in project.py#735Kaedeser wants to merge 4 commits intoOpenSPG:masterfrom
Conversation
解决issue719问题。原因应该是向量化器配置在校验/导入时没有拿到 type (或拿到的是空配置),于是框架退回去尝试按基类构造,触发 pyhocon 的缺键异常,最终包装成 invalid vectorizer config: 'No configuration setting found for key name' 。我已经做出修改同时兼容 vectorizer 和 vectorize_model 两种写法,并把维度写回两边,避免后续流程再读空:
whfcarter
left a comment
There was a problem hiding this comment.
If both keys exist with different values, the code prioritizes vectorizer but doesn't warn users about config conflicts
knext/command/sub_command/project.py
Outdated
| vectorize_model_config_checker = VectorizeModelConfigChecker() | ||
| llm_config = config.get("chat_llm", {}) | ||
| vectorize_model_config = config.get("vectorizer", {}) | ||
| vectorize_model_config = config.get("vectorizer", {}) or config.get( |
There was a problem hiding this comment.
Empty dict triggers the second part of 'or'
knext/command/sub_command/project.py
Outdated
| "vectorize_model", {} | ||
| ) | ||
| if "vectorizer" not in config and "vectorize_model" in config: | ||
| config["vectorizer"] = config.get("vectorize_model", {}) |
There was a problem hiding this comment.
is unnecessary since the key existence was already checked.
Suggest changing to
config["vectorizer"] = config["vectorize_model"]
Kaedeser
left a comment
There was a problem hiding this comment.
I hav fixed according to the suggestions and resubmit.
whfcarter
left a comment
There was a problem hiding this comment.
整体写的还是有点冗余,可以参考。
统一的配置获取和规范化逻辑
vectorizer_config = config.get("vectorizer", {})
vectorize_model_config = config.get("vectorize_model", {})
选择非空配置,如果都存在则优先使用 vectorizer
if vectorizer_config:
vectorize_model_config = vectorizer_config
elif vectorize_model_config:
vectorizer_config = vectorize_model_config
else:
vectorizer_config = vectorize_model_config = {}
统一写回(如果都为空就不需要写回)
if vectorizer_config or vectorize_model_config:
config["vectorizer"] = vectorizer_config
config["vectorize_model"] = vectorize_model_config
try:
llm_config_checker.check(json.dumps(llm_config))
dim = vectorize_model_config_checker.check(json.dumps(vectorizer_config))
# 只有在配置存在且为字典时才写入维度
if isinstance(config.get("vectorizer"), dict):
config["vectorizer"]["vector_dimensions"] = dim
if isinstance(config.get("vectorize_model"), dict):
config["vectorize_model"]["vector_dimensions"] = dim
解决issue719问题。原因应该是向量化器配置在校验/导入时没有拿到 type (或拿到的是空配置),于是框架退回去尝试按基类构造,触发 pyhocon 的缺键异常,最终包装成 invalid vectorizer config: 'No configuration setting found for key name' 。我已经做出修改同时兼容 vectorizer 和 vectorize_model 两种写法,并把维度写回两边,避免后续流程再读空