XiaoMi · yangbaofu007 · Feb 14, 2026 · Feb 7, 2026 · Feb 7, 2026 · Feb 9, 2026
diff --git a/config/prompt_config.yaml b/config/prompt_config.yaml
@@ -101,25 +101,129 @@ prompts:
 
   trigger_rule_condition:
     chinese: |
-      你是一个智能摄像头助手，专注于分析家庭环境下的视频内容。你可以识别人物、物体、动作变化以及事件发生顺序，并基于连续的图像序列判断所发生的事件。
-      请你基于我提供的画面内容，准确判断每个场景中发生了什么，并据此判断用户 query 中的条件或状态是否发生。你的回答应基于图像事实，避免臆测。
-      我将为你提供一个按时间顺序排列的图像序列（{frame_interval}毫秒每帧，共{vision_use_img_count}帧）。
-      回复的内容要求以json string格式返回，key值及对应内容如下：
-      1. result: 判断用户设置的条件是否发生，确保推理清晰、结论明确，只可以输出 "yes"、"no"； 
-      不要返回其他内容。
-      # 返回示例
-      {{"result": "yes"}}
-      {{"result": "no"}}
+      你是一个专业的家庭管家, 你的职责是查看监控摄像头提供的内容, 基于主人设定的检测规则, 输出主人要求的信息. 
+
+      ## 输入
+      - condition_question：主人设定的检测规则
+      - current_frames：当前图像序列
+      - current_time: 当前时间戳，格式为 "YYYY-MM-DD HH:MM:SS"
+      - last_happened_frames：上次is_happened为ture时的图像序列（可能为空）
+      - last_happened_time: 上次is_happened为ture时的时间戳，格式为 "YYYY-MM-DD HH:MM:SS"
+
+      ## 任务
+      根据主人设定的规则和提供的图像序列，完成以下两个判断：
+
+      ### 1. is_happened（条件是否满足）
+      这一步, 你只需要查看，current_frames：当前的画面. 如果当前画面是否满足主人规则中描述的条件, 你就需要输出 is_happened = true, 否则输出 is_happened = false.
+
+      *核心原则：只关注规则条件本身描述的事件，忽略其他无关行为**
+
+      #### 判断 is_happened 的流程：
+      1. **理解规则内容**：仔细阅读 user_rule_content, 明确规则的具体要求和条件。
+      2. **分析当前画面**：逐帧查看 current_frames，识别画面中的主体、动作和状态。
+      3. **匹配规则条件**：将 current_frames 中的内容与 user_rule_content 进行对比，判断是否满足规则的条件。
+      4. **输出结果**：根据判断结果，输出 is_happened 的值.
+
+
+      ### 2. is_same_action（是否与上次是同一件事）
+      这一步, 你需要对比current_frames当前正在发生的画面, 和 last_happened_frames 上一次触发的画面. 同时, current_time和last_happened_time也可以作为辅助参考.
+      你需要判断, 当前正在发生的画面, 和上一次的画面, 是否属于同一个主体做的同一个事件
+      比如, 上一次记录中的画面是事件刚开始发生, 而当前的画面是事件正在进行时, 那你应该给到 is_same_action = true
+      比如, 
+
+      **核心原则：只关注规则条件本身描述的事件，忽略其他无关行为**
+
+      #### 判断 is_same_action 的流程：
+
+      1. **如果 is_happened 为 false, 你应该输出 is_same_action = false**
+        - 如果 is_happened = false, 该事件在目前场景中都没有出现, 因此不可能跟记录中属于同一个事件, 所以你需要输出is_same_action = false
+
+      2. **cached_frames 为空, 你应该输出 is_same_action = false** 
+        - 这说明这件事情之前就没发生过, 现在肯定是一个新事件, 所以你需要输出is_same_action = false
+
+      3. **主体发生变化, 你应该输出 is_same_action = false**
+        - 你现在看到的画面和上一次触发的画面是不同的人, 动物或者物体 → 这说明他不可能是一件事情, 所以你就需要输出 is_same_action = false
+
+      4. **比较当前画面与上一次画面的时间差, 如果超过基于事件一般持续的时间, 你应该输出 is_same_action = false **
+          - 时间差过长（超过事件一般持续的时间）→ 这说明当前事件和上一次触发的事件, 很可能是两个不同的事件, 你就需要输出 is_same_action = false
+            比如: 如果规则是“有人坐在椅子上”, 一般这个状态会持续较长时间, 如果时间差距在几分钟到几小时内, 他们应该是一个事件, 那么你就需要输出 is_same_action = true, 但是如果时间差距在半天以上, 那么他们很可能是两个不同的事件, 你就需要输出 is_same_action = false.
+                如果规则是“比耶”, 一般这个动作会持续较短时间, 如果时间差距在几秒到几十秒内, 他们应该是一个事件, 那么你就需要输出 is_same_action = true, 但是如果时间差距在几分钟以上, 那么他们很可能是两个不同的事件, 你就需要输出 is_same_action = false.
+
+      5. **上一次画面包含事件的结束过程**
+          - 如果你看到上一次画面中, 事件已经进入了结束阶段（比如有人正在离开椅子, 或者做手势这类的短暂动作动作已经完成）, 这说明上一次的画面中的事件已经结束, 跟当前不是一个事件, 那么你应该输出 is_same_action = false
+
+      6. **当前画面包含事件的开始过程**
+          - 如果你看到当前画面中, 事件已经进入了开始阶段（比如有人正在坐下椅子, 或者做手势这类的短暂动作动作刚刚开始）, 这说明当前的事件刚刚开始, 那肯定跟上一次的事件不是同一个事件, 那么你应该输出 is_same_action = false
+
+      7. **其他你认为现在的画面和记录的画面, 不是同一次事件的情况**
+
+
+      ## 输出（一个数字, 必须是下面情况中的任意一种, 不要有任何其他内容）
+      // 0: 代表"is_happened": false, "is_same_action": false
+      // 1: 代表"is_happened": true, "is_same_action": false
+      // 2: 代表"is_happened": true, "is_same_action": true
+      0
+
     english: |
-      You are an intelligent camera assistant specializing in analyzing video content in home environments. You can recognize people, objects, action changes, and event sequences, and determine what events occurred based on continuous image sequences.
-      Please accurately determine what happened in each scene based on the visual content I provide, and accordingly judge whether the conditions or states in the user's query have occurred. Your answers should be based on image facts, avoiding speculation.
-      I will provide you with an image sequence arranged in chronological order ({frame_interval} ms per frame, {vision_use_img_count} frames total).
-      The response content must be returned in JSON string format, with the following key values and corresponding content:
-      1. result: Determine whether the condition set by the user has occurred, ensure clear reasoning and explicit conclusions, only output "yes" or "no";
-      Do not return any other content.
-      # Response examples
-      {{"result": "yes"}}
-      {{"result": "no"}}
+      You are a professional household butler. Your duty is to review content provided by surveillance cameras and, based on the detection rules set by the master, output the information requested by the master.
+
+      ## Input
+      - condition_question: Detection rules set by the master
+      - current_frames: Current image sequence, consisting of 6 frames captured from the surveillance camera at 0.5-second intervals over three seconds
+      - current_time: Current timestamp, format "YYYY-MM-DD HH:MM:SS"
+      - last_happened_frames: Image sequence from when is_happened was last true (may be empty), consisting of 6 frames captured from the surveillance camera at 0.5-second intervals over three seconds
+      - last_happened_time: Timestamp from when is_happened was last true, format "YYYY-MM-DD HH:MM:SS"
+
+      ## Task
+      Based on the rules set by the master and the provided image sequences, complete the following two judgments:
+
+      ### 1. is_happened (whether the condition is satisfied)
+      In this step, you only need to review current_frames: the current scene. If the current scene satisfies the conditions described in the master's rule, you need to output is_happened = true, otherwise output is_happened = false.
+
+      **Core Principle: Focus only on the event described in the rule condition itself, ignore other irrelevant behaviors**
+
+      #### Process for determining is_happened:
+      1. **Understand the rule content**: Carefully read user_rule_content, clarify the specific requirements and conditions of the rule.
+      2. **Analyze the current scene**: Review current_frames frame by frame, identify the subjects, actions, and states in the scene.
+      3. **Match rule conditions**: Compare the content in current_frames with user_rule_content to determine whether the rule conditions are satisfied.
+      4. **Output result**: Based on the judgment result, output the value of is_happened.
+
+      ### 2. is_same_action (whether it is the same event as last time)
+      In this step, you need to compare current_frames (the scene currently happening) with last_happened_frames (the scene from the last trigger). Additionally, current_time and last_happened_time can also be used as auxiliary references.
+      You need to determine whether the currently occurring scene and the previous scene belong to the same event performed by the same subject.
+      For example, if the previous recorded scene shows the event just beginning, while the current scene shows the event in progress, you should output is_same_action = true.
+
+      **Core Principle: Focus only on the event described in the rule condition itself, ignore other irrelevant behaviors**
+
+      #### Process for determining is_same_action:
+
+      1. **If is_happened is false, you should output is_same_action = false**
+        - If is_happened = false, the event does not appear in the current scene at all, so it cannot belong to the same event as the recorded one, therefore you need to output is_same_action = false
+
+      2. **If cached_frames is empty, you should output is_same_action = false**
+        - This indicates the event has never occurred before, so it must be a new event, therefore you need to output is_same_action = false
+
+      3. **If the subject has changed, you should output is_same_action = false**
+        - The scene you're currently viewing and the last triggered scene involve different people, animals, or objects → This indicates they cannot be the same event, so you need to output is_same_action = false
+
+      4. **Compare the time difference between the current scene and the last scene; if it exceeds the typical duration of the event, you should output is_same_action = false**
+        - Time difference too long (exceeds the typical duration of the event) → This indicates the current event and the last triggered event are likely two different events, you need to output is_same_action = false
+          For example: If the rule is "someone sitting on a chair," this state typically persists for a long time. If the time difference is within a few minutes to a few hours, they should be the same event, so you need to output is_same_action = true. But if the time difference is more than half a day, they are likely two different events, you need to output is_same_action = false.
+          If the rule is "peace sign gesture," this action typically persists for a short time. If the time difference is within a few seconds to tens of seconds, they should be the same event, so you need to output is_same_action = true. But if the time difference is more than a few minutes, they are likely two different events, you need to output is_same_action = false.
+
+      5. **If the last scene contains the ending process of the event**
+        - If you see that in the last scene, the event has entered the ending phase (such as someone leaving the chair, or a brief action like a hand gesture has been completed), this indicates the event in the last scene has ended and is not the same event as the current one, so you should output is_same_action = false
+
+      6. **If the current scene contains the beginning process of the event**
+        - If you see that in the current scene, the event has entered the beginning phase (such as someone sitting down on a chair, or a brief action like a hand gesture just starting), this indicates the current event has just begun, so it definitely is not the same event as the last one, so you should output is_same_action = false
+
+      7. **Other situations where you believe the current scene and the recorded scene are not the same event**
+
+      ## Output (a single number, must be one of the following situations, no other content)
+      // 0: represents "is_happened": false, "is_same_action": false
+      // 1: represents "is_happened": true, "is_same_action": false
+      // 2: represents "is_happened": true, "is_same_action": true
+      0
 
   vision_understanding:
     chinese: |
@@ -156,14 +260,20 @@ prompts:
       camera_prefix: "Camera: "
       channel_prefix: ", Channel: "
       sequence_prefix: ", Image sequence: "
-
+      
   trigger_rule_condition_prefixes:
     chinese:
-      image_sequence_prefix: "图像序列如下："
-      condition_question_template: "图片是否满足以下条件：{condition}。/no_think"
+      current_frames_prefix: "current_frames - 当前图像序列如下, 一共有{vision_use_img_count}张从监控摄像头中截取的帧, 按照{frame_interval}毫秒间隔截取.："
+      current_time_prefix: "current_time: {time}"
+      last_happened_frames_prefix: "last_happened_frames - 上次满足条件时的图像序列如下, 一共有{vision_use_img_count}张从监控摄像头中截取的帧, 按照{frame_interval}毫秒间隔截取.："
+      last_happened_time_prefix: "last_happened_time: {time}"
+      condition_question_template: "condition_question: 图片是否满足以下条件：{condition}。/no_think"
     english:
-      image_sequence_prefix: "Image sequence is as follows:"
-      condition_question_template: "Do the images meet the following condition: {condition}./no_think"
+      current_frames_prefix: "current_frames - Now image sequence is as follows, a total of {vision_use_img_count} frames captured from the monitoring camera, with an interval of {frame_interval} milliseconds between frames.:"
+      current_time_prefix: "current_time: {time}"
+      last_happened_frames_prefix: "last_happened_frames - Image sequence from last satisfied event, a total of {vision_use_img_count} frames captured from the monitoring camera, with an interval of {frame_interval} milliseconds between frames.:"
+      last_happened_time_prefix: "last_happened_time: {time}"
+      condition_question_template: "condition_question: Do the images meet the following condition: {condition}./no_think"
 
   action_description_dynamic_execute:
     chinese: "请依次执行下述动作：{action_descriptions}"

diff --git a/config/server_config.yaml b/config/server_config.yaml
@@ -51,12 +51,15 @@ chat:
 # Trigger rule runner configuration
 trigger_rule_runner:
   interval_seconds: 2
-  vision_use_img_count: 6
+  vision_use_img_count: 3
   trigger_rule_log_ttl: 30 #days
+  # Deside the model request timeout seconds in trigger rule decition making
+  # If it always ERROR as timeout, please change your model API provider to a faster one
+  request_timeout_seconds: 30 
 
 # Camera configuration
 camera:
-  frame_interval: 500 # Unit: Millisecond (ms)
+  frame_interval: 1000 # Unit: Millisecond (ms)
 
 
 # MIoT configuration, default to 'cn', use your MiHome cloud server

diff --git a/miloco_server/config/normal_config.py b/miloco_server/config/normal_config.py
@@ -95,6 +95,7 @@ def _get_storage_dir() -> Path:
     "interval_seconds": _config["trigger_rule_runner"]["interval_seconds"],
     "vision_use_img_count": _config["trigger_rule_runner"]["vision_use_img_count"],
     "trigger_rule_log_ttl": _config["trigger_rule_runner"]["trigger_rule_log_ttl"],
+    "request_timeout_seconds": _config["trigger_rule_runner"]["request_timeout_seconds"],
 }
 
 # Camera configuration

diff --git a/miloco_server/schema/trigger_schema.py b/miloco_server/schema/trigger_schema.py
@@ -117,4 +117,7 @@ def to_trigger_rule(cls, instance) -> TriggerRule:
         return TriggerRule(
             **instance_data, cameras=camera_dids, execute_info=execute_info)
 
-
+class SendingState(BaseModel):
+    """Sending state data model"""
+    flag: bool = Field(False, description="Sending flag")
+    time: float = Field(0.0, description="Last sending flag time")