OpenDCAI
diff --git a/‎docs/en/notes/guide/basicinfo/framework.md‎
Lines changed: 20 additions & 41 deletions b/‎docs/en/notes/guide/basicinfo/framework.md‎
Lines changed: 20 additions & 41 deletions
diff --git a/‎docs/en/notes/guide/basicinfo/install.md‎
Lines changed: 21 additions & 0 deletions b/‎docs/en/notes/guide/basicinfo/install.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/en/notes/guide/basicinfo/intro.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/en/notes/guide/basicinfo/intro.md‎
Lines changed: 1 addition & 1 deletion
@@ -15,31 +15,31 @@ DataFlex is an advanced dynamic training framework built on [LlamaFactory](https
 
 The core design philosophy of DataFlex is: **Data-centric intelligent training scheduling**. Traditional training methods typically use fixed data order and ratios, while DataFlex allows models to dynamically adjust data usage strategies based on their current state during training, achieving more efficient learning. It is designed to seamlessly integrate with LlamaFactory, providing researchers and developers with more flexible and powerful training control capabilities.
 
+During the data selection process, it is often necessary to perform operations such as embedding, inference, and gradient computation on data samples. DataFlex is designed to provide a unified management framework for embedding, large model inference, and gradient computation.
+
 ## Core Architecture
 
 ### Overall Architecture Diagram
 
 ```
-┌─────────────────────────────────────────────────────────────┐
-│                      LlamaFactory Framework                 │
-├─────────────────────────────────────────────────────────────┤
-│         Model Management · Data Processing · Optimizers     │
-├─────────────────────────────────────────────────────────────┤
-│                                                             │
-│    Training Layer (DataFlex replaces LlamaFactory trainer)  │
-│  ┌─────────────────┬─────────────────┬─────────────────────┐ │
-│  │  Select Trainer │   Mix Trainer   │  Weight Trainer     │ │
-│  │ (Dynamic Sample │  (Dynamic Ratio)│ (Dynamic Weights)   │ │
-│  │   Selection)    │                 │                     │ │
-│  ├─────────────────┼─────────────────┼─────────────────────┤ │
-│  │ Selector Components│ Mixer Components│ Weighter Components│ │
-│  │ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────────┐ │ │
-│  │ │Loss Selector│ │ │Random Mixer │ │ │ Loss Weighter   │ │ │
-│  │ │LESS Selector│ │ │Custom Mixer │ │ │ Custom Weighter │ │ │
-│  │ │ Custom...   │ │ │   ...       │ │ │    ...          │ │ │
-│  │ └─────────────┘ │ └─────────────┘ │ └─────────────────┘ │ │
-│  └─────────────────┴─────────────────┴─────────────────────┘ │
-└─────────────────────────────────────────────────────────────┘
+┌───────────────────────────────────────────────────────────────────────────────┐
+│                           LlamaFactory Framework                              │
+├───────────────────────────────────────────────────────────────────────────────┤
+│                  Model Management · Data Processing · Optimizers              │
+├───────────────────────────────────────────────────────────────────────────────┤
+│            Training Layer (DataFlex replaces LlamaFactory trainer)            │
+│  ┌────────────────────────┬────────────────────────┬────────────────────────┐ │
+│  │      Select Trainer    │       Mix Trainer      │     Weight Trainer     │ │
+│  │   (Dynamic Selection)  │      (Dynamic Ratio)   │     (Dynamic Weights)  │ │
+│  ├────────────────────────┼────────────────────────┼────────────────────────┤ │
+│  │  Selector Components   │    Mixer Components    │   Weighter Components  │ │
+│  │  ┌──────────────────┐  │  ┌──────────────────┐  │  ┌───────────────────┐ │ │
+│  │  │  Loss Selector   │  │  │   Random Mixer   │  │  │   Loss Weighter   │ │ │
+│  │  │  LESS Selector   │  │  │   Custom Mixer   │  │  │  Custom Weighter  │ │ │
+│  │  │   Custom ...     │  │  │       ...        │  │  │        ...        │ │ │
+│  │  └──────────────────┘  │  └──────────────────┘  │  └───────────────────┘ │ │
+│  └────────────────────────┴────────────────────────┴────────────────────────┘ │
+└───────────────────────────────────────────────────────────────────────────────┘
 ```
 
 ### Component Hierarchy
@@ -60,24 +60,3 @@ DataFlex provides three core trainers that can seamlessly integrate into LlamaFa
 - **Select Trainer (Dynamic Selection Trainer)**: During training, dynamically selects a subset of samples from the dataset based on predefined strategies (Selector) for subsequent training, e.g., prioritizing "difficult" samples that the model finds challenging.
 - **Mix Trainer (Dynamic Ratio Trainer)**: Supports dynamic adjustment of mixing ratios for data from different sources or domains during training.
 - **Weight Trainer (Dynamic Weighting Trainer)**: Supports dynamic adjustment of sample weights during backpropagation, increasing learning intensity for model-preferred data.
-
-## Usage Example
-
-The training command is very similar to LlamaFactory. Below is an example using LESS, refer to the paper for details [https://arxiv.org/abs/2402.04333](https://arxiv.org/abs/2402.04333):
-
-```bash
-FORCE_TORCHRUN=1 DISABLE_VERSION_CHECK=1 dataflex-cli train examples/train_lora/selectors/less.yaml
-```
-
-**Note**: Unlike standard LlamaFactory, your `.yaml` configuration file must include DataFlex-specific parameters in addition to LlamaFactory's standard training parameters.
-
-## Integration with LlamaFactory
-
-DataFlex is fully compatible with LlamaFactory's configuration and usage:
-
-1. **Configuration Compatibility**: Add DataFlex parameters on top of LlamaFactory configuration
-2. **Consistent Commands**: Use `dataflex-cli` instead of `llamafactory-cli`
-3. **Feature Preservation**: Supports all original LlamaFactory functionality
-4. **Seamless Switching**: Can fallback to original training mode with `train_type: static`
-
-This design ensures users can progressively adopt DataFlex functionality without major modifications to existing workflows.
@@ -14,3 +14,24 @@ cd DataFlex
 pip install -e .
 pip install llamafactory
 ```
+
+## Usage Example
+
+The training command is very similar to LlamaFactory. Below is an example using LESS, refer to the paper for details [https://arxiv.org/abs/2402.04333](https://arxiv.org/abs/2402.04333):
+
+```bash
+FORCE_TORCHRUN=1 DISABLE_VERSION_CHECK=1 dataflex-cli train examples/train_lora/selectors/less.yaml
+```
+
+**Note**: Unlike standard LlamaFactory, your `.yaml` configuration file must include DataFlex-specific parameters in addition to LlamaFactory's standard training parameters.
+
+## Integration with LlamaFactory
+
+DataFlex is fully compatible with LlamaFactory's configuration and usage:
+
+1. **Configuration Compatibility**: Add DataFlex parameters on top of LlamaFactory configuration
+2. **Consistent Commands**: Use `dataflex-cli` instead of `llamafactory-cli`
+3. **Feature Preservation**: Supports all original LlamaFactory functionality
+4. **Seamless Switching**: Can fallback to original training mode with `train_type: static`
+
+This design ensures users can progressively adopt DataFlex functionality without major modifications to existing workflows.
@@ -6,7 +6,7 @@ permalink: /en/guide/intro/basicinfo/intro/
 ---
 # Introduction
 
-In recent years, the development of large models has largely depended on large-scale, high-quality training data. First, the preparation of high-quality datasets is crucial, a process completed by our other project [DataFlow](https://github.com/OpenDCAI/DataFlow/tree/main). Building on this foundation, the interaction between data and models during the training phase is equally critical, such as: data selection, ratio adjustment, and weight allocation for different data during training. Although academia has proposed several data selection strategies based on influence and other methods, there has always been a lack of a unified, easy-to-use, and extensible training framework.
+In recent years, the development of large models has largely depended on large-scale, high-quality training data. First, the preparation of high-quality datasets is crucial, a process completed by our other project [DataFlow](https://github.com/OpenDCAI/DataFlow/tree/main). Building on this foundation, the interaction between data and models during training is equally important, such as data selection, mixing, and weighting throughout the training process. Although several influence-based methods have been proposed in academia — such as those based on the distributional distance between training and test data, as well as strategies like TracIn, Influence Function, and PMP — there still lacks a unified, user-friendly, and extensible training framework.
 
 To address this problem, we built [DataFlex](https://github.com/OpenDCAI/DataFlex/tree/main) based on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), a data-centric system focused on optimizing data-model interactions during training, combining both **ease of use** and **training effectiveness**.