Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions docs/docs.cn/Lazy.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,48 @@ void func(int x, Executor *e) {

总之,当我们需要为多个 Lazy 组成的任务链指定调度器时,我们只需要在任务链的开头指定调度器就好了。

## 内存分配

### 用户自定义分配器

async_simple 支持用户为每个 Lazy 函数定义内存分配器。接口为,Lazy 函数的第一个参数为 `std::allocator_arg_t`,第二个参数为支持 `void *allocate(unsigned)` 和
`void deallocate(void*, unsigned)` 成员函数的接口。例如 `std::pmr::polymorphic_allocator<>`。

具体使用方式可参考 `demo_example/pmr_lazy.cpp`。

### 编译器合并内存分配

async_simple 支持 clang 的 `[[clang::coro_await_elidable]]` 属性。只需要使用支持 `[[clang::coro_await_elidable]]` 的编译器编译 async_simple,在 `co_await`
后的 Lazy 调用所需的内存会被自动叠加进当前协程的协程帧中。例如:

```
Lazy<int> foo() { ... }
Lazy<int> bar() {
auto f = co_await foo();
...
}
```

在这个例子中,`bar()` 协程调用 `foo()` 时并不会为 `foo()` 协程触发内存分配,而是 `bar()` 自己会申请一块更大的协程帧,将其中的一部分内容给 `foo()` 使用。而 `bar()` 自己
的协程帧的生命周期,则是由 `bar()` 的调用者负责,若 `bar()` 的调用者依然使用 `co_await` 后直接调用 `bar()` 的方式,则 `bar()` 自身的协程帧依然不会被分配,而是复用其调用环境的协程帧的一部分。这个过程是递归的。

注意,这种策略可能并不总是好的,考虑如下情况:

```
Lazy<int> foo() { ... }
Lazy<int> bar(bool cond) {
if (cond) {
co_await foo();
...
}
...
}
```

此时在开启 `[[clang::coro_await_elidable]]` 优化之后 `bar()` 的协程帧总是会更大以包含 `foo()` 的协程帧,然而,若实际运行时 `cond` 总是为 `false`,则这必然是一个负优化。

为了缓解这一点,我们在内部编译器中做了更智能的优化,编译器会根据上下文的冷热信息来判断是否要对调用点进行转换,以避免这类负优化产生。

# LazyLocals

LazyLocals类似于线程环境下的thread_local。用户可以通过派生LazyLocals并实现静态函数`T::classof(const LazyLocalBase*)`来自定义LazyLocals。
Expand Down
42 changes: 42 additions & 0 deletions docs/docs.en/Lazy.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,48 @@ In the above example, `task1...task4` represents a task chain consists of Lazy.

So we could assign the executor at the root the task chain simply.

我来为您翻译这段关于内存分配的内容:

Ran tool
## Memory Allocation

### User-Defined Allocator

async_simple supports user-defined memory allocators for each Lazy function. The interface requires the first parameter of the Lazy function to be `std::allocator_arg_t`, and the second parameter to be an interface that supports `void *allocate(unsigned)` and `void deallocate(void*, unsigned)` member functions. For example, `std::pmr::polymorphic_allocator<>`.

For specific usage, please refer to `demo_example/pmr_lazy.cpp`.

### Compiler-Integrated Memory Allocation

async_simple supports clang's `[[clang::coro_await_elidable]]` attribute. Simply compile async_simple with a compiler that supports `[[clang::coro_await_elidable]]`, and the memory required for Lazy calls after `co_await` will be automatically merged into the current coroutine's coroutine frame. For example:

```
Lazy<int> foo() { ... }
Lazy<int> bar() {
auto f = co_await foo();
...
}
```

In this example, when the `bar()` coroutine calls `foo()`, it will not trigger memory allocation for the `foo()` coroutine. Instead, `bar()` itself will allocate a larger coroutine frame and give a portion of it to `foo()` to use. The lifecycle of `bar()`'s own coroutine frame is managed by `bar()`'s caller. If `bar()`'s caller still uses the method of directly calling `bar()` after `co_await`, then `bar()`'s own coroutine frame will still not be allocated, but will reuse a portion of its calling environment's coroutine frame. This process is recursive.

Note that this strategy may not always be beneficial. Consider the following scenario:

```
Lazy<int> foo() { ... }
Lazy<int> bar(bool cond) {
if (cond) {
co_await foo();
...
}
...
}
```

In this case, after enabling the `[[clang::coro_await_elidable]]` optimization, `bar()`'s coroutine frame will always be larger to include `foo()`'s coroutine frame. However, if `cond` is always `false` at runtime, this would inevitably be a negative optimization.

To mitigate this issue, we have implemented more intelligent optimizations in our internal compiler. The compiler will determine whether to perform transformations at call sites based on context hot/cold information to avoid such negative optimizations.

# LazyLocals

LazyLocals is similar to `thread_local` in a thread environment. Users can customize their own LazyLocals by deriving from LazyLocals and implement static function `T::classsof(const LazyLocalBase*)`
Expand Down
Loading