Skip to content

Current TaskEstimator API challenges #296

@gabotechs

Description

@gabotechs

The TaskEstimator API was introduced in

For allowing a better task assignation control than just static config numbers. The approach followed by this API is the following:

  • Nodes can decide how many tasks are appropriate for them
  • The amount of tasks appropriate for each individual node is calculated from a bottom to top fashion, taking into account how many tasks each node decided
  • Upon reaching a network boundary, all the nodes in each stage reach a consensus on whats the appropriate final task count

The current API has a big limitation:

The current TaskEstimator API lets nodes decide how many tasks are appropriate for them taking into account how much data they are going to return, but it does not take into account the amount of compute that will happen during the query.

For example:

Plan that currently gets distributed but it shouldnt

                           ┌───────────────────┐                           
                           │  ProjectionExec   │                           
                           └─────────▲─────────┘                           
                                     │                                     
                           ┌─────────┴─────────┐                           
                           │     UnionExec     │                           
                           └─────▲─▲───▲─▲─────┘                           
         ┌──────────────────┬────┴─┘   └─┴─────┬──────────────────┐        
         │                  │                  │                  │        
┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│   DataSource   │ │   DataSource   │ │   DataSource   │ │   DataSource   │
└────────────────┘ └────────────────┘ └────────────────┘ └────────────────┘

This plan has the capacity of pulling a lot of data up, but computationally speaking is very inexpensive, as it's almost a pure IO query. This plan should not be distributed, it's fine to run this in a single node.

Another example:

Plan that currently does not get distributed but it should

      ┌───────────────────┐      
      │  ProjectionExec   │      
      └─────────▲─────────┘      
                │                
┌───────────────┴───────────────┐
│ ExtremelyExpensiveComputeExec │
└───────────────▲───────────────┘
                │                
   ┌────────────┴───────────┐    
   │    SmallDataSource     │    
   └────────────────────────┘    

The current TaskEstimator API might decide that, as not a huge amount of data is flowing through SmallDataSource, then it does not need to be distributed, but it doesn't account for the fact that ExtremelyExpensiveComputeExec might be doing something crazy computationally speaking that might be nice to distribute.


Right now the amount of tasks is determined by looking at the data size, but it should take into account the product of "data size" * "compute", so that pure IO plans do not get unnecessarily distributed, and compute heavy plans are not left undistributed by mistake.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions