Dear author, I noticed that kernels submitted to the CUDA default stream are submitted directly, instead of being buffered into the Xqueue like kernels from other streams.
Could you please explain the rationale behind this special handling of the default stream? Is there a technical or design reason why preemption is not supported for the CUDA default stream in xsched?