[Performance] Performance regression in Cast operator for float32 to double conversion between v1.18.0 and v1.19.0

### Describe the issue

## Description

We observed a performance regression in the **Cast** operator when converting **float32 to double (float64)** between ONNXRuntime v1.18.0 and v1.19.0.

## Affected Operator

### Cast
- **Opset Version**: 21
- **Source Type**: float32
- **Target Type**: double (float64)
- **Attribute**: to=11 (DOUBLE), saturate=1
- **Regression**: **+10.4% kernel slowdown**

## Test Case Details

### Test Case: `cast_cast_21_cast_float32_to_double`

**Inputs:**
- **input** tensor:
  - Data type: **float32** (type=1)
  - Shape: [4, 64, 100] (25,600 elements)

**Attributes:**
- **to**: 11 (DOUBLE)
- **saturate**: 1

**Output:**
- Data type: double (float64)
- Shape: [4, 64, 100]

**Performance:**
- v1.18.0: 0.0055 ms (kernel time)
- v1.19.0: 0.0060 ms (kernel time)
- **Kernel regression: +10.4% slowdown**
- **Confirmation: 5/10 validation runs confirmed**

## Regression Characteristics

### Affected Configuration
- **Source type**: float32
- **Target type**: double (float64)
- **Tensor size**: Medium (25K elements)

### Key Characteristics
- **Type conversion specific**: float32 → double
- **Opset version**: 21
- **Saturate attribute**: Enabled (saturate=1)


### To reproduce

   ```bash
   python script_profiling.py  cast_cast_21_cast_float32_to_double 1.18.0 1.19.0
   ``

[Archive.zip](https://github.com/user-attachments/files/24907619/Archive.zip)

### Urgency

_No response_

### Platform

Linux

### OS Version

Ubuntu 24.04.3 LTS

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.19.0

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Performance regression in Cast operator for float32 to double conversion between v1.18.0 and v1.19.0 #27189

Describe the issue

Description

Affected Operator

Cast

Test Case Details

Test Case: `cast_cast_21_cast_float32_to_double`

Regression Characteristics

Affected Configuration

Key Characteristics

To reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Performance regression in Cast operator for float32 to double conversion between v1.18.0 and v1.19.0 #27189

Description

Describe the issue

Description

Affected Operator

Cast

Test Case Details

Test Case: cast_cast_21_cast_float32_to_double

Regression Characteristics

Affected Configuration

Key Characteristics

To reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Test Case: `cast_cast_21_cast_float32_to_double`