[OpenCL] Set half-precision Div and Sqrt accuracy #179621

wenju-he · 2026-02-04T07:35:20Z

OpenCL spec relaxed half-precision divide to 1 ULP and sqrt to 1.5 ULP in KhronosGroup/OpenCL-Docs#1293 KhronosGroup/OpenCL-Docs#1386
This can enable target to use hardware rcp instruction for half.

OpenCL spec relaxed half-precision divide to 1 ULP and sqrt to 1.5 ULP in KhronosGroup/OpenCL-Docs#1293 KhronosGroup/OpenCL-Docs#1386

Copilot

Pull request overview

This PR updates the OpenCL code generation to reflect the relaxed accuracy requirements for half-precision floating-point operations introduced in OpenCL 3.0. The changes set half-precision division to 1 ULP (Unit in the Last Place) and sqrt to 1.5 ULP, aligning with the updated OpenCL specification.

Changes:

Modified SetSqrtFPAccuracy and SetDivFPAccuracy functions to support half-precision types and apply appropriate accuracy values (1.5 ULP for sqrt, 1 ULP for division)
Updated test expectations to verify half-precision operations now have fpmath metadata attached
Added new test cases for half-precision division and sqrt operations

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
clang/lib/CodeGen/CGExpr.cpp	Implements half-precision support in sqrt and division accuracy functions with type-specific ULP values
clang/test/CodeGenOpenCL/sqrt-fpmath.cl	Updates test expectations to verify fpmath metadata is now applied to half-precision sqrt calls
clang/test/CodeGenOpenCL/fpmath.cl	Adds comprehensive test coverage for half-precision division and sqrt operations with fpmath metadata verification

llvmbot · 2026-02-04T07:35:53Z

@llvm/pr-subscribers-clang

Author: Wenju He (wenju-he)

Changes

OpenCL spec relaxed half-precision divide to 1 ULP and sqrt to 1.5 ULP in KhronosGroup/OpenCL-Docs#1293 KhronosGroup/OpenCL-Docs#1386

Full diff: https://github.com/llvm/llvm-project/pull/179621.diff

3 Files Affected:

(modified) clang/lib/CodeGen/CGExpr.cpp (+8-6)
(modified) clang/test/CodeGenOpenCL/fpmath.cl (+41-3)
(modified) clang/test/CodeGenOpenCL/sqrt-fpmath.cl (+13-7)

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 339314ecff9cd..71a14d65c1bfe 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -6979,14 +6979,15 @@ void CodeGenFunction::SetFPAccuracy(llvm::Value *Val, float Accuracy) {
 
 void CodeGenFunction::SetSqrtFPAccuracy(llvm::Value *Val) {
   llvm::Type *EltTy = Val->getType()->getScalarType();
-  if (!EltTy->isFloatTy())
+  if (!EltTy->isFloatTy() && !EltTy->isHalfTy())
     return;
 
   if ((getLangOpts().OpenCL &&
        !CGM.getCodeGenOpts().OpenCLCorrectlyRoundedDivSqrt) ||
       (getLangOpts().HIP && getLangOpts().CUDAIsDevice &&
        !CGM.getCodeGenOpts().HIPCorrectlyRoundedDivSqrt)) {
-    // OpenCL v1.1 s7.4: minimum accuracy of single precision / is 3ulp
+    // OpenCL v1.1 s7.4: minimum accuracy of single precision sqrt is 3 ulp.
+    // OpenCL v3.0 s7.4: minimum accuracy of half precision sqrt is 1.5 ulp.
     //
     // OpenCL v1.2 s5.6.4.2: The -cl-fp32-correctly-rounded-divide-sqrt
     // build option allows an application to specify that single precision
@@ -6994,20 +6995,21 @@ void CodeGenFunction::SetSqrtFPAccuracy(llvm::Value *Val) {
     // source are correctly rounded.
     //
     // TODO: CUDA has a prec-sqrt flag
-    SetFPAccuracy(Val, 3.0f);
+    SetFPAccuracy(Val, EltTy->isFloatTy() ? 3.0f : 1.5f);
   }
 }
 
 void CodeGenFunction::SetDivFPAccuracy(llvm::Value *Val) {
   llvm::Type *EltTy = Val->getType()->getScalarType();
-  if (!EltTy->isFloatTy())
+  if (!EltTy->isFloatTy() && !EltTy->isHalfTy())
     return;
 
   if ((getLangOpts().OpenCL &&
        !CGM.getCodeGenOpts().OpenCLCorrectlyRoundedDivSqrt) ||
       (getLangOpts().HIP && getLangOpts().CUDAIsDevice &&
        !CGM.getCodeGenOpts().HIPCorrectlyRoundedDivSqrt)) {
-    // OpenCL v1.1 s7.4: minimum accuracy of single precision / is 2.5ulp
+    // OpenCL v1.1 s7.4: minimum accuracy of single precision / is 2.5 ulp.
+    // OpenCL v3.0 s7.4: minimum accuracy of half precision / is 1 ulp.
     //
     // OpenCL v1.2 s5.6.4.2: The -cl-fp32-correctly-rounded-divide-sqrt
     // build option allows an application to specify that single precision
@@ -7015,7 +7017,7 @@ void CodeGenFunction::SetDivFPAccuracy(llvm::Value *Val) {
     // source are correctly rounded.
     //
     // TODO: CUDA has a prec-div flag
-    SetFPAccuracy(Val, 2.5f);
+    SetFPAccuracy(Val, EltTy->isFloatTy() ? 2.5f : 1.f);
   }
 }
 
diff --git a/clang/test/CodeGenOpenCL/fpmath.cl b/clang/test/CodeGenOpenCL/fpmath.cl
index f3649d52e0091..5915496b3963d 100644
--- a/clang/test/CodeGenOpenCL/fpmath.cl
+++ b/clang/test/CodeGenOpenCL/fpmath.cl
@@ -1,8 +1,44 @@
 // RUN: %clang_cc1 %s -emit-llvm -o - -triple spir-unknown-unknown | FileCheck --check-prefix=CHECK --check-prefix=NODIVOPT %s
 // RUN: %clang_cc1 %s -emit-llvm -o - -triple spir-unknown-unknown -cl-fp32-correctly-rounded-divide-sqrt | FileCheck --check-prefix=CHECK --check-prefix=DIVOPT %s
-// RUN: %clang_cc1 %s -emit-llvm -o - -DNOFP64 -cl-std=CL1.2 -triple r600-unknown-unknown -target-cpu r600 -pedantic | FileCheck --check-prefix=CHECK-FLT %s
+// RUN: %clang_cc1 %s -emit-llvm -o - -DNOFP16 -DNOFP64 -cl-std=CL1.2 -triple r600-unknown-unknown -target-cpu r600 -pedantic | FileCheck --check-prefix=CHECK-FLT %s
 // RUN: %clang_cc1 %s -emit-llvm -o - -DFP64 -cl-std=CL1.2 -triple spir-unknown-unknown -pedantic | FileCheck --check-prefix=CHECK-DBL %s
 
+#ifndef NOFP16
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
+typedef __attribute__(( ext_vector_type(4) )) half half4;
+
+half hpscalardiv(half a, half b) {
+  // CHECK: @hpscalardiv
+  // CHECK: fdiv{{.*}},
+  // NODIVOPT: !fpmath ![[MD_HFDIV:[0-9]+]]
+  // DIVOPT-NOT: !fpmath !{{[0-9]+}}
+  return a / b;
+}
+
+half4 hpvectordiv(half4 a, half4 b) {
+  // CHECK: @hpvectordiv
+  // CHECK: fdiv{{.*}},
+  // NODIVOPT: !fpmath ![[MD_HFDIV]]
+  // DIVOPT-NOT: !fpmath !{{[0-9]+}}
+  return a / b;
+}
+
+half elementwise_sqrt_f16(half a) {
+  // CHECK-LABEL: @elementwise_sqrt_f16
+  // NODIVOPT: call half @llvm.sqrt.f16(half %{{.+}}), !fpmath ![[MD_HSQRT:[0-9]+]]
+  // DIVOPT: call half @llvm.sqrt.f16(half %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+half4 elementwise_sqrt_v4f16(half4 a) {
+  // CHECK-LABEL: @elementwise_sqrt_v4f16
+  // NODIVOPT: call <4 x half> @llvm.sqrt.v4f16(<4 x half> %{{.+}}), !fpmath ![[MD_HSQRT]]
+  // DIVOPT: call <4 x half> @llvm.sqrt.v4f16(<4 x half> %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+#endif // NOFP16
+
 typedef __attribute__(( ext_vector_type(4) )) float float4;
 
 float spscalardiv(float a, float b) {
@@ -30,14 +66,14 @@ float spscalarsqrt(float a) {
 
 float elementwise_sqrt_f32(float a) {
   // CHECK-LABEL: @elementwise_sqrt_f32
-  // NODIVOPT: call float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // NODIVOPT: call float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD_SQRT]]
   // DIVOPT: call float @llvm.sqrt.f32(float %{{.+}}){{$}}
   return __builtin_elementwise_sqrt(a);
 }
 
 float4 elementwise_sqrt_v4f32(float4 a) {
   // CHECK-LABEL: @elementwise_sqrt_v4f32
-  // NODIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // NODIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}), !fpmath ![[MD_SQRT]]
   // DIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}){{$}}
   return __builtin_elementwise_sqrt(a);
 }
@@ -90,5 +126,7 @@ double4 elementwise_sqrt_v4f64(double4 a) {
 
 #endif
 
+// NODIVOPT: ![[MD_HFDIV]] = !{float 1.000000e+00}
+// NODIVOPT: ![[MD_HSQRT]] = !{float 1.500000e+00}
 // NODIVOPT: ![[MD_FDIV]] = !{float 2.500000e+00}
 // NODIVOPT: ![[MD_SQRT]] = !{float 3.000000e+00}
diff --git a/clang/test/CodeGenOpenCL/sqrt-fpmath.cl b/clang/test/CodeGenOpenCL/sqrt-fpmath.cl
index d0637283a7ec1..6f4adf56930ff 100644
--- a/clang/test/CodeGenOpenCL/sqrt-fpmath.cl
+++ b/clang/test/CodeGenOpenCL/sqrt-fpmath.cl
@@ -134,46 +134,52 @@ double16 call_sqrt_v16f64(double16 x) {
 }
 
 
-// Not for f16
 // CHECK-LABEL: define {{.*}} half @call_sqrt_f16(
-// CHECK: call {{.*}} half @_Z4sqrtDh(half noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} half @_Z4sqrtDh(half noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH:\![0-9]+]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} half @_Z4sqrtDh(half noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half call_sqrt_f16(half x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <2 x half> @call_sqrt_v2f16(
-// CHECK: call {{.*}} <2 x half> @_Z4sqrtDv2_Dh(<2 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <2 x half> @_Z4sqrtDv2_Dh(<2 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <2 x half> @_Z4sqrtDv2_Dh(<2 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half2 call_sqrt_v2f16(half2 x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <3 x half> @call_sqrt_v3f16(
-// CHECK: call {{.*}} <3 x half> @_Z4sqrtDv3_Dh(<3 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <3 x half> @_Z4sqrtDv3_Dh(<3 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <3 x half> @_Z4sqrtDv3_Dh(<3 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half3 call_sqrt_v3f16(half3 x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <4 x half> @call_sqrt_v4f16(
-// CHECK: call {{.*}} <4 x half> @_Z4sqrtDv4_Dh(<4 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <4 x half> @_Z4sqrtDv4_Dh(<4 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <4 x half> @_Z4sqrtDv4_Dh(<4 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half4 call_sqrt_v4f16(half4 x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <8 x half> @call_sqrt_v8f16(
-// CHECK: call {{.*}} <8 x half> @_Z4sqrtDv8_Dh(<8 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <8 x half> @_Z4sqrtDv8_Dh(<8 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <8 x half> @_Z4sqrtDv8_Dh(<8 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half8 call_sqrt_v8f16(half8 x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <16 x half> @call_sqrt_v16f16(
-// CHECK: call {{.*}} <16 x half> @_Z4sqrtDv16_Dh(<16 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <16 x half> @_Z4sqrtDv16_Dh(<16 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <16 x half> @_Z4sqrtDv16_Dh(<16 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half16 call_sqrt_v16f16(half16 x) {
   return sqrt(x);
 }
 
 // DEFAULT: [[FPMATH]] = !{float 3.000000e+00}
+// DEFAULT: [[HFPMATH]] = !{float 1.500000e+00}

llvmbot · 2026-02-04T07:35:53Z

@llvm/pr-subscribers-clang-codegen

Author: Wenju He (wenju-he)

Changes

OpenCL spec relaxed half-precision divide to 1 ULP and sqrt to 1.5 ULP in KhronosGroup/OpenCL-Docs#1293 KhronosGroup/OpenCL-Docs#1386

Full diff: https://github.com/llvm/llvm-project/pull/179621.diff

3 Files Affected:

(modified) clang/lib/CodeGen/CGExpr.cpp (+8-6)
(modified) clang/test/CodeGenOpenCL/fpmath.cl (+41-3)
(modified) clang/test/CodeGenOpenCL/sqrt-fpmath.cl (+13-7)

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 339314ecff9cd..71a14d65c1bfe 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -6979,14 +6979,15 @@ void CodeGenFunction::SetFPAccuracy(llvm::Value *Val, float Accuracy) {
 
 void CodeGenFunction::SetSqrtFPAccuracy(llvm::Value *Val) {
   llvm::Type *EltTy = Val->getType()->getScalarType();
-  if (!EltTy->isFloatTy())
+  if (!EltTy->isFloatTy() && !EltTy->isHalfTy())
     return;
 
   if ((getLangOpts().OpenCL &&
        !CGM.getCodeGenOpts().OpenCLCorrectlyRoundedDivSqrt) ||
       (getLangOpts().HIP && getLangOpts().CUDAIsDevice &&
        !CGM.getCodeGenOpts().HIPCorrectlyRoundedDivSqrt)) {
-    // OpenCL v1.1 s7.4: minimum accuracy of single precision / is 3ulp
+    // OpenCL v1.1 s7.4: minimum accuracy of single precision sqrt is 3 ulp.
+    // OpenCL v3.0 s7.4: minimum accuracy of half precision sqrt is 1.5 ulp.
     //
     // OpenCL v1.2 s5.6.4.2: The -cl-fp32-correctly-rounded-divide-sqrt
     // build option allows an application to specify that single precision
@@ -6994,20 +6995,21 @@ void CodeGenFunction::SetSqrtFPAccuracy(llvm::Value *Val) {
     // source are correctly rounded.
     //
     // TODO: CUDA has a prec-sqrt flag
-    SetFPAccuracy(Val, 3.0f);
+    SetFPAccuracy(Val, EltTy->isFloatTy() ? 3.0f : 1.5f);
   }
 }
 
 void CodeGenFunction::SetDivFPAccuracy(llvm::Value *Val) {
   llvm::Type *EltTy = Val->getType()->getScalarType();
-  if (!EltTy->isFloatTy())
+  if (!EltTy->isFloatTy() && !EltTy->isHalfTy())
     return;
 
   if ((getLangOpts().OpenCL &&
        !CGM.getCodeGenOpts().OpenCLCorrectlyRoundedDivSqrt) ||
       (getLangOpts().HIP && getLangOpts().CUDAIsDevice &&
        !CGM.getCodeGenOpts().HIPCorrectlyRoundedDivSqrt)) {
-    // OpenCL v1.1 s7.4: minimum accuracy of single precision / is 2.5ulp
+    // OpenCL v1.1 s7.4: minimum accuracy of single precision / is 2.5 ulp.
+    // OpenCL v3.0 s7.4: minimum accuracy of half precision / is 1 ulp.
     //
     // OpenCL v1.2 s5.6.4.2: The -cl-fp32-correctly-rounded-divide-sqrt
     // build option allows an application to specify that single precision
@@ -7015,7 +7017,7 @@ void CodeGenFunction::SetDivFPAccuracy(llvm::Value *Val) {
     // source are correctly rounded.
     //
     // TODO: CUDA has a prec-div flag
-    SetFPAccuracy(Val, 2.5f);
+    SetFPAccuracy(Val, EltTy->isFloatTy() ? 2.5f : 1.f);
   }
 }
 
diff --git a/clang/test/CodeGenOpenCL/fpmath.cl b/clang/test/CodeGenOpenCL/fpmath.cl
index f3649d52e0091..5915496b3963d 100644
--- a/clang/test/CodeGenOpenCL/fpmath.cl
+++ b/clang/test/CodeGenOpenCL/fpmath.cl
@@ -1,8 +1,44 @@
 // RUN: %clang_cc1 %s -emit-llvm -o - -triple spir-unknown-unknown | FileCheck --check-prefix=CHECK --check-prefix=NODIVOPT %s
 // RUN: %clang_cc1 %s -emit-llvm -o - -triple spir-unknown-unknown -cl-fp32-correctly-rounded-divide-sqrt | FileCheck --check-prefix=CHECK --check-prefix=DIVOPT %s
-// RUN: %clang_cc1 %s -emit-llvm -o - -DNOFP64 -cl-std=CL1.2 -triple r600-unknown-unknown -target-cpu r600 -pedantic | FileCheck --check-prefix=CHECK-FLT %s
+// RUN: %clang_cc1 %s -emit-llvm -o - -DNOFP16 -DNOFP64 -cl-std=CL1.2 -triple r600-unknown-unknown -target-cpu r600 -pedantic | FileCheck --check-prefix=CHECK-FLT %s
 // RUN: %clang_cc1 %s -emit-llvm -o - -DFP64 -cl-std=CL1.2 -triple spir-unknown-unknown -pedantic | FileCheck --check-prefix=CHECK-DBL %s
 
+#ifndef NOFP16
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
+typedef __attribute__(( ext_vector_type(4) )) half half4;
+
+half hpscalardiv(half a, half b) {
+  // CHECK: @hpscalardiv
+  // CHECK: fdiv{{.*}},
+  // NODIVOPT: !fpmath ![[MD_HFDIV:[0-9]+]]
+  // DIVOPT-NOT: !fpmath !{{[0-9]+}}
+  return a / b;
+}
+
+half4 hpvectordiv(half4 a, half4 b) {
+  // CHECK: @hpvectordiv
+  // CHECK: fdiv{{.*}},
+  // NODIVOPT: !fpmath ![[MD_HFDIV]]
+  // DIVOPT-NOT: !fpmath !{{[0-9]+}}
+  return a / b;
+}
+
+half elementwise_sqrt_f16(half a) {
+  // CHECK-LABEL: @elementwise_sqrt_f16
+  // NODIVOPT: call half @llvm.sqrt.f16(half %{{.+}}), !fpmath ![[MD_HSQRT:[0-9]+]]
+  // DIVOPT: call half @llvm.sqrt.f16(half %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+half4 elementwise_sqrt_v4f16(half4 a) {
+  // CHECK-LABEL: @elementwise_sqrt_v4f16
+  // NODIVOPT: call <4 x half> @llvm.sqrt.v4f16(<4 x half> %{{.+}}), !fpmath ![[MD_HSQRT]]
+  // DIVOPT: call <4 x half> @llvm.sqrt.v4f16(<4 x half> %{{.+}}){{$}}
+  return __builtin_elementwise_sqrt(a);
+}
+
+#endif // NOFP16
+
 typedef __attribute__(( ext_vector_type(4) )) float float4;
 
 float spscalardiv(float a, float b) {
@@ -30,14 +66,14 @@ float spscalarsqrt(float a) {
 
 float elementwise_sqrt_f32(float a) {
   // CHECK-LABEL: @elementwise_sqrt_f32
-  // NODIVOPT: call float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // NODIVOPT: call float @llvm.sqrt.f32(float %{{.+}}), !fpmath ![[MD_SQRT]]
   // DIVOPT: call float @llvm.sqrt.f32(float %{{.+}}){{$}}
   return __builtin_elementwise_sqrt(a);
 }
 
 float4 elementwise_sqrt_v4f32(float4 a) {
   // CHECK-LABEL: @elementwise_sqrt_v4f32
-  // NODIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}), !fpmath ![[MD_SQRT:[0-9]+]]
+  // NODIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}), !fpmath ![[MD_SQRT]]
   // DIVOPT: call <4 x float> @llvm.sqrt.v4f32(<4 x float> %{{.+}}){{$}}
   return __builtin_elementwise_sqrt(a);
 }
@@ -90,5 +126,7 @@ double4 elementwise_sqrt_v4f64(double4 a) {
 
 #endif
 
+// NODIVOPT: ![[MD_HFDIV]] = !{float 1.000000e+00}
+// NODIVOPT: ![[MD_HSQRT]] = !{float 1.500000e+00}
 // NODIVOPT: ![[MD_FDIV]] = !{float 2.500000e+00}
 // NODIVOPT: ![[MD_SQRT]] = !{float 3.000000e+00}
diff --git a/clang/test/CodeGenOpenCL/sqrt-fpmath.cl b/clang/test/CodeGenOpenCL/sqrt-fpmath.cl
index d0637283a7ec1..6f4adf56930ff 100644
--- a/clang/test/CodeGenOpenCL/sqrt-fpmath.cl
+++ b/clang/test/CodeGenOpenCL/sqrt-fpmath.cl
@@ -134,46 +134,52 @@ double16 call_sqrt_v16f64(double16 x) {
 }
 
 
-// Not for f16
 // CHECK-LABEL: define {{.*}} half @call_sqrt_f16(
-// CHECK: call {{.*}} half @_Z4sqrtDh(half noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} half @_Z4sqrtDh(half noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH:\![0-9]+]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} half @_Z4sqrtDh(half noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half call_sqrt_f16(half x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <2 x half> @call_sqrt_v2f16(
-// CHECK: call {{.*}} <2 x half> @_Z4sqrtDv2_Dh(<2 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <2 x half> @_Z4sqrtDv2_Dh(<2 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <2 x half> @_Z4sqrtDv2_Dh(<2 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half2 call_sqrt_v2f16(half2 x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <3 x half> @call_sqrt_v3f16(
-// CHECK: call {{.*}} <3 x half> @_Z4sqrtDv3_Dh(<3 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <3 x half> @_Z4sqrtDv3_Dh(<3 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <3 x half> @_Z4sqrtDv3_Dh(<3 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half3 call_sqrt_v3f16(half3 x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <4 x half> @call_sqrt_v4f16(
-// CHECK: call {{.*}} <4 x half> @_Z4sqrtDv4_Dh(<4 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <4 x half> @_Z4sqrtDv4_Dh(<4 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <4 x half> @_Z4sqrtDv4_Dh(<4 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half4 call_sqrt_v4f16(half4 x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <8 x half> @call_sqrt_v8f16(
-// CHECK: call {{.*}} <8 x half> @_Z4sqrtDv8_Dh(<8 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <8 x half> @_Z4sqrtDv8_Dh(<8 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <8 x half> @_Z4sqrtDv8_Dh(<8 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half8 call_sqrt_v8f16(half8 x) {
   return sqrt(x);
 }
 
 
 // CHECK-LABEL: define {{.*}} <16 x half> @call_sqrt_v16f16(
-// CHECK: call {{.*}} <16 x half> @_Z4sqrtDv16_Dh(<16 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
+// DEFAULT: call {{.*}} <16 x half> @_Z4sqrtDv16_Dh(<16 x half> noundef %{{.+}}) #{{[0-9]+}}, !fpmath [[HFPMATH]]{{$}}
+// CORRECTLYROUNDED: call {{.*}} <16 x half> @_Z4sqrtDv16_Dh(<16 x half> noundef %{{.+}}) #{{[0-9]+$}}{{$}}
 half16 call_sqrt_v16f16(half16 x) {
   return sqrt(x);
 }
 
 // DEFAULT: [[FPMATH]] = !{float 3.000000e+00}
+// DEFAULT: [[HFPMATH]] = !{float 1.500000e+00}

[OpenCL] Set half-precision Div and Sqrt accuracy

b81729a

OpenCL spec relaxed half-precision divide to 1 ULP and sqrt to 1.5 ULP in KhronosGroup/OpenCL-Docs#1293 KhronosGroup/OpenCL-Docs#1386

wenju-he requested review from arsenm and Copilot February 4, 2026 07:35

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Feb 4, 2026

wenju-he requested a review from svenvh February 4, 2026 07:35

Copilot AI reviewed Feb 4, 2026

View reviewed changes

arsenm approved these changes Feb 4, 2026

View reviewed changes

svenvh approved these changes Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenCL] Set half-precision Div and Sqrt accuracy #179621

[OpenCL] Set half-precision Div and Sqrt accuracy #179621

wenju-he commented Feb 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

llvmbot commented Feb 4, 2026

Uh oh!

llvmbot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[OpenCL] Set half-precision Div and Sqrt accuracy #179621

Are you sure you want to change the base?

[OpenCL] Set half-precision Div and Sqrt accuracy #179621

Conversation

wenju-he commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

llvmbot commented Feb 4, 2026

Uh oh!

llvmbot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wenju-he commented Feb 4, 2026 •

edited

Loading