Skip to content

Conversation

@ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Sep 25, 2025

This PR bumps up the Golang version to 1.25.4. It also:

  • removes the ms_tls13kdf Golang build tag when building in FIPS mode because this tag was only needed with Golang versions 1.24.x.
  • sets ths GODEBUG=tlsmlkem=0 environment variable when running FIPS140-only unit tests. This prevents errors like so: Failed to connect: crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode.
  • replaces any fmt.Sprintf("%s:%d", ip, port) code fragments with net.JoinHostPort(ip, strconv.Itoa(int(port))) to work with the new hostport go vet analyzer.

@ycombinator ycombinator requested review from a team as code owners September 25, 2025 17:17
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 25, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Sep 25, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @ycombinator? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@ycombinator ycombinator added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-active-all Automated backport with mergify to all the active branches labels Sep 25, 2025
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 25, 2025
@ycombinator ycombinator requested a review from a team as a code owner September 25, 2025 17:28
Copy link
Contributor

@joecompute joecompute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@belimawr
Copy link
Contributor

There is a new Go Vet rule: https://go.dev/doc/go1.25#vet

So many builds are breaking with:

helper/server/udp/udp_test.go:81:26: address format "%s:%d" does not work with IPv6 (passed to net.Dial at L82)

https://buildkite.com/elastic/beats-metricbeat/builds/24655/steps/canvas?jid=01998236-e28d-40a4-a759-d06b75055f5a#01998236-e28d-40a4-a759-d06b75055f5a/132-385

@ycombinator
Copy link
Contributor Author

ycombinator commented Sep 25, 2025

The fips140=only unit tests are failing like so:

crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode

These appear to be golang/go#75148, which should be fixable when golang/go#74630 is implemented. However, in order to upgrade to Go 1.25.1 now, we'll need to find a workaround.

@ycombinator
Copy link
Contributor Author

ycombinator commented Sep 26, 2025

The fips140=only unit tests are failing like so:

crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode

These appear to be golang/go#75148, which should be fixable when golang/go#74630 is implemented. However, in order to upgrade to Go 1.25.1 now, we'll need to find a workaround.

These errors are coming from Go downloading dependencies before executing the tests. The errors can be simulated like so:

GODEBUG=fips140=only go mod download -x
# get https://proxy.golang.org/github.com/opencontainers/image-spec/@v/v1.1.1.info
# get https://proxy.golang.org/github.com/opencontainers/image-spec/@v/v1.1.1.info: Get "https://proxy.golang.org/github.com/opencontainers/image-spec/@v/v1.1.1.info": crypto/ecdh: use of X25519 is not allowed in FIPS 140-only mode
...

I ran into the same problem in elastic/elastic-agent#10156 and I had success with explicitly downloading the dependencies before executing the tests. I'm running into a different problem on that PR now; once it's sorted out, I will apply the same approach on this PR here. Moving this PR into draft until then.

@ycombinator ycombinator marked this pull request as draft September 26, 2025 05:27
@ycombinator
Copy link
Contributor Author

MacOS packaging steps are failing in CI like so (this example is from trying to package Agentbeat):

/usr/local/go/pkg/tool/linux_amd64/link: running o64-clang failed: exit status 1
--
  | /usr/local/osxcross/bin/o64-clang -arch x86_64 -m64 -Wl,-flat_namespace -Wl,-bind_at_load -Wl,-S -Wl,-x -o $WORK/b001/exe/a.out -Qunused-arguments /tmp/go-link-387501915/go.o /tmp/go-link-387501915/000000.o /tmp/go-link-387501915/000001.o /tmp/go-link-387501915/000002.o /tmp/go-link-387501915/000003.o /tmp/go-link-387501915/000004.o /tmp/go-link-387501915/000005.o /tmp/go-link-387501915/000006.o /tmp/go-link-387501915/000007.o /tmp/go-link-387501915/000008.o /tmp/go-link-387501915/000009.o /tmp/go-link-387501915/000010.o /tmp/go-link-387501915/000011.o /tmp/go-link-387501915/000012.o /tmp/go-link-387501915/000013.o /tmp/go-link-387501915/000014.o /tmp/go-link-387501915/000015.o /tmp/go-link-387501915/000016.o /tmp/go-link-387501915/000017.o /tmp/go-link-387501915/000018.o /tmp/go-link-387501915/000019.o /tmp/go-link-387501915/000020.o /tmp/go-link-387501915/000021.o /tmp/go-link-387501915/000022.o /tmp/go-link-387501915/000023.o /tmp/go-link-387501915/000024.o /tmp/go-link-387501915/000025.o /tmp/go-link-387501915/000026.o /tmp/go-link-387501915/000027.o /tmp/go-link-387501915/000028.o /tmp/go-link-387501915/000029.o /tmp/go-link-387501915/000030.o /tmp/go-link-387501915/000031.o /tmp/go-link-387501915/000032.o /tmp/go-link-387501915/000033.o /tmp/go-link-387501915/000034.o /tmp/go-link-387501915/000035.o /tmp/go-link-387501915/000036.o /tmp/go-link-387501915/000037.o /tmp/go-link-387501915/000038.o /tmp/go-link-387501915/000039.o /tmp/go-link-387501915/000040.o /tmp/go-link-387501915/000041.o /tmp/go-link-387501915/000042.o /tmp/go-link-387501915/000043.o /tmp/go-link-387501915/000044.o /tmp/go-link-387501915/000045.o /tmp/go-link-387501915/000046.o -lresolv -lpcap -lpcap -lpcap -lpcap -lproc -framework CoreFoundation -framework Security -lpcap -framework CoreServices -lpcap -lpthread -lpcap -lpcap -lpcap -lpcap -lpcap -lpcap
  | Undefined symbols for architecture x86_64:
  | "_SecTrustCopyCertificateChain", referenced from:
  | _crypto/x509/internal/macos.x509_SecTrustCopyCertificateChain_trampoline.abi0 in go.o
  | ld: symbol(s) not found for architecture x86_64
  | clang: error: linker command failed with exit code 1 (use -v to see invocation)
  |  
  | Error: running "go build -o build/golang-crossbuild/agentbeat-darwin-amd64 -buildmode pie -trimpath -tags=agentbeat -ldflags -s -X github.com/elastic/beats/v7/libbeat/version.buildTime=2025-09-26T23:08:11Z -X github.com/elastic/beats/v7/libbeat/version.commit=cf79447dfa4d2753c13a26766f8b25995a2062fe" failed with exit code 1
  | Error: failed building for darwin/amd64: exit status 1
  | failed building for darwin/amd64: exit status 1

From https://go.dev/doc/go1.25#darwin:

As announced in the Go 1.24 release notes, Go 1.25 requires macOS 12 Monterey or later. Support for previous versions has been discontinued.

And it looks like we're still using either the 10.11 or 11.3 MacOS SDK in golang-crossbuild images.

@cmacknz
Copy link
Member

cmacknz commented Dec 15, 2025

I think what is curious here is the failing auditbeat test is exactly the case the RemoveAll bug fix in Go 1.25.4 should correct - it explicitly creates a read only file and then automatically deleted it in the test Cleanup call fails.

func TestFileInfoPermissions(t *testing.T) {
f, err := os.CreateTemp(t.TempDir(), "metadata")
if err != nil {
t.Fatal(err)
}
defer f.Close()
name := f.Name()
makeFileNonReadable(t, f.Name())
info, err := os.Stat(name)
if err != nil {
t.Fatal(err)
}
meta, err := NewMetadata(name, info)
if !assert.NoError(t, err) {
t.Fatal(err)
}
t.Log(meta.Owner)
assert.NotEqual(t, "", meta.Owner)
}

This test case is pretty simple as a reproducer if we wanted to ask upstream about it.

@cmacknz
Copy link
Member

cmacknz commented Dec 15, 2025

The other option is we aren't actually building with Go 1.25.4 even though it looks like we should be.

@axw
Copy link
Member

axw commented Dec 15, 2025

The other option is we aren't actually building with Go 1.25.4 even though it looks like we should be.

The auditbeat\build\TEST-go-unit.out.json artifact in Buildkite claims it's using 1.25.4.

@mergify
Copy link
Contributor

mergify bot commented Dec 18, 2025

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b bump-golang-1.25.1 upstream/bump-golang-1.25.1
git merge upstream/main
git push upstream bump-golang-1.25.1

@cmacknz
Copy link
Member

cmacknz commented Jan 26, 2026

I can confirm that on a Windows Server 2022 21H2 VM the auditbeat test failing here failed on the first try with Go 1.25.6 so there is nothing special going on in CI for this one.


PS C:\Users\craig_mackenzie\Documents\beats\auditbeat\module\file_integrity> go version
go version go1.25.6 windows/amd64
PS C:\Users\craig_mackenzie\Documents\beats\auditbeat\module\file_integrity> go test ./... -run TestFileInfoPermissions
--- FAIL: TestFileInfoPermissions (1.53s)
    fileinfo_windows_test.go:53: BUILTIN\Administrators
    testing.go:1369: TempDir RemoveAll cleanup: unlinkat C:\Users\CRAIG_~1\AppData\Local\Temp\TestFileInfoPermissions4093013615\001\metadata4128307354: Access is denied.
FAIL
FAIL    github.com/elastic/beats/v7/auditbeat/module/file_integrity     3.382s
ok      github.com/elastic/beats/v7/auditbeat/module/file_integrity/monitor     1.010s [no tests to run]
?       github.com/elastic/beats/v7/auditbeat/module/file_integrity/schema      [no test files]
FAIL

func TestFileInfoPermissions(t *testing.T) {
f, err := os.CreateTemp(t.TempDir(), "metadata")
if err != nil {
t.Fatal(err)
}
defer f.Close()
name := f.Name()
makeFileNonReadable(t, f.Name())
info, err := os.Stat(name)
if err != nil {
t.Fatal(err)
}
meta, err := NewMetadata(name, info)
if !assert.NoError(t, err) {
t.Fatal(err)
}
t.Log(meta.Owner)
assert.NotEqual(t, "", meta.Owner)
}

@cmacknz
Copy link
Member

cmacknz commented Jan 26, 2026

I have a theory based on the reference to POSIX semantics in golang/go#75922 (comment) that this is based on use of Windows ACLs to mark the file readonly instead of going through os.Chmod.

The patch below causes the test to pass to confirm this:

diff --git a/auditbeat/module/file_integrity/fileinfo_windows_test.go b/auditbeat/module/file_integrity/fileinfo_windows_test.go
index a73f4642bc..209e4780fc 100644
--- a/auditbeat/module/file_integrity/fileinfo_windows_test.go
+++ b/auditbeat/module/file_integrity/fileinfo_windows_test.go
@@ -22,7 +22,6 @@ import (
        "path/filepath"
        "testing"

-       "github.com/hectane/go-acl"
        "github.com/stretchr/testify/assert"
        "golang.org/x/sys/windows"
 )
@@ -55,7 +54,7 @@ func TestFileInfoPermissions(t *testing.T) {
 }

 func makeFileNonReadable(t testing.TB, path string) {
-       if err := acl.Chmod(path, 0); err != nil {
+       if err := os.Chmod(path, 0); err != nil {
                t.Fatal(err)
        }
 }

This test happens to use go-acl, which we cloned most of into https://github.com/elastic/elastic-agent/blob/cd886995e9b6a09602f5927f47bc83c2bc80b12c/internal/pkg/acl/acl_windows.go#L143-L146 in Elastic Agent instead of forking upstream. That is probably the source of the test failures in agent CI.

I don't know why, it is probably that the fix upstream is incomplete somehow, but we have a lead on a way past this.

@ycombinator
Copy link
Contributor Author

Thanks @cmacknz.

IIUC, the intent of the calling code here, makeFileNonReadable, is to make the file non-readable. That works by passing a fileMode of 0 to acl.Chmod as it removes all access permissions on the file, due to the bitwise &-ing going on in https://github.com/hectane/go-acl/blob/ca0b05cb1adbf3d91585d8acab7bc804ee0b8583/chmod.go#L34-L36. However, if we were to pass a fileMode of 0 to os.Chmod instead, I believe the file would actually be granted the read-only attribute, as per https://github.com/golang/go/blob/f2cd93aa0505465c1d30201c806b6d4d3481c5fa/src/syscall/syscall_windows.go#L753. And that would not be what we intended here?

@cmacknz
Copy link
Member

cmacknz commented Jan 27, 2026

Yes, I think we need to keep the ACL manipulation and add a call to os.Chmod in the test to get it to pass.

I am still not sure of the exact mechanism in Go 1.25 that broke this, but using Chmod to set the read only attribute works around it and seems safe to do here.

For the similar Elastic Agent failures we still need to investigate more since in that case we are definitely relying on the ACLs there.

@michalpristas
Copy link
Contributor

michalpristas commented Jan 28, 2026

i think adding os.Chmod wont do a trick here, i'd focus more on getting a handle for a file you are owner but have no rights on (hence acl.Chmod(0))
auditbeat can get metadata because they are using stat, this is different syscall performing different access checks.

on delete during test cleanup, we have good access on test temp dir, in this dir we created metadata if parent has attribute specifying we can delete child, it will allow us to create a handle using DELETE. adding FILE_READ_PERMISSIONS in NtOpenFile in a RemoveAll (when cleaning test) now adds ACL check that was not necessary and here is where we probably fail

@cmacknz
Copy link
Member

cmacknz commented Jan 29, 2026

Thanks Michal I see it now, in golang/go@6d41809#diff-4620d8a807ea6bc818910000f177a72022dba33b44111ad14b2f3a959250dee0 a change in build tags in removeall_at.go and removeall_noat.go switched from calling os.Remove first in removeall_noat.go which does not open the file to the more complex deleteAt implementation that does open the file which is probably where the Access Is Denied comes from (would have to get out the debugger to confirm exactly).

Testing this, the following change to call os.Remove on the metadata file with no ACLs before os.RemoveAll gets to it in the cleanup call from the stdlib gets the test to pass. This is better than sticking os.Chmod in here, it just needs some comments as explanation.

diff --git a/auditbeat/module/file_integrity/fileinfo_windows_test.go b/auditbeat/module/file_integrity/fileinfo_windows_test.go
index a73f4642bc..a79877988d 100644
--- a/auditbeat/module/file_integrity/fileinfo_windows_test.go
+++ b/auditbeat/module/file_integrity/fileinfo_windows_test.go
@@ -40,6 +40,9 @@ func TestFileInfoPermissions(t *testing.T) {
        defer f.Close()

        name := f.Name()
+       t.Cleanup(func() {
+               err := os.Remove(name)
+       })

The Elastic Agent case is going to be similar, we must be trying to delete a file we don't have permission to open, but that is odd because we have to be running as admin and I would be surprised if we strip all ACLs like we do here (maybe we remove read or just enough to cause open to fail or something).

@michalpristas
Copy link
Contributor

michalpristas commented Jan 30, 2026

just a small update, i was playing with this and building go from source with some custom changes, and after quite some head-banging that my changes are not behaving in a way i want them i figured out something else.

this is indeed regression in 1.2, not 1.25.4 but 1.25.0, well partially.
changes in 1.25.4 would break it but what breaks it in addition is this
golang/go@a39046f

same logic applies as described above, when we have DELETE CHILD on parent we can open a handle on a file we don't have permissions for in this case metadata file. By adding SYNCHRONIZE or FILE_READ_ATTRIBUTE we break this behavior because we're asking for something extra that OS cannot give us.

we have 2 options at this point

  • stay on 1.24 for some time
  • go with 1.25.4 and disable this test for a while

this is touching really an edgy edge scenario of file manipulation.

in a meantime i have a very ugly custom go build (that i wont' share for the very same reason) with everything working as expected, new and old behavior, i just need to polish it and raise it upstream. so maybe it will get in, hopefully. in this case we could unskip test in one of future releases

@cmacknz
Copy link
Member

cmacknz commented Jan 30, 2026

Awesome thanks, for Beats the os.Remove work around in this test is fine with me as a placeholder to get us upgraded, I'm not sure that'll work in Elastic Agent though but I haven't done as thorough an investigation.

Basically the only thing that seems to work is bypassing the deletaat implementation, if we had to we could reproduce the previous implementation of os.RemoveAll and just call that in the places where this matters.

How to proceed here I think depends on how fast an upstream fix can happen, Go 1.24 will go out of support in February when 1.26 releases.

@michalpristas
Copy link
Contributor

just for visibility,
issue for this regression created upstream: golang/go#77402
with a fix already in place: golang/go#77403

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants