Skip to content

Conversation

@jimklimov
Copy link

@jimklimov jimklimov commented Nov 27, 2025

See https://issues.jenkins.io/browse/JENKINS-76294, #336 and jenkinsci/lockable-resources-plugin#825 (for example of similar problem and fix).

This PR adds @GuardedBy and synchronized protection for Map SafeArchiveServingAction.fileChecksums which is implicated in job crashes:

java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
...
Caused: java.lang.RuntimeException: Failed to serialize net.masterthought.jenkins.SafeArchiveServingAction#fileChecksums
  for class net.masterthought.jenkins.SafeArchiveServingRunAction
...

and those crashes are likely due to the plugin sometimes changing the map while another thread in the Jenkins controller says it should be serialized (e.g. to save workflow state). To address the latter, a writeReplace() method is introduced (which is found by XStreams serialization via reflection, along with the properties it wants to save), and constructs a synchronized copy of the current object to be actually saved in its stead.

Testing done

OBSOLETED: So far none. There are no existing unit tests in this plugin source code, and I am not sure where to start from scratch. An adaptation of test added for Lockable Resources (see above), trying to run many parallel builds and stages in them, while saving the workflows and server state internally and externally, would be a likely start.

UPDATED: See also PR #537 which adds the test to original code base and exposes the bug. Same commit is now added to this PR branch to check that the issue is no longer reproducible (passed on my system).

Note that previously this repository had no tests, so relevant pom.xml bits were copied from Lockable Resource plugin (along with the convoluted test to force the race condition and crash the best we can). The pom.xml changes in particular may be a bit of overkill for this one test case, pulling in components that are not really referenced (I did not investigate deeper). They might be useful if someone adds other tests later though.

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

…ecksums [JENKINS-76294]

Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
… recreate it anyway [JENKINS-76294]

Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
…place() for XStreams serialization [JENKINS-76294]

Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
…tion [JENKINS-76294]

Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
@jimklimov
Copy link
Author

Unit test survived, proposed fix is working.

@jimklimov
Copy link
Author

jimklimov commented Nov 27, 2025

Note for reviewers: I did not check if de-serialization works correctly (and not sure how to do so), but with a full-object replica with no prop-renaming trickery, there is little reason why it would not, right? :)

@damianszczepanik
Copy link
Member

Class which you have been updated for this problem was taken from https://github.com/jenkinsci/htmlpublisher-plugin - did you check if and how this problem was fixed there?

@jimklimov
Copy link
Author

@damianszczepanik : can't quickly say which of those sources was the origin of yours :) but...

import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.LinkedBlockingQueue;

* @throws Exception If test failed
*/
@Test
@Timeout(900)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the test result if this timeout is reached?

@Test
@Timeout(900)
@Issue("JENKINS-76294")
void noCmeWhileSavingXStreamVsSafeArchiveServingRunAction(JenkinsRule j) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to throw an Exception here, suggest to put something concrete if possible

void noCmeWhileSavingXStreamVsSafeArchiveServingRunAction(JenkinsRule j) throws Exception {
// How many parallel stages would we use before saving WorkflowRun
// state inside the pipeline run, and overall?
int preflood = 25, maxflood = 75;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this approach of merging two definition into single line

LOGGER.info("create extra build agents done");

LOGGER.info("define " + maxRuns + " test workflows");
String pipeCode = "import java.lang.Math;\n" + "import java.util.Random;\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put this java code into separate java file and load as a resource into the string? it help reading such code as putting this into java file enables syntax and coloring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants