-
-
Notifications
You must be signed in to change notification settings - Fork 230
Address ConcurrentModificationException which occasionally happens during XStream saving of running workflows [JENKINS-76294] #536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ecksums [JENKINS-76294] Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
… recreate it anyway [JENKINS-76294] Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
…place() for XStreams serialization [JENKINS-76294] Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
…tion [JENKINS-76294] Signed-off-by: Jim Klimov <jimklimov+jenkinsci@gmail.com>
|
Unit test survived, proposed fix is working. |
|
Note for reviewers: I did not check if de-serialization works correctly (and not sure how to do so), but with a full-object replica with no prop-renaming trickery, there is little reason why it would not, right? :) |
|
Class which you have been updated for this problem was taken from https://github.com/jenkinsci/htmlpublisher-plugin - did you check if and how this problem was fixed there? |
|
@damianszczepanik : can't quickly say which of those sources was the origin of yours :) but...
|
| * @throws Exception If test failed | ||
| */ | ||
| @Test | ||
| @Timeout(900) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the test result if this timeout is reached?
| @Test | ||
| @Timeout(900) | ||
| @Issue("JENKINS-76294") | ||
| void noCmeWhileSavingXStreamVsSafeArchiveServingRunAction(JenkinsRule j) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to throw an Exception here, suggest to put something concrete if possible
| void noCmeWhileSavingXStreamVsSafeArchiveServingRunAction(JenkinsRule j) throws Exception { | ||
| // How many parallel stages would we use before saving WorkflowRun | ||
| // state inside the pipeline run, and overall? | ||
| int preflood = 25, maxflood = 75; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this approach of merging two definition into single line
| LOGGER.info("create extra build agents done"); | ||
|
|
||
| LOGGER.info("define " + maxRuns + " test workflows"); | ||
| String pipeCode = "import java.lang.Math;\n" + "import java.util.Random;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put this java code into separate java file and load as a resource into the string? it help reading such code as putting this into java file enables syntax and coloring
See https://issues.jenkins.io/browse/JENKINS-76294, #336 and jenkinsci/lockable-resources-plugin#825 (for example of similar problem and fix).
This PR adds
@GuardedByandsynchronizedprotection forMap SafeArchiveServingAction.fileChecksumswhich is implicated in job crashes:and those crashes are likely due to the plugin sometimes changing the map while another thread in the Jenkins controller says it should be serialized (e.g. to save workflow state). To address the latter, a
writeReplace()method is introduced (which is found by XStreams serialization via reflection, along with the properties it wants to save), and constructs asynchronizedcopy of the current object to be actually saved in its stead.Testing done
OBSOLETED: So far none. There are no existing unit tests in this plugin source code, and I am not sure where to start from scratch. An adaptation of test added for Lockable Resources (see above), trying to run many parallel builds and stages in them, while saving the workflows and server state internally and externally, would be a likely start.
UPDATED: See also PR #537 which adds the test to original code base and exposes the bug. Same commit is now added to this PR branch to check that the issue is no longer reproducible (passed on my system).
Note that previously this repository had no tests, so relevant
pom.xmlbits were copied from Lockable Resource plugin (along with the convoluted test to force the race condition and crash the best we can). Thepom.xmlchanges in particular may be a bit of overkill for this one test case, pulling in components that are not really referenced (I did not investigate deeper). They might be useful if someone adds other tests later though.Submitter checklist