Skip to content

fix: handle null states in EVCacheBulkGetFuture and refactor#179

Merged
shy-1234 merged 5 commits intomasterfrom
dev/sh/fixbulkget
Feb 5, 2026
Merged

fix: handle null states in EVCacheBulkGetFuture and refactor#179
shy-1234 merged 5 commits intomasterfrom
dev/sh/fixbulkget

Conversation

@shy-1234
Copy link
Contributor

This PR cherry-picked the commit from PR to fix the NullPointerException in bulk get operations. More details regarding the bug can be found in the original PR. On top of the change, refactoring some of the bulk get logic and adding some comments to make it more clear.

Prudhviraj Karumanchi and others added 2 commits January 23, 2026 09:06
  Fixes NullPointerException in bulk get operations when some operations
  never signal completion through callbacks. The issue was introduced in
  commit e4a2a9b which optimized operation state checking by collecting
  state during callbacks, but didn't handle cases where callbacks don't fire.

  This fix adds a fallback mechanism that preserves the original behavior:
  - Use pre-collected state when available (maintains performance optimization)
  - Fall back to direct operation state checking when state is null
  - Ensures all operations are processed, matching pre-e4a2a9bd behavior

  The root cause was that operationStates[i] could remain null if
  signalSingleOpComplete() was never called for operations that timeout,
  fail, or are cancelled before their completion callbacks fire.
assert operationStates != null;

// Note: The latch here is counterintuitive. Based on the implementation in EVCacheMemcachedClient.asyncGetBulk(),
// the latch count is set to 1 no matter the chunk size and only decrement when pendingChunks counts down to 0.
Copy link
Collaborator

@Sunjeet Sunjeet Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the latch count is set to 1 no matter the chunk size

isn't it also 0 when chunks.isEmpty() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I actually mean I was expecting the latch count equal to the chunk size (no matter it's empty or with some chunks).

super(m, getOps, l, service);
rvMap = m;
ops = getOps;
opsArray = ops.toArray(new Operation[0]);
Copy link
Collaborator

@Sunjeet Sunjeet Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops is an empty list that gets populated later.. See EVCacheMemcachedClient's

final Collection<Operation> ops = new ArrayList<Operation>(chunks.size());
final EVCacheBulkGetFuture<T> rv = new EVCacheBulkGetFuture<T>(m, ops, latch, executorService, client); 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch.. agree, then we should convert to Array whenever we need them (was implemented that way in the original fix, but I tweak it the wrong way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

if (!allCompleted && !hasZF && hadTimedoutOp) statusString = EVCacheMetricsFactory.TIMEOUT;
if (hadTimedoutOp && !hasZF) statusString = EVCacheMetricsFactory.TIMEOUT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional to omit && hadTimedoutOp in the conditional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean omit !allCompleted? This is based on the fact that "!allCompleted" will always lead to hadTimedoutOp.


if (state == null) {
Operation op = opsArray[i];
op.timeOut();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to signal timeout here? Iiuc we're handling that one of the ops in the bulk had an exception, shouldn't we signal a cancellation instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. signal a cancellation to the op seems better if we enter here while state is still null

Copy link
Contributor Author

@shy-1234 shy-1234 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm after second look at this method, I wonder what's the intention of this method. The caller caught an exception, but we are not doing anything with that "ex". Instead we throw based on whichever ops we found "first" that is either Cancelled or TimedOut or Errored...
I feel like the correct implementation should be something like

    public void handleBulkException(Throwable ex) {
        Operation[] opsArray = ops.toArray(new Operation[0]);
        for (int i = 0; i < operationStates.length(); i++) {
            SingleOperationState state = operationStates.get(i);

            if (state == null) {
                Operation op = opsArray[i];
                op.cancel();
                MemcachedConnection.opSucceeded(op);
            } else {
                // Use pre-collected state
                if (state.timedOut) {
                    MemcachedConnection.opTimedOut(state.op);
                } else {
                    MemcachedConnection.opSucceeded(state.op);
                }
            }
        }

        throw new RuntimeException(ex);
    }

@shy-1234 shy-1234 marked this pull request as ready for review February 3, 2026 20:20
}
}

throw new RuntimeException(t);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it now possible to throw new RuntimeException(null)? may need to handle that

Copy link
Contributor Author

@shy-1234 shy-1234 Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting down the note here: As we discussed offline, the state==null condition is more like the scenario in the below else block, so we go with marking it as timedout. (here is the original code before state was introduced for reference)

@shy-1234 shy-1234 merged commit 27ab576 into master Feb 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants