-
Notifications
You must be signed in to change notification settings - Fork 12k
Description
Before Creating the Bug Report
-
I found a bug, not just asking a question, which should be created in GitHub Discussions.
-
I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
-
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
uos 32G 16C
RocketMQ version
5.3.2
JDK Version
open jdk 1.8.0_342
Describe the Bug
RocketMQ 5.3.1 master (SYNC_MASTER) in 32G container (JVM heap 26G) throws Java heap space OOM in PopBufferMergeService, using official default runbroker.sh (only modified -Xms26g -Xmx26g, no other changes).
No business/retry/revive log backlog, master-slave configs fully consistent (24h fileReservedTime). Slave node works fine. GC logs show continuous Full GC with 0 memory reclaimed (old gen 100% full). OOM fixed after G1GC tuning (G1HeapRegionSize=32m etc.).
Business handles large messages (attachments/video file) + batch GPS data packets (no backlog for business/retry/revive logs, master-slave configs consistent). Slave works fine, master has continuous Full GC with 0 memory reclaimed (old gen 100% full). OOM fixed after G1GC tuning (G1HeapRegionSize=32m etc.).
Steps to Reproduce
- Deploy RocketMQ 5.3.1 master-slave cluster in 32G container.
- Use official default runbroker.sh, only set -Xms26g -Xmx26g for master.
- Set fileReservedTime=24 on both master and slave (consistent).
- Start NameServer, master, slave with default config.
- Run moderate normal production/consumption traffic (1-2KB messages, no failures).
- After hours, master throws Java heap space OOM in PopBufferMergeService.
What Did You Expect to See?
Master node runs stably with 26G heap, no OOM, normal GC, even under moderate traffic.
What Did You See Instead?
Master node (32G container, 26G heap) throws:
java.lang.OutOfMemoryError: Java heap space
at org.apache.rocketmq.broker.pop.PopBufferMergeService.merge(...)
GC logs show continuous Full GC with 0 memory reclaimed (old gen full: 26309M->26309M).
No business/retry/revive log backlog. Slave node works fine.
OOM fixed after tuning G1GC (G1HeapRegionSize=32m, InitiatingHeapOccupancyPercent=40).
Additional Context
