Skip to content

[Bug] Default runbroker.sh causes OOM in PopBufferMergeService (26G heap, no backlog) #10052

@wangshuai67

Description

@wangshuai67

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

uos 32G 16C

RocketMQ version

5.3.2

JDK Version

open jdk 1.8.0_342

Describe the Bug

RocketMQ 5.3.1 master (SYNC_MASTER) in 32G container (JVM heap 26G) throws Java heap space OOM in PopBufferMergeService, using official default runbroker.sh (only modified -Xms26g -Xmx26g, no other changes).
No business/retry/revive log backlog, master-slave configs fully consistent (24h fileReservedTime). Slave node works fine. GC logs show continuous Full GC with 0 memory reclaimed (old gen 100% full). OOM fixed after G1GC tuning (G1HeapRegionSize=32m etc.).

Business handles large messages (attachments/video file) + batch GPS data packets (no backlog for business/retry/revive logs, master-slave configs consistent). Slave works fine, master has continuous Full GC with 0 memory reclaimed (old gen 100% full). OOM fixed after G1GC tuning (G1HeapRegionSize=32m etc.).

Image Image

Steps to Reproduce

  1. Deploy RocketMQ 5.3.1 master-slave cluster in 32G container.
  2. Use official default runbroker.sh, only set -Xms26g -Xmx26g for master.
  3. Set fileReservedTime=24 on both master and slave (consistent).
  4. Start NameServer, master, slave with default config.
  5. Run moderate normal production/consumption traffic (1-2KB messages, no failures).
  6. After hours, master throws Java heap space OOM in PopBufferMergeService.

What Did You Expect to See?

Master node runs stably with 26G heap, no OOM, normal GC, even under moderate traffic.

What Did You See Instead?

Master node (32G container, 26G heap) throws:
java.lang.OutOfMemoryError: Java heap space
at org.apache.rocketmq.broker.pop.PopBufferMergeService.merge(...)

GC logs show continuous Full GC with 0 memory reclaimed (old gen full: 26309M->26309M).
No business/retry/revive log backlog. Slave node works fine.
OOM fixed after tuning G1GC (G1HeapRegionSize=32m, InitiatingHeapOccupancyPercent=40).

Additional Context

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions