[Questions] Quorum Queues disk alarms block consumption #15453
-
Community Support Policy
RabbitMQ version used4.2.3 Erlang version used28.3.x Operating system (distribution) usedOracle Linux How is RabbitMQ deployed?RPM package rabbitmq-diagnostics status outputSee https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics DetailsLogs from node 1 (with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs DetailsLogs from node 2 (if applicable, with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs DetailsLogs from node 3 (if applicable, with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs Detailsrabbitmq.confSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location DetailsSteps to deploy RabbitMQ clusterrpm install rabbitmq-server Steps to reproduce the behavior in questionPublish some messages, use all disk space, try to consume messages advanced.configSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location DetailsApplication codeDetails# PASTE CODE HERE, BETWEEN BACKTICKSKubernetes deployment fileDetails# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKSWhat problem are you trying to solve?For quorum queues, when disk alarm goes off not only publishing but also consuming messages is blocked. This makes recovering from failures harder as you might have fixed failed dependency which caused messages to accumulate but if disk alarm went off you cannot consume/ACK messages unless you clear disk alarm (if your data volume is now completely full you are out of luck). I thought of splitting segment files and WAL files to different volumes but looks like this is not possible via configuration. Idea was ACKs could be written to WAL even if segment's volume is full, which could result in segment truncation releasing space. Not sure if such assumption is correct whatsoever. Are there any recommendations how to recover from disk alarms for QQs, possibly in an automated way? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
@kamilzzz you haven't provided any executable evidence to back your claim. An alarm can trivially be simulated by using very high (for memory) or very low (for free disk space) values with Any connection on which RabbitMQ observes one of the publishing or content frames (specifically a You have two options:
Both recommendations have been around for years. |
Beta Was this translation helpful? Give feedback.
@kamilzzz you haven't provided any executable evidence to back your claim. An alarm can trivially be simulated by using very high (for memory) or very low (for free disk space) values with
rabbitmqctl set_vm_memory_high_watermark,rabbitmqctl set_disk_free_limit.Any connection on which RabbitMQ observes one of the publishing or content frames (specifically a
basic.publishframe or content header or content body frames) will be blocked, even if it is primarily or almost exclusively used by consumers. Connections that never publish won't be.You have two options:
Both recommendations…