This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 1866848 - [RFE] Limit concurrent Block Storage service backup/restore operations to control memory usage
Summary: [RFE] Limit concurrent Block Storage service backup/restore operations to con...
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 16.2 (Train)
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Gorka Eguileor
QA Contact: Yosi Ben Shimon
Ian Frangs
URL:
Whiteboard:
Depends On:
Blocks: 1866853
TreeView+ depends on / blocked
 
Reported: 2020-08-06 14:53 UTC by Gorka Eguileor
Modified: 2025-01-17 16:13 UTC (History)
14 users (show)

Fixed In Version: openstack-cinder-18.2.1-0.20220526042308.5532645.el8ost
Doc Type: Enhancement
Doc Text:
With the new `backup_max_operations` parameter, you can now tune Block Storage service backups to operate more reliably within your hardware environment and usage patterns. + An unlimited number of concurrent backup and restore operations can lead to excessive memory consumption, which can kill the cinder backup service and result in service disruptions. + You can adjust the value of `backup_max_operations` to prevent these service disruptions.
Clone Of: 1806975
: 1866853 (view as bug list)
Environment:
Last Closed: 2025-01-17 16:13:06 UTC
Target Upstream Version:
Embargoed:
tshefi: automate_bug?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 710297 0 None MERGED Backup: Limit number of concurent operations 2021-04-30 15:15:49 UTC
Red Hat Issue Tracker OSP-2203 0 None None None 2021-12-06 11:00:09 UTC
Red Hat Issue Tracker OSP-32526 0 None None None 2024-07-18 08:27:55 UTC
Red Hat Issue Tracker OSP-33486 0 None None None 2025-01-17 16:13:38 UTC
Red Hat Issue Tracker   OSPRH-13131 0 None None None 2025-01-17 16:13:05 UTC
Red Hat Issue Tracker RHOSPDOC-1543 0 None None None 2023-11-20 11:25:01 UTC

Description Gorka Eguileor 2020-08-06 14:53:52 UTC
The feature is to be able to limit the number of concurrent backup/restore operations by each cinder-backup service, thus controlling the maximum amount of memory the service will be using.


+++ This bug was initially created as a clone of Bug #1806975 +++
+++ Original summary was: cinder backup restore: decompression uses lots of memory +++

Description of problem:

unable to restore Cinder volumes created after an FFU upgrade from OSP10 to OSP13

Noticed nova_api_wsgi and nova-conductor are the current high memory processes. 

It seems that cinder-backup was consuming 162GB of RAM when it was oom killed.

~~~
Feb 24 14:28:18 controller3 kernel: Out of memory: Kill process 2501135 (cinder-backup) score 797 or sacrifice child
Feb 24 14:28:18 controller3 kernel: Killed process 2501135 (cinder-backup), UID 0, total-vm:195150272kB, anon-rss:162185040kB, file-rss:536kB, shmem-rss:0kB
Feb 24 14:28:18 controller3 kernel: cinder-backup: page allocation failure: order:0, mode:0x280da
Feb 24 14:28:18 controller3 kernel: CPU: 13 PID: 2501135 Comm: cinder-backup Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.12.1.el7.x86_64 #1
~~~

Also, noticed high resource utilization by snmpd on the same controller


Version-Release number of selected component (if applicable):

openstack-cinder-12.0.8-3.el7ost.noarch                     Fri Feb  7 12:53:05 2020
puppet-cinder-12.4.1-5.el7ost.noarch                        Fri Feb  7 12:52:15 2020
python2-cinderclient-3.5.0-1.el7ost.noarch                  Fri Feb  7 12:50:55 2020
python-cinder-12.0.8-3.el7ost.noarch                        Fri Feb  7 12:53:00 2020

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
2822235 root      20   0   76.5g  76.3g   3296 R 100.0 40.5 867:53.35 snmpd

# rpm -qf  /usr/sbin/snmpd
net-snmp-5.7.2-43.el7_7.3.x86_64

Tried downgrading the net-snmp version but still got the same results.


How reproducible:


Steps to Reproduce:
1. create backup of openstack volume with some large data inside
2. try to restore multiple backup at the same time.
3. You will notice OOM

Actual results:

cinder-backup getting OOM

Expected results:

multiple cinder backup volume should get restored at a time.

At this moment we are able to restore single volumes, but not multiple volumes at the same time.

Comment 18 Brian Rosmaita 2023-08-04 13:21:34 UTC
@astillma Added suggested doc text.  There is more extensive documentation upstream that could be added somewhere if appropriate: https://review.opendev.org/c/openstack/cinder/+/710297/9/doc/source/admin/blockstorage-volume-backups.rst


Note You need to log in before you can comment on or make changes to this bug.