Bug 1866853 - [RFE] Add parameter to configure Cinder's max backup/restore operations
Summary: [RFE] Add parameter to configure Cinder's max backup/restore operations
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: Unspecified
OS: Linux
high
medium
Target Milestone: z2
: 17.1
Assignee: Alan Bishop
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On: 1866848
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-06 15:23 UTC by Gorka Eguileor
Modified: 2023-08-14 16:29 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-0.20220519001841.3965818.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of: 1866848
Environment:
Last Closed:
Target Upstream Version:
Embargoed:
ifrangs: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 783741 0 None MERGED Add CinderBackupWorkers and CinderBackupMaxOperations 2023-06-05 13:51:43 UTC
Red Hat Issue Tracker OSP-1518 0 None None None 2021-12-14 15:21:39 UTC

Description Gorka Eguileor 2020-08-06 15:23:02 UTC
Add a THT parameter to configure the new Cinder "backup_max_operations" parameter for the "[DEFAULT]" section that limits the number of concurrent backup/restore operations.

+++ This bug was initially created as a clone of Bug #1866848 +++

The feature is to be able to limit the number of concurrent backup/restore operations by each cinder-backup service, thus controlling the maximum amount of memory the service will be using.


+++ This bug was initially created as a clone of Bug #1806975 +++
+++ Original summary was: cinder backup restore: decompression uses lots of memory +++

Description of problem:

unable to restore Cinder volumes created after an FFU upgrade from OSP10 to OSP13

Noticed nova_api_wsgi and nova-conductor are the current high memory processes. 

It seems that cinder-backup was consuming 162GB of RAM when it was oom killed.

~~~
Feb 24 14:28:18 controller3 kernel: Out of memory: Kill process 2501135 (cinder-backup) score 797 or sacrifice child
Feb 24 14:28:18 controller3 kernel: Killed process 2501135 (cinder-backup), UID 0, total-vm:195150272kB, anon-rss:162185040kB, file-rss:536kB, shmem-rss:0kB
Feb 24 14:28:18 controller3 kernel: cinder-backup: page allocation failure: order:0, mode:0x280da
Feb 24 14:28:18 controller3 kernel: CPU: 13 PID: 2501135 Comm: cinder-backup Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.12.1.el7.x86_64 #1
~~~

Also, noticed high resource utilization by snmpd on the same controller


Version-Release number of selected component (if applicable):

openstack-cinder-12.0.8-3.el7ost.noarch                     Fri Feb  7 12:53:05 2020
puppet-cinder-12.4.1-5.el7ost.noarch                        Fri Feb  7 12:52:15 2020
python2-cinderclient-3.5.0-1.el7ost.noarch                  Fri Feb  7 12:50:55 2020
python-cinder-12.0.8-3.el7ost.noarch                        Fri Feb  7 12:53:00 2020

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
2822235 root      20   0   76.5g  76.3g   3296 R 100.0 40.5 867:53.35 snmpd

# rpm -qf  /usr/sbin/snmpd
net-snmp-5.7.2-43.el7_7.3.x86_64

Tried downgrading the net-snmp version but still got the same results.


How reproducible:


Steps to Reproduce:
1. create backup of openstack volume with some large data inside
2. try to restore multiple backup at the same time.
3. You will notice OOM

Actual results:

cinder-backup getting OOM

Expected results:

multiple cinder backup volume should get restored at a time.

At this moment we are able to restore single volumes, but not multiple volumes at the same time.

Comment 5 Tzach Shefi 2022-06-07 14:18:44 UTC
Verified on:
openstack-tripleo-heat-templates-14.3.1-0.20220513220827.4edb55d.el9ost.noarch

Deployed an overcloud with the default settings,
I then updated the overcloud using this yaml
(undercloud) [stack@undercloud-0 ~]$ cat virt/custombackup.yaml 
parameter_defaults:
    CinderBackupWorkers: 2   
    CinderBackupMaxOperations: 3


Diff between original cinder.conf and the updated cinder.conf confirms changes were implemented successfully into Cinder.conf

[root@controller-2 ~]# diff cinder.conf.preupdate /var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf
303c303
< backup_workers=1
---
> backup_workers=2
309c309
< backup_max_operations=15
---
> backup_max_operations=3


Note You need to log in before you can comment on or make changes to this bug.