Bug 1806975

Summary:	cinder backup restore: decompression uses lots of memory
Product:	Red Hat OpenStack	Reporter:	rohit londhe <rlondhe>
Component:	openstack-cinder	Assignee:	Gorka Eguileor <geguileo>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Tzach Shefi <tshefi>
Severity:	urgent	Docs Contact:	Chuck Copello <ccopello>
Priority:	urgent
Version:	13.0 (Queens)	CC:	dhill, eharney, geguileo, jvisser, kiyyappa, ltoscano, pcaruana, pratik.bandarkar, senrique, tquinlan, tshefi
Target Milestone:	z13	Keywords:	TestOnly, Triaged, ZStream
Target Release:	13.0 (Queens)	Flags:	tshefi: automate_bug-
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	openstack-cinder-9.1.4-53.el7ost	Doc Type:	Bug Fix
Doc Text:	Before this update, when several restores were being performed concurrently, the backup service was failing because the system was running out of memory. With this update, we have increased the rate at which Python frees memory during backup restore operations by reducing the reference count to the data sooner, to allow Python to garbage collect the data as soon as it is decompressed rather than waiting until the restoration is complete. This resolves the issue, allowing the backup service to handle multiple restorations concurrently.	Story Points:	---
Clone Of:
Clones:	1810627 1810629 1866848 (view as bug list)		Environment:
Last Closed:	2021-04-01 13:27:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1810627, 1810629

Description rohit londhe 2020-02-25 11:52:25 UTC

Description of problem:

unable to restore Cinder volumes created after an FFU upgrade from OSP10 to OSP13

Noticed nova_api_wsgi and nova-conductor are the current high memory processes. 

It seems that cinder-backup was consuming 162GB of RAM when it was oom killed.

~~~
Feb 24 14:28:18 controller3 kernel: Out of memory: Kill process 2501135 (cinder-backup) score 797 or sacrifice child
Feb 24 14:28:18 controller3 kernel: Killed process 2501135 (cinder-backup), UID 0, total-vm:195150272kB, anon-rss:162185040kB, file-rss:536kB, shmem-rss:0kB
Feb 24 14:28:18 controller3 kernel: cinder-backup: page allocation failure: order:0, mode:0x280da
Feb 24 14:28:18 controller3 kernel: CPU: 13 PID: 2501135 Comm: cinder-backup Kdump: loaded Tainted: G               ------------ T 3.10.0-1062.12.1.el7.x86_64 #1
~~~

Also, noticed high resource utilization by snmpd on the same controller


Version-Release number of selected component (if applicable):

openstack-cinder-12.0.8-3.el7ost.noarch                     Fri Feb  7 12:53:05 2020
puppet-cinder-12.4.1-5.el7ost.noarch                        Fri Feb  7 12:52:15 2020
python2-cinderclient-3.5.0-1.el7ost.noarch                  Fri Feb  7 12:50:55 2020
python-cinder-12.0.8-3.el7ost.noarch                        Fri Feb  7 12:53:00 2020

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
2822235 root      20   0   76.5g  76.3g   3296 R 100.0 40.5 867:53.35 snmpd

# rpm -qf  /usr/sbin/snmpd
net-snmp-5.7.2-43.el7_7.3.x86_64

Tried downgrading the net-snmp version but still got the same results.


How reproducible:


Steps to Reproduce:
1. create backup of openstack volume with some large data inside
2. try to restore multiple backup at the same time.
3. You will notice OOM

Actual results:

cinder-backup getting OOM

Expected results:

multiple cinder backup volume should get restored at a time.

At this moment we are able to restore single volumes, but not multiple volumes at the same time.

Additional info:

Comment 9 Gorka Eguileor 2020-02-27 12:46:38 UTC

The amount of memory that cinder backup may require when we are restoring 41 compressed volumes simultaneously can be huge. I believe it's even worse for RBD volumes (many customers have RBD for both volumes and backups, and that's more efficient).

Here's the explanation of the peak memory we will need for 1 restore:

- First we use as much as the size of the chunk we have stored, which in our case is compressed, so if we assume a 50% compression ratio, it would be 50% of the original chunk which can be 1999994880 bytes.
  So here we use 0.93GB when the ChunkedBackupDriver reads the object [1]

            with self._get_object_reader(
                    container, object_name,
                    extra_metadata=extra_metadata) as reader:
                body = reader.read()

- Then when we decompress the data [2] we will need an additional 1,86GB which is the full original chunk size:
                decompressed = decompressor.decompress(body)

- Then when the ChunkedBackupDriver writes the data [3]:
                volume_file.write(decompressed)

  What happens behind that call, because this is an RBD volume, is that we use the osb-brick connector which uses librbd Image object to do the writing [4]:
    def write(self, data):
        self._rbd_volume.image.write(data, self._offset)

  The write method in librbd calls the rbd_write2 method [5]
            ret = rbd_write2(self.image, _offset, length, _data, _fadvise_flags)

  And the method will call the create_write_raw method with nullptr as the aio_completion parameter [6]:
       bl.push_back(create_write_raw(ictx, buf, len, nullptr));

  And because of this the create_write_raw method will copy the data in a different buffer [7]:
    if (ictx->disable_zero_copy || aio_completion == nullptr) {
      // must copy the buffer if writeback/writearound cache is in-use (or using
      // non-AIO)
      return buffer::copy(buf, len);

  And we end up using another 1.86GB of RAM


So we end up needing 0.93GB + 1.86GB + 1.86GB for a single restore operation. If we now do 41 simultaneous operations, we will end up using 190.65GB, more than the machine has, hence the OOM kill.

I see 2 improvements that can be done in the Cinder code:

- Help Python free memory faster by setting to None the body variable as soon as we decompress it, and setting to None the decompressed variable as soon as we've written it.
- Introduce a max concurrent backup & restore operations that will queue operations that exceed them.

To mitigate the problem in this deployment they can do any of:

- Reduce the number of concurrent restore operations
- Disable compression
- Reduce the size of the chunks with the backup_file_size variable



[1]: https://opendev.org/openstack/cinder/src/commit/a154a1360be62eed0e2bf20937503b55659f4701/cinder/backup/chunkeddriver.py#L712
[2]: https://opendev.org/openstack/cinder/src/commit/a154a1360be62eed0e2bf20937503b55659f4701/cinder/backup/chunkeddriver.py#L719
[3]: https://opendev.org/openstack/cinder/src/commit/a154a1360be62eed0e2bf20937503b55659f4701/cinder/backup/chunkeddriver.py#L720
[4]: https://opendev.org/openstack/os-brick/src/commit/49d5616f86d637c846d54cd48c5ed4e17bd6695e/os_brick/initiator/linuxrbd.py#L195
[5]: https://github.com/ceph/ceph/blob/53febd478dfc7282f0948853c117061d96cda9b1/src/pybind/rbd/rbd.pyx#L4321
[6]: https://github.com/ceph/ceph/blob/b2e825debc4d47cede8df86b96af94893241ddf7/src/librbd/librbd.cc#L5826
[7]: https://github.com/ceph/ceph/blob/b2e825debc4d47cede8df86b96af94893241ddf7/src/librbd/librbd.cc#L91-L94

Comment 21 Lon Hohberger 2020-05-26 10:44:12 UTC

According to our records, this should be resolved by openstack-cinder-12.0.10-2.el7ost.  This build is available now.

Comment 23 Gorka Eguileor 2020-08-06 15:25:20 UTC

This BZ is for the mitigation fix that speeds up the freeing of memory during a backup restore operations.
There is an additional feature that we are working on to limit the number of concurrent "memory heavy" operations, but that one will only be backported to OSP16.  That RFE is being tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1866848

Comment 24 Gorka Eguileor 2020-08-07 11:21:30 UTC

*** Bug 1810629 has been marked as a duplicate of this bug. ***

Comment 26 Tzach Shefi 2020-09-02 07:55:35 UTC

Verified on: openstack-cinder-12.0.10-11.el7ost.noarch
Notice Eric's #17, fixed-in landed on openstack-cinder-12.0.10-9.el7

I'd deployed two separate  OSP13 systems, on two identical servers (CPU/RAM/disk/network) 
Titan92 (openstack-cinder-12.0.10-2.el7ost.noarch)
Titan93 (openstack-cinder-12.0.10-11.el7ost.noarch)

On both systems C-vol was backed by Ceph, Cinder backup was NFS backed. 

Created a 3G volume filled with random data,
uploaded volume to Glance, cloned same image on both systems. 
From said image I'd created 5 volumes on each system, 
backed-up all 5 volumes on each system.

Opened TOP on both systems, monitoring Cinder-backup's memory consumption. 
Then I restored all 5 backups on each system simultaneously. 

Titan92(pre-fixed) system consistently consumed more RAM than Titan93(post-fixed) system.
On average Titan92 consumed about 2.5-3 times more RAM than Titan93. 
I had duplicated restore procedure twice RAM consumption trend re-confirmed.  


Unfortunately my resources are not production grade, 
can't simulate multiple 10+ large volumes in the 100G+ range. 
Explains why my pre-fixed system didn't exhibit the reported OOM state. 
However as mentioned above RAM consumption reduction was clearly visible when comparing both systems. 
Good to verify.

Comment 30 Luigi Toscano 2021-04-01 13:27:52 UTC

TestOnly bug, shipped in 13z13, need to be manually closed.

Comment 31 Red Hat Bugzilla 2023-09-15 00:29:53 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days