Bug 1806975
Summary: | cinder backup restore: decompression uses lots of memory | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | rohit londhe <rlondhe> | |
Component: | openstack-cinder | Assignee: | Gorka Eguileor <geguileo> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tzach Shefi <tshefi> | |
Severity: | urgent | Docs Contact: | Chuck Copello <ccopello> | |
Priority: | urgent | |||
Version: | 13.0 (Queens) | CC: | dhill, eharney, geguileo, jvisser, kiyyappa, ltoscano, pcaruana, pratik.bandarkar, senrique, tquinlan, tshefi | |
Target Milestone: | z13 | Keywords: | TestOnly, Triaged, ZStream | |
Target Release: | 13.0 (Queens) | Flags: | tshefi:
automate_bug-
|
|
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | openstack-cinder-9.1.4-53.el7ost | Doc Type: | Bug Fix | |
Doc Text: |
Before this update, when several restores were being performed concurrently, the backup service was failing because the system was running out of memory.
With this update, we have increased the rate at which Python frees memory during backup restore operations by reducing the reference count to the data sooner, to allow Python to garbage collect the data as soon as it is decompressed rather than waiting until the restoration is complete. This resolves the issue, allowing the backup service to handle multiple restorations concurrently.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1810627 1810629 1866848 (view as bug list) | Environment: | ||
Last Closed: | 2021-04-01 13:27:52 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1810627, 1810629 |
Description
rohit londhe
2020-02-25 11:52:25 UTC
The amount of memory that cinder backup may require when we are restoring 41 compressed volumes simultaneously can be huge. I believe it's even worse for RBD volumes (many customers have RBD for both volumes and backups, and that's more efficient). Here's the explanation of the peak memory we will need for 1 restore: - First we use as much as the size of the chunk we have stored, which in our case is compressed, so if we assume a 50% compression ratio, it would be 50% of the original chunk which can be 1999994880 bytes. So here we use 0.93GB when the ChunkedBackupDriver reads the object [1] with self._get_object_reader( container, object_name, extra_metadata=extra_metadata) as reader: body = reader.read() - Then when we decompress the data [2] we will need an additional 1,86GB which is the full original chunk size: decompressed = decompressor.decompress(body) - Then when the ChunkedBackupDriver writes the data [3]: volume_file.write(decompressed) What happens behind that call, because this is an RBD volume, is that we use the osb-brick connector which uses librbd Image object to do the writing [4]: def write(self, data): self._rbd_volume.image.write(data, self._offset) The write method in librbd calls the rbd_write2 method [5] ret = rbd_write2(self.image, _offset, length, _data, _fadvise_flags) And the method will call the create_write_raw method with nullptr as the aio_completion parameter [6]: bl.push_back(create_write_raw(ictx, buf, len, nullptr)); And because of this the create_write_raw method will copy the data in a different buffer [7]: if (ictx->disable_zero_copy || aio_completion == nullptr) { // must copy the buffer if writeback/writearound cache is in-use (or using // non-AIO) return buffer::copy(buf, len); And we end up using another 1.86GB of RAM So we end up needing 0.93GB + 1.86GB + 1.86GB for a single restore operation. If we now do 41 simultaneous operations, we will end up using 190.65GB, more than the machine has, hence the OOM kill. I see 2 improvements that can be done in the Cinder code: - Help Python free memory faster by setting to None the body variable as soon as we decompress it, and setting to None the decompressed variable as soon as we've written it. - Introduce a max concurrent backup & restore operations that will queue operations that exceed them. To mitigate the problem in this deployment they can do any of: - Reduce the number of concurrent restore operations - Disable compression - Reduce the size of the chunks with the backup_file_size variable [1]: https://opendev.org/openstack/cinder/src/commit/a154a1360be62eed0e2bf20937503b55659f4701/cinder/backup/chunkeddriver.py#L712 [2]: https://opendev.org/openstack/cinder/src/commit/a154a1360be62eed0e2bf20937503b55659f4701/cinder/backup/chunkeddriver.py#L719 [3]: https://opendev.org/openstack/cinder/src/commit/a154a1360be62eed0e2bf20937503b55659f4701/cinder/backup/chunkeddriver.py#L720 [4]: https://opendev.org/openstack/os-brick/src/commit/49d5616f86d637c846d54cd48c5ed4e17bd6695e/os_brick/initiator/linuxrbd.py#L195 [5]: https://github.com/ceph/ceph/blob/53febd478dfc7282f0948853c117061d96cda9b1/src/pybind/rbd/rbd.pyx#L4321 [6]: https://github.com/ceph/ceph/blob/b2e825debc4d47cede8df86b96af94893241ddf7/src/librbd/librbd.cc#L5826 [7]: https://github.com/ceph/ceph/blob/b2e825debc4d47cede8df86b96af94893241ddf7/src/librbd/librbd.cc#L91-L94 According to our records, this should be resolved by openstack-cinder-12.0.10-2.el7ost. This build is available now. This BZ is for the mitigation fix that speeds up the freeing of memory during a backup restore operations. There is an additional feature that we are working on to limit the number of concurrent "memory heavy" operations, but that one will only be backported to OSP16. That RFE is being tracked by https://bugzilla.redhat.com/show_bug.cgi?id=1866848 *** Bug 1810629 has been marked as a duplicate of this bug. *** Verified on: openstack-cinder-12.0.10-11.el7ost.noarch Notice Eric's #17, fixed-in landed on openstack-cinder-12.0.10-9.el7 I'd deployed two separate OSP13 systems, on two identical servers (CPU/RAM/disk/network) Titan92 (openstack-cinder-12.0.10-2.el7ost.noarch) Titan93 (openstack-cinder-12.0.10-11.el7ost.noarch) On both systems C-vol was backed by Ceph, Cinder backup was NFS backed. Created a 3G volume filled with random data, uploaded volume to Glance, cloned same image on both systems. From said image I'd created 5 volumes on each system, backed-up all 5 volumes on each system. Opened TOP on both systems, monitoring Cinder-backup's memory consumption. Then I restored all 5 backups on each system simultaneously. Titan92(pre-fixed) system consistently consumed more RAM than Titan93(post-fixed) system. On average Titan92 consumed about 2.5-3 times more RAM than Titan93. I had duplicated restore procedure twice RAM consumption trend re-confirmed. Unfortunately my resources are not production grade, can't simulate multiple 10+ large volumes in the 100G+ range. Explains why my pre-fixed system didn't exhibit the reported OOM state. However as mentioned above RAM consumption reduction was clearly visible when comparing both systems. Good to verify. TestOnly bug, shipped in 13z13, need to be manually closed. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |