Bug 1504661 - Problem in creating several cinder backups at same time
Summary: Problem in creating several cinder backups at same time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z7
: 10.0 (Newton)
Assignee: Gorka Eguileor
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On: 1504670 1504671
Blocks: 1464146 1542607
TreeView+ depends on / blocked
 
Reported: 2017-10-20 11:12 UTC by Gorka Eguileor
Modified: 2022-08-16 11:49 UTC (History)
18 users (show)

Fixed In Version: openstack-cinder-9.1.4-23.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, certain method calls for backup/restore operations would block the eventlet's thread switching. Consequently, operations were slower and connection errors were observed in the database and RabbitMQ logs. With this update, proxy blocking method calls were changed into native threads to prevent blocking. As a result, restore/backup operations are faster and the connection issues are resolved.
Clone Of: 1464146
: 1504670 1542607 (view as bug list)
Environment:
Last Closed: 2018-02-27 16:39:47 UTC
Target Upstream Version:
Embargoed:
tshefi: automate_bug+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1719580 0 None None None 2017-10-20 11:12:31 UTC
OpenStack gerrit 507510 0 'None' 'MERGED' 'Run backup compression on native thread' 2019-11-13 02:31:20 UTC
OpenStack gerrit 518316 0 'None' 'MERGED' 'Run backup-restore operations on native thread' 2019-11-13 02:31:19 UTC
Red Hat Issue Tracker OSP-4731 0 None None None 2022-08-16 11:49:31 UTC
Red Hat Product Errata RHBA-2018:0360 0 normal SHIPPED_LIVE openstack-cinder bug fix advisory 2018-02-27 21:35:04 UTC

Description Gorka Eguileor 2017-10-20 11:12:31 UTC
+++ This bug was initially created as a clone of Bug #1464146 +++

>> Description of problem:

Creating the cinder backup manually, no problem:

cinder --os-tenant-name sandbox create --display_name volume_kris 1
cinder --os-tenant-name sandbox backup-create --display-name back_kris --force 5a06927c-1892-45b8-b8a3-3981dafea875


When we run the below scripts, we have problems:

Creating of 10 volumes

#!/bin/sh
for var in {0..9}
do
  cinder --os-tenant-name sandbox create --display_name volume_kris_$var 1
done

Creating 10 backup volumes
#!/bin/sh
i=0
for var in $(cinder --os-tenant-name sandbox list | grep volume_kris_ |awk '{print $2}')
do
cinder --os-tenant-name sandbox backup-create --display-name back_kris_$i --force $var
i=$((i+1))
done


>> Version-Release number of selected component (if applicable):
Openstack 9
openstack-cinder-8.1.1-4.el7ost.noarch                      Sat Mar 18 03:48:42 2017
python-cinder-8.1.1-4.el7ost.noarch                         Sat Mar 18 03:42:38 2017
python-cinderclient-1.6.0-1.el7ost.noarch                   Sat Mar 18 03:37:14 2017


>> How reproducible:
Re-run the above script
Note: It's not all the cinder backups creation that will failed


>> Actual results:
Few cinder backups will not be created getting the "error" or "creating" state.

Expected results:
After the scripts, all the cinders backups are created

Additional info:
After we modified the timeout as below, we got better results. 
  listen mysql
  timeout client 180m
  timeout server 180m

This seems to be caused by Cinder's data compression (a CPU intensive operation) during backups being done directly in the greenthread, which would prevent thread switching to other greenthreads.

Given enough greenthreads doing compression they would end up running mostly serially and preventing other threads from running.

Solution would be to run the compression on a native thread so they don't interfere with greenthread switching.

--- Additional comment from Gorka Eguileor on 2017-09-26 08:39:45 EDT ---

Seems to be the same issue as in bz #1403948

Comment 11 Tzach Shefi 2018-02-08 15:32:16 UTC
Verified on:
openstack-cinder-9.1.4-24.el7ost.noarch

Ran Gorka's script to generate 10 volumes (in my case nfs backed) then back them up to Swift Cinder backend. 
All volumes and backups are available. 


[stack@undercloud-0 ~]$ cinder backup-list
+--------------------------------------+--------------------------------------+-----------+-------------+------+--------------+---------------+                                                                                                                                
| ID                                   | Volume ID                            | Status    | Name        | Size | Object Count | Container     |                                                                                                                                
+--------------------------------------+--------------------------------------+-----------+-------------+------+--------------+---------------+                                                                                                                                
| 2181a5a8-0fef-4448-9079-77f15e65b81b | 3f7fb4e6-c007-4911-bd42-091c796fc74c | available | back_kris_2 | 1    | 22           | volumebackups |                                                                                                                                
| 4165e80d-39b1-482d-a2d2-bc02e4fa8801 | f6ac45bc-5d82-4102-a1a9-5cae0be1678a | available | back_kris_8 | 1    | 22           | volumebackups |
| 51958304-2472-4a38-80f7-c007e73f4cb2 | 88957b5d-d452-46d3-8a25-32d90fa3b36b | available | back_kris_5 | 1    | 22           | volumebackups |
| 5b561ae2-97d2-4c17-87f1-436f33c142f8 | 154e2445-e2a7-479b-805d-bd6e0a2cf3bc | available | back_kris_0 | 1    | 22           | volumebackups |
| 857a0aa6-1d59-4111-b204-5bcfc1dd50c2 | d32d58de-f324-4b1d-88bd-98805802d10a | available | back_kris_7 | 1    | 22           | volumebackups |
| 9127a127-2504-4db2-93ac-3726964bab25 | c6e3e0d0-41d6-4d01-9329-30010aebcd55 | available | back_kris_6 | 1    | 22           | volumebackups |
| b18ffd27-31ca-4c09-9e01-f49b382bea5c | 7a5e3cc2-420b-4f64-ae43-2fd58018e622 | available | back_kris_4 | 1    | 22           | volumebackups |
| bfe72142-f51f-4c56-bd74-b5e0d3b15fad | 753d375e-2bd6-4747-9a6a-d95d686e413c | available | back_kris_3 | 1    | 22           | volumebackups |
| c3e903e5-4d20-4538-a0bf-116a160dd6c0 | 3d5afdfd-e45a-4d6a-b64d-c7421a90e98f | available | back_kris_1 | 1    | 22           | volumebackups |
| cabaca92-1b60-43f6-93ca-7c31a5cbf36c | 03d10afe-667c-466c-8dd2-7cdfc0ddae13 | available | -           | 1    | 22           | volumebackups |
+--------------------------------------+--------------------------------------+-----------+-------------+------+--------------+---------------+


Reran same scripts time filled 1 volume with random data,
Created an image from that filled volume, created 10 new volumes from that image. 
Again all backups were created successfully, tho as expected took a bit longer to complete.

Comment 13 errata-xmlrpc 2018-02-27 16:39:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0360


Note You need to log in before you can comment on or make changes to this bug.