Bug 1504661

Summary: Problem in creating several cinder backups at same time
Product: Red Hat OpenStack Reporter: Gorka Eguileor <geguileo>
Component: openstack-cinderAssignee: Gorka Eguileor <geguileo>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: aavraham, cpaquin, cschwede, cswanson, dciabrin, dhill, dvd, ebeaudoi, geguileo, jthomas, lkuchlan, marjones, mbayer, mburns, pgrist, scohen, srevivo, tshefi
Target Milestone: z7Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)Flags: tshefi: automate_bug+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-9.1.4-23.el7ost Doc Type: Bug Fix
Doc Text:
Previously, certain method calls for backup/restore operations would block the eventlet's thread switching. Consequently, operations were slower and connection errors were observed in the database and RabbitMQ logs. With this update, proxy blocking method calls were changed into native threads to prevent blocking. As a result, restore/backup operations are faster and the connection issues are resolved.
Story Points: ---
Clone Of: 1464146
: 1504670 1542607 (view as bug list) Environment:
Last Closed: 2018-02-27 16:39:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1504670, 1504671    
Bug Blocks: 1464146, 1542607    

Description Gorka Eguileor 2017-10-20 11:12:31 UTC
+++ This bug was initially created as a clone of Bug #1464146 +++

>> Description of problem:

Creating the cinder backup manually, no problem:

cinder --os-tenant-name sandbox create --display_name volume_kris 1
cinder --os-tenant-name sandbox backup-create --display-name back_kris --force 5a06927c-1892-45b8-b8a3-3981dafea875


When we run the below scripts, we have problems:

Creating of 10 volumes

#!/bin/sh
for var in {0..9}
do
  cinder --os-tenant-name sandbox create --display_name volume_kris_$var 1
done

Creating 10 backup volumes
#!/bin/sh
i=0
for var in $(cinder --os-tenant-name sandbox list | grep volume_kris_ |awk '{print $2}')
do
cinder --os-tenant-name sandbox backup-create --display-name back_kris_$i --force $var
i=$((i+1))
done


>> Version-Release number of selected component (if applicable):
Openstack 9
openstack-cinder-8.1.1-4.el7ost.noarch                      Sat Mar 18 03:48:42 2017
python-cinder-8.1.1-4.el7ost.noarch                         Sat Mar 18 03:42:38 2017
python-cinderclient-1.6.0-1.el7ost.noarch                   Sat Mar 18 03:37:14 2017


>> How reproducible:
Re-run the above script
Note: It's not all the cinder backups creation that will failed


>> Actual results:
Few cinder backups will not be created getting the "error" or "creating" state.

Expected results:
After the scripts, all the cinders backups are created

Additional info:
After we modified the timeout as below, we got better results. 
  listen mysql
  timeout client 180m
  timeout server 180m

This seems to be caused by Cinder's data compression (a CPU intensive operation) during backups being done directly in the greenthread, which would prevent thread switching to other greenthreads.

Given enough greenthreads doing compression they would end up running mostly serially and preventing other threads from running.

Solution would be to run the compression on a native thread so they don't interfere with greenthread switching.

--- Additional comment from Gorka Eguileor on 2017-09-26 08:39:45 EDT ---

Seems to be the same issue as in bz #1403948

Comment 11 Tzach Shefi 2018-02-08 15:32:16 UTC
Verified on:
openstack-cinder-9.1.4-24.el7ost.noarch

Ran Gorka's script to generate 10 volumes (in my case nfs backed) then back them up to Swift Cinder backend. 
All volumes and backups are available. 


[stack@undercloud-0 ~]$ cinder backup-list
+--------------------------------------+--------------------------------------+-----------+-------------+------+--------------+---------------+                                                                                                                                
| ID                                   | Volume ID                            | Status    | Name        | Size | Object Count | Container     |                                                                                                                                
+--------------------------------------+--------------------------------------+-----------+-------------+------+--------------+---------------+                                                                                                                                
| 2181a5a8-0fef-4448-9079-77f15e65b81b | 3f7fb4e6-c007-4911-bd42-091c796fc74c | available | back_kris_2 | 1    | 22           | volumebackups |                                                                                                                                
| 4165e80d-39b1-482d-a2d2-bc02e4fa8801 | f6ac45bc-5d82-4102-a1a9-5cae0be1678a | available | back_kris_8 | 1    | 22           | volumebackups |
| 51958304-2472-4a38-80f7-c007e73f4cb2 | 88957b5d-d452-46d3-8a25-32d90fa3b36b | available | back_kris_5 | 1    | 22           | volumebackups |
| 5b561ae2-97d2-4c17-87f1-436f33c142f8 | 154e2445-e2a7-479b-805d-bd6e0a2cf3bc | available | back_kris_0 | 1    | 22           | volumebackups |
| 857a0aa6-1d59-4111-b204-5bcfc1dd50c2 | d32d58de-f324-4b1d-88bd-98805802d10a | available | back_kris_7 | 1    | 22           | volumebackups |
| 9127a127-2504-4db2-93ac-3726964bab25 | c6e3e0d0-41d6-4d01-9329-30010aebcd55 | available | back_kris_6 | 1    | 22           | volumebackups |
| b18ffd27-31ca-4c09-9e01-f49b382bea5c | 7a5e3cc2-420b-4f64-ae43-2fd58018e622 | available | back_kris_4 | 1    | 22           | volumebackups |
| bfe72142-f51f-4c56-bd74-b5e0d3b15fad | 753d375e-2bd6-4747-9a6a-d95d686e413c | available | back_kris_3 | 1    | 22           | volumebackups |
| c3e903e5-4d20-4538-a0bf-116a160dd6c0 | 3d5afdfd-e45a-4d6a-b64d-c7421a90e98f | available | back_kris_1 | 1    | 22           | volumebackups |
| cabaca92-1b60-43f6-93ca-7c31a5cbf36c | 03d10afe-667c-466c-8dd2-7cdfc0ddae13 | available | -           | 1    | 22           | volumebackups |
+--------------------------------------+--------------------------------------+-----------+-------------+------+--------------+---------------+


Reran same scripts time filled 1 volume with random data,
Created an image from that filled volume, created 10 new volumes from that image. 
Again all backups were created successfully, tho as expected took a bit longer to complete.

Comment 13 errata-xmlrpc 2018-02-27 16:39:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0360