Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1811587

Summary: fail to create volume from existed volume when nums are larger than rbd_max_clone_depth
Product: Red Hat OpenStack Reporter: Alex Stupnikov <astupnik>
Component: openstack-cinderAssignee: Jon Bernard <jobernar>
Status: CLOSED ERRATA QA Contact: Tzach Shefi <tshefi>
Severity: high Docs Contact: Chuck Copello <ccopello>
Priority: high    
Version: 13.0 (Queens)CC: abishop, dmaley, eharney, gfidente, jobernar, jvisser, sbaldwin
Target Milestone: z12Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: openstack-cinder-12.0.10-4.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-24 11:51:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Stupnikov 2020-03-09 10:01:16 UTC
Description of problem:

Customer reported a case when cinder failed to flatten RBD volume and caused RBD segfault [1], [2]. I thought that it is some bug known to Ceph people, but Steve Baldwin told me that he is not familiar with any related Ceph bugs. At the same time Steve found upstream cinder bug: https://bugs.launchpad.net/cinder/+bug/1794956

Its upstream fix https://review.opendev.org/#/c/606038/ was blocked by core developer because he thinks that this issue should be addressed in RBD library. I believe that the best way to move this issue forward is to ask cinder developers to double-check everything and report ceph bug if it is the best way to address the issue.

This is a production customer-facing environment, so our highest priority is a work-around.


[1]
2020-03-04 11:32:42.401 34 INFO cinder.volume.drivers.rbd [req-5c33fbbe-6b74-41f1-9bad-34513abc14bb ccc1366cf96a4eea841b6abc76eff8cb a66b85c4cb754fb685e5b2ae2d122422 - default 5dc670a1c5504b2585471a079d480742] maximum clone depth (5) has been reached - flattening dest volume
2020-03-04 11:32:55.260 8 INFO oslo_service.service [req-37eb96f7-efee-40e2-b623-6c6d121f921d - - - - -] Child 34 killed by signal 11
2020-03-04 11:32:55.266 4156 INFO cinder.service [-] Starting cinder-volume node (version 12.0.8)
2020-03-04 11:32:55.277 4156 INFO cinder.volume.manager [req-eb04731f-389d-4795-9f9e-1bf79ce5c5d8 - - - - -] Starting volume driver RBDDriver (1.2.0)
2020-03-04 11:32:55.525 4156 INFO cinder.keymgr.migration [req-3b7439fa-c371-4480-b7e2-9b4e229286db - - - - -] Not migrating encryption keys because the ConfKeyManager is still in use.
2020-03-04 11:32:55.532 4156 INFO cinder.volume.manager [req-eb04731f-389d-4795-9f9e-1bf79ce5c5d8 - - - - -] Driver initialization completed successfully.
2020-03-04 11:32:55.535 4156 INFO cinder.manager [req-eb04731f-389d-4795-9f9e-1bf79ce5c5d8 - - - - -] Initiating service 17 cleanup
2020-03-04 11:32:55.538 4156 INFO cinder.manager [req-eb04731f-389d-4795-9f9e-1bf79ce5c5d8 - - - - -] Service 17 cleanup completed.
2020-03-04 11:32:55.612 4156 INFO cinder.volume.manager [req-eb04731f-389d-4795-9f9e-1bf79ce5c5d8 - - - - -] Initializing RPC dependent components of volume driver RBDDriver (1.2.0)
2020-03-04 11:32:55.677 4156 INFO cinder.volume.manager [req-eb04731f-389d-4795-9f9e-1bf79ce5c5d8 - - - - -] Driver post RPC initialization completed successfully.

2020-03-04 13:32:03.394 34 INFO cinder.volume.drivers.rbd [req-e9d7b018-5506-45cf-84c7-5df5985a5b39 ccc1366cf96a4eea841b6abc76eff8cb a66b85c4cb754fb685e5b2ae2d122422 - default 5dc670a1c5504
b2585471a079d480742] maximum clone depth (5) has been reached - flattening dest volume
2020-03-04 13:32:16.284 8 INFO oslo_service.service [req-cda6a667-b106-47cb-bbfc-12e972783356 - - - - -] Child 34 killed by signal 11
2020-03-04 13:32:16.290 23925 INFO cinder.service [-] Starting cinder-volume node (version 12.0.8)
2020-03-04 13:32:16.301 23925 INFO cinder.volume.manager [req-32fcd257-9ce0-443d-921e-5221d9672012 - - - - -] Starting volume driver RBDDriver (1.2.0)
2020-03-04 13:32:16.544 23925 INFO cinder.keymgr.migration [req-1cff1e13-bea0-4835-bba5-f6c3e5aaedf9 - - - - -] Not migrating encryption keys because the ConfKeyManager is still in use.
2020-03-04 13:32:16.552 23925 INFO cinder.volume.manager [req-32fcd257-9ce0-443d-921e-5221d9672012 - - - - -] Driver initialization completed successfully.
2020-03-04 13:32:16.556 23925 INFO cinder.manager [req-32fcd257-9ce0-443d-921e-5221d9672012 - - - - -] Initiating service 17 cleanup
2020-03-04 13:32:16.559 23925 INFO cinder.manager [req-32fcd257-9ce0-443d-921e-5221d9672012 - - - - -] Service 17 cleanup completed.
2020-03-04 13:32:16.629 23925 INFO cinder.volume.manager [req-32fcd257-9ce0-443d-921e-5221d9672012 - - - - -] Initializing RPC dependent components of volume driver RBDDriver (1.2.0)
2020-03-04 13:32:16.695 23925 INFO cinder.volume.manager [req-32fcd257-9ce0-443d-921e-5221d9672012 - - - - -] Driver post RPC initialization completed successfully.

[2]
/var/log/messages:
  Mar  4 11:32:54 njss1-ospdq-inf-ctrl0 kernel: cinder-volume[664544]: segfault at 1a0 ip 00007f922a1858e5 sp 00007ffffafc42f0 error 4 in librbd.so.1.12.0[7f922a0ba000+30f000]
  Mar  4 13:32:15 njss1-ospdq-inf-ctrl0 kernel: cinder-volume[1030845]: segfault at d98 ip 00007f21d3f268e0 sp 00007fff2c68f620 error 4 in librbd.so.1.12.0[7f21d3e5b000+30f000]


Version-Release number of selected component (if applicable):

https://access.redhat.com/containers/?architecture=AMD64&tab=package-list#/registry.access.redhat.com/rhosp13/openstack-cinder-volume/images/13.0-99

Comment 1 Alex Stupnikov 2020-03-09 11:15:21 UTC
What do you think about significantly increasing "rbd_max_clone_depth" and then decreasing it after the fix will be available?

Comment 2 Luigi Toscano 2020-03-11 13:05:58 UTC
Jon (and Eric, and Brian), do you think https://review.opendev.org/#/c/606038/ (when cleaned up) is the real solution to this issue, at least on the cinder side (i.e. don't crash when librdb acts weirdly)?

Comment 3 Alex Stupnikov 2020-03-11 14:03:56 UTC
I also would like to repeat a call for a work-around: do you think that customer will face any issues after decreasing "rbd_max_clone_depth" when this issue will be fixed?

Comment 4 Jon Bernard 2020-03-11 19:32:04 UTC
(In reply to Luigi Toscano from comment #2)
> Jon (and Eric, and Brian), do you think
> https://review.opendev.org/#/c/606038/ (when cleaned up) is the real
> solution to this issue, at least on the cinder side (i.e. don't crash when
> librdb acts weirdly)?

As far as I can tell, yes - these look to be the same issue.

Comment 5 Jon Bernard 2020-03-11 21:22:54 UTC
(In reply to Alex Stupnikov from comment #3)
> I also would like to repeat a call for a work-around: do you think that
> customer will face any issues after decreasing "rbd_max_clone_depth" when
> this issue will be fixed?

If rbd_max_clone_depth is set to 0, that will cause the clone function to issue a copy() instead of a COW clone and skip the faulty logic later in the function.  That should be a sufficient temporary workaround in the short term.  I'm going to post an update to the upstream patch as soon as I can.

Comment 6 Alex Stupnikov 2020-03-12 10:02:10 UTC
Thank you Jon. Do you think it will make sense to report a bug for Ceph developers, so they could work on a fix in parallel? If it makes sense, can you do this, or want me to do it?

Kind Regards, Alex.

Comment 7 Jon Bernard 2020-03-12 14:38:34 UTC
(In reply to Alex Stupnikov from comment #6)
> Thank you Jon. Do you think it will make sense to report a bug for Ceph
> developers, so they could work on a fix in parallel? If it makes sense, can
> you do this, or want me to do it?
> 
> Kind Regards, Alex.

IMO, it's never acceptable for a program to segfault.  So, I do think it's a fault that should be fixed in librbd.  But... Cinder's logic is wrong and fixing it there will hide the librbd issue.  So one could argue that it's not a severe bug, but it's a bug nonetheless and should be reported.  Let me know if you file it, else I will when I free up.  Thanks Alex.  BTW the patch upstream is shaping up and I hope it will merge soon, so a backport should be coming soon too.

Comment 8 Alex Stupnikov 2020-03-13 17:31:29 UTC
I have reported bug #1813402 for ceph developers.

Comment 24 Tzach Shefi 2020-06-08 12:59:56 UTC
Verified on:
openstack-cinder-12.0.10-10.el7ost.noarch

Followed Cinder LP bug about how to reproduce/test.
https://bugs.launchpad.net/cinder/+bug/1794956/comments/5

Set rbd_max_clone_depth = 2 in cinder.conf, restart c-vol docker. 
Default value BTW is 5. 


Uploaded an image:
(overcloud) [stack@undercloud-0 ~]$ glance image-create --name cirros --disk-format raw --container-format bare --file cirros-0.4.0-x86_64-disk.raw 
+------------------+----------------------------------------------------------------------------------+
| Property         | Value                                                                            |
+------------------+----------------------------------------------------------------------------------+
| checksum         | ba3cd24377dde5dfdd58728894004abb                                                 |
| container_format | bare                                                                             |
| created_at       | 2020-06-08T12:45:31Z                                                             |
| direct_url       | rbd://9dc3113a-a968-11ea-a2dc-525400046f39/images/51afc4cf-                      |
|                  | 16e3-4248-9991-84c2da173cdf/snap                                                 |
| disk_format      | raw                                                                              |
| id               | 51afc4cf-16e3-4248-9991-84c2da173cdf                                             |
| locations        | [{"url": "rbd://9dc3113a-a968-11ea-a2dc-525400046f39/images/51afc4cf-            |
|                  | 16e3-4248-9991-84c2da173cdf/snap", "metadata": {}}]                              |
| min_disk         | 0                                                                                |
| min_ram          | 0                                                                                |
| name             | cirros                                                                           |
| owner            | acb89238386848d5bb4502fc035f9d44                                                 |
| protected        | False                                                                            |
| size             | 46137344                                                                         |
| status           | active                                                                           |
| tags             | []                                                                               |
| updated_at       | 2020-06-08T12:45:34Z                                                             |
| virtual_size     | None                                                                             |
| visibility       | shared                                                                           |
+------------------+----------------------------------------------------------------------------------+


Create vol1 from image:
(overcloud) [stack@undercloud-0 ~]$ cinder create 5 --image cirros --name vol1
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | false                                |
| consistencygroup_id            | None                                 |
| created_at                     | 2020-06-08T12:47:30.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | c026e66e-68f0-43ed-95ee-d34db7aca3ec |
| metadata                       | {}                                   |
| migration_status               | None                                 |
| multiattach                    | False                                |
| name                           | vol1                                 |
| os-vol-host-attr:host          | hostgroup@tripleo_ceph#tripleo_ceph  |
| os-vol-mig-status-attr:migstat | None                                 |
| os-vol-mig-status-attr:name_id | None                                 |
| os-vol-tenant-attr:tenant_id   | acb89238386848d5bb4502fc035f9d44     |
| replication_status             | None                                 |
| size                           | 5                                    |
| snapshot_id                    | None                                 |
| source_volid                   | None                                 |
| status                         | creating                             |
| updated_at                     | 2020-06-08T12:47:30.000000           |
| user_id                        | 20fd2d35a42c422f94edd8bf03893198     |
| volume_type                    | tripleo                              |
+--------------------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID                                   | Status    | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| c026e66e-68f0-43ed-95ee-d34db7aca3ec | available | vol1 | 5    | tripleo     | true     |             |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+


Now create vol2, source is vol1 
(overcloud) [stack@undercloud-0 ~]$ cinder create 5 --source-volid c026e66e-68f0-43ed-95ee-d34db7aca3ec --name vol2
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | true                                 |
| consistencygroup_id            | None                                 |
| created_at                     | 2020-06-08T12:49:13.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | ab67da11-40ca-4bc4-b987-ad28afed023e |
| metadata                       | {}                                   |
| migration_status               | None                                 |
| multiattach                    | False                                |
| name                           | vol2                                 |
| os-vol-host-attr:host          | hostgroup@tripleo_ceph#tripleo_ceph  |
| os-vol-mig-status-attr:migstat | None                                 |
| os-vol-mig-status-attr:name_id | None                                 |
| os-vol-tenant-attr:tenant_id   | acb89238386848d5bb4502fc035f9d44     |
| replication_status             | None                                 |
| size                           | 5                                    |
| snapshot_id                    | None                                 |
| source_volid                   | c026e66e-68f0-43ed-95ee-d34db7aca3ec |
| status                         | creating                             |
| updated_at                     | 2020-06-08T12:49:13.000000           |
| user_id                        | 20fd2d35a42c422f94edd8bf03893198     |
| volume_type                    | tripleo                              |
+--------------------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID                                   | Status    | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ab67da11-40ca-4bc4-b987-ad28afed023e | available | vol2 | 5    | tripleo     | true     |             |
| c026e66e-68f0-43ed-95ee-d34db7aca3ec | available | vol1 | 5    | tripleo     | true     |             |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+

Create vol3 source is vol2
(overcloud) [stack@undercloud-0 ~]$ cinder create 5 --source-volid ab67da11-40ca-4bc4-b987-ad28afed023e --name vol3
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | true                                 |
| consistencygroup_id            | None                                 |
| created_at                     | 2020-06-08T12:50:24.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | d30c3708-bf6a-4ccc-beb7-cbd31a4cc010 |
| metadata                       | {}                                   |
| migration_status               | None                                 |
| multiattach                    | False                                |
| name                           | vol3                                 |
| os-vol-host-attr:host          | hostgroup@tripleo_ceph#tripleo_ceph  |
| os-vol-mig-status-attr:migstat | None                                 |
| os-vol-mig-status-attr:name_id | None                                 |
| os-vol-tenant-attr:tenant_id   | acb89238386848d5bb4502fc035f9d44     |
| replication_status             | None                                 |
| size                           | 5                                    |
| snapshot_id                    | None                                 |
| source_volid                   | ab67da11-40ca-4bc4-b987-ad28afed023e |
| status                         | creating                             |
| updated_at                     | 2020-06-08T12:50:25.000000           |
| user_id                        | 20fd2d35a42c422f94edd8bf03893198     |
| volume_type                    | tripleo                              |
+--------------------------------+--------------------------------------+

(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID                                   | Status    | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ab67da11-40ca-4bc4-b987-ad28afed023e | available | vol2 | 5    | tripleo     | true     |             |
| c026e66e-68f0-43ed-95ee-d34db7aca3ec | available | vol1 | 5    | tripleo     | true     |             |
| d30c3708-bf6a-4ccc-beb7-cbd31a4cc010 | available | vol3 | 5    | tripleo     | true     |             |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+

Depth of vol3 is 2 which equals rbd_max_clone_depth=2 in cinder.conf

Now lets create vol4 sourced from vol3, 
before the fix this would fail crashing c-vol service. 

(overcloud) [stack@undercloud-0 ~]$ cinder create 5 --source-volid d30c3708-bf6a-4ccc-beb7-cbd31a4cc010 --name vol4
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | true                                 |
| consistencygroup_id            | None                                 |
| created_at                     | 2020-06-08T12:52:24.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | 770126f0-b4fb-4df6-bfd8-2382b96e1e4e |
| metadata                       | {}                                   |
| migration_status               | None                                 |
| multiattach                    | False                                |
| name                           | vol4                                 |
| os-vol-host-attr:host          | hostgroup@tripleo_ceph#tripleo_ceph  |
| os-vol-mig-status-attr:migstat | None                                 |
| os-vol-mig-status-attr:name_id | None                                 |
| os-vol-tenant-attr:tenant_id   | acb89238386848d5bb4502fc035f9d44     |
| replication_status             | None                                 |
| size                           | 5                                    |
| snapshot_id                    | None                                 |
| source_volid                   | d30c3708-bf6a-4ccc-beb7-cbd31a4cc010 |
| status                         | creating                             |
| updated_at                     | 2020-06-08T12:52:24.000000           |
| user_id                        | 20fd2d35a42c422f94edd8bf03893198     |
| volume_type                    | tripleo                              |
+--------------------------------+--------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID                                   | Status    | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| 770126f0-b4fb-4df6-bfd8-2382b96e1e4e | available | vol4 | 5    | tripleo     | true     |             |
| ab67da11-40ca-4bc4-b987-ad28afed023e | available | vol2 | 5    | tripleo     | true     |             |
| c026e66e-68f0-43ed-95ee-d34db7aca3ec | available | vol1 | 5    | tripleo     | true     |             |
| d30c3708-bf6a-4ccc-beb7-cbd31a4cc010 | available | vol3 | 5    | tripleo     | true     |             |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+

Yay verified as working, 
no c-vol crash and we have 4 available volumes, each one created from it's predecessor. 
I wounder lets try vol5 just cause I'm already here

(overcloud) [stack@undercloud-0 ~]$ cinder create 5 --source-volid 770126f0-b4fb-4df6-bfd8-2382b96e1e4e --name vol5
+--------------------------------+--------------------------------------+
| Property                       | Value                                |
+--------------------------------+--------------------------------------+
| attachments                    | []                                   |
| availability_zone              | nova                                 |
| bootable                       | true                                 |
| consistencygroup_id            | None                                 |
| created_at                     | 2020-06-08T12:53:23.000000           |
| description                    | None                                 |
| encrypted                      | False                                |
| id                             | 71bee8af-ba5c-45d2-8ac2-eaa062e6bc61 |
| metadata                       | {}                                   |
| migration_status               | None                                 |
| multiattach                    | False                                |
| name                           | vol5                                 |
| os-vol-host-attr:host          | hostgroup@tripleo_ceph#tripleo_ceph  |
| os-vol-mig-status-attr:migstat | None                                 |
| os-vol-mig-status-attr:name_id | None                                 |
| os-vol-tenant-attr:tenant_id   | acb89238386848d5bb4502fc035f9d44     |
| replication_status             | None                                 |
| size                           | 5                                    |
| snapshot_id                    | None                                 |
| source_volid                   | 770126f0-b4fb-4df6-bfd8-2382b96e1e4e |
| status                         | creating                             |
| updated_at                     | 2020-06-08T12:53:23.000000           |
| user_id                        | 20fd2d35a42c422f94edd8bf03893198     |
| volume_type                    | tripleo                              |
+--------------------------------+--------------------------------------+

Vol5 from vol4 is also available. 
(overcloud) [stack@undercloud-0 ~]$ cinder list
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| ID                                   | Status    | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+
| 71bee8af-ba5c-45d2-8ac2-eaa062e6bc61 | available | vol5 | 5    | tripleo     | true     |             |
| 770126f0-b4fb-4df6-bfd8-2382b96e1e4e | available | vol4 | 5    | tripleo     | true     |             |
| ab67da11-40ca-4bc4-b987-ad28afed023e | available | vol2 | 5    | tripleo     | true     |             |
| c026e66e-68f0-43ed-95ee-d34db7aca3ec | available | vol1 | 5    | tripleo     | true     |             |
| d30c3708-bf6a-4ccc-beb7-cbd31a4cc010 | available | vol3 | 5    | tripleo     | true     |             |
+--------------------------------------+-----------+------+------+-------------+----------+-------------+

Comment 27 errata-xmlrpc 2020-06-24 11:51:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2722