Bug 1894303 - When flattening RBD images, cinder-volume losses connectivity to rabbitmq due to client heartbeat timeout
Summary: When flattening RBD images, cinder-volume losses connectivity to rabbitmq due...
Keywords:
Status: CLOSED DUPLICATE of bug 1888011
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z14
: 13.0 (Queens)
Assignee: Cinder Bugs List
QA Contact: Tzach Shefi
Chuck Copello
URL:
Whiteboard:
Depends On: 1885734 1888011
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-03 23:40 UTC by Takashi Kajinami
Modified: 2020-11-04 11:51 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1885734
Environment:
Last Closed: 2020-11-03 23:41:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Takashi Kajinami 2020-11-03 23:40:23 UTC
+++ This bug was initially created as a clone of Bug #1885734 +++

Description of problem:

When flattening RBD images, cinder-volume losses connectivity to rabbitmq due to client heartbeat timeout and subsequent volume creation appears to be failing:


2020-10-06 15:15:50.746 55 DEBUG cinder.volume.drivers.rbd [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:15:50.846 55 DEBUG cinder.volume.drivers.rbd [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] creating snapshot='volume-de8d188a-ebcb-422d-b76d-3fad2e0d3b72.clone_snap' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:588
2020-10-06 15:15:51.585 55 DEBUG cinder.volume.drivers.rbd [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] cloning 'volume-d5b6f370-11b9-4561-b8e6-ec40bf049cca' to 'volume-de8d188a-ebcb-422d-b76d-3fad2e0d3b72' create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:597
2020-10-06 15:15:52.003 55 DEBUG cinder.volume.drivers.rbd [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Volume volume-eb0c640e-f22e-4b49-8675-cff8bc39ea26.deleted is not a clone. _get_clone_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:972
2020-10-06 15:15:52.005 55 INFO cinder.volume.drivers.rbd [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] maximum clone depth (5) has been reached - flattening dest volume
2020-10-06 15:15:52.091 55 DEBUG cinder.volume.drivers.rbd [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] flattening dest volume volume-de8d188a-ebcb-422d-b76d-3fad2e0d3b72 create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:625
2020-10-06 15:15:55.586 70 DEBUG cinder.volume.drivers.rbd [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:15:55.619 70 DEBUG cinder.volume.drivers.rbd [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:15:55.656 70 DEBUG cinder.manager [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:194
2020-10-06 15:16:41.979 110 DEBUG cinder.volume.drivers.rbd [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:16:42.017 110 DEBUG cinder.volume.drivers.rbd [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:16:42.052 110 DEBUG cinder.manager [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:194
2020-10-06 15:16:55.591 70 DEBUG cinder.volume.drivers.rbd [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:16:55.623 70 DEBUG cinder.volume.drivers.rbd [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:16:55.661 70 DEBUG cinder.manager [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:194
2020-10-06 15:17:41.976 110 DEBUG cinder.volume.drivers.rbd [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:17:42.012 110 DEBUG cinder.volume.drivers.rbd [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:17:42.047 110 DEBUG cinder.manager [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:194
2020-10-06 15:17:55.586 70 DEBUG cinder.volume.drivers.rbd [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:17:55.623 70 DEBUG cinder.volume.drivers.rbd [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:17:55.666 70 DEBUG cinder.manager [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:194
2020-10-06 15:18:41.981 110 DEBUG cinder.volume.drivers.rbd [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:18:42.016 110 DEBUG cinder.volume.drivers.rbd [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:18:42.049 110 DEBUG cinder.manager [req-2e9c0e29-42cd-4d18-b0a3-f8e0517fa367 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:194
2020-10-06 15:18:55.591 70 DEBUG cinder.volume.drivers.rbd [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:18:55.624 70 DEBUG cinder.volume.drivers.rbd [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-06 15:18:55.657 70 DEBUG cinder.manager [req-06296211-f754-401c-8cdf-c6b65fe61d25 - - - - -] Notifying Schedulers of capabilities ... _publish_service_capabilities /usr/lib/python2.7/site-packages/cinder/manager.py:194
2020-10-06 15:19:05.575 55 DEBUG cinder.volume.drivers.rbd [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] remove temporary snap volume-de8d188a-ebcb-422d-b76d-3fad2e0d3b72.clone_snap create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:640
2020-10-06 15:19:05.688 55 DEBUG cinder.volume.drivers.rbd [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] clone created successfully create_cloned_volume /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:666
2020-10-06 15:19:05.690 55 DEBUG cinder.volume.manager [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeFromSpecTask;volume:create' (5b07b48e-e622-4603-b45b-81ac5099ea67) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'status': u'creating', 'volume_size': 200, 'volume_name': u'volume-de8d188a-ebcb-422d-b76d-3fad2e0d3b72', 'source_volid': u'd5b6f370-11b9-4561-b8e6-ec40bf049cca', 'source_volstatus': u'in-use'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183
2020-10-06 15:19:05.691 55 DEBUG cinder.volume.manager [req-2bb6029d-be03-4aab-9a0e-95cec31c4dd8 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeOnFinishTask;volume:create, create.end' (3addaf14-ccd7-4889-8dc0-1b2b4d880911) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194
2020-10-06 15:19:05.694 55 DEBUG oslo.messaging._drivers.impl_rabbit [-] [a2e721c1-7e9d-42d4-af9c-df2f9c2c7322] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:765
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 494, in _ensured
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, **kwargs)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 570, in __call__
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, channel=channels[0], **kwargs), channels[0]
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 816, in execute_method
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     method()
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1103, in _consume
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     self._heartbeat_check()
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 969, in _heartbeat_check
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     self.connection.heartbeat_check(rate=self.heartbeat_rate)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 290, in heartbeat_check
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     return self.transport.heartbeat_check(self.connection, rate=rate)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 149, in heartbeat_check
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     return connection.heartbeat_tick(rate=rate)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 718, in heartbeat_tick
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     self.send_heartbeat()
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 669, in send_heartbeat
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     self.frame_writer(8, 0, None, None, None)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 166, in write_frame
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     write(view[:offset])
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 275, in write
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     self._write(s)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 390, in sendall
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     tail = self.send(data, flags)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 384, in send
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     return self._send_loop(self.fd.send, data, flags)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 371, in _send_loop
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit     return send_method(data, *args)
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2020-10-06 15:19:05.694 55 ERROR oslo.messaging._drivers.impl_rabbit 
2020-10-06 15:19:05.696 55 DEBUG oslo.messaging._drivers.impl_rabbit [-] [a2e721c1-7e9d-42d4-af9c-df2f9c2c7322] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:765
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 494, in _ensured
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, **kwargs)
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 570, in __call__
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, channel=channels[0], **kwargs), channels[0]
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 816, in execute_method
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit     method()
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1088, in _consume
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit     raise self.connection.recoverable_connection_errors[0]
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit RecoverableConnectionError: <AMQPError: unknown error>
2020-10-06 15:19:05.696 55 ERROR oslo.messaging._drivers.impl_rabbit 
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit [-] [a2e721c1-7e9d-42d4-af9c-df2f9c2c7322] AMQP server on overcloud-controller-1:5672 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds.: RecoverableConnectionError: <AMQPError: unknown error>
2020-10-06 15:19:05.697 55 DEBUG oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] Received recoverable error from kombu: on_error /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:765
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 494, in _ensured
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, **kwargs)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 570, in __call__
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     return fun(*args, channel=channels[0], **kwargs), channels[0]
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 816, in execute_method
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     method()
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1103, in _consume
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     self._heartbeat_check()
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 969, in _heartbeat_check
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     self.connection.heartbeat_check(rate=self.heartbeat_rate)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 290, in heartbeat_check
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     return self.transport.heartbeat_check(self.connection, rate=rate)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 149, in heartbeat_check
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     return connection.heartbeat_tick(rate=rate)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 718, in heartbeat_tick
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     self.send_heartbeat()
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 669, in send_heartbeat
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     self.frame_writer(8, 0, None, None, None)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 166, in write_frame
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     write(view[:offset])
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/amqp/transport.py", line 275, in write
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     self._write(s)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 390, in sendall
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     tail = self.send(data, flags)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 384, in send
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     return self._send_loop(self.fd.send, data, flags)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 371, in _send_loop
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit     return send_method(data, *args)
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
2020-10-06 15:19:05.697 55 ERROR oslo.messaging._drivers.impl_rabbit 
2020-10-06 15:19:05.698 55 ERROR oslo.messaging._drivers.impl_rabbit [-] [80610839-4914-4df7-b1ab-d5d0efcc2fae] AMQP server on overcloud-controller-0:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer



rabbitmq logs:
~~~
These are in the rabbitmq logs

=ERROR REPORT==== 6-Oct-2020::16:55:41 ===
closing AMQP connection <0.20256.0> (10.10.10.10:39952 -> 10.10.10.11:5672 - cinder-volume:55:b8e9ab46-ab18-45e6-835d-78aa1756bd2e):
missed heartbeats from client, timeout: 60s
~~~

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from David Hill on 2020-10-06 20:14:53 UTC ---

Red Hat OpenStack Platform release 13.0.12 (Queens)

--- Additional comment from Takashi Kajinami on 2020-10-07 01:06:30 UTC ---

One suspicious thing I found during code investigation is that there are no tpool.Proxy used with the flatten call in create_cloned_volume.
Do we need to change the call so that it use tpool.Proxy like the following example ?
 https://review.opendev.org/#/c/756416/1/cinder/volume/drivers/rbd.py

I found that the flatten method called by create_volume_from_snapshot DOES use that tpool.Proxy, so they might be able to workaround the issue
by
 - enable rbd_flatten_volume_from_snapshot in cinder.conf (*)
 - Create a snapshot from the source volume
 - Create a new volume from the snapshot

(*) I don't know whether this is actually required, but I'll enable this to avoid any dependency problems in the future,
    and also see whether proxied flatten causes the problem or not.

--- Additional comment from David Hill on 2020-10-07 13:33:58 UTC ---

Customer says that your patch [1] helped !

[1] https://review.opendev.org/#/c/756416

--- Additional comment from David Hill on 2020-10-07 14:13:55 UTC ---

Nevermind, customer didn't test it after all.

--- Additional comment from Takashi Kajinami on 2020-10-07 17:26:35 UTC ---

According to the comments on the case, it seems that volume clone using snapshot, which I wrote in comment:2 worked as a workaround.

The newly attached cinder-volume.log shows that the volume with id 7765c477-2c9a-4389-8276-740863b6f7c7 was created from a snapshot and its rbd image was flatten, because rbd_flatten_volume_from_spapshot is enabled.
~~~
2020-10-07 12:23:21.883 55 DEBUG cinder.volume.flows.manager.create_volume [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Volume reschedule parameters: True retry: {u'num_attempts': 1, u'backends': [u'hostgroup@tripleo_ceph#tripleo_ceph'], u'hosts': [u'hostgroup@tripleo_ceph#tripleo_ceph']} get_flow /usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py:1168
2020-10-07 12:23:21.886 55 WARNING stevedore.named [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Could not load cinder.keymgr.conf_key_mgr.ConfKeyManager
2020-10-07 12:23:21.887 55 WARNING castellan.key_manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Deprecation Warning : cinder.keymgr.conf_key_mgr.ConfKeyManager is not a stevedore based driver, trying to load it as a class: NoMatches: No 'castellan.drivers' driver found, looking for 'cinder.keymgr.conf_key_mgr.ConfKeyManager'
2020-10-07 12:23:21.893 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Flow 'volume_create_manager' (363a3610-a836-4e5c-a4b6-89b56052ee19) transitioned into state 'RUNNING' from state 'PENDING' _flow_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:145
2020-10-07 12:23:21.895 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeRefTask;volume:create' (10fb5c21-7a29-4426-9dc4-07850c030590) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194
2020-10-07 12:23:21.921 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeRefTask;volume:create' (10fb5c21-7a29-4426-9dc4-07850c030590) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'Volume(_name_id=None,admin_metadata={},attach_status='detached',availability_zone='nova',bootable=False,cluster=<?>,cluster_name=None,consistencygroup=<?>,consistencygroup_id=None,created_at=2020-10-07T12:23:21Z,deleted=False,deleted_at=None,display_description=None,display_name='test-volume',ec2_id=None,encryption_key_id=None,glance_metadata=<?>,group=<?>,group_id=None,host='hostgroup@tripleo_ceph#tripleo_ceph',id=7765c477-2c9a-4389-8276-740863b6f7c7,launched_at=None,metadata={},migration_status=None,multiattach=False,previous_status=None,project_id='8ecf6e04b50e4954832784c3c2794d2a',provider_auth=None,provider_geometry=None,provider_id=None,provider_location=None,replication_driver_data=None,replication_extended_status=None,replication_status=None,scheduled_at=2020-10-07T12:23:21Z,service_uuid=None,shared_targets=True,size=200,snapshot_id=b12ed9cb-3134-4c44-ae73-c537e17adc19,snapshots=<?>,source_volid=None,status='creating',terminated_at=None,updated_at=2020-10-07T12:23:21Z,user_id='e54fa6be51954cf9b0b2863023d69b27',volume_attachment=<?>,volume_type=VolumeType(3817c59b-9537-4317-94dc-ed81f51a6c6e),volume_type_id=3817c59b-9537-4317-94dc-ed81f51a6c6e)' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183
2020-10-07 12:23:21.923 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.OnFailureRescheduleTask;volume:create' (52672f6e-ceba-4c28-a3ad-cfd4ae69494f) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194
2020-10-07 12:23:21.924 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.OnFailureRescheduleTask;volume:create' (52672f6e-ceba-4c28-a3ad-cfd4ae69494f) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'None' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183
2020-10-07 12:23:21.925 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeSpecTask;volume:create' (6c6acda2-877c-44ae-a48c-92ab8760be94) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194
2020-10-07 12:23:21.926 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.ExtractVolumeSpecTask;volume:create' (6c6acda2-877c-44ae-a48c-92ab8760be94) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'status': u'creating', 'volume_size': 200, 'snapshot_id': u'b12ed9cb-3134-4c44-ae73-c537e17adc19', 'volume_name': u'volume-7765c477-2c9a-4389-8276-740863b6f7c7', 'volume_id': u'7765c477-2c9a-4389-8276-740863b6f7c7', 'type': 'snap'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183
2020-10-07 12:23:21.928 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.NotifyVolumeActionTask;volume:create, create.start' (9f15df6c-5642-444c-ae89-62db0640c7b4) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194
2020-10-07 12:23:21.929 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.NotifyVolumeActionTask;volume:create, create.start' (9f15df6c-5642-444c-ae89-62db0640c7b4) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'None' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183
2020-10-07 12:23:21.930 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeFromSpecTask;volume:create' (dc9c5eb0-0721-4d0d-bb5c-9bbbd177a7f2) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194
2020-10-07 12:23:21.930 55 INFO cinder.volume.flows.manager.create_volume [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Volume 7765c477-2c9a-4389-8276-740863b6f7c7: being created as snap with specification: {'status': u'creating', 'volume_size': 200, 'snapshot_id': u'b12ed9cb-3134-4c44-ae73-c537e17adc19', 'volume_name': u'volume-7765c477-2c9a-4389-8276-740863b6f7c7'}
2020-10-07 12:23:21.968 55 DEBUG cinder.volume.drivers.rbd [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] cloning volumes/volume-d5b6f370-11b9-4561-b8e6-ec40bf049cca@snapshot-b12ed9cb-3134-4c44-ae73-c537e17adc19 to volume-7765c477-2c9a-4389-8276-740863b6f7c7 _clone /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:900
2020-10-07 12:23:21.968 55 DEBUG cinder.volume.drivers.rbd [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-07 12:23:21.999 55 DEBUG cinder.volume.drivers.rbd [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-07 12:23:22.192 55 DEBUG cinder.volume.drivers.rbd [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] flattening volumes/volume-7765c477-2c9a-4389-8276-740863b6f7c7 _flatten /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:893
2020-10-07 12:23:22.192 55 DEBUG cinder.volume.drivers.rbd [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-07 12:26:36.200 55 DEBUG cinder.volume.drivers.rbd [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] connecting to ceph (timeout=-1). _do_conn /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:337
2020-10-07 12:26:36.276 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeFromSpecTask;volume:create' (dc9c5eb0-0721-4d0d-bb5c-9bbbd177a7f2) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'status': u'creating', 'volume_size': 200, 'snapshot_id': u'b12ed9cb-3134-4c44-ae73-c537e17adc19', 'volume_name': u'volume-7765c477-2c9a-4389-8276-740863b6f7c7'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183
2020-10-07 12:26:36.277 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeOnFinishTask;volume:create, create.end' (99a9f7fe-fbea-481c-8aaf-7f4a3e4be239) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194
2020-10-07 12:26:36.281 55 INFO cinder.volume.flows.manager.create_volume [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Volume volume-7765c477-2c9a-4389-8276-740863b6f7c7 (7765c477-2c9a-4389-8276-740863b6f7c7): created successfully
2020-10-07 12:26:36.282 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Task 'cinder.volume.flows.manager.create_volume.CreateVolumeOnFinishTask;volume:create, create.end' (99a9f7fe-fbea-481c-8aaf-7f4a3e4be239) transitioned into state 'SUCCESS' from state 'RUNNING' with result 'None' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183
2020-10-07 12:26:36.283 55 DEBUG cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Flow 'volume_create_manager' (363a3610-a836-4e5c-a4b6-89b56052ee19) transitioned into state 'SUCCESS' from state 'RUNNING' _flow_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:145
2020-10-07 12:26:36.287 55 INFO cinder.volume.manager [req-69b31e34-bde7-4717-900c-db88614c8f86 e54fa6be51954cf9b0b2863023d69b27 8ecf6e04b50e4954832784c3c2794d2a - default default] Created volume successfully.
~~~

However that flatten call is invoked via tpool.Proxy this time, and I think that would be the reason why they didn't observe heartbeat loss this time.
 https://github.com/openstack/cinder/blob/stable/queens/cinder/volume/drivers/rbd.py#L842
 https://github.com/openstack/cinder/blob/stable/queens/cinder/volume/drivers/rbd.py#L800

When we clone volume with rbd backend, cinder internally creates a snapshot and then create a cloned volume from the snapshot, and flatten the volume if needed[1].
So the volume created from snapshot should look almost the same as the one created by clone, from resource perspective.
 [1] https://github.com/openstack/cinder/blob/stable/queens/cinder/volume/drivers/rbd.py#L612

--- Additional comment from Takashi Kajinami on 2020-10-07 17:43:18 UTC ---

The same issue with volume-from-snapshot has been reported in another bug[1] and fixed by the same strategy (using tpool.Proxy)[2] a while ago, and this shows that we need the same fix for the flatten call in create_cloned_volume... 

 [1] https://bugs.launchpad.net/cinder/+bug/1658037
 [2] https://review.opendev.org/#/c/423184/

--- Additional comment from RHEL Program Management on 2020-10-13 19:57:16 UTC ---

This bugzilla has been removed from the release since it  does not have an acked release flag. For details, see https://mojo.redhat.com/docs/DOC-1144661#jive_content_id_OSP_Release_Planning.'

--- Additional comment from David Hill on 2020-10-14 19:10:56 UTC ---

May we expedite the process for this patch and have a hotfix as fast as possible ?  Customer would like to have a hotfix *now* ... I explained him how it worked but he doesn't accept this explanation.

--- Additional comment from David Hill on 2020-10-15 14:32:41 UTC ---

https://review.opendev.org/#/c/758003/ => victoria

--- Additional comment from David Hill on 2020-10-15 14:34:49 UTC ---

https://review.opendev.org/#/c/758440/ => ussuri

--- Additional comment from Freddy Wissing on 2020-10-19 19:56:45 UTC ---

Hi guys, 

Need an additional look here at what's going on.

The customer installed the packages according to:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/transitioning_to_containerized_services/obtaining-container-images#modifying-container-images-temp

Verification:


[1]
[root@njss1-ospdq-inf-ctrl2 ~]# docker exec -it openstack-cinder-volume-docker-0 /bin/bash
()[root@njss1-ospdq-inf-ctrl2 /]# rpm -q openstack-cinder
openstack-cinder-12.0.10-20.el7ost.noarch
()[root@njss1-ospdq-inf-ctrl2 /]# rpm -q python-cinder
python-cinder-12.0.10-20.el7ost.noarch

Test:

[2]
openstack volume create --source d5b6f370-11b9-4561-b8e6-ec40bf049cca --type volumes test

Result:

[3]
(overcloud) [stack@njss1-ospdq01a-inf ~]$ openstack volume list
+--------------------------------------+------+----------+------+-------------+
| ID                                   | Name | Status   | Size | Attached to |
+--------------------------------------+------+----------+------+-------------+
| a0f1aea2-7ae0-4066-820f-b9ac8a6c0c39 | test | creating |  200 |             |
+--------------------------------------+------+----------+------+-------------+
(overcloud) [stack@njss1-ospdq01a-inf ~]$ openstack volume service list 
+------------------+--------------------------------------+------+---------+-------+----------------------------+
| Binary           | Host                                 | Zone | Status  | State | Updated At                 |
+------------------+--------------------------------------+------+---------+-------+----------------------------+
| cinder-scheduler | njss1-ospdq-inf-ctrl1.inf.fndlsb.net | nova | enabled | up    | 2020-10-19T18:24:34.000000 |
| cinder-scheduler | njss1-ospdq-inf-ctrl2.inf.fndlsb.net | nova | enabled | up    | 2020-10-19T18:24:40.000000 |
| cinder-scheduler | njss1-ospdq-inf-ctrl0.inf.fndlsb.net | nova | enabled | up    | 2020-10-19T18:24:42.000000 |
| cinder-volume    | hostgroup@tripleo_ceph               | nova | enabled | down  | 2020-10-19T18:15:01.000000 |
| cinder-volume    | hostgroup@tripleo_ceph_vms           | nova | enabled | up    | 2020-10-19T18:24:41.000000 |
| cinder-volume    | hostgroup@tripleo_ceph_vmsroot	  | nova | enabled | up    | 2020-10-19T18:24:42.000000 |
+------------------+--------------------------------------+------+---------+-------+----------------------------+

Thank you everyone,

/Freddy

--- Additional comment from Robert Sunog on 2020-10-20 11:55:59 UTC ---

Hello Team,

I'm Robert, Senior Escalation Manager, involved here after today's escalation.

I marked "Customer escalation" flag here as Customer claims hot-fix made the situation worse and needs urgent assistance.
Support team will further review sosreports and post here as needed.

In the meantime any help prioritizing this bug is appreciated as Customer needs to understand whether they should roll-back the fix and how we can provide them with permanent solution. They have no workaround available.



Regards,

Robert Sunog
Senior Escalation Manager
Customer Experience & Engagement
Red Hat
rsunog
irc: rsunog 
Mobile: +420-702-222-497
Office: +420-532-294-241

--- Additional comment from Eric Harney on 2020-10-20 12:03:21 UTC ---

Please attach the relevant log files to the BZ showing the failures after the hotfix was applied.

--- Additional comment from  on 2020-10-20 12:06:20 UTC ---



--- Additional comment from  on 2020-10-20 12:07:09 UTC ---

@Eric I have uploaded the cinder-volume files I just got from the client.

--- Additional comment from  on 2020-10-20 12:08:36 UTC ---

0060-sosreport-njss1-ospdq-inf-ctrl2-2020-10-19-njehicr.tar.xz/sosreport-njss1-ospdq-inf-ctrl2-2020-10-19-njehicr/etc/cinder/cinder.conf

# Maximum number of nested volume clones that are taken before a flatten
# occurs. Set to 0 to disable cloning. (integer value)
#rbd_max_clone_depth = 5

I don't understand why we have a default value of 1:
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager Traceback (most recent call last):
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     result = task.execute(**arguments)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 1034, in execute
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     context, volume, **volume_spec)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 492, in _create_from_source_volume
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     model_update = self.driver.create_cloned_volume(volume, srcvol_ref)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 614, in create_cloned_volume
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     depth = self._get_clone_depth(client, src_name)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 548, in _get_clone_depth
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     return self._get_clone_depth(client, parent, depth + 1)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 548, in _get_clone_depth
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     return self._get_clone_depth(client, parent, depth + 1)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 546, in _get_clone_depth
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     (self.configuration.rbd_max_clone_depth))
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager Exception: clone depth exceeds limit of 1

--- Additional comment from Takashi Kajinami on 2020-10-20 12:09:28 UTC ---

Is the latest failure really related to the patch ?

I see that customer raised another case 02780999 and they mentioned that they have changed rbd_max_clone_depth to 1.
The traceback pasted in the case indicates that this rbd_max_clone_depth=1 caused that error.

IMO we should know which value was described in cinder.conf when they observed that error.
~~~
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager Traceback (most recent call last):
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     result = task.execute(**arguments)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 1034, in execute
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     context, volume, **volume_spec)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/flows/manager/create_volume.py", line 492, in _create_from_source_volume
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     model_update = self.driver.create_cloned_volume(volume, srcvol_ref)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 614, in create_cloned_volume
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     depth = self._get_clone_depth(client, src_name)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 548, in _get_clone_depth
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     return self._get_clone_depth(client, parent, depth + 1)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 548, in _get_clone_depth
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     return self._get_clone_depth(client, parent, depth + 1)
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager   File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 546, in _get_clone_depth
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager     (self.configuration.rbd_max_clone_depth))
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager Exception: clone depth exceeds limit of 1
2020-10-20 11:22:33.012 55 ERROR cinder.volume.manager 
~~~

--- Additional comment from  on 2020-10-20 12:11:59 UTC ---

Sorry I looked at the wrong place:

0060-sosreport-njss1-ospdq-inf-ctrl2-2020-10-19-njehicr.tar.xz/sosreport-njss1-ospdq-inf-ctrl2-2020-10-19-njehicr/var/lib/config-data/puppet-generated/cinder/etc/cinder/cinder.conf:rbd_max_clone_depth=1

So they indeed put 1 here.

--- Additional comment from Freddy Wissing on 2020-10-21 14:29:48 UTC ---

Thank you guys! 

Should the customer be instructed to put the clone depth back to five in this case?  

And I want to confirm an understanding of this operation.

The default here is 5.  

The customer set it to 1, believing that the clones here are consuming a large amount of disk space.

This of course breaks the ability of it to flatten volumes created with a higher depth as noted.

But would doing so actually INCREASE the amount of space consumption in the ceph environment?  

It seems that it ultimately would, but would like to confirm.

--- Additional comment from errata-xmlrpc on 2020-10-29 15:35:05 UTC ---

This bug has been added to advisory RHBA-2020:65304 by Miguel Garcia Cruces (mgarciac)

--- Additional comment from errata-xmlrpc on 2020-10-29 15:35:06 UTC ---

Advisory RHBA-2020:65304 expected publication date changed from 2018-04-01 to 2020-10-21

--- Additional comment from errata-xmlrpc on 2020-10-30 13:51:29 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2020:65304-01
https://errata.devel.redhat.com/advisory/65304

--- Additional comment from Miguel Garcia on 2020-10-30 13:56:24 UTC ---

Moving to MODIFIED as not all fixed-in-versions are present in 2020-10-06.2

Comment 2 Takashi Kajinami 2020-11-03 23:41:46 UTC
Sorry, we already have a cloned bug for 16.1...

*** This bug has been marked as a duplicate of bug 1888011 ***


Note You need to log in before you can comment on or make changes to this bug.