Bug 1523358

Summary: cinder-volume is not initializing the RBD driver properly
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: openstack-cinderAssignee: Gorka Eguileor <geguileo>
Status: CLOSED ERRATA QA Contact: Avi Avraham <aavraham>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 11.0 (Ocata)CC: eharney, geguileo, jobernar, mschuppe, pgrist, pmorey, skinjo, srevivo, ssigwald, tshefi
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-10.0.6-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-13 16:29:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2017-12-07 19:51:39 UTC
Description of problem:
cinder-volume is not initializing the RBD driver properly and fails to report itself as UP.


2017-12-07 19:56:03.315 266295 DEBUG cinder.volume.drivers.rbd [req-f2384250-37a3-4551-9d7a-67563c0b7286 - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 19:56:03.387 266295 DEBUG cinder.volume.drivers.rbd [req-f2384250-37a3-4551-9d7a-67563c0b7286 - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 19:56:17.760 266295 DEBUG cinder.volume.drivers.rbd [req-f2384250-37a3-4551-9d7a-67563c0b7286 - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 19:56:21.015 266295 DEBUG cinder.volume.drivers.rbd [req-f2384250-37a3-4551-9d7a-67563c0b7286 - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 19:56:33.169 266295 DEBUG cinder.volume.drivers.rbd [req-f2384250-37a3-4551-9d7a-67563c0b7286 - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300





  File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 722, in run_service
    service.start()
  File "/usr/lib/python2.7/site-packages/cinder/service.py", line 241, in start
    service_id=Service.service_id)
  File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 442, in init_host
    self.driver.init_capabilities()
  File "/usr/lib/python2.7/site-packages/cinder/volume/driver.py", line 739, in init_capabilities
    stats = self.get_volume_stats(True)
  File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 427, in get_volume_stats
    self._update_volume_stats()
  File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 413, in _update_volume_stats
    self._get_usage_info()
  File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 368, in _get_usage_info
    with RBDVolumeProxy(self, t, read_only=True) as v:
  File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 111, in __init__
    client, ioctx = driver._connect_to_rados(pool, remote, timeout)
  File "/usr/lib/python2.7/site-packages/cinder/utils.py", line 796, in _wrapper
    return r.call(f, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py", line 313, in _connect_to_rados
    traceback.print_stack()





Every 2.0s: cinder-manage service list                                                                                                                                    Thu Dec  7 21:48:25 2017

Option "logdir" from group "DEFAULT" is deprecated. Use option "log-dir" from group "DEFAULT".
Binary           Host                                 Zone             Status     State Updated At           RPC Version  Object Version  Cluster
cinder-scheduler hostgroup                            nova             enabled    :-)   2017-12-07 19:48:19  3.5          1.21
cinder-scheduler hostgroup                            nova             enabled    XXX   2017-11-16 11:40:37  3.5          1.21
cinder-volume    hostgroup@ceph                       nova             enabled    XXX   2017-12-07 19:47:18  3.10         1.21

Version-Release number of selected component (if applicable):


How reproducible:
This deployment

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 David Hill 2017-12-07 20:00:44 UTC
We might be hitting this bug : https://bugs.launchpad.net/cinder/+bug/1649956

Comment 2 David Hill 2017-12-07 20:12:09 UTC
[root@server cinder]# time rados ls --pool 'volumes'  | wc -l
848404

real    0m31.634s
user    0m0.978s
sys     0m1.683s

Comment 3 David Hill 2017-12-07 20:32:59 UTC
I've added some debugging information in rbd.py and it looks like it's iterating the volumes really really really slowly:

2017-12-07 22:30:06.638 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 22:30:06.725 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF volume-029d8bad-aa48-4aef-8a66-e880418f5eb3 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:365
2017-12-07 22:30:06.725 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 22:30:21.421 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF 27917287424 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:371
2017-12-07 22:30:21.453 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF volume-060ae761-e35e-4f40-8553-042dcbb30239 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:365
2017-12-07 22:30:21.454 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 22:30:23.865 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF 5368709120 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:371
2017-12-07 22:30:23.893 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF volume-0659b956-d3a7-44c5-be8c-855d3e5bdc6c _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:365
2017-12-07 22:30:23.894 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 22:30:33.468 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF 27917287424 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:371
2017-12-07 22:30:33.497 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF volume-0960c137-5129-4806-be5e-a7e64598efad _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:365
2017-12-07 22:30:33.498 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 22:30:48.295 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF 45097156608 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:371
2017-12-07 22:30:48.322 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF volume-0a425dec-7c34-4219-89f9-dd543bab2e32 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:365
2017-12-07 22:30:48.322 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 22:31:01.564 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF 34359738368 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:371
2017-12-07 22:31:01.596 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF volume-0ad5a30c-821d-45f5-be66-b142f4602124 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:365
2017-12-07 22:31:01.597 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300
2017-12-07 22:31:06.305 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF 10737418240 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:371
2017-12-07 22:31:06.338 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] ASDF volume-0b3a54c6-5681-402a-a044-50a4f5bb9c97 _get_usage_info /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:365
2017-12-07 22:31:06.338 343617 DEBUG cinder.volume.drivers.rbd [req-16d80259-2e52-452c-ac5e-6fd448c3545b - - - - -] connecting to ceph (timeout=-1). _connect_to_rados /usr/lib/python2.7/site-packages/cinder/volume/drivers/rbd.py:300

Comment 5 Shinobu KINJO 2017-12-08 00:19:36 UTC
If there is no networking issue or no configuration issue, "rbd" python module should be able to connect RADOS cluster via "rados" python module.

Comment 7 Martin Schuppert 2017-12-08 10:56:12 UTC
The slow cinder-volume start is https://bugs.launchpad.net/cinder/+bug/1649956. 

The patch from
 https://review.openstack.org/#/c/508455/1/cinder/volume/drivers/rbd.py has fixed the slow cinder-volume start.
Before it took depending on the number of volumes between 10 and 30 minutes for the service to come up. Now it is just seconds. 

Also volume create + delete work.

I see https://review.openstack.org/#/c/508455 got abandoned , not sure why. Is it in favor of the patch mentioned in https://bugs.launchpad.net/cinder/+bug/1649956/comments/9 ?

Comment 15 Tzach Shefi 2018-01-29 08:43:15 UTC
Verified on 
openstack-cinder-10.0.6-9.el7ost.noarch

Created two Cinder ceph backed volumes 10G 20G, filled them up with random data. 
The external ceph cluster used also had additional volumes, not related to this deployment. 

Restarted controller twice, the only thing I noticed was that by the time I ran 
cinder-manage service list
cinder-scheduler hostgroup     nova  enabled    XXX 

Scheduler would still be down, but once I reran cinder-manage service list again it too would be up :-) as well. 


cinder get-pools --detail shows backend stats are updated/correct.

Comment 18 errata-xmlrpc 2018-02-13 16:29:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0306