Bug 1273194 - Cinder cannot create volumes after Ceph packages are updated
Summary: Cinder cannot create volumes after Ceph packages are updated
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 5.0 (RHEL 6)
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: 5.0 (RHEL 6)
Assignee: Jon Bernard
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-19 22:02 UTC by nalmond
Modified: 2019-09-12 09:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-07 13:31:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description nalmond 2015-10-19 22:02:54 UTC
After updating these Ceph packages:

ceph-common
librados2
librbd1
python-rados
python-rbd

..from version 1:0.80.8-5.el6cp to 1:0.80.8.15.el6cp, Cinder can no longer create Volumes and gives a traceback in volumes.log with this error message:

OSError: /usr/lib64/librbd.so.1: undefined symbol: _ZNK14SimpleThrottle13pending_error

A yum downgrade/rollback fixes this issue.

Comment 2 Jon Bernard 2015-11-05 16:23:15 UTC
An unresolved symbol in any dynamically linked library suggests a packaging bug.  If librbd cannot be loaded as a result, any user (cinder in this case) will fail.  We need to look closer at librbd packaging for that particular version.

Comment 3 Sergey Gotliv 2015-11-09 10:46:08 UTC
Josh,

I guess we have reassign it to Ceph.

Comment 4 Josh Durgin 2015-11-10 02:20:23 UTC
There are internals in librbd and librados like this that are accidentally exposed in firefly. These internal ABIs are not stable, so this kind of problem occurs when mismatched versions are loaded.

Since cinder may have the old version of librados in memory, then try loading the new version librbd, this sort of error can happen.

These internal symbols are not exported in hammer (downstream 1.3.0), and for upgrades like this of older versions we may need to document a workaround, i.e. restart cinder-volume (and nova-compute if using rbd for ephemeral disks) after upgrading librbd.

Other librbd users like qemu are much less likely to be affected since they only open librbd/librados once, at start up. The python bindings are effectively using dlopen(), so there are larger windows during which a conflict can arise as packages are installed, and cinder-volume or nova-compute re-load new versions of the libraries.

Comment 5 nalmond 2015-12-01 16:19:43 UTC
Is there a documented workaround for this, or will it be addressed in a later version?

Comment 6 Josh Durgin 2015-12-11 00:50:02 UTC
Since there are no further releases of RHCS 1.2, where the bug is present, it does not make sense to document workarounds for the issue.

We should be pushing for customers to upgrade to RHCS 1.3, which will not have this problem.

Comment 7 Sergey Gotliv 2015-12-15 09:54:24 UTC
Nick,

Please, recommend your customer to upgrade to RHCS 1.3 or restart relevant services as described in comment #4.

Comment 8 Sofer Athlan-Guyot 2018-09-04 10:03:28 UTC
Hi,

Re-opening that bug because we've got a new instance of it with[1]

  - librbd1-10.2.10-28.el7cp.x86_64
  - openstack-nova-compute-14.1.0-26.el7ost.noarch

on a compute node of a director installation during the upgrade for OSP9 to OSP10.

This basically happen after having run yum upgrade on all node.

The exact error is:

Build of instance cc5b7484-e201-496c-af5b-75297a7f8870 aborted: /lib64/librbd.so.1: undefined symbol: _ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE', u'code': 500, u'details': u'  File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1787, 

a workaround is to restart nova-compute.

But I wonder if there could be a more "permanent" fix.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1625166


Note You need to log in before you can comment on or make changes to this bug.