Bug 1395324 - Ceph issues on OSP10 integrated with an existing older Ceph Cluster
Summary: Ceph issues on OSP10 integrated with an existing older Ceph Cluster
Keywords:
Status: CLOSED DUPLICATE of bug 1393581
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: ceph
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 10.0 (Newton)
Assignee: John Fulton
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-15 16:45 UTC by Kiran Thyagaraja
Modified: 2017-02-06 19:34 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-16 12:36:54 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Kiran Thyagaraja 2016-11-15 16:45:35 UTC
Description of problem:
After successfully deploying an OSP10 cluster integrated with an external Ceph cluster, I tried to launch an instance but failed. Peeking into logs reveal that there might be an incompatibility between the ceph installed by OSP10 and the external ceph cluster. The core error seen is: "rbd: import failed: (38) Function not implemented". Could be functional incompatibilities between the version of Ceph available on OSP10 and the external ceph cluster running an older version of Ceph.

Version-Release number of selected component (if applicable):
OSP10 from http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/10.0-RHEL-7/passed_phase1/

External Ceph cluster version: ceph-0.94.1-16.el7cp.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Deploy OSP10 
2. Try launching an instance. Failure ensues with no valid hosts found error.
3. Peek into nova-compute.log on an compute node and witness ceph client errors.

Actual results:
Error launching an instance due to incompatibility of Ceph clients installed in OSP10 and an external ceph cluster running an old version of Ceph.

Expected results:
Launch an instance without any issues.


Additional info:
The version of Ceph running on the external ceph cluster:
ceph-0.94.1-16.el7cp.x86_64
ceph-common-0.94.1-16.el7cp.x86_64
ceph-mon-0.94.1-16.el7cp.x86_64

Error witnessed in nova-compute.log

2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [req-5c35003a-834a-493e-b8d0-aa8c23b05716 f7de11624f134a549c284b6e185f727c dc8f76a6b56d417a90d23ddc5cff40bf - - -] [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Instance failed to spawn
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Traceback (most recent call last):
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2078, in _build_resources
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     yield resources
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1920, in _build_and_run_instance
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     block_device_info=block_device_info)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2584, in spawn
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     admin_pass=admin_password)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2988, in _create_image
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     fallback_from_host)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3088, in _create_and_inject_local_root
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     instance, size, fallback_from_host)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6571, in _try_fetch_image_cache
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     size=size)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 218, in cache
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     *args, **kwargs)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 853, in create_image
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     self.driver.import_image(base, self.rbd_name)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 327, in import_image
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     utils.execute('rbd', 'import', *args)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/utils.py", line 296, in execute
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     return processutils.execute(*cmd, **kwargs)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 389, in execute
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     cmd=sanitized_cmd)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] ProcessExecutionError: Unexpected error while running command.
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Command: rbd import --pool kt-ospd2_vms /var/lib/nova/instances/_base/6b2852052e8b1b9c6ca68d379837ff2ec029343e dbf600f4-de7e-4e45-a7b9-cdb653f8486a_disk --image-format=2 --id kt-ospd2 --conf /etc/ceph/ceph.conf
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Exit code: 38
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Stdout: u''
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Stderr: u'rbd: --pool is deprecated for import, use --dest-pool\n2016-11-14 02:22:58.751568 7f2a362fcd80 -1 librbd: error writing header: (38) Function not implemented\nrbd: image creation failed\n\rImporting image: 0% complete...failed.\nrbd: import failed: (38) Function not implemented\n'
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]

Comment 1 John Fulton 2016-11-15 22:46:26 UTC
Hi Kiran,

In order to make the Ceph2 clients shipped in the OSP10 images work with a Ceph1.3 server, you need to enable a flag in OSPd for backwards compatibility. To do this, please add some additional Heat to your template as described in: 

 https://bugzilla.redhat.com/show_bug.cgi?id=1393581#c8

As linked in the BZ above, OSP10 documentation should be updated in time to include the above step. 

Also, relevant to OSP10 Ceph clients and external Ceph servers is the following: 

 https://bugzilla.redhat.com/show_bug.cgi?id=1394587#c21

The above isn't a doc issue of including the flag, but of a change that will come to OSP10. You can test as if the fix had shipped by using the following for your ceph-external.yaml in /usr/share/openstack-tripleo-heat-templates/puppet/services

 https://review.openstack.org/#/c/397819/1/puppet/services/ceph-external.yaml

Please let me know how that works.

Comment 2 Kiran Thyagaraja 2016-11-16 01:51:17 UTC
Update: The ceph backward compatibility solution works. So, this (not a)bug may be closed. Many thanks.

Comment 3 John Fulton 2016-11-16 12:36:54 UTC
Kiran, thanks for the update. Since it was the same root cause and fix of BZ 1393581 I am going to mark this as a duplicate.

*** This bug has been marked as a duplicate of bug 1393581 ***


Note You need to log in before you can comment on or make changes to this bug.