Bug 1395324

Summary: Ceph issues on OSP10 integrated with an existing older Ceph Cluster
Product: Red Hat OpenStack Reporter: Kiran Thyagaraja <kiran>
Component: cephAssignee: John Fulton <johfulto>
Status: CLOSED DUPLICATE QA Contact: Yogev Rabl <yrabl>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: jdurgin, johfulto, jomurphy, lhh, nlevine, srevivo
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-16 12:36:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Kiran Thyagaraja 2016-11-15 16:45:35 UTC
Description of problem:
After successfully deploying an OSP10 cluster integrated with an external Ceph cluster, I tried to launch an instance but failed. Peeking into logs reveal that there might be an incompatibility between the ceph installed by OSP10 and the external ceph cluster. The core error seen is: "rbd: import failed: (38) Function not implemented". Could be functional incompatibilities between the version of Ceph available on OSP10 and the external ceph cluster running an older version of Ceph.

Version-Release number of selected component (if applicable):
OSP10 from http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/10.0-RHEL-7/passed_phase1/

External Ceph cluster version: ceph-0.94.1-16.el7cp.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Deploy OSP10 
2. Try launching an instance. Failure ensues with no valid hosts found error.
3. Peek into nova-compute.log on an compute node and witness ceph client errors.

Actual results:
Error launching an instance due to incompatibility of Ceph clients installed in OSP10 and an external ceph cluster running an old version of Ceph.

Expected results:
Launch an instance without any issues.


Additional info:
The version of Ceph running on the external ceph cluster:
ceph-0.94.1-16.el7cp.x86_64
ceph-common-0.94.1-16.el7cp.x86_64
ceph-mon-0.94.1-16.el7cp.x86_64

Error witnessed in nova-compute.log

2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [req-5c35003a-834a-493e-b8d0-aa8c23b05716 f7de11624f134a549c284b6e185f727c dc8f76a6b56d417a90d23ddc5cff40bf - - -] [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Instance failed to spawn
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Traceback (most recent call last):
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2078, in _build_resources
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     yield resources
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1920, in _build_and_run_instance
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     block_device_info=block_device_info)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2584, in spawn
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     admin_pass=admin_password)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2988, in _create_image
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     fallback_from_host)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3088, in _create_and_inject_local_root
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     instance, size, fallback_from_host)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6571, in _try_fetch_image_cache
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     size=size)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 218, in cache
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     *args, **kwargs)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 853, in create_image
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     self.driver.import_image(base, self.rbd_name)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/storage/rbd_utils.py", line 327, in import_image
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     utils.execute('rbd', 'import', *args)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/nova/utils.py", line 296, in execute
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     return processutils.execute(*cmd, **kwargs)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]   File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 389, in execute
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]     cmd=sanitized_cmd)
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] ProcessExecutionError: Unexpected error while running command.
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Command: rbd import --pool kt-ospd2_vms /var/lib/nova/instances/_base/6b2852052e8b1b9c6ca68d379837ff2ec029343e dbf600f4-de7e-4e45-a7b9-cdb653f8486a_disk --image-format=2 --id kt-ospd2 --conf /etc/ceph/ceph.conf
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Exit code: 38
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Stdout: u''
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a] Stderr: u'rbd: --pool is deprecated for import, use --dest-pool\n2016-11-14 02:22:58.751568 7f2a362fcd80 -1 librbd: error writing header: (38) Function not implemented\nrbd: image creation failed\n\rImporting image: 0% complete...failed.\nrbd: import failed: (38) Function not implemented\n'
2016-11-14 02:22:58.765 43152 ERROR nova.compute.manager [instance: dbf600f4-de7e-4e45-a7b9-cdb653f8486a]

Comment 1 John Fulton 2016-11-15 22:46:26 UTC
Hi Kiran,

In order to make the Ceph2 clients shipped in the OSP10 images work with a Ceph1.3 server, you need to enable a flag in OSPd for backwards compatibility. To do this, please add some additional Heat to your template as described in: 

 https://bugzilla.redhat.com/show_bug.cgi?id=1393581#c8

As linked in the BZ above, OSP10 documentation should be updated in time to include the above step. 

Also, relevant to OSP10 Ceph clients and external Ceph servers is the following: 

 https://bugzilla.redhat.com/show_bug.cgi?id=1394587#c21

The above isn't a doc issue of including the flag, but of a change that will come to OSP10. You can test as if the fix had shipped by using the following for your ceph-external.yaml in /usr/share/openstack-tripleo-heat-templates/puppet/services

 https://review.openstack.org/#/c/397819/1/puppet/services/ceph-external.yaml

Please let me know how that works.

Comment 2 Kiran Thyagaraja 2016-11-16 01:51:17 UTC
Update: The ceph backward compatibility solution works. So, this (not a)bug may be closed. Many thanks.

Comment 3 John Fulton 2016-11-16 12:36:54 UTC
Kiran, thanks for the update. Since it was the same root cause and fix of BZ 1393581 I am going to mark this as a duplicate.

*** This bug has been marked as a duplicate of bug 1393581 ***