Bug 1020979
Summary: | After configuring cinder for libgfapi, volumes create but do not attach | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Steve Reichard <sreichar> | ||||
Component: | openstack-nova | Assignee: | Eric Harney <eharney> | ||||
Status: | CLOSED ERRATA | QA Contact: | Dafna Ron <dron> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.0 | CC: | abaron, bengland, dallan, ddomingo, dron, eblake, eharney, ewarszaw, grajaiya, jlabocki, ndipanov, perfbz, sclewis, scohen, shyu, smanjara, sreichar, vbellur, xqueralt, yeylon, yrabl | ||||
Target Milestone: | z2 | Keywords: | Triaged, ZStream | ||||
Target Release: | 4.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | openstack-nova-2013.2-7.el6ost | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: The GlusterFS volume connector in Nova did not pass a port to libvirt for the GlusterFS disk spec
Consequence: Failure to attach with a libvirt error indicating the port field was missing
Fix: Always provide the default Gluster port in the Nova code
Result: libgfapi attach works correctly on RHEL libvirt
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1038260 (view as bug list) | Environment: | |||||
Last Closed: | 2014-03-04 19:03:39 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 956919, 978503, 1040649, 1045047, 1045196 | ||||||
Attachments: |
|
Description
Steve Reichard
2013-10-18 16:04:54 UTC
RHEL's libvirt may require additional args compared to Fedora's, for example, see below. This error message doesn't exist on newer versions of libvirt. http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/conf/domain_conf.c;h=86640137;hb=refs/heads/v0.10.2-maint#l3584 http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/conf/domain_conf.c;h=562d98b03;hb=HEAD#l4906 (Though reporter mentions trying this, unclear why it did not fix the issue.) I see the same issue. Cannot attach volume to instance after enabling libgfapi. Version: Havana openstack-cinder-2013.2-1.el6.noarch # cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.5 Beta (Santiago) Deployed RHOS4.0 from RDO using packstack. Followed the same steps as above: On the cinder server: openstack-config --set /etc/cinder/cinder.conf DEFAULT enabled_backends GLUSTER openstack-config --set /etc/cinder/cinder.conf GLUSTER volume_backend_name GLUSTER openstack-config --set /etc/cinder/cinder.conf GLUSTER volume_driver cinder.volume.drivers.glusterfs.GlusterfsDriver openstack-config --set /etc/cinder/cinder.conf GLUSTER glusterfs_shares_config /etc/cinder/shares.conf On the nova computes: openstack-config --set /etc/nova/nova.conf DEFAULT qemu_allowed_storage_drivers gluster sed -i "s/conf.source_ports = \[None\]/conf.source_ports = \[\'24007\'\]/" /usr/lib/python2.6/site-packages/nova/virt/libvirt/volume.py restart libvirtd on hypervisor deleted all instance and volume launched new instance and created new volume. Same thing works when I disable "qemu_allowed_storage_drivers = gluster". rhs-vm:cinder-vol on /var/lib/cinder/mnt/5029605b62dea1dcb02421f4ba422fcc type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) rhs-vm1:/glance-vol on /mnt/gluster type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) rhs-vm1:cinder-vol on /var/lib/nova/mnt/5029605b62dea1dcb02421f4ba422fcc type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) Saw these in /cinder/scheduler.log 11:49:08.604 6348 WARNING cinder.scheduler.host_manager [req-497673ed-646c-4ee2-ad13-dcb51ea06826 f87800a82e5e4277a36fa273b4db2fa9 6e5577905d77445ea93e363a898d2551] volume service is down or disabled. (host: rhs-client.lab.eng.blr.redhat.com) 2013-10-28 12:01:53.480 6348 CRITICAL cinder [-] need more than 0 values to unpack 2013-10-28 12:09:27.984 11026 WARNING cinder.scheduler.host_manager [req-f18b2dde-fb1c-47c2-9879-18720850375f f87800a82e5e4277a36fa273b4db2fa9 6e5577905d77445ea93e363a898d2551] volume service is down or disabled. (host: rhs-client.lab.eng.blr.redhat.com) 2013-10-28 12:22:40.804 11026 CRITICAL cinder [-] need more than 0 values to unpack 2013-10-28 12:23:19.862 17012 WARNING cinder.scheduler.host_manager [req-8f6556de-2135-46a8-9903-7170a20a5e56 f87800a82e5e4277a36fa273b4db2fa9 6e5577905d77445ea93e363a898d2551] volume service is down or disabled. (host: rhs-client.lab.eng.blr.redhat.com) 2013-10-28 12:39:41.591 17012 CRITICAL cinder [-] need more than 0 values to unpack 2013-10-28 12:44:47.941 22490 WARNING cinder.scheduler.host_manager [req-71b5e6d7-25fa-4f69-a44e-d78d730cf05c f87800a82e5e4277a36fa273b4db2fa9 6e5577905d77445ea93e363a898d2551] volume service is down or disabled. Shows volume service is down. But volume creation works. While attaching the volume,see these messages in /cinder/api.log. It seems the connection to RHS server is timing out. 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 144, in __call__ 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault return resp(environ, start_response) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 130, in __call__ 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault resp = self.call_func(req, *args, **self.kwargs) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 195, in call_func 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault return self.func(req, *args, **kwargs) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/openstack/wsgi.py", line 824, in __call__ 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault content_type, body, accept) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/openstack/wsgi.py", line 872, in _process_stack 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault action_result = self.dispatch(meth, request, action_args) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/openstack/wsgi.py", line 948, in dispatch 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault return method(req=request, **action_args) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/contrib/volume_actions.py", line 162, in _initialize _connection 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault connector) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/volume/api.py", line 75, in wrapped 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault return func(self, context, target_obj, *args, **kwargs) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/volume/api.py", line 442, in initialize_connection 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault connector) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/volume/rpcapi.py", line 133, in initialize_connection 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault volume['host'])) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/openstack/common/rpc/proxy.py", line 89, in call 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault exc.info, real_topic, msg.get('method')) 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault Timeout: Timeout while waiting on RPC response - topic: "cinder-volume:rhs-client8.lab.eng.blr.redhat.co m@GLUSTER", RPC method: "initialize_connection" info: "<unknown>" 2013-10-28 12:56:02.395 23814 TRACE cinder.api.middleware.fault 2013-10-28 12:57:06.492 23814 ERROR cinder.api.middleware.fault [req-a1998b71-a17e-4d2f-8844-276688b7ca5a f87800a82e5e4277a36fa273b4db2fa9 6e5577905d77445ea93e363a898d2 551] Caught error: Timeout while waiting on RPC response - topic: "cinder-volume:rhs-client8.lab.eng.blr.redhat.com@GLUSTER", RPC method: "initialize_connection" info: "<unknown>" This is blocking any progress for our performance work as we run into it constantly. My testing on RHEL6.5+RHOS4 indicates that the error: 2013-10-18 15:40:02.079+0000: 14766: error : virDomainDiskDefParseXML:3719 : XML error: missing port for host can be resolved by: --- a/nova/virt/libvirt/volume.py +++ b/nova/virt/libvirt/volume.py @@ -717,7 +717,7 @@ class LibvirtGlusterfsVolumeDriver(LibvirtBaseVolumeDriver): vol_name = data['export'].split('/')[1] source_host = data['export'].split('/')[0][:-1] - conf.source_ports = [None] + conf.source_ports = [''] conf.source_type = 'network' conf.source_protocol = 'gluster' conf.source_hosts = [source_host] Steve: I'd be curious if this helps in your configuration. Note that the Nova compute service must be restarted when changing this code. (Not libvirtd.) Eric, I tried the configuration from c#6 and restarted nova compute service. I still cannot attach the volume. From nova/virt/libvirt/volume.py conf.source_ports = ['']vol_name = data['export'].split('/')[1] source_host = data['export'].split('/')[0][:-1] conf.source_ports = [''] conf.source_type = 'network' conf.source_protocol = 'gluster' conf.source_hosts = [source_host] # nova volume-attach 997a27ad-d244-42f1-b41e-ba9740a93c5f 617597a9-8b42-4c3f-bc50-c29234a60380 auto +----------+--------------------------------------+ | Property | Value | +----------+--------------------------------------+ | device | /dev/vdb | | serverId | 997a27ad-d244-42f1-b41e-ba9740a93c5f | | id | 617597a9-8b42-4c3f-bc50-c29234a60380 | | volumeId | 617597a9-8b42-4c3f-bc50-c29234a60380 | +----------+--------------------------------------+ # cinder list +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ | 617597a9-8b42-4c3f-bc50-c29234a60380 | available | myvol | 10 | None | false | | | f0a2a186-604e-4839-a093-bf11a3848f13 | available | vol1 | 2 | None | false | | +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ (In reply to shilpa from comment #7) > Eric, I tried the configuration from c#6 and restarted nova compute service. > I still cannot attach the volume. Were you seeing the "missing port for host" error? There could be a different issue -- have you set the required allow-insecure option, etc., on the server side? (In reply to c#8) Yes I do see "error : virDomainDiskDefParseXML:3719 : XML error: missing port for host" and I have set the allow-insecure option and virt option on the gluster volume for cinder. Options Reconfigured: server.allow-insecure: on storage.owner-uid: 165 storage.owner-gid: 165 network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off Adding to c#9, "error : virDomainDiskDefParseXML:3719 : XML error: missing port for host" was seen during my previous tests reported in comment #3.. I reinstalled RHOS and retested it yesterday and today with setting source_ports to ['24007'] or [''] or [None], it did *not* produce the error "missing port for host" again. I should have mentioned this in my earlier comment #10. What I meant to say was I don't see the "missing port for host" anymore, but attaching cinder volume still fails. Any clue on the error that prevents the volume from being attached after the port problem is fixed? Is there anything relevant in compute's or libvirt's logs? Yes, with port being set to ['24007'], I see relevant errors in the compute logs: 1] [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] Attaching volume 503fec29-e6d3-4c66-b0d0-9afe3bbde02c to /dev/vdb 2013-12-04 14:53:38.185 14535 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 10.70.36.32 2013-12-04 14:53:38.571 14535 ERROR nova.compute.manager [req-a474cb47-fe55-481c-b12f-5db24cebf803 3a64469ca7064f2fa5222470e912af73 c2cdad15f08d40f9a3a12d16ed2d212 1] [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] Failed to attach volume 503fec29-e6d3-4c66-b0d0-9afe3bbde02c at /dev/vdb 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] Traceback (most recent call last): 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] File "/usr/lib/python2.6/site-packages/nova/compute/man ager.py", line 3669, in _attach_volume 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] encryption=encryption) 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] File "/usr/lib/python2.6/site-packages/nova/virt/libvir t/driver.py", line 1071, in attach_volume 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] disk_info) 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] File "/usr/lib/python2.6/site-packages/nova/virt/libvir t/driver.py", line 1030, in volume_driver_method 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] return method(connection_info, *args, **kwargs) 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] File "/usr/lib/python2.6/site-packages/nova/virt/libvir t/volume.py", line 829, in connect_volume 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] vol_name = data['export'].split('/')[1] 2013-12-04 14:53:38.571 14535 TRACE nova.compute.manager [instance: 7e5e0171-940e-4d86-97cc-01a061390f50] IndexError: list index out of range In libvirt.log I see a few errors but with a different timestamp. So I am guessing they are not relevant: 2013-12-04 09:17:14.362+0000: 12197: error : virNetSocketReadWire:1194 : End of file while reading data: Input/output error 2013-12-04 09:17:14.363+0000: 12197: error : virNetSocketReadWire:1194 : End of file while reading data: Input/output error 2013-12-04 09:21:20.260+0000: 12197: error : virNetSocketReadWire:1194 : End of file while reading data: Input/output error That error makes it look like there is something unexpected in the connection_info object... like it doesn't have a "/" in the share configured in the Cinder gluster_shares file. Can you post your Cinder gluster_shares file contents? Moving shilpa's issue over to bug 1038260. eric, would attach of volume after installing with packstack+gluster share be enough to test this or is there anything else we need to check here? (In reply to Dafna Ron from comment #19) > eric, > would attach of volume after installing with packstack+gluster share be > enough to test this or is there anything else we need to check here? You have to ensure that libgfapi is used, by setting qemu_allowed_storage_drivers in Nova. For testing: Be sure to set the Gluster volume to be able to accept connections from insecure ports: gluster volume set <volume_name> allow-insecure True I tried the configuration above but we are failing attach with libvirt error. will attach the logs openstack-nova-compute-2013.2-9.el6ost.noarch openstack-cinder-2013.2-8.el6ost.noarch libvirt-0.10.2-29.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.3.x86_64 qemu-img-rhev-0.12.1.2-2.415.el6_5.3.x86_64 2013-12-11 18:53:48.910 14903 ERROR nova.openstack.common.rpc.amqp [req-ba311b30-21ff-4209-8321-543544956e54 674776fc1eea47718301aeacbab072b3 5cea8d9e58c841dfb03b1cda755b539d] Exception during message handling 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last): 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp **args) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 90, in wrapped 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp payload) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 73, in wrapped 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 243, in decorated_function 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp pass 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 229, in decorated_function 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 271, in decorated_function 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info()) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 258, in decorated_function 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 3646, in attach_volume 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp context, instance, mountpoint) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 3641, in attach_volume 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp mountpoint, instance) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 3688, in _attach_volume 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp connector) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 3678, in _attach_volume 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp encryption=encryption) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1105, in attach_volume 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp disk_dev) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1092, in attach_volume 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp virt_dom.attachDeviceFlags(conf.to_xml(), flags) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 187, in doit 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp result = proxy_call(self._autowrap, f, *args, **kwargs) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 147, in proxy_call 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp rv = execute(f,*args,**kwargs) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", line 76, in tworker 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp rv = meth(*args,**kwargs) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp File "/usr/lib64/python2.6/site-packages/libvirt.py", line 419, in attachDeviceFlags 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self) 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp libvirtError: internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized 2013-12-11 18:53:48.910 14903 TRACE nova.openstack.common.rpc.amqp 2013-12-11 18:53:48.914 14903 DEBUG qpid.messaging.io.raw [-] SENT[2d54710]: '\x0f\x00\x00:\x00\x00\x00\x00\x00\x00\x00\x00\x02\x01\x01\x00\x00(cf301c00-bc69-4a4a-bba7-e727311abd46:720\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02\ x07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' writeable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:480 2013-12-11 18:53:48.914 14903 DEBUG qpid.messaging.io.raw [-] READ[2d54710]: '\x0f\x00\x00:\x00\x00\x00\x00\x00\x00\x00\x00\x02\x02\x01\x00\x00(cf301c00-bc69-4a4a-bba7-e727311abd46:720\x0f\x00\x00\x1c\x00\x00\x00\x00\x00\x00\x00\x00\x02\ x07\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' readable /usr/lib/python2.6/site-packages/qpid/messaging/driver.py:416 2013-12-11 18:53:48.915 14903 DEBUG qpid.messaging.io.ops [-] RCVD[2d54710]: SessionAttached(name='cf301c00-bc69-4a4a-bba7-e727311abd46:720') write /usr/lib/python2.6/site-pa Created attachment 835380 [details]
logs
The fix made for the original "error : virDomainDiskDefParseXML:3719 : XML error: missing port for host" problem looks good, Dafna's latest report here seems to be describing a different issue. Eric Blake, Can you provide some insight on what to look for to troubleshoot the error seen in comment #22 when a libgfapi attach fails? raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self) libvirtError: internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized This seems to be a general attach failure but it's not clear why it may happen. In reply to comment #27, It could perhaps be an SELinux issue? I reproduced the issue of not being able to attach a volume to an instance with the error "raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self) libvirtError: internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized" 1. FUSE mount the volume. 2. Set SELinux to Enforcing 3. Attach cinder volume to an instance and it fails with the error below. 2013-12-19 02:08:35.696 30393 AUDIT nova.compute.manager [req-c0bc96c1-14ae-4830-8ea2-fca07af4e80e 575f0769ede540bc8bb279f568336f1e ccb8126a56b9408c9a1895c71cdf3ea9] [i nstance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] Attaching volume fe61fb62-4043-468a-b55f-d3c6a2b3d74a to /dev/vdd 2013-12-19 02:08:36.094 30393 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 10.70.36.32 2013-12-19 02:08:48.573 30393 ERROR nova.compute.manager [req-c0bc96c1-14ae-4830-8ea2-fca07af4e80e 575f0769ede540bc8bb279f568336f1e ccb8126a56b9408c9a1895c71cdf3ea9] [i nstance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] Failed to attach volume fe61fb62-4043-468a-b55f-d3c6a2b3d74a at /dev/vdd 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] Traceback (most recent call last): 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] File "/usr/lib/python2.6/site-packages/nova/compute/manager. py", line 3670, in _attach_volume 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] encryption=encryption) 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/dri ver.py", line 1105, in attach_volume 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] disk_dev) 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/dri ver.py", line 1092, in attach_volume 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] virt_dom.attachDeviceFlags(conf.to_xml(), flags) 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", l ine 187, in doit 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] result = proxy_call(self._autowrap, f, *args, **kwargs) 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", l ine 147, in proxy_call 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] rv = execute(f,*args,**kwargs) 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] File "/usr/lib/python2.6/site-packages/eventlet/tpool.py", l ine 76, in tworker 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] rv = meth(*args,**kwargs) 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] File "/usr/lib64/python2.6/site-packages/libvirt.py", line 4 19, in attachDeviceFlags 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] if ret == -1: raise libvirtError ('virDomainAttachDeviceFl ags() failed', dom=self) 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] libvirtError: internal error unable to execute QEMU command '_ _com.redhat_drive_add': Device 'drive-virtio-disk3' could not be initialized 2013-12-19 02:08:48.573 30393 TRACE nova.compute.manager [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] 4. Set it back to Permissive and volume attach works. 2013-12-19 02:14:25.503 30393 AUDIT nova.compute.manager [req-021cf494-448b-40e5-b017-5e9b379a0e7c 575f0769ede540bc8bb279f568336f1e ccb8126a56b9408c9a1895c71cdf3ea9] [instance: 400f3581-b24e-4be0-90f1-0c4a4bb5f217] Attaching volume fe61fb62-4043-468a-b55f-d3c6a2b3d74a to /dev/vdd 2013-12-19 02:14:25.913 30393 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 10.70.36.32 So it does not look like a problem with libgfapi. (In reply to shilpa from comment #28) > In reply to comment #27, It could perhaps be an SELinux issue? > I think it can be (at least I've heard others say that setting SELinux to permissive makes it go away), but I believe we tried this on Dafna's machine and it didn't help. My guess is that that same particular error message can come up for a variety of underlying reasons. Did you see any logged AVCs in audit.log that appeared related? Eric, OK. Seeing if i can reproduce it on libgfapi setup. As for AVCs saw these in the logs. type=AVC msg=audit(1387399466.564:6600760): avc: denied { open } for pid=20013 comm="qemu-kvm" name="volume-fe61fb62-4043-468a-b55f-d3c6a2b3d74a" dev=fuse ino=12247896744984864522 scontext=unconfined_u:system_r:svirt_t:s0:c556,c643 tcontext=system_u:object_r:fusefs_t:s0 tclass=file type=SYSCALL msg=audit(1387399466.564:6600760): arch=c000003e syscall=2 success=yes exit=32 a0=7fa6e6f61da0 a1=800 a2=0 a3=1 items=0 ppid=1 pid=20013 auid=0 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=243 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm" subj=unconfined_u:system_r:svirt_t:s0:c556,c643 key=(null) type=AVC msg=audit(1387399466.568:6600761): avc: denied { getattr } for pid=20013 comm="qemu-kvm" name="/" dev=fuse ino=1 scontext=unconfined_u:system_r:svirt_t:s0:c556,c643 tcontext=system_u:object_r:fusefs_t:s0 tclass=filesystem type=SYSCALL msg=audit(1387399466.568:6600761): arch=c000003e syscall=138 success=yes exit=0 a0=20 a1=7fff0644b800 a2=3 a3=40 items=0 ppid=1 pid=20013 auid=0 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=243 comm="qemu-kvm" exe="/usr/libexec/qemu-kvm" subj=unconfined_u:system_r:svirt_t:s0:c556,c643 key=(null) Eric, On my libgfapi setup, even with SELinux enforced, I can still attach a volume to an instance. As you said, it could be something else. Setting needinfo on eblake for question in comment #27. (In reply to Eric Harney from comment #27) > Eric Blake, > > Can you provide some insight on what to look for to troubleshoot the error > seen in comment #22 when a libgfapi attach fails? > > raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self) > libvirtError: internal error unable to execute QEMU command > '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be > initialized > > This seems to be a general attach failure but it's not clear why it may > happen. Can you access the drive natively via qemu-img? That is, does 'qemu-img info gluster://host/volume/path' work, where just libvirt is failing? The error message you are quoting comes from qemu (libvirt is just being the messenger in this case), so I'm not sure where else to look; my guess is on SELinux labeling; if going permissive makes it work, then those AVCs are relevant (and libvirt should be relabeling things as needed). If FUSE mounting is failing but libgfapi is succeeding, then that does indeed sound like it might be a SELinux labeling (since FUSE mounts are local file systems and need labels, whereas libgfapi bypasses the file system and is not impacted by labels). Yogev, can you please test this with selinux permissive? We ran into this same issue [1] on ewarszaw's setup, and it seems to be because allow-insecure was set for the Gluster volume but rpc-auth-allow-insecure was not also set for the Gluster server. I'm not sure the former has any effect without the latter. (Both are noted in doc bug 1052336.) [1] libvirtError: internal error unable to execute QEMU command '__com.redhat_drive_add': Device 'drive-virtio-disk1' could not be initialized I haven't found any reasons for this to occur at this point outside of misconfiguration of the Gluster server. (Security options required both on the server globally and the individual Gluster volume.) If someone still manages to hit this, please let me know. (In reply to Eric Harney from comment #36) > I haven't found any reasons for this to occur at this point outside of > misconfiguration of the Gluster server. (Security options required both on > the server globally and the individual Gluster volume.) > > If someone still manages to hit this, please let me know. Eric, please post the configuration required so that we can test the gluster server we have. Yogev, please work with Eric on this. (In reply to Dafna Ron from comment #37) Nothing special for this bug. Just the steps from bug 1052336 again. Bug 1061963 is also out there to make sure they get written down in all the right places. Re-add needinfo I erased... (In reply to Eric Harney from comment #38) > (In reply to Dafna Ron from comment #37) > Nothing special for this bug. Just the steps from bug 1052336 again. Bug > 1061963 is also out there to make sure they get written down in all the > right places. Eric, you said that this bug is created from misconfiguration but we installed the systems based on the configuration which was given in all these bugs and this was still reproduced by both Yogev and myself. I don't mind testing this again and working on this with you, however, you need to be clear and write down exact configurations so that none of us waist our time... If you think that this bug is actually misconfiguration - please provide exact instructions on what we need to configure on both gluster and cinder side before running this test again. (In reply to Dafna Ron from comment #40) > (In reply to Eric Harney from comment #38) > > (In reply to Dafna Ron from comment #37) > > Nothing special for this bug. Just the steps from bug 1052336 again. Bug > > 1061963 is also out there to make sure they get written down in all the > > right places. > > Eric, > you said that this bug is created from misconfiguration but we installed the > systems based on the configuration which was given in all these bugs and > this was still reproduced by both Yogev and myself. > I don't mind testing this again and working on this with you, however, you > need to be clear and write down exact configurations so that none of us > waist our time... > If you think that this bug is actually misconfiguration - please provide > exact instructions on what we need to configure on both gluster and cinder > side before running this test again. This bug originated from an incompatibility between Nova and the version of libvirt being used on RHEL. We patched that here, spawned off another bug 1038260 for an issue also found in this bug, and patched that one too. The bugs above cover the GlusterFS setup requirements from the documentation side. The other failure you hit in comment 22 matches the one I looked at on another machine in comment 35, which is being handled by these documentation updates. (The initial guess was that it was SELinux related, but that does not seem so.) I think that covers all of the issues we've hit here -- basically now we just need to continue testing the initial scenario of attaching volumes with libgfapi enabled. I don't know how to make that fail at this point. If it does I would like to look at the machine where it happens. I configured everything as stated above. I was able to create the volume but instance is stuck in BUILD when we try to boot it from volume and when we try to attach we get stuck in attaching state: [root@puma31 ~(keystone_admin)]# nova list +--------------------------------------+---------+--------+------------+-------------+--------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+---------+--------+------------+-------------+--------------------------+ | 94de0702-6756-447a-b4b7-875c5e1f5fee | gluster | BUILD | spawning | NOSTATE | novanetwork=192.168.32.2 | | 0ae38934-895c-493d-a7a8-b649c89146ee | guster2 | ACTIVE | None | Running | novanetwork=192.168.32.3 | +--------------------------------------+---------+--------+------------+-------------+--------------------------+ [root@puma31 ~(keystone_admin)]# cinder list +--------------------------------------+-----------+----------------+------+-------------+----------+--------------------------------------+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+----------------+------+-------------+----------+--------------------------------------+ | b606fa38-f4a1-491b-ae9e-04ffe75ed370 | in-use | dafna-gluster | 10 | None | false | 94de0702-6756-447a-b4b7-875c5e1f5fee | | c2b9dfb5-18df-4cb7-8c17-8f98b2c75479 | attaching | dafna-gluster1 | 10 | None | false | | +--------------------------------------+-----------+----------------+------+-------------+----------+--------------------------------------+ moving this back to dev. > nova.conf on my hosts was changed to: > > # Currently supported protocols: [gluster] (list value) > qemu_allowed_storage_drivers= gluster > > I added the configuration listed here: > https://bugzilla.redhat.com/show_bug.cgi?id=1020979#c43 > > This is my volume configuration in gluster: > > Volume Name: Dafna-openstack > Type: Replicate > Volume ID: e1f315dc-ced1-41f9-b721-de141cb35ce8 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: 10.35.XX.XX:/var/lib/share/Dafna > Brick2: tiger.qa.lab.tlv.redhat.com:/var/lib/share/Dafna > Options Reconfigured: > storage.owner-uid: 165 > storage.owner-gid: 165 > server.root-squash: off > auth.allow: * > user.cifs: enable > nfs.disable: off > server.allow-insecure: on > > > It seems that when I boot an instance from the volume the instance is in > spawning a long time and in shut off state in virsh: > > [root@puma32 ~(keystone_admin)]# nova list > +--------------------------------------+---------+--------+------------+-------------+--------------------------+ > > | ID | Name | Status | Task State | > Power State | Networks | > +--------------------------------------+---------+--------+------------+-------------+--------------------------+ > > | 94de0702-6756-447a-b4b7-875c5e1f5fee | gluster | BUILD | spawning | > NOSTATE | novanetwork=192.168.32.2 | > +--------------------------------------+---------+--------+------------+-------------+--------------------------+ > > [root@puma32 ~(keystone_admin)]# vim /etc/nova/nova.conf > [root@puma32 ~(keystone_admin)]# nova list > +--------------------------------------+---------+--------+------------+-------------+--------------------------+ > > | ID | Name | Status | Task State | > Power State | Networks | > +--------------------------------------+---------+--------+------------+-------------+--------------------------+ > > | 94de0702-6756-447a-b4b7-875c5e1f5fee | gluster | BUILD | spawning | > NOSTATE | novanetwork=192.168.32.2 | > +--------------------------------------+---------+--------+------------+-------------+--------------------------+ > > [root@puma32 ~(keystone_admin)]# virsh -r list > Id Name State > ---------------------------------------------------- > 1 instance-00000007 shut off > > > > also, mount command shows we are still mounting using fuse: > > 10.35.XX.XX:/Dafna-openstack on > /var/lib/nova/mnt/59ed8cb64fc8948968a29181234051a2 type fuse.glusterfs > (rw,default_permissions,allow_other,max_read=131072) > The share will be mounted whether libgfapi is being used or not. The way to confirm that libgfapi is being used is to get the VM id from "virsh list" and then "virsh domblklist <vm id>". The fuse case will show a mounted file path (/var/lib/...) , and the libgfapi case will show <gluster_volume_name>/volume-<id>. > attach of a volume is also stuck: > > [root@puma31 ~(keystone_admin)]# cinder list > +--------------------------------------+-----------+----------------+------+-------------+----------+--------------------------------------+ > > | ID | Status | Display Name | > Size | Volume Type | Bootable | Attached to | > +--------------------------------------+-----------+----------------+------+-------------+----------+--------------------------------------+ > > | b606fa38-f4a1-491b-ae9e-04ffe75ed370 | in-use | dafna-gluster | > 10 | None | false | 94de0702-6756-447a-b4b7-875c5e1f5fee | > | c2b9dfb5-18df-4cb7-8c17-8f98b2c75479 | attaching | dafna-gluster1 | > 10 | None | false | | > +--------------------------------------+-----------+----------------+------+-------------+----------+--------------------------------------+ > (In reply to Dafna Ron from comment #45) > > nova.conf on my hosts was changed to: > > > > # Currently supported protocols: [gluster] (list value) > > qemu_allowed_storage_drivers= gluster I created a new Cinder volume and attempted an attach to the same VM -- this fails with: 2014-02-20 17:25:38.506 18122 TRACE nova.openstack.common.rpc.amqp if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self) 2014-02-20 17:25:38.506 18122 TRACE nova.openstack.common.rpc.amqp libvirtError: Timed out during operation: cannot acquire state change lock I launched a new VM and attempted an attach, that one hangs in "attaching" and also shows that error in Nova's compute.log. I think we'll need to get some input from the Nova/virt guys about what's happening here. verified on: openstack-nova-cert-2013.2.2-2.el6ost.noarch python-novaclient-2.15.0-2.el6ost.noarch openstack-nova-common-2013.2.2-2.el6ost.noarch openstack-nova-api-2013.2.2-2.el6ost.noarch openstack-nova-compute-2013.2.2-2.el6ost.noarch openstack-nova-conductor-2013.2.2-2.el6ost.noarch openstack-nova-novncproxy-2013.2.2-2.el6ost.noarch openstack-nova-scheduler-2013.2.2-2.el6ost.noarch python-nova-2013.2.2-2.el6ost.noarch openstack-nova-console-2013.2.2-2.el6ost.noarch openstack-nova-network-2013.2.2-2.el6ost.noarch python-cinderclient-1.0.7-2.el6ost.noarch python-cinder-2013.2.2-1.el6ost.noarch openstack-cinder-2013.2.2-1.el6ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0231.html |