- Description of problem: There is a Nova multipath issue related to the cleanup of the multipath devices on a Fibre Channel environment. The root cause seems to be in the way LibvirtFibreChannelVolumeDriver detects multipath ids (connect_volume). It's just running "multipath -l <dev>" once and parsing response to get a multipath id [1]. The problem is that if environment is slow then it takes time to create a multipath device so Nova is getting back an empty response and assumes that the volume is connected via single path. Later, in disconnect_volume it reports that something wrong happens with the multipath tools [2] and delete a single path, so the multipath device remains stale. We examined connection info of 300 volumes in Nova's block_device_mapping table and found that some of them are missing a multipath id. [1] https://github.com/openstack/nova/blob/icehouse-eol/nova/virt/libvirt/volume.py#L1006 [2] https://github.com/openstack/nova/blob/icehouse-eol/nova/virt/libvirt/volume.py#L1043 - Version-Release number of selected component (if applicable): openstack-nova-common-2014.1.4-4.el6ost.noarch openstack-nova-compute-2014.1.4-4.el6ost.noarch - How reproducible: Intermittent - Steps to Reproduce: (In pseduo code) Spawn 4 concurrent threads of the following to create/delete 128 VMs. for x in range(0, 32): vol = cinder_create() vm = nova_boot(vol) nova_delete(vm) cinder_delete(vol) - Actual results: After all instances are deleted there are faulty paths left on the controller. # multipath -ll 30000000000000000 dm-11 3PARdata,VV size=38G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=enabled |- 8:0:2:1 sdj 8:144 failed faulty running |- 7:0:2:1 sdh 8:112 failed faulty running `- 8:0:2:4 sdp 8:240 failed faulty running # - Expected results: `multipath -ll` will return return paths in a failed faulty running state. - Additional info: A similar problem occurs on the controller node during the same stress test as documented in externally linked Red Hat BZ 1255523.
It looks very similar to BZ#1115375. I think that this attachment [1] from the Cinder's BZ#1093416 is what you need here at least for 5.0. Unfortunately more complete solution that was recently introduced in the os-brick library [2] is not backportable to 5.0 and has to be properly tested. [1] https://bugzilla.redhat.com/attachment.cgi?id=892577 [2] https://review.openstack.org/#/c/213389/
automation passed https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS5/job/rhos-jenkins-rhos-5.0-puddle-rhel-6.7-multi-node-packstack-neutron-ml2-gre-rabbitmq-tempest-git-all/lastCompletedBuild/testReport/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2075.html