Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1370598 - multipathd segfault during volume attach
multipathd segfault during volume attach
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova (Show other bugs)
7.0 (Kilo)
All Linux
high Severity urgent
: async
: 7.0 (Kilo)
Assigned To: Lee Yarwood
Prasanth Anbalagan
: ZStream
Depends On: 1367850
Blocks:
  Show dependency treegraph
 
Reported: 2016-08-26 13:54 EDT by Jack Waterworth
Modified: 2017-12-06 14:40 EST (History)
23 users (show)

See Also:
Fixed In Version: openstack-nova-2015.1.4-18.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-02-15 17:56:32 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0282 normal SHIPPED_LIVE Moderate: openstack-cinder, openstack-glance, and openstack-nova security update 2017-02-15 22:52:44 EST

  None (edit)
Description Jack Waterworth 2016-08-26 13:54:10 EDT
Description of problem:
nova fails during volume attach. upon further inspection it appears that multipathd has seg faulted and nova fails when attempting to view multipathing output.

Version-Release number of selected component (if applicable):
openstack-nova-compute-2015.1.4-1.el7ost.noarch

How reproducible:
sometimes

Steps to Reproduce:
1. Attach volume to instance

Actual results:
volume attach fails because multipathd is not running

Expected results:
multipathd should not be in a stopped state due to segfault

Additional info:

I have a bugzilla opened with the device-mapper team for this issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1367850

The team has suggested putting in further checks in device-mapper-multipath to avoid the segfault, but states that there is a SAN side shuffling that is causing multipathing to get into this bad state.

I suspect this is either caused by nova not correctly cleaning up paths on attach/detach, or by cinder when devices are created and deleted.
Comment 5 Paul Grist 2016-08-30 11:20:14 EDT
Looking to confirm, but I think the next step here is to get the updates to the 2 customers discussed in: 

https://bugzilla.redhat.com/show_bug.cgi?id=1367850#c13
Comment 6 Ben Marzinski 2016-08-30 13:25:10 EDT
I'm also going to try to trigger some LUN reassignments on my machines to see if I can recreate this, but with and without the latest multipath code.
Comment 8 Jack Waterworth 2016-09-06 15:29:56 EDT
The customer has updated the rpms from the other bz. they havent had the issue occur again, but they are seeing some messages from mpath:

# multipath -ll 36005076802810b39780000000000012f
Sep 02 15:12:40 | 65:80: path wwid appears to have changed. Using old wwid.
Comment 9 Ben Marzinski 2016-09-06 16:00:19 EDT
This is what multipathd prints when it catches the issue an keeps itself from crashing. However, I wrote that fix to deal with a bug where the LUN itself wasn't changing, just its WWID (because of user error). In the current case, the LUN is changing.  Probably the best thing for multipathd to do is to disable and then remove any path when we detect that it's wwid has changed (and possibly re-add the path again, so multipath can continue to use it with the new information). That way multipath will do the best that it can to save users from themselves (Like I said, we still do not support remapping LUNs while they are in use, and currently, there is no way to ).

The ideal solution would be to not remap in-use LUNs, since nothing supports this.
Comment 24 errata-xmlrpc 2017-02-15 17:56:32 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0282.html

Note You need to log in before you can comment on or make changes to this bug.