Bug 1370598 - multipathd segfault during volume attach
Summary: multipathd segfault during volume attach
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 7.0 (Kilo)
Hardware: All
OS: Linux
high
urgent
Target Milestone: async
: 7.0 (Kilo)
Assignee: Lee Yarwood
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On: 1367850
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-26 17:54 UTC by Jack Waterworth
Modified: 2022-07-09 09:44 UTC (History)
22 users (show)

Fixed In Version: openstack-nova-2015.1.4-18.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-15 22:56:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-16919 0 None None None 2022-07-09 09:44:01 UTC
Red Hat Product Errata RHSA-2017:0282 0 normal SHIPPED_LIVE Moderate: openstack-cinder, openstack-glance, and openstack-nova security update 2017-02-16 03:52:44 UTC

Description Jack Waterworth 2016-08-26 17:54:10 UTC
Description of problem:
nova fails during volume attach. upon further inspection it appears that multipathd has seg faulted and nova fails when attempting to view multipathing output.

Version-Release number of selected component (if applicable):
openstack-nova-compute-2015.1.4-1.el7ost.noarch

How reproducible:
sometimes

Steps to Reproduce:
1. Attach volume to instance

Actual results:
volume attach fails because multipathd is not running

Expected results:
multipathd should not be in a stopped state due to segfault

Additional info:

I have a bugzilla opened with the device-mapper team for this issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1367850

The team has suggested putting in further checks in device-mapper-multipath to avoid the segfault, but states that there is a SAN side shuffling that is causing multipathing to get into this bad state.

I suspect this is either caused by nova not correctly cleaning up paths on attach/detach, or by cinder when devices are created and deleted.

Comment 5 Paul Grist 2016-08-30 15:20:14 UTC
Looking to confirm, but I think the next step here is to get the updates to the 2 customers discussed in: 

https://bugzilla.redhat.com/show_bug.cgi?id=1367850#c13

Comment 6 Ben Marzinski 2016-08-30 17:25:10 UTC
I'm also going to try to trigger some LUN reassignments on my machines to see if I can recreate this, but with and without the latest multipath code.

Comment 8 Jack Waterworth 2016-09-06 19:29:56 UTC
The customer has updated the rpms from the other bz. they havent had the issue occur again, but they are seeing some messages from mpath:

# multipath -ll 36005076802810b39780000000000012f
Sep 02 15:12:40 | 65:80: path wwid appears to have changed. Using old wwid.

Comment 9 Ben Marzinski 2016-09-06 20:00:19 UTC
This is what multipathd prints when it catches the issue an keeps itself from crashing. However, I wrote that fix to deal with a bug where the LUN itself wasn't changing, just its WWID (because of user error). In the current case, the LUN is changing.  Probably the best thing for multipathd to do is to disable and then remove any path when we detect that it's wwid has changed (and possibly re-add the path again, so multipath can continue to use it with the new information). That way multipath will do the best that it can to save users from themselves (Like I said, we still do not support remapping LUNs while they are in use, and currently, there is no way to ).

The ideal solution would be to not remap in-use LUNs, since nothing supports this.

Comment 24 errata-xmlrpc 2017-02-15 22:56:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0282.html


Note You need to log in before you can comment on or make changes to this bug.