Bug 1672761 - segfault during boot of netinstall image
Summary: segfault during boot of netinstall image
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: device-mapper-multipath   
(Show other bugs)
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ben Marzinski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Keywords:
Depends On:
Blocks: BetaBlocker, F30BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2019-02-05 19:45 UTC by Chris Murphy
Modified: 2019-03-05 03:49 UTC (History)
14 users (show)

Fixed In Version: device-mapper-multipath-0.7.9-5.git2df6110
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2019-03-05 03:49:18 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
journal (865.35 KB, text/plain)
2019-02-05 19:46 UTC, Chris Murphy
no flags Details
coredumpctl info (3.05 KB, text/plain)
2019-02-05 19:46 UTC, Chris Murphy
no flags Details

Description Chris Murphy 2019-02-05 19:45:04 UTC
Description of problem:


Version-Release number of selected component (if applicable):
Fedora-Workstation-netinst-x86_64-Rawhide-20190205.n.0.iso
device-mapper-multipath-0.7.9-4.git2df6110.fc30.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Boot media
2.
3.

Actual results:

[  101.820263] localhost multipathd[1401]: zram0: unusable path - checker failed
[  101.820896] localhost kernel: multipathd[1408]: segfault at 1d0 ip 000055f07d6d7ff9 sp 00007f48d4e56a18 error 4 in multipathd[55f07d6d0000+10000]
[  101.821407] localhost kernel: Code: c1 fb ff ff 4c 89 e7 e8 45 9f ff ff 48 8b 95 40 05 00 00 48 83 ec 08 49 89 e8 8b 3d 81 22 01 00 49 89 c1 be 02 00 00 00 31 c0 <48> 8b 8a d0 01 00 00 53 48 8d 15 39 85 00 00 e8 e3 91 ff ff 41 58


Expected results:

Shouldn't crash.


Additional info:

Comment 1 Chris Murphy 2019-02-05 19:46 UTC
Created attachment 1527270 [details]
journal

Comment 2 Chris Murphy 2019-02-05 19:46 UTC
Created attachment 1527271 [details]
coredumpctl info

Comment 3 Chris Murphy 2019-02-05 19:48:20 UTC
The crash happens in a guest VM. Host is Fedora 29, using qemu-kvm.

Comment 4 Fedora Blocker Bugs Application 2019-02-05 19:56:14 UTC
Proposed as a Blocker for 30-beta by Fedora user chrismurphy using the blocker tracking app because:

 Anaconda makes use of dm-multipath in some use cases. But if dm-multipathd flat out crashes during boot, I'm gonna go with:

"Bug hinders execution of required Beta test plans or dramatically reduces test coverage"

I think we need it executing successfully for the beta release, in order to discover more specific bugs during beta GA, which then can be fixed up for final.

Comment 5 Ben Marzinski 2019-02-05 22:41:12 UTC
In check_path() at multipathd/main.c:3308, multipathd is calling LOG_MSG() which dereferences pp->mpp, which is not set in this case.  I'll have this fixed shortly.

Comment 6 Ben Marzinski 2019-02-07 00:23:06 UTC
This should be fixed now.  However, this shouldn't have happened if you used the default /etc/multipath.conf created by mpathconf. That file should include this section

blacklist_exceptions {
        property "(SCSI_IDENT_|ID_WWN)"
}

That should keep multipath from ever trying to multipath zram devices. Removing this section is sometimes necessary if you would like to do something weird, like run multipath on a USB memory stick, in order to use its queue_if_no_path ability. However, in general, this section keeps multipathd from trying to create multipaths a number of devices that it has no business multipathing.

I have added zram devices to the builtin blacklist, as well as fixing the possibility of a NULL pointer dereference, but in general, unless you have a good reason, you shouldn't remove that above section from the default /etc/multipath.conf file.

Comment 7 Chris Murphy 2019-02-07 00:35:06 UTC
I'm just booting the netinstaller as-is. I'm not modifying or removing anything.

[anaconda root@localhost ~]# ls -l /etc/multipath
multipath/          multipath.conf      multipath.conf.old  
[anaconda root@localhost ~]# cat /etc/multipath.conf
defaults {
        find_multipaths yes
        user_friendly_names yes
}


blacklist {
}
[anaconda root@localhost ~]# cat /etc/multipath.conf.old 
defaults {
        find_multipaths yes
        user_friendly_names yes
}

[anaconda root@localhost ~]#

Comment 8 Chris Murphy 2019-02-07 00:41:01 UTC
a) mpathconf is clearly not setting the multipath.conf file as you're saying it should; my cat of the file below comes after the mpathconf command shown is called.
b) mpathconf is called by anaconda after multipathd crashes so any change to the conf file couldn't have prevented the crash.

Feb 07 00:29:31 localhost systemd[1]: multipathd.service: Main process exited, code=killed, status=11/SEGV                                                                  
Feb 07 00:29:31 localhost systemd[1]: multipathd.service: Failed with result 'signal'.                                                                                      
Feb 07 00:29:31 localhost audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=multipathd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Feb 07 00:29:32 localhost systemd-coredump[1867]: Process 1431 (multipathd) of user 0 dumped core.                                                                          
                                                  #0  0x0000561e12938ff9 n/a (multipathd)                                                                                   
                                                  #1  0x0000561e129397e3 n/a (multipathd)                                                                                   
                                                  #1  0x00007f407362ae33 uevent_listen (libmultipath.so.0)                                                                  
                                                  #2  0x0000561e12935205 n/a (multipathd)                                                                                   
                                                  #1  0x0000561e12933d45 n/a (multipathd)                                                                                   
                                                  #3  0x0000561e1293463e n/a (multipathd)                                                                                   
                                                  #1  0x0000561e1294005f n/a (multipathd)                                                                                   
                                                  #2  0x0000561e12940a0a n/a (multipathd)                                                                                   
                                                  #1  0x0000561e1293a6d9 n/a (multipathd)                                                                                   
                                                  #2  0x0000561e129357d4 n/a (multipathd)                                                                                   
                                                  #1  0x00007f407362a03c uevent_dispatch (libmultipath.so.0)                                                                
                                                  #2  0x0000561e1293525c n/a (multipathd)                                                                                   
                                          installonlypkgs = [kernel, kernel-PAE, installonlypkg(kernel), installonlypkg(kernel-module), installonlypkg(vm), multiversion(kernel)]
                                          multilib_policy = best
Feb 07 00:29:34 localhost anaconda[1657]: program: Running [3] mpathconf --find_multipaths y --user_friendly_names y --with_multipathd y ...

Comment 9 Chris Murphy 2019-02-07 00:44:10 UTC
s/my cat of the file below comes after/my cat of multipath.conf above (comment 7) comes after

Comment 10 Ben Marzinski 2019-02-07 18:10:00 UTC
So, would you mind deleting /etc/multipath.conf and running

# mpathconf --enable -find_multipaths y --user_friendly_names y --with_multipathd y

This should create an /etc/multipath.conf file that looks like this:

**************

# device-mapper-multipath configuration file

# For a complete list of the default configuration values, run either:
# # multipath -t
# or
# # multipathd show config

# For a list of configuration options with descriptions, see the
# multipath.conf man page.

defaults {
        user_friendly_names yes
        find_multipaths yes
}

blacklist_exceptions {
        property "(SCSI_IDENT_|ID_WWN)"
}

blacklist {
}

**************

That is the default mpathconf file.  This is what it uses if there is no existing file.  If multipath.conf already exists, it uses the existing file and just changes it to match the options requested.

This means that something (anconda) most likely wrote that /etc/multipath.conf file first, and then called mpathconf afterwards.
With the fixes I put it, this shouldn't crash anymore, but multipath should really be using the default config file.

Comment 11 Chris Murphy 2019-02-07 21:55:10 UTC
(In reply to Ben Marzinski from comment #10)
> So, would you mind deleting /etc/multipath.conf and running
> 
> # mpathconf --enable -find_multipaths y --user_friendly_names y
> --with_multipathd y
> 
> This should create an /etc/multipath.conf file that looks like this:

Confirmed.


> This means that something (anconda) most likely wrote that
> /etc/multipath.conf file first, and then called mpathconf afterwards.

a) there is already an /etc/multipath.conf on the netinstall (determined by loop mounting it, rather than booting it)

$ cat /mnt/loop2/etc/multipath.conf 
defaults {
        find_multipaths yes
        user_friendly_names yes
}
$

Note that this is not identical to the one found when booting this same image, which also has:

blacklist {
}


b) after installing a system with the above netinstall, the resulting system has no /etc/multipath.conf

c)
https://github.com/rhinstaller/anaconda/blob/master/docs/multipath.rst
https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/storage/fsset.py

I can't really parse this but I'm gonna guess this is what's appending the blacklist {} lines to the already included multipath.conf, and hence why there's also a multipath.conf.old. I also don't know what's creating the /etc/multipath.conf found on the netinstall. And I'm not sure how lives differ.

Comment 12 Chris Murphy 2019-02-12 02:59:35 UTC
This problem doesn't happen on Live media. The media itself doesn't have /etc/multipath.conf and it's also not created during boot or startup. The first instance is when anaconda is launched, and it appears correctly formed per comment 10.

Comment 13 František Zatloukal 2019-02-12 14:12:01 UTC
Discussed during the 2018-10-22 blocker review meeting: [1]

The decision to classify this bug as an AcceptedBlocker was made:

"we believe this crash will occur commonly enough to accept it as a Beta blocker per "All release-blocking images must boot in their supported configurations". It is believed fixed, but not yet fully confirmed, so we will accept it in case any issues remain"

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-10-22/f29-blocker-review.2018-10-22-16.00.log.txt

Comment 14 František Zatloukal 2019-02-12 14:12:40 UTC
Whoops, wrong link to the meeting log, the correct one is:

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2019-02-11/f30-blocker-review.2019-02-11-17.13.log.txt

Comment 15 Ben Marzinski 2019-02-14 17:54:22 UTC
(In reply to Chris Murphy from comment #11)
> (In reply to Ben Marzinski from comment #10)

> Note that this is not identical to the one found when booting this same
> image, which also has:
> 
> blacklist {
> }
> 

> 
> I can't really parse this but I'm gonna guess this is what's appending the
> blacklist {} lines to the already included multipath.conf, and hence why
> there's also a multipath.conf.old. I also don't know what's creating the
> /etc/multipath.conf found on the netinstall. And I'm not sure how lives
> differ.

Actually,
mpathconf --enable
will create that empty section when it runs.

While this multipath fix should keep this issue from happening with the netinstalls, the multipath.conf file that is included with the netinstall should get changed to match the mpathconf generated default file, unless there is some good reason that I don't understand for it to be different.

Comment 16 Ben Marzinski 2019-02-26 02:45:11 UTC
It looks like the latest netinstall isos have the updated device-mapper-multipath packages.  Would you be able to retest, and see if the updated packages have fixed the issue. Otherwise, can anyone point me at an older, broken netinstall iso, so that I can verify that I can see if failing on my test system.

Comment 17 Chris Murphy 2019-02-26 04:49:19 UTC
No crash with Fedora-Server-netinst-x86_64-30-20190224.n.0.iso, although once booted the /etc/multipath.conf file still looks like this:


defaults {
	find_multipaths yes
	user_friendly_names yes
}


blacklist {
}

Comment 18 Adam Williamson 2019-02-26 18:59:52 UTC
I'd say any questions about what the config file contains / should contain can be handled separately, this bug was for the crash. Any objection to closing it?

Comment 19 Ben Marzinski 2019-03-05 03:49:18 UTC
The multipath fix was added before f30 was ever branched from rawhide, so there should be no effected f30 releases.

Bug #1685363 is for changing the config file in the netinstall image.


Note You need to log in before you can comment on or make changes to this bug.