Summary: | segfault during boot of netinstall image | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Chris Murphy <bugzilla> | ||||||
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | agk, awilliam, bmarzins, bugzilla, cfeist, fzatlouk, heinzm, kzak, lvm-team, mcsontos, msnitzer, prajnoha, prockai, robatino | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | AcceptedBlocker | ||||||||
Fixed In Version: | device-mapper-multipath-0.7.9-5.git2df6110 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-03-05 03:49:18 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1574713 | ||||||||
Attachments: |
|
Description
Chris Murphy
2019-02-05 19:45:04 UTC
Created attachment 1527270 [details]
journal
Created attachment 1527271 [details]
coredumpctl info
The crash happens in a guest VM. Host is Fedora 29, using qemu-kvm. Proposed as a Blocker for 30-beta by Fedora user chrismurphy using the blocker tracking app because: Anaconda makes use of dm-multipath in some use cases. But if dm-multipathd flat out crashes during boot, I'm gonna go with: "Bug hinders execution of required Beta test plans or dramatically reduces test coverage" I think we need it executing successfully for the beta release, in order to discover more specific bugs during beta GA, which then can be fixed up for final. In check_path() at multipathd/main.c:3308, multipathd is calling LOG_MSG() which dereferences pp->mpp, which is not set in this case. I'll have this fixed shortly. This should be fixed now. However, this shouldn't have happened if you used the default /etc/multipath.conf created by mpathconf. That file should include this section blacklist_exceptions { property "(SCSI_IDENT_|ID_WWN)" } That should keep multipath from ever trying to multipath zram devices. Removing this section is sometimes necessary if you would like to do something weird, like run multipath on a USB memory stick, in order to use its queue_if_no_path ability. However, in general, this section keeps multipathd from trying to create multipaths a number of devices that it has no business multipathing. I have added zram devices to the builtin blacklist, as well as fixing the possibility of a NULL pointer dereference, but in general, unless you have a good reason, you shouldn't remove that above section from the default /etc/multipath.conf file. I'm just booting the netinstaller as-is. I'm not modifying or removing anything. [anaconda root@localhost ~]# ls -l /etc/multipath multipath/ multipath.conf multipath.conf.old [anaconda root@localhost ~]# cat /etc/multipath.conf defaults { find_multipaths yes user_friendly_names yes } blacklist { } [anaconda root@localhost ~]# cat /etc/multipath.conf.old defaults { find_multipaths yes user_friendly_names yes } [anaconda root@localhost ~]# a) mpathconf is clearly not setting the multipath.conf file as you're saying it should; my cat of the file below comes after the mpathconf command shown is called. b) mpathconf is called by anaconda after multipathd crashes so any change to the conf file couldn't have prevented the crash. Feb 07 00:29:31 localhost systemd[1]: multipathd.service: Main process exited, code=killed, status=11/SEGV Feb 07 00:29:31 localhost systemd[1]: multipathd.service: Failed with result 'signal'. Feb 07 00:29:31 localhost audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:kernel_t:s0 msg='unit=multipathd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Feb 07 00:29:32 localhost systemd-coredump[1867]: Process 1431 (multipathd) of user 0 dumped core. #0 0x0000561e12938ff9 n/a (multipathd) #1 0x0000561e129397e3 n/a (multipathd) #1 0x00007f407362ae33 uevent_listen (libmultipath.so.0) #2 0x0000561e12935205 n/a (multipathd) #1 0x0000561e12933d45 n/a (multipathd) #3 0x0000561e1293463e n/a (multipathd) #1 0x0000561e1294005f n/a (multipathd) #2 0x0000561e12940a0a n/a (multipathd) #1 0x0000561e1293a6d9 n/a (multipathd) #2 0x0000561e129357d4 n/a (multipathd) #1 0x00007f407362a03c uevent_dispatch (libmultipath.so.0) #2 0x0000561e1293525c n/a (multipathd) installonlypkgs = [kernel, kernel-PAE, installonlypkg(kernel), installonlypkg(kernel-module), installonlypkg(vm), multiversion(kernel)] multilib_policy = best Feb 07 00:29:34 localhost anaconda[1657]: program: Running [3] mpathconf --find_multipaths y --user_friendly_names y --with_multipathd y ... s/my cat of the file below comes after/my cat of multipath.conf above (comment 7) comes after So, would you mind deleting /etc/multipath.conf and running # mpathconf --enable -find_multipaths y --user_friendly_names y --with_multipathd y This should create an /etc/multipath.conf file that looks like this: ************** # device-mapper-multipath configuration file # For a complete list of the default configuration values, run either: # # multipath -t # or # # multipathd show config # For a list of configuration options with descriptions, see the # multipath.conf man page. defaults { user_friendly_names yes find_multipaths yes } blacklist_exceptions { property "(SCSI_IDENT_|ID_WWN)" } blacklist { } ************** That is the default mpathconf file. This is what it uses if there is no existing file. If multipath.conf already exists, it uses the existing file and just changes it to match the options requested. This means that something (anconda) most likely wrote that /etc/multipath.conf file first, and then called mpathconf afterwards. With the fixes I put it, this shouldn't crash anymore, but multipath should really be using the default config file. (In reply to Ben Marzinski from comment #10) > So, would you mind deleting /etc/multipath.conf and running > > # mpathconf --enable -find_multipaths y --user_friendly_names y > --with_multipathd y > > This should create an /etc/multipath.conf file that looks like this: Confirmed. > This means that something (anconda) most likely wrote that > /etc/multipath.conf file first, and then called mpathconf afterwards. a) there is already an /etc/multipath.conf on the netinstall (determined by loop mounting it, rather than booting it) $ cat /mnt/loop2/etc/multipath.conf defaults { find_multipaths yes user_friendly_names yes } $ Note that this is not identical to the one found when booting this same image, which also has: blacklist { } b) after installing a system with the above netinstall, the resulting system has no /etc/multipath.conf c) https://github.com/rhinstaller/anaconda/blob/master/docs/multipath.rst https://github.com/rhinstaller/anaconda/blob/master/pyanaconda/storage/fsset.py I can't really parse this but I'm gonna guess this is what's appending the blacklist {} lines to the already included multipath.conf, and hence why there's also a multipath.conf.old. I also don't know what's creating the /etc/multipath.conf found on the netinstall. And I'm not sure how lives differ. This problem doesn't happen on Live media. The media itself doesn't have /etc/multipath.conf and it's also not created during boot or startup. The first instance is when anaconda is launched, and it appears correctly formed per comment 10. Discussed during the 2018-10-22 blocker review meeting: [1] The decision to classify this bug as an AcceptedBlocker was made: "we believe this crash will occur commonly enough to accept it as a Beta blocker per "All release-blocking images must boot in their supported configurations". It is believed fixed, but not yet fully confirmed, so we will accept it in case any issues remain" [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-10-22/f29-blocker-review.2018-10-22-16.00.log.txt Whoops, wrong link to the meeting log, the correct one is: [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2019-02-11/f30-blocker-review.2019-02-11-17.13.log.txt (In reply to Chris Murphy from comment #11) > (In reply to Ben Marzinski from comment #10) > Note that this is not identical to the one found when booting this same > image, which also has: > > blacklist { > } > > > I can't really parse this but I'm gonna guess this is what's appending the > blacklist {} lines to the already included multipath.conf, and hence why > there's also a multipath.conf.old. I also don't know what's creating the > /etc/multipath.conf found on the netinstall. And I'm not sure how lives > differ. Actually, mpathconf --enable will create that empty section when it runs. While this multipath fix should keep this issue from happening with the netinstalls, the multipath.conf file that is included with the netinstall should get changed to match the mpathconf generated default file, unless there is some good reason that I don't understand for it to be different. It looks like the latest netinstall isos have the updated device-mapper-multipath packages. Would you be able to retest, and see if the updated packages have fixed the issue. Otherwise, can anyone point me at an older, broken netinstall iso, so that I can verify that I can see if failing on my test system. No crash with Fedora-Server-netinst-x86_64-30-20190224.n.0.iso, although once booted the /etc/multipath.conf file still looks like this: defaults { find_multipaths yes user_friendly_names yes } blacklist { } I'd say any questions about what the config file contains / should contain can be handled separately, this bug was for the crash. Any objection to closing it? The multipath fix was added before f30 was ever branched from rawhide, so there should be no effected f30 releases. Bug #1685363 is for changing the config file in the netinstall image. |