Created attachment 922160 [details]
Description of problem:
Systems configured to boot from SAN via iSCSI fail to boot after vdsm changed the multipath.conf originally created by anaconda. An updated kernel finally embeds the faulty multipath.conf into the initrd and the boot fails. Multipath is not able to enumerate the multipath disks.
Version-Release number of selected component (if applicable):
Linux test-host.eample.com 2.6.32-431.20.5.el6.x86_64 #1 SMP Fri Jul 25 08:34:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Steps to Reproduce:
1. Install system with root on SAN (iSCSI multipath)
2. Install vdsm (vdsm overwrites multipath.conf created by anaconda)
3. Update kernel (faulty multipath.conf is embedded into initrd)
Boot fails, multipath cannot enumerate the disks
Boot succeeds, multipath disks can be enumerated
It all boils down to the default value set for option getuid_callout
in multipath.conf. Vdsm sets it to the following:
getuid_callout "/sbin/scsi_id --whitelisted --replace-whitespace
According to the man-page and other docu the default value on EL6 is
"/lib/udev/scsi_id --whitelisted --device=/dev/%n". Although the binary
/sbin/scsi_id is a valid link to the target /lib/udev/scsi_id, the link
itself (/sbin/scsi_id) is _not_ included in the generated initrd. The
binary /lib/udev/scsi_id is indeed included and changing the default
config to use /lib/udev/scsi_id instead does make it all work again.
iscsi boot (after regenerating the initrd, as the multipath.conf is
embedded) is back to good and the previously logged device-mapper errors
are gone, too.
Patrick - thanks for the patch!
Nir, Patrick - who's submitting this to gerrit?
I will post Patrick patch to gerrit,
We need more than this patch - we need to upgrade existing multipath conf file created by vdsm versions without this fix.
So the final fix will have to be:
1. Fix the path to scsi_id
2. Bump multipath configuration version, so multipath.conf will be
upgraded with the correct path when upgrading vdsm.
Related to bug 1108711
Patrick, please check your patch in gerrit:
Sorry for the delay. The patch looks fine and it is already merged.
Thanks Nir for committing
Allon, any reason that this should not go into 3.5.1?
It's already been delayed THREE times (https://www.ovirt.org/OVirt_3.5.z_Release_Management) - at this point we want to decrease uncertainty, not increase it.
Is there a super-pressing need for this fix in 3.5.1?
(In reply to Allon Mureinik from comment #8)
> It's already been delayed THREE times
> (https://www.ovirt.org/OVirt_3.5.z_Release_Management) - at this point we
> want to decrease uncertainty, not increase it.
> Is there a super-pressing need for this fix in 3.5.1?
I don't see any urgency.
This is an automated message:
This bug should be fixed in oVirt 3.5.1 RC1, moving to QA
oVirt 3.5.1 has been released. If problems still persist, please make note of it in this bug report.