| Summary: | RHV-H installation fails when using multipath (on FCoE) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Kumar Mashalkar <kmashalk> | ||||||
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Lin Li <lilin> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 7.2 | CC: | agk, ankit, bmarzins, cshao, dfediuck, dguo, fdeutsch, gklein, gveitmic, heinzm, kmashalk, lilin, lsurette, lvm-team, mkalinin, msnitzer, prajnoha, pstehlik, rbarry, srevivo, ycui, ykaul, ylavi | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | 7.3 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-10-04 19:50:28 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
16:35:09,480 INFO program: mke2fs 1.42.9 (28-Dec-2013) 16:35:09,480 INFO program: /dev/sda2 is apparently in use by the system; will not make a filesystem here! 16:35:09,481 DEBUG program: Return code: 1 Can you please grab an sosreport? A screenshot of storage would also be helpful. Hello Ryan, Customer is using FCoE. Also Cu is able to install RHEL7.2 without any issue. Reassigning to Anaconda. Just reading the bug, wondering, if something was missed during the process of setup, and not followed those instructions, when using customer partition? https://bugzilla.redhat.com/show_bug.cgi?id=1359181#c0 (In reply to Marina from comment #13) > Just reading the bug, wondering, if something was missed during the process > of setup, and not followed those instructions, when using customer partition? > https://bugzilla.redhat.com/show_bug.cgi?id=1359181#c0 It's possible. However, the problem seems more fundamental. I reassigned to Anaconda because this error is lower than anything RHV specific (bug #1359181 would result in failures during %post from imgbased -- this failure is before we even hit the RHV installclass) No matter what partitions we create, Manual or Automatic, it gives the error : "No valid boot loader target device found. See below for details. You must include at least one MBR- or GPT-formatted disk as an install target" Also if same server with same storage was booted with RHEL ISO, It was able to do the partitioning and proceed with the installation. During RHV-H Installation, we observed that FCoE storage was showing 4 Local disks in addition to the Multipath one. The storage space was calculated as 5x40=200Gb where the storage was of only 40Gb. During RHEL Installation, We could only see one disk i.e. FCoE and was showing proper 40Gb space. May be we can rule out the doubt of the supported partition and supported storage disk used with above observation. Kumar -
Unfortunately, I don't have a FC test environment.
However, it would be helpful if you could grab the anaconda logs from RHEL. RHV-H ships with the same base anaconda version as 7.2, but it looks like there may be a race or confusion.
storage.log appropriately shows that sd[a-d] are being grabbed as multipath:
23:31:10,622 DEBUG blivet: DeviceTree.addUdevMultiPathDevice: name: 3600a09803830357a782b472f6a43476d ;
23:31:10,624 DEBUG blivet: DeviceTree.getDeviceByName: hidden: False ; name: sda ; incomplete: False ;
23:31:10,626 DEBUG blivet: DeviceTree.getDeviceByName returned sda
23:31:10,628 DEBUG blivet: DeviceTree.getDeviceByName: hidden: False ; name: sdb ; incomplete: False ;
23:31:10,629 DEBUG blivet: DeviceTree.getDeviceByName returned sdb
23:31:10,632 DEBUG blivet: DeviceTree.getDeviceByName: hidden: False ; name: sdc ; incomplete: False ;
23:31:10,633 DEBUG blivet: DeviceTree.getDeviceByName returned sdc
23:31:10,635 DEBUG blivet: DeviceTree.getDeviceByName: hidden: False ; name: sdd ; incomplete: False ;
23:31:10,637 DEBUG blivet: DeviceTree.getDeviceByName returned sdd
23:31:10,640 DEBUG blivet: DiskDevice.addChild: kids: 0 ; name: sda ;
23:31:10,642 DEBUG blivet: DiskDevice.addChild: kids: 0 ; name: sdb ;
23:31:10,644 DEBUG blivet: DiskDevice.addChild: kids: 0 ; name: sdc ;
23:31:10,646 DEBUG blivet: DiskDevice.addChild: kids: 0 ; name: sdd ;
23:31:10,647 DEBUG blivet: getFormat('None') returning DeviceFormat instance with object id 67
But they are later shown as available and selected:
16:35:08,615 DEBUG blivet: action: [148] create format lvmpv on partition sda3 (id 145)
16:35:08,615 DEBUG blivet: action: [153] create device partition sdb2 (id 151)
16:35:08,615 DEBUG blivet: action: [154] create format lvmpv on partition sdb2 (id 151)
16:35:08,615 DEBUG blivet: action: [159] create device partition sdc2 (id 157)
16:35:08,615 DEBUG blivet: action: [160] create format lvmpv on partition sdc2 (id 157)
16:35:08,616 DEBUG blivet: action: [165] create device partition sdd2 (id 163)
16:35:08,616 DEBUG blivet: action: [166] create format lvmpv on partition sdd2 (id 163)
16:35:08,616 DEBUG blivet: action: [171] create device partition sda2 (id 169)
16:35:08,616 DEBUG blivet: action: [172] create format ext4 filesystem mounted at /boot on partition sda2 (id 169)
16:35:08,616 DEBUG blivet: action: [177] create device lvmvg rhvh (id 174)
16:35:08,617 DEBUG blivet: action: [181] create device lvmthinpool rhvh-pool00 (id 178)
16:35:08,617 DEBUG blivet: action: [185] create device lvmthinlv rhvh-root (id 183)
16:35:08,617 DEBUG blivet: action: [186] create format xfs filesystem mounted at / on lvmthinlv rhvh-root (id 183)
16:35:08,617 DEBUG blivet: action: [191] create device lvmthinlv rhvh-var (id 189)
16:35:08,617 DEBUG blivet: action: [192] create format xfs filesystem mounted at /var on lvmthinlv rhvh-var (id 189)
16:35:08,618 DEBUG blivet: action: [197] create device lvmlv rhvh-swap (id 195)
16:35:08,618 DEBUG blivet: action: [198] create format swap on lvmlv rhvh-swap (id 195)
The screenshot attached earlier shows a multipathed device at 40GB, with 4 other LUNs available (which are not part of the same multipath group, judging from the logs).
The storage information in the screenshots shows a different set of LUNs (available as dm-0).
It would be *extremely* helpful to get logs and screenshots from the same installation. Including a screenshot of the selected partitioning, in order to make the logs/output/screenshots easier to connect.
Can you please try the workaround in bug #1370414#c28
Also, let's make sure that the hook for FCoE for RHV-H is setup correctly. https://bugzilla.redhat.com/show_bug.cgi?id=1370030 (In reply to Marina from comment #18) > Also, let's make sure that the hook for FCoE for RHV-H is setup correctly. > https://bugzilla.redhat.com/show_bug.cgi?id=1370030 FYI -- FCoE (or any vdsm hooks) won't apply during anaconda. We follow platform here. No RHV code runs in the installer, other than the installclass (which sets autopartitioning defaults), and the %post script which initializes imgbased. Kumar - Can you provide the logs requested (and try the workaround) in comment #16? Created attachment 1200987 [details]
Logs requested
(In reply to Kumar from comment #24) > Created attachment 1200987 [details] > Logs requested I'll let the anaconda team grab this, but this appears to be a different error entirely. Rather than a stage1 failure because sda2 (a multipath member) was being addressed individually, the failure here is in creating LVM objects on sdg (which appears to be a LUN, but not one which is multipathed). Can you try unselecting the bare LUNs (sd[d-g])? This:
> 23:31:08,633 INFO program: Running... multipath -c /dev/sda
> 23:31:08,646 INFO program: /dev/sda is not a valid multipath device path
> 23:31:08,647 DEBUG program: Return code: 1
means that multipath did not recognize a block device as a multipath path. There are two reasons why this could be true:
1. The device is blacklisted in /etc/multipath.conf
2. The device wwid is not in /etc/multipath/wwids
Most likely #2 is the issue here. Multipathd should add the wwid to /etc/multipath/wwids when it creates the multipath device. If a multipath device has already been created with the path, and "multipath -c" says that it's not a valid path, then there seems to be a problem in multipath. Looking at:
23:31:08,491 INFO blivet: devices to scan: [u'sda', u'sdb', u'sdc', u'sdd', u'sr0', u'sde', u'sdf', u'sr1
', u'sdg', u'loop0', u'loop1', u'loop2', u'3600a09803830357a782b472f6a43476d', u'3600a09803830357a782b472f6a43476d1', u'live-rw', u'live-base']
It certainly seems that the multipath device has already been created when that "multipath -c" call is run. If that's the case, then probably the best answer would be for me to provide patched multipath packages that printed more information when "multipath -c failed" to check and see if the wwid really isn't in the wwids file at the time of the check. What's the easiest way for you to integrate a new package into your anaconda run? Do you want me to give you an updates.img, or are just the new rpms fine?
Could you also verify that after you see this error, /etc/multipath/wwids exists and contains the line
/3600a09803830357a782b472f6a43476d/
which is the wwid listed for the device. Also, the images show a device named 3600a09803830357a782b472f6a43476e, ending in "e" not "d". However these are from different dates than the logs, so I assume that there isn't a weird name mismatch. At any rate, the wwid in /etc/multipath/wwids should match the wwid of the existing multipath device (which is also it's name when user_friendly_names is disabled).
(In reply to Ryan Barry from comment #25) > (In reply to Kumar from comment #24) > > Created attachment 1200987 [details] > > Logs requested > > I'll let the anaconda team grab this, but this appears to be a different > error entirely. > > Rather than a stage1 failure because sda2 (a multipath member) was being > addressed individually, the failure here is in creating LVM objects on sdg > (which appears to be a LUN, but not one which is multipathed). > > Can you try unselecting the bare LUNs (sd[d-g])? Hello Ryan, We have tried deselecting the bare LUN's but it says the multipath disk cannot be used partially as it is part of bare LUN's too. So to do any partitioning we have to select all 5 disks shown in the screenshot. Ben, an updates.img would be helpful here. Kumar, can you please take alook at the question in comment 26 and let us know if you retry in teh customer env with a updates.img package? Created attachment 1205767 [details]
anaconda updates image with debugging multipath code
This should update the multipath binary used during installation to print more messages when it finds an invalid path, so that we can determine why the path was declared invalid. This will hopefully give some clues as to what is going wrong here.
Hello Ben/Fabian, What are the files do you need for analyzing after booting with anaconda updates.img? Anything else than /etc/multipath/wwids ? The updates.img isn't necessary for checking that /etc/multipath/wwids exists and has the correct wwid in it after the failure, that can be done on any system that can reproduce this. The information from the updates.img binaries should show up as messages in program.log, but it would probably be most useful just to collect all the anaconda logs when this reproduces with the updates.img included, like in Comment 24 Hello, Good news. Using the latest ISO : RHVH-4.0-20160919.1-RHVH-x86_64-dvd1.iso customer was able to overcome the issue. Cu confirmed to close the case, we might consider closing the Bugzilla too. |
Created attachment 1199285 [details] Logs during installtion. Description of problem: The problem occurs when installing RHEV-H 4 on a Cisco UCS blade server with SAN attached storage. The storage is presented as 4 paths in the installer and the installer seems to be correctly combining them with multipath to one device. But when we attempt to configure storage during the install we get "No valid boot loader target device found. See below for details. You must include at least one MBR- or GPT-formatted disk as an install target." Version-Release number of selected component (if applicable): RHVH-4 How reproducible: Every time in customer environment. Steps to Reproduce: 1. Start Installation with RHVH 4.0 Installation media. 2. Create manual partitions. 3. Click done. Actual results: Error displayed: "No valid boot loader target device found. See below for details. You must include at least one MBR- or GPT-formatted disk as an install target." Expected results: Installations should proceed. Additional info: Errors observed in anaconda logs: +++ raise FormatCreateError("format failed: %s" % ret, self.device) FormatCreateError: ('format failed: 1', '/dev/sda2') +++ Attaching log files of the installation.