Bug 1593917 - iSCSI broken in installer due to module missing from initramfs(?)
Summary: iSCSI broken in installer due to module missing from initramfs(?)
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: lorax
Version: rawhide
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Brian Lane
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F29FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2018-06-21 20:10 UTC by Adam Williamson
Modified: 2018-06-25 21:30 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-25 19:46:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2018-06-21 20:10:00 UTC
Since the Fedora-Rawhide-20180616.n.0 compose, iSCSI install tests are failing. In the first three composes from that one onwards, anaconda crashed while trying to add an iSCSI target, with this traceback:

10:20:32,435 CRT exception: Traceback (most recent call last):

  File "/usr/lib64/python3.6/site-packages/pyanaconda/threading.py", line 286, in run
    threading.Thread.run(self)

  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)

  File "/usr/lib64/python3.6/site-packages/pyanaconda/ui/gui/spokes/advstorage/iscsi.py", line 214, in _discover
    r_password=credentials.rPassword)

  File "/usr/lib/python3.6/site-packages/blivet/iscsi.py", line 373, in discover
    raise IOError(_("iSCSI not available"))

OSError: iSCSI not available

In the most recent compose, the button was not shown in the installer UI at all. That case is rather odd, as AFAICS that will only happen if blivet.iscsi.available is false-y, and blivet.iscsi.available appears to be a property set to *always* return True.

So I've been concentrating more on the 'iSCSI not available' error, and I think I've found the proximate cause of that. blivet raises that error when its `has_iscsi()` function returns False, and in our case I believe that's happening here:

    if not os.access("/sys/module/iscsi_tcp", os.X_OK):
        return False

because if you boot an installer image from one of the affected composes and then switch to a console and look for /sys/module/iscsi_tcp...it's not there.

That path seems to exist if the module is loaded: if you run 'modprobe iscsi_tcp', it shows up. And indeed, if you boot a 20180615.n.0 installer image - the compose right before the bug - and check, iscsi_tcp *is* loaded on boot, and /sys/module/iscsi_tcp *does* exist.

Digging a bit deeper...the module exists *in the installer environment* in both composes. However, I figured out that it does not exist *in the installer initramfs* in both composes. It's in the 21080615.n.0 installer initramfs, but not the 20180616.n.0 installer initramfs. In fact there are several other modules missing too, the full list is:

be2iscsi
bnx2i
cxgb3i
cxgb4i
iscsi_tcp
libiscsi_tcp
qedi

all of those are in the installer initramfs for 20180615.n.0 but not 20180616.n.0.

There's an error in the journal on boot of 20180616.n.0+ that is not there in 20180615.n.0:

dracut-pre-udev[466]: modprobe: FATAL: Module iscsi_tcp not found in directory /lib/modules/4.18.0-0.rc0.git10.1.fc29.x86_64

which does rather suggest this is the basic problem here - dracut tries to load the module, but it can't because it's not there.

Now what I'm trying to figure out is...*why* these modules disappeared from the installer initramfs. The odd thing is that none of the most obvious suspects - dracut, anaconda itself, lorax - changed between 20180615.n.0 and 20180616.n.0. The only package that *did* change that looks at all relevant, that I can see, is the kernel itself, which went from kernel-4.18.0-0.rc0.git9.1.fc29 to kernel-4.18.0-0.rc0.git10.1.fc29 . But I can't immediately see how that could cause this. The module doesn't seem to have moved from core to extras, or anything. Still, given that it appears to be the only relative change, I'm tentatively assigning this to kernel while I try and dig into it further...just trying to remember how the installer initramfs is actually generated now.

Proposing as an F29 Final blocker: violates https://fedoraproject.org/wiki/Fedora_29_Final_Release_Criteria#network-attached-storage , "The installer must be able to detect (if possible) and install to supported network-attached storage devices."

Comment 1 Adam Williamson 2018-06-21 21:11:11 UTC
AHA. I think I've nailed this one down.

I believe what's going on is the dracut 95iscsi module is not really being run at all when lorax regenerates the initramfs during buildinstall - and I've got a plausible theory as to why.

95iscsi depends on the 'hostname' command:

# called by dracut
check() {
    local _rootdev
    # If our prerequisites are not met, fail anyways.
    require_binaries iscsistart hostname iscsi-iname || return 1

Now, let's take a look at the 20180615.n.0 and 20180616.n.0 buildinstall logs:

https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20180615.n.0/logs/x86_64/buildinstall-Everything.x86_64.log
https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20180616.n.0/logs/x86_64/buildinstall-Everything.x86_64.log

Note the count of packages installed into the environment we regenerate the initramfs images in: on 20180615.n.0 it's 772, on 20180616.n.0 it's 771. Something went missing! I hacked around with the logs a bit and found the missing package: it's...hostname.

*Why* is hostname missing? Turns out, it's down to a change in initscripts. That was another package that changed in the 20180616.n.0 compose, but at first I didn't think it could be relevant. Turns out it is, though. 9.83-1.fc29 - the version that landed in 20180616.n.0 - splits out the legacy 'network' service and its supporting bits into a subpackage, 'network-scripts'. With that change, initscripts no longer depends on hostname - the dependency is moved from initscripts itself to network-scripts.

Turns out nothing else in our buildinstall environment depends on hostname either, so it just...doesn't get pulled in. And I believe that, because of that, the dracut 95iscsi module fails, and that's the cause of the missing kernel modules in the initramfs.

I guess we either need to add a dependency somewhere, or just patch lorax to explicitly install hostname when building installers. Let's re-assign to lorax for now.

Comment 2 Adam Williamson 2018-06-21 21:26:37 UTC
https://github.com/weldr/lorax/pull/382

Comment 3 Adam Williamson 2018-06-25 19:43:27 UTC
So, turns out the change from "crash" to "button doesn't show up" was a real thing and a different bug: we have two bugs here. This one I'm pretty sure is fixed now, 'hostname' is showing up in composes again. But iSCSI install tests are still failing because of the *other* bug, which I've filed separately:

https://bugzilla.redhat.com/show_bug.cgi?id=1594946

Comment 4 Adam Williamson 2018-06-25 19:46:39 UTC
The fixed lorax has been sent out, so closing this.


Note You need to log in before you can comment on or make changes to this bug.