Bug 1666920 - iSCSI installs fail to boot since systemd-240 (Fedora-Rawhide-20181222.n.1)
Summary: iSCSI installs fail to boot since systemd-240 (Fedora-Rawhide-20181222.n.1)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: x86_64
OS: All
unspecified
high
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa AcceptedBlocker
Depends On:
Blocks: F30FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2019-01-16 22:55 UTC by Adam Williamson
Modified: 2019-04-14 00:01 UTC (History)
12 users (show)

Fixed In Version: systemd-241-6.gita2eaa1c.fc30
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-14 00:01:56 UTC


Attachments (Terms of Use)
systemd debug logs (85.92 KB, text/plain)
2019-01-17 04:17 UTC, Adam Williamson
no flags Details
systemd debug + journal (94.97 KB, text/plain)
2019-01-17 04:53 UTC, Adam Williamson
no flags Details
equivalent log from successful f29 test (1.60 MB, text/plain)
2019-04-11 23:15 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2019-01-16 22:55:19 UTC
Since the Fedora-Rawhide-20181222.n.1 , the openQA iSCSI install test has been failing. This test involves two VMs; one acts as a server and sets up an iSCSI target, the other runs an install using anaconda's iSCSI support to place the / filesystem on that iSCSI target.

The install proceeds just fine, but the installed system does not boot. By default it gets stuck on the splash screen. Between 20181219.n.0 (the previous compose, where the test passed) and 20181222.n.0 (the first compose where it failed), systemd is the most obvious suspect in the list of changed packages. iscsi-initiator-utils did not change, nor did the kernel, or anaconda, or dracut, or anything else that would be an obvious suspect. systemd went from 239-10.git3bf819c to 240-2.fc30 in that compose.

I'm currently running a test with a tweak to disable the splash screen; I'll see if that provides any useful indication of the bug.

Comment 1 Adam Williamson 2019-01-16 22:56:10 UTC
Proposing as a Final blocker: "The installer must be able to detect (if possible) and install to supported network-attached storage devices...Supported network-attached storage types include iSCSI, Fibre Channel and Fibre Channel over Ethernet (FCoE)." - https://fedoraproject.org/wiki/Fedora_30_Final_Release_Criteria#Network_attached_storage

Comment 2 Adam Williamson 2019-01-17 04:17:18 UTC
Created attachment 1521184 [details]
systemd debug logs

I tweaked the test to do systemd debug logging to a serial console, and got the logs out that way; here they are.

Comment 3 Adam Williamson 2019-01-17 04:53:33 UTC
Created attachment 1521186 [details]
systemd debug + journal

Updated logs with journal messages also...

Comment 4 František Zatloukal 2019-02-12 14:28:01 UTC
Discussed during the 2019-02-11 blocker review meeting: [1]

The decision to classify this bug as an AcceptedBlocker was made:

"accepted as a violation of "The installer must be able to detect (if possible) and install to supported network-attached storage devices...Supported network-attached storage types include iSCSI, Fibre Channel and Fibre Channel over Ethernet (FCoE).""

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2019-02-11/f30-blocker-review.2019-02-11-17.13.log.txt

Comment 5 Adam Williamson 2019-03-27 18:22:16 UTC
Ping? systemd folks, any news here? This is still broken.

Comment 6 Yu Watanabe 2019-04-09 18:38:22 UTC
Could you test systemd-241-5.git3d835d0 ?

Comment 7 Adam Williamson 2019-04-10 17:46:31 UTC
Unfortunately -5 doesn't seem to help. Fails the same way. Do you need updated logs?

Comment 8 Zbigniew Jędrzejewski-Szmek 2019-04-10 22:08:56 UTC
iscsid.service: Installed new job iscsid.service/start as 35
iscsiuio.service: Installed new job iscsiuio.service/start as 34
iscsiuio.socket: Enqueued job iscsiuio.socket/start as 39
...
and nothing interesting seems to happen after that. I don't see any failure from
systemd or other indication that systemd is at fault. I'll reassign this to
iscsi, maybe they can diagnose what is wrong. Please reassign back if it turns
out systemd is doing something wrong.

Comment 9 Adam Williamson 2019-04-11 22:08:00 UTC
I explained why this is assigned to systemd when I filed it:

"Between 20181219.n.0 (the previous compose, where the test passed) and 20181222.n.1 (the first compose where it failed), systemd is the most obvious suspect in the list of changed packages. iscsi-initiator-utils did not change, nor did the kernel, or anaconda, or dracut, or anything else that would be an obvious suspect. systemd went from 239-10.git3bf819c to 240-2.fc30 in that compose."

You can see the compose report mail for 20181222.n.1 here:

https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org/thread/V6NPWEYRREYQN3A6HDZ2Z7EXYPYRPB25/

as you can see, iscsi-initiator-utils did not change at all.

Comment 10 Adam Williamson 2019-04-11 23:15:15 UTC
Created attachment 1554664 [details]
equivalent log from successful f29 test

I re-ran the same test on F29 Final to compare the log output. Here is the log, logged the same way with systemd debugging enabled. It looks quite different...I don't see any log that these units are 'enqueued' at all in this case, but they clearly actually *run*, unlike what happens now.

Note there should be no significant difference between the iscsi-initiator-utils builds in F29 Final and current F30 / Rawhide. This is all that's happened to the package in that time:

* Tue Feb 12 2019 Igor Gnatenko <ignatenkobrain@fedoraproject.org> - 6.2.0.876-8.gitf3c8e90
- Remove obsolete scriptlets 
* Fri Feb 01 2019 Fedora Release Engineering <releng@fedoraproject.org> - 6.2.0.876-7.gitf3c8e90
- Rebuilt for https://fedoraproject.org/wiki/Fedora_30_Mass_Rebuild
* Thu Jan 10 2019 Miro Hrončok <mhroncok@redhat.com> - 6.2.0.876-6.gitf3c8e90
- Disable python2 build

The "obsolete scriptlets" removed were just calls to /sbin/ldconfig in %post, nothing to do with this bug.

Comment 11 Adam Williamson 2019-04-12 06:48:26 UTC
Hmm, so I've been digging into this some more. I'm not completely done yet and will have to finish up tomorrow, but it seems it may indeed not be systemd after all. I tested with various systemd git snapshots between v239 and v240, rebuilt with an epoch bump, and they *all* seem to fail the same way. Even just v239 itself (with a couple of necessary patches to make the build succeed) fails in what looks like the same way.

Right now I'm trying a rebuild of the actual 239-10 with the epoch bump, and if that fails, I'll see if I can do a test with a downgrade to the actual original 239-10 (though that may not be possible due to deps). If all of these still don't make it work again, we may have to look again at other stuff that changed in the 20181222.n.1 compose...

Comment 12 Zbigniew Jędrzejewski-Szmek 2019-04-12 15:06:22 UTC
I think this is fixed by
https://github.com/systemd/systemd/commit/30fdb8962a.
This patch is already in rawhide, but not in F30.

Comment 13 Adam Williamson 2019-04-12 16:30:03 UTC
So, turns out I was missing something important in my blunt force bisecting: the bug occurs in initramfs stage. I was changing the systemd package in the installed system root, but not running dracut after doing that, so the initramfs still had the systemd from the initial install. So basically all my bisect attempts really just used the current stable systemd (241-4). Sorry about that :/

This also affected my attempt to test 241-5, that test was invalid. I will redo that.

But I did just *correctly* test a build of 241-5 with the commit from #c12 applied, and that works! So we can say that *either* the changes in -5 *or* that commit fixes this, at least.

Comment 14 Adam Williamson 2019-04-12 18:13:37 UTC
OK, and I just tested correctly and -5 does not work. So it is definitely the commit Zbigniew mentioned in #c12 that fixes this. Thanks!

Comment 15 Fedora Update System 2019-04-12 19:11:31 UTC
systemd-241-6.gita2eaa1c.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-754128629a

Comment 16 Fedora Update System 2019-04-13 02:00:20 UTC
systemd-241-6.gita2eaa1c.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-754128629a

Comment 17 Fedora Update System 2019-04-14 00:01:56 UTC
systemd-241-6.gita2eaa1c.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.