Bug 2017787

Summary: iSCSI installation stuck on boot: iscsid.service depends on a volume that's itself on iSCSI
Product: Red Hat Enterprise Linux 9 Reporter: Lubomir Rintel <lrintel>
Component: dracutAssignee: Lukáš Nykrýn <lnykryn>
Status: CLOSED ERRATA QA Contact: Vojtech Eichler <veichler>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.0CC: dracut-maint-list, drusek, dtardon, pvalena, veichler
Target Milestone: rcKeywords: Regression, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: dracut-055-30.git20220216.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-17 16:04:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lubomir Rintel 2021-10-27 12:33:22 UTC
Description of problem:

A default installation of RHEL-9 to iSCSI will succeed, but the installed system will deadlock on boot.

I suppose this happens because iscsid.service has a dependency on hibernation image resume from swap volume, which itself is on iSCSI.

Version-Release number of selected component (if applicable):

dracut-055-10.git20210824.el9.x86_64
iscsi-initiator-utils-6.2.1.4-1.git2a8f9d8.el9.x86_64
systemd-249-4.el9.x86_64

How reproducible:

Always.

Steps to Reproduce:
1. Do a default installation to iSCSI disk
2. Reboot

Actual results:

> [    0.000000] Linux version 5.14.0-1.7.1.el9.x86_64 (mockbuild.eng.bos.redhat.com) (gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-2), GNU ld version 2.35.2-9.el9) #1 SMP Mon Oct 25 15:21:23 EDT 2021
> [    0.000000] Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-1.7.1.el9.x86_64 root=/dev/mapper/rhel-root ro resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.iscsi.firmware rd.lvm.lv=rhel/swap rd.break=initqueue console=ttyS0
> ...
> [    0.016204] Kernel command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-1.7.1.el9.x86_64 root=/dev/mapper/rhel-root ro resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.iscsi.firmware rd.lvm.lv=rhel/swap rd.break=initqueue console=ttyS0

Note the resume=/dev/mapper/rhel-swap.
It was set up by Anaconda.

> ...
> [    1.573510] Run /init as init process
> [    1.579894] systemd[1]: systemd 249-7.el9_b running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
> [    1.582867] systemd[1]: Detected virtualization kvm.
> [    1.583424] systemd[1]: Detected architecture x86-64.
> [    1.583880] systemd[1]: Running in initial RAM disk.
> 
> Welcome to Red Hat Enterprise Linux 9.0 Beta (Plow) dracut-055-10.git20210824.el9 (Initramfs)!
> 
> [    1.585269] systemd[1]: No hostname configured, using default hostname.
> [    1.586232] systemd[1]: Hostname set to <localhost>.
> [    1.639537] systemd[1]: Queued start job for default target Initrd Default Target.
> [    1.641174] systemd[1]: Created slice Slice /system/systemd-hibernate-resume.
> [  OK  ] Created slice Slice /system/systemd-hibernate-resume.
> [    1.643228] systemd[1]: Reached target Initrd /usr File System.
> [  OK  ] Reached target Initrd /usr File System.
> [    1.644728] systemd[1]: Reached target Slice Units.
> [  OK  ] Reached target Slice Units.
> [    1.645965] systemd[1]: Reached target Swaps.
> [  OK  ] Reached target Swaps.
> [    1.647080] systemd[1]: Reached target Timer Units.
> [  OK  ] Reached target Timer Units.
> [    1.648474] systemd[1]: Listening on D-Bus System Message Bus Socket.
> [  OK  ] Listening on D-Bus System Message Bus Socket.
> [    1.650106] systemd[1]: Listening on Open-iSCSI iscsid Socket.
> [  OK  ] Listening on Open-iSCSI iscsid Socket.

iscsid.socket is active, but won't respond since iscsi.service startup is stuck.

> [    1.651588] systemd[1]: Listening on Open-iSCSI iscsiuio Socket.
> [  OK  ] Listening on Open-iSCSI iscsiuio Socket.
> ...
>          Starting Dracut Emergency Shell...
> [    3.237647] fbcon: qxl (fb0) is primary device
> [    3.240607] Console: switching to colour frame buffer device 128x48
> [    3.248954] qxl 0000:00:01.0: [drm] fb0: qxl frame buffer device
> Warning: /dev/mapper/rhel-root does not exist
> Warning: /dev/rhel/root does not exist
> Warning: /dev/rhel/swap does not exist
> 
> Generating "/run/initramfs/rdsosreport.txt"
> 
> 
> Entering emergency mode. Exit the shell to continue.
> Type "journalctl" to view system logs.
> You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
> after mounting them and attach it to a bug report.
> 
> 
> Press Enter for maintenance
> (or press Control-D to continue): [    3.479017] ata3: SATA link down (SStatus 0 SControl 300)
> [    3.479793] ata6: SATA link down (SStatus 0 SControl 300)
> [    3.480620] ata2: SATA link down (SStatus 0 SControl 300)
> [    3.481367] ata4: SATA link down (SStatus 0 SControl 300)
> [    3.482222] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [    3.483258] ata5: SATA link down (SStatus 0 SControl 300)
> [    3.483998] ata1.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
> [    3.484615] ata1.00: applying bridge limits
> [    3.485196] ata1.00: configured for UDMA/100
> [    3.485975] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
> [    3.493604] scsi 1:0:0:0: Attached scsi generic sg0 type 5
> [    3.500216] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
> [    3.501120] cdrom: Uniform CD-ROM driver Revision: 3.20
> [    4.208557] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
> <info>  [1635331346.0184] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
> 
> sh-5.1# 
> sh-5.1# iscsiadm -m fw -l
> Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.f.x8664:sn.8ed7544cb95d, portal: 172.31.46.1,3260]

It just hangs now, attempting to talk to iscsid.socket...

> 
> iscsiadm: caught SIGINT, exiting...

But why:

> sh-5.1# systemctl list-dependencies --all iscsid.service
> ○ ├─iscsi-init.service
> ● │ ├─system.slice
> ● │ │ └─-.slice
> ○ │ └─sysinit.target
> ● │   ├─kmod-static-nodes.service
> ● │   │ └─system.slice
> ● │   │   └─-.slice
> ● │   ├─plymouth-start.service
> ● │   │ ├─system.slice
> ● │   │ │ └─-.slice
> ● │   │ ├─systemd-ask-password-plymouth.path
> ● │   │ └─systemd-vconsole-setup.service
> ● │   │   └─system.slice
> ● │   │     └─-.slice
> ● │   ├─systemd-ask-password-console.path
> ○ │   ├─systemd-hibernate-resume@dev-mapper-rhel\x2dswap.service
> ○ │   │ ├─dev-mapper-rhel\x2dswap.device

This is no good -- /dev/mapper/rhel-swap would be accessible via iSCSI.

> ● │   │ ├─system-systemd\x2dhibernate\x2dresume.slice
> ● │   │ │ └─system.slice
> ● │   │ │   └─-.slice
> ○ │   │ └─local-fs-pre.target
> ● │   ├─systemd-journald.service
> ● │   │ ├─system.slice
> ● │   │ │ └─-.slice
> ● │   │ ├─systemd-journald-audit.socket
> ● │   │ │ └─system.slice
> ● │   │ │   └─-.slice
> ● │   │ ├─systemd-journald-dev-log.socket
> ● │   │ │ └─system.slice
> ● │   │ │   └─-.slice
> ● │   │ └─systemd-journald.socket
> ● │   │   └─system.slice
> ● │   │     └─-.slice
> ● │   ├─systemd-modules-load.service
> ● │   │ └─system.slice
> ● │   │   └─-.slice
> ● │   ├─systemd-sysctl.service
> ● │   │ └─system.slice
> ● │   │   └─-.slice
> ○ │   ├─systemd-sysusers.service
> ● │   │ └─system.slice
> ● │   │   └─-.slice
> ● │   ├─systemd-tmpfiles-setup-dev.service
> ● │   │ └─system.slice
> ● │   │   └─-.slice
> ● │   ├─systemd-tmpfiles-setup.service
> ● │   │ └─system.slice
> ● │   │   └─-.slice
> ● │   ├─systemd-udev-trigger.service
> ● │   │ ├─system.slice
> ● │   │ │ └─-.slice
> ● │   │ └─systemd-udevd.service
> ● │   │   ├─system.slice
> ● │   │   │ └─-.slice
> ● │   │   ├─systemd-udevd-control.socket
> ● │   │   │ └─system.slice
> ● │   │   │   └─-.slice
> ● │   │   └─systemd-udevd-kernel.socket
> ● │   │     └─system.slice
> ● │   │       └─-.slice
> ● │   ├─systemd-udevd.service
> ● │   │ ├─system.slice
> ● │   │ │ └─-.slice
> ● │   │ ├─systemd-udevd-control.socket
> ● │   │ │ └─system.slice
> ● │   │ │   └─-.slice
> ● │   │ └─systemd-udevd-kernel.socket
> ● │   │   └─system.slice
> ● │   │     └─-.slice
> ○ │   ├─local-fs.target
> ● │   └─swap.target
> ○ ├─iscsi-shutdown.service
> ● │ └─system.slice
> ● │   └─-.slice
> ● └─system.slice
> ●   └─-.slice

Expected results:

Successful boot. If we spawn iscsid manually, the boot would proceed:

> sh-5.1# systemctl stop iscsid.socket
> sh-5.1# iscsid
> sh-5.1# iscsiadm -m fw -l
> Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.f.x8664:sn.8ed7544cb95d, portal: 172.31.46.1,3260]
> [  230.438301] scsi host7: iSCSI Initiator over TCP/IP
> [  230.452174] scsi 7:0:0:0: Direct-Access     LIO-ORG  vdb              4.0  PQ: 0 ANSI: 6
> [  230.455030] scsi 7:0:0:0: alua: supports implicit and explicit TPGS
> [  230.456289] scsi 7:0:0:0: alua: device naa.60014052472e9d39f594df69d8ab703e port group 0 rel port 1
> [  230.458129] scsi 7:0:0:0: Attached scsi generic sg1 type 0
> Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.f.x8664:sn.8ed7544cb95d, portal: 172.31.46.1,3260] successful.
> sh-5.1# [  230.463870] scsi 7:0:0:0: alua: transition timeout set to 60 seconds
> [  230.464852] scsi 7:0:0:0: alua: port group 00 state A non-preferred supports TOlUSNA
> [  230.467700] sd 7:0:0:0: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
> [  230.468864] sd 7:0:0:0: [sda] Write Protect is off
> [  230.469890] sd 7:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
> [  230.471351] sd 7:0:0:0: [sda] Optimal transfer size 4294967288 logical blocks > dev_max (65535 logical blocks)
> [  230.485509]  sda: sda1 sda2
> [  230.497163] sd 7:0:0:0: [sda] Attached SCSI disk
> 
> sh-5.1# 
> sh-5.1# exit
> [FAILED] Failed to start Open-iSCSI.
> [  232.431683] SGI XFS with ACLs, security attributes, scrub, quota, no debug enabled
> [  232.435300] XFS (dm-0): Mounting V5 Filesystem
> [FAILED] Failed to start Open-iSCSI.
> [  232.639918] XFS (dm-0): Starting recovery (logdev: internal)
> [  232.651469] XFS (dm-0): Ending recovery (logdev: internal)
> [FAILED] Failed to start Open-iSCSI.
> [FAILED] Failed to start Open-iSCSI.
> <info>  [1635331571.5930] caught SIGTERM, shutting down normally.
> <info>  [1635331571.5936] dhcp4 (enp1s0): canceled DHCP transaction
> <info>  [1635331571.5937] dhcp4 (enp1s0): state changed extended -> terminated
> <info>  [1635331571.5941] exiting (success)
> [  232.837751] systemd-journald[269]: Received SIGTERM from PID 1 (systemd).
> [  232.890702] audit: type=1404 audit(1635331571.693:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=selinux res=1
> [  232.924534] SELinux:  policy capability network_peer_controls=1
> [  232.925155] SELinux:  policy capability open_perms=1
> [  232.925701] SELinux:  policy capability extended_socket_class=1
> [  232.926312] SELinux:  policy capability always_check_network=0
> [  232.926928] SELinux:  policy capability cgroup_seclabel=1
> [  232.927466] SELinux:  policy capability nnp_nosuid_transition=1
> [  232.928070] SELinux:  policy capability genfs_seclabel_symlinks=0
> [  232.962468] audit: type=1403 audit(1635331571.764:3): auid=4294967295 ses=4294967295 lsm=selinux res=1
> [  232.965627] systemd[1]: Successfully loaded SELinux policy in 75.356ms.
> [  233.003923] systemd[1]: Relabelled /dev, /dev/shm, /run, /sys/fs/cgroup in 22.294ms.
> [  233.008363] systemd[1]: systemd 249-7.el9_b running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
> [  233.011555] systemd[1]: Detected virtualization kvm.
> [  233.012055] systemd[1]: Detected architecture x86-64.
> 
> Welcome to Red Hat Enterprise Linux 9.0 Beta (Plow)!
> 
> [  233.224355] systemd[1]: initrd-switch-root.service: Deactivated successfully.
> [  233.225235] systemd[1]: Stopped Switch Root.
> ...
> Red Hat Enterprise Linux 9.0 Beta (Plow)
> Kernel 5.14.0-1.7.1.el9.x86_64 on an x86_64
> 
> Activate the web console with: systemctl enable --now cockpit.socket
> 
> localhost login:

Comment 1 David Tardon 2021-11-05 09:20:16 UTC
Bah. And we foolishly thought that always including resume module is safe...

@lnykryn: We have to revert 058ffa90669a4dbd2af16e025f22312408ed4ba9 and only drop the /sys/power/resume check as a fix for bug 1926544.

Comment 2 Pavel Valena 2022-02-15 19:11:59 UTC
Reviewed and pre-verified in the PR: https://github.com/redhat-plumbers/dracut-rhel9/pull/29

Comment 7 errata-xmlrpc 2022-05-17 16:04:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: dracut), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4070