Bug 2203004 - Server edition QEMU image first-boot menu intermittently (>50%) fails for aarch64
Summary: Server edition QEMU image first-boot menu intermittently (>50%) fails for aar...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 38
Hardware: aarch64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: dracut-maint-list
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-11 03:41 UTC by Tom Sobczynski
Modified: 2023-07-27 16:32 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)
Shell script reproducer with comments (4.00 KB, application/x-shellscript)
2023-05-11 03:43 UTC, Tom Sobczynski
no flags Details
Log from run with extra kernel params "inst.text rd.debug rd.shell" (468.86 KB, text/plain)
2023-07-20 20:52 UTC, Tom Sobczynski
no flags Details

Description Tom Sobczynski 2023-05-11 03:41:25 UTC
When the Server edition QEMU image is booted for the first time, the boot process leads to a menu allowing account creation, root password setting, etc. This process works every time on x64, but fails with >50% frequency on aarch64. The attached shell script is a reproducer and contains detailed comments with sample output. I have seen this when running the reproducer in a F38 Workstation VM in Parallels on macOS as well as running Qemu directly on macOS with M2 processor.

Reproducible: Sometimes

Steps to Reproduce:
See attached shell script reproducer with detailed comments.
Actual Results:  
Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Expected Results:  
1) [x] Language settings                 2) [x] Time settings
       (English (United States))                (Etc/UTC timezone)
3) [x] Network configuration             4) [x] Root password
       (Connected: ens3)                        (Root account is disabled)
5) [ ] User creation
       (No user will be created)


Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to
refresh]:

Although my testing with F37 was somewhat limited, I have NOT seen this with F37 Server QEMU aarch64.

Comment 1 Tom Sobczynski 2023-05-11 03:43:51 UTC
Created attachment 1963913 [details]
Shell script reproducer with comments

Comment 2 Vladimír Slávik 2023-05-31 13:50:06 UTC
Hello Tom, thank you for the report. I'm sorry - the way you reported this with the reproducer is pretty great - but there's nothing about anaconda in there, I'm afraid.

I ran the script, and got the predicted result. There isn't much I could do, though. There seems to be no way to get access to the running system so I can't investigate the logs or run programs. (No console...) However, it's clear that anaconda does not even start, the last messages are from systemd.

Here's a pile of ideas that are cheaper to type here than trying them myself:
- install f37 and upgrade to f38 to see if it gets stuck the same way
- try attaching a virtual serial console to see if the thing maybe continues elsewhere
- there's also a virtio port to open
- try inst.notmux and inst.noshell
- reassign to systemd, they might have some ideas

Comment 3 Vladimír Slávik 2023-07-11 13:55:50 UTC
After discussing with team members, dracut might be a more likely cause?

Comment 4 Pavel Valena 2023-07-19 08:33:53 UTC
Hello,

on dracut side, there's nothing useful in the log.

Please start the install with `rd.debug rd.shell` (additionally to `inst.text console=ttyS0`) and post the full log (and `rdsosreport.txt` if available).

More on how to debug: https://fedoraproject.org/wiki/How_to_debug_Dracut_problems

If there's a dracut crash or similar (failure/error), feel free to report it upstream (for faster resolution).

Comment 5 Tom Sobczynski 2023-07-20 20:50:23 UTC
I interrupted GRUB to add those parameters to the kernel arguments. In the log I am about to attach (2023-07-20_log.txt), I see the parameters Pavel requested, but then a complaint about "Unknown..." that I don't know how to interpret. But my guess is that only the one "BOOT_IMAGE..." quoted is unknown and passed through unheeded?

[    0.000000] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.2.9-300.fc38.aarch64 root=/dev/mapper/sysvg-root ro console=tty1 console=ttyS0,115200n8 rd.lvm.lv=sysvg/root inst.text rd.debug rd.shell
[    0.000000] Unknown kernel command line parameters "BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.2.9-300.fc38.aarch64", will be passed to user space.

At the end of the boot process, I see the familiar error instead of the expected menu: "Cannot open access to console, the root account is locked."

Comment 6 Tom Sobczynski 2023-07-20 20:52:12 UTC
Created attachment 1976820 [details]
Log from run with extra kernel params "inst.text rd.debug rd.shell"

Comment 7 Tom Sobczynski 2023-07-20 21:01:34 UTC
I also see no difference in behavior with "inst.notmux" and "inst.noshell".

Comment 8 Tom Sobczynski 2023-07-27 16:32:02 UTC
I take back my previous comment. I happened to leave the Qemu VM reproducer running in the background for a long time with "inst.notmux" and "inst.noshell", and after a long delay, the menu appeared on the console but I could not interact with it, because input is instead directed to the login prompt. You can see below where I typed "r" to refresh the menu but it was instead read as username "r" and followed by the "Password: " prompt.


Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.
[  307.813570] audit: type=1130 audit(1689886799.830:35): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hwdb-update comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[  307.927246] audit: type=1334 audit(1689886799.940:36): prog-id=51 op=LOAD
[  307.935616] audit: type=1334 audit(1689886799.950:37): prog-id=52 op=LOAD
[  307.940033] audit: type=1334 audit(1689886799.950:38): prog-id=22 op=UNLOAD
[  307.942639] audit: type=1334 audit(1689886799.960:39): prog-id=23 op=UNLOAD
[  312.030713] audit: type=1130 audit(1689886804.050:40): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  313.438067] audit: type=1334 audit(1689886805.450:41): prog-id=53 op=LOAD
[  313.460262] audit: type=1334 audit(1689886805.470:42): prog-id=54 op=LOAD
[  313.474193] audit: type=1334 audit(1689886805.490:43): prog-id=55 op=LOAD
[  314.245566] audit: type=1334 audit(1689886806.260:44): prog-id=56 op=LOAD
[  314.898490] RPC: Registered named UNIX socket transport module.
[  314.899674] RPC: Registered udp transport module.
[  314.900519] RPC: Registered tcp transport module.
[  314.901385] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  315.694049] audit: type=1305 audit(1689886807.710:45): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[  315.710735] audit: type=1300 audit(1689886807.710:45): arch=c00000b7 syscall=206 success=yes exit=60 a0=3 a1=fffff0827670 a2=3c a3=0 items=0 ppid=594 pid=601 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditd" exe="/usr/sbin/auditd" subj=system_u:system_r:auditd_t:s0 key=(null)

[  389.019906] zram0: detected capacity change from 0 to 6053888
[  393.093333] EXT4-fs (vda2): mounted filesystem a5d1694d-01d5-4b53-b1ff-e9e7fe2aedb4 with ordered data mode. Quota mode: none.
[  397.041189] Adding 3026940k swap on /dev/zram0.  Priority:100 extents:1 across:3026940k SSDscFS

Fedora Linux 38 (Server Edition)
Kernel 6.2.9-300.fc38.aarch64 on an aarch64 (ttyAMA0)

Web console: https://localhost:9090/

localhost login: [  444.406279] NET: Registered PF_QIPCRTR protocol family
================================================================================
================================================================================

1) [x] Language settings                 2) [x] Time settings
       (English (United States))                (Etc/UTC timezone)
3) [x] Network configuration             4) [x] Root password
       (Connected: enp0s1)                      (Root account is disabled)
5) [ ] User creation
       (No user will be created)


Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to
refresh]:
31415999[188156.604039] hrtimer: interrupt took 30180878 ns
r
Password: login: timed out after 60 seconds
Fedora Linux 38 (Server Edition)
Kernel 6.2.9-300.fc38.aarch64 on an aarch64 (ttyAMA0)

Web console: https://localhost:9090/

localhost login:Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.
[  307.813570] audit: type=1130 audit(1689886799.830:35): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hwdb-update comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[  307.927246] audit: type=1334 audit(1689886799.940:36): prog-id=51 op=LOAD
[  307.935616] audit: type=1334 audit(1689886799.950:37): prog-id=52 op=LOAD
[  307.940033] audit: type=1334 audit(1689886799.950:38): prog-id=22 op=UNLOAD
[  307.942639] audit: type=1334 audit(1689886799.960:39): prog-id=23 op=UNLOAD
[  312.030713] audit: type=1130 audit(1689886804.050:40): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  313.438067] audit: type=1334 audit(1689886805.450:41): prog-id=53 op=LOAD
[  313.460262] audit: type=1334 audit(1689886805.470:42): prog-id=54 op=LOAD
[  313.474193] audit: type=1334 audit(1689886805.490:43): prog-id=55 op=LOAD
[  314.245566] audit: type=1334 audit(1689886806.260:44): prog-id=56 op=LOAD
[  314.898490] RPC: Registered named UNIX socket transport module.
[  314.899674] RPC: Registered udp transport module.
[  314.900519] RPC: Registered tcp transport module.
[  314.901385] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  315.694049] audit: type=1305 audit(1689886807.710:45): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[  315.710735] audit: type=1300 audit(1689886807.710:45): arch=c00000b7 syscall=206 success=yes exit=60 a0=3 a1=fffff0827670 a2=3c a3=0 items=0 ppid=594 pid=601 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditd" exe="/usr/sbin/auditd" subj=system_u:system_r:auditd_t:s0 key=(null)

[  389.019906] zram0: detected capacity change from 0 to 6053888
[  393.093333] EXT4-fs (vda2): mounted filesystem a5d1694d-01d5-4b53-b1ff-e9e7fe2aedb4 with ordered data mode. Quota mode: none.
[  397.041189] Adding 3026940k swap on /dev/zram0.  Priority:100 extents:1 across:3026940k SSDscFS

Fedora Linux 38 (Server Edition)
Kernel 6.2.9-300.fc38.aarch64 on an aarch64 (ttyAMA0)

Web console: https://localhost:9090/

localhost login: [  444.406279] NET: Registered PF_QIPCRTR protocol family
================================================================================
================================================================================

1) [x] Language settings                 2) [x] Time settings
       (English (United States))                (Etc/UTC timezone)
3) [x] Network configuration             4) [x] Root password
       (Connected: enp0s1)                      (Root account is disabled)
5) [ ] User creation
       (No user will be created)


Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to
refresh]:
31415999[188156.604039] hrtimer: interrupt took 30180878 ns
r
Password: login: timed out after 60 seconds
Fedora Linux 38 (Server Edition)
Kernel 6.2.9-300.fc38.aarch64 on an aarch64 (ttyAMA0)

Web console: https://localhost:9090/

localhost login:


Note You need to log in before you can comment on or make changes to this bug.