Bug 2203004 - Server edition QEMU image first-boot menu intermittently (>50%) fails for aarch64
Summary: Server edition QEMU image first-boot menu intermittently (>50%) fails for aar...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 38
Hardware: aarch64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Pavel Valena
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-11 03:41 UTC by Tom Sobczynski
Modified: 2024-05-22 10:55 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-05-22 10:55:44 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Shell script reproducer with comments (4.00 KB, application/x-shellscript)
2023-05-11 03:43 UTC, Tom Sobczynski
no flags Details
Log from run with extra kernel params "inst.text rd.debug rd.shell" (468.86 KB, text/plain)
2023-07-20 20:52 UTC, Tom Sobczynski
no flags Details

Description Tom Sobczynski 2023-05-11 03:41:25 UTC
When the Server edition QEMU image is booted for the first time, the boot process leads to a menu allowing account creation, root password setting, etc. This process works every time on x64, but fails with >50% frequency on aarch64. The attached shell script is a reproducer and contains detailed comments with sample output. I have seen this when running the reproducer in a F38 Workstation VM in Parallels on macOS as well as running Qemu directly on macOS with M2 processor.

Reproducible: Sometimes

Steps to Reproduce:
See attached shell script reproducer with detailed comments.
Actual Results:  
Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Expected Results:  
1) [x] Language settings                 2) [x] Time settings
       (English (United States))                (Etc/UTC timezone)
3) [x] Network configuration             4) [x] Root password
       (Connected: ens3)                        (Root account is disabled)
5) [ ] User creation
       (No user will be created)


Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to
refresh]:

Although my testing with F37 was somewhat limited, I have NOT seen this with F37 Server QEMU aarch64.

Comment 1 Tom Sobczynski 2023-05-11 03:43:51 UTC
Created attachment 1963913 [details]
Shell script reproducer with comments

Comment 2 Vladimír Slávik 2023-05-31 13:50:06 UTC
Hello Tom, thank you for the report. I'm sorry - the way you reported this with the reproducer is pretty great - but there's nothing about anaconda in there, I'm afraid.

I ran the script, and got the predicted result. There isn't much I could do, though. There seems to be no way to get access to the running system so I can't investigate the logs or run programs. (No console...) However, it's clear that anaconda does not even start, the last messages are from systemd.

Here's a pile of ideas that are cheaper to type here than trying them myself:
- install f37 and upgrade to f38 to see if it gets stuck the same way
- try attaching a virtual serial console to see if the thing maybe continues elsewhere
- there's also a virtio port to open
- try inst.notmux and inst.noshell
- reassign to systemd, they might have some ideas

Comment 3 Vladimír Slávik 2023-07-11 13:55:50 UTC
After discussing with team members, dracut might be a more likely cause?

Comment 4 Pavel Valena 2023-07-19 08:33:53 UTC
Hello,

on dracut side, there's nothing useful in the log.

Please start the install with `rd.debug rd.shell` (additionally to `inst.text console=ttyS0`) and post the full log (and `rdsosreport.txt` if available).

More on how to debug: https://fedoraproject.org/wiki/How_to_debug_Dracut_problems

If there's a dracut crash or similar (failure/error), feel free to report it upstream (for faster resolution).

Comment 5 Tom Sobczynski 2023-07-20 20:50:23 UTC
I interrupted GRUB to add those parameters to the kernel arguments. In the log I am about to attach (2023-07-20_log.txt), I see the parameters Pavel requested, but then a complaint about "Unknown..." that I don't know how to interpret. But my guess is that only the one "BOOT_IMAGE..." quoted is unknown and passed through unheeded?

[    0.000000] Kernel command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.2.9-300.fc38.aarch64 root=/dev/mapper/sysvg-root ro console=tty1 console=ttyS0,115200n8 rd.lvm.lv=sysvg/root inst.text rd.debug rd.shell
[    0.000000] Unknown kernel command line parameters "BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.2.9-300.fc38.aarch64", will be passed to user space.

At the end of the boot process, I see the familiar error instead of the expected menu: "Cannot open access to console, the root account is locked."

Comment 6 Tom Sobczynski 2023-07-20 20:52:12 UTC
Created attachment 1976820 [details]
Log from run with extra kernel params "inst.text rd.debug rd.shell"

Comment 7 Tom Sobczynski 2023-07-20 21:01:34 UTC
I also see no difference in behavior with "inst.notmux" and "inst.noshell".

Comment 8 Tom Sobczynski 2023-07-27 16:32:02 UTC
I take back my previous comment. I happened to leave the Qemu VM reproducer running in the background for a long time with "inst.notmux" and "inst.noshell", and after a long delay, the menu appeared on the console but I could not interact with it, because input is instead directed to the login prompt. You can see below where I typed "r" to refresh the menu but it was instead read as username "r" and followed by the "Password: " prompt.


Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.
[  307.813570] audit: type=1130 audit(1689886799.830:35): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hwdb-update comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[  307.927246] audit: type=1334 audit(1689886799.940:36): prog-id=51 op=LOAD
[  307.935616] audit: type=1334 audit(1689886799.950:37): prog-id=52 op=LOAD
[  307.940033] audit: type=1334 audit(1689886799.950:38): prog-id=22 op=UNLOAD
[  307.942639] audit: type=1334 audit(1689886799.960:39): prog-id=23 op=UNLOAD
[  312.030713] audit: type=1130 audit(1689886804.050:40): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  313.438067] audit: type=1334 audit(1689886805.450:41): prog-id=53 op=LOAD
[  313.460262] audit: type=1334 audit(1689886805.470:42): prog-id=54 op=LOAD
[  313.474193] audit: type=1334 audit(1689886805.490:43): prog-id=55 op=LOAD
[  314.245566] audit: type=1334 audit(1689886806.260:44): prog-id=56 op=LOAD
[  314.898490] RPC: Registered named UNIX socket transport module.
[  314.899674] RPC: Registered udp transport module.
[  314.900519] RPC: Registered tcp transport module.
[  314.901385] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  315.694049] audit: type=1305 audit(1689886807.710:45): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[  315.710735] audit: type=1300 audit(1689886807.710:45): arch=c00000b7 syscall=206 success=yes exit=60 a0=3 a1=fffff0827670 a2=3c a3=0 items=0 ppid=594 pid=601 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditd" exe="/usr/sbin/auditd" subj=system_u:system_r:auditd_t:s0 key=(null)

[  389.019906] zram0: detected capacity change from 0 to 6053888
[  393.093333] EXT4-fs (vda2): mounted filesystem a5d1694d-01d5-4b53-b1ff-e9e7fe2aedb4 with ordered data mode. Quota mode: none.
[  397.041189] Adding 3026940k swap on /dev/zram0.  Priority:100 extents:1 across:3026940k SSDscFS

Fedora Linux 38 (Server Edition)
Kernel 6.2.9-300.fc38.aarch64 on an aarch64 (ttyAMA0)

Web console: https://localhost:9090/

localhost login: [  444.406279] NET: Registered PF_QIPCRTR protocol family
================================================================================
================================================================================

1) [x] Language settings                 2) [x] Time settings
       (English (United States))                (Etc/UTC timezone)
3) [x] Network configuration             4) [x] Root password
       (Connected: enp0s1)                      (Root account is disabled)
5) [ ] User creation
       (No user will be created)


Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to
refresh]:
31415999[188156.604039] hrtimer: interrupt took 30180878 ns
r
Password: login: timed out after 60 seconds
Fedora Linux 38 (Server Edition)
Kernel 6.2.9-300.fc38.aarch64 on an aarch64 (ttyAMA0)

Web console: https://localhost:9090/

localhost login:Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.
[  307.813570] audit: type=1130 audit(1689886799.830:35): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hwdb-update comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[  307.927246] audit: type=1334 audit(1689886799.940:36): prog-id=51 op=LOAD
[  307.935616] audit: type=1334 audit(1689886799.950:37): prog-id=52 op=LOAD
[  307.940033] audit: type=1334 audit(1689886799.950:38): prog-id=22 op=UNLOAD
[  307.942639] audit: type=1334 audit(1689886799.960:39): prog-id=23 op=UNLOAD
[  312.030713] audit: type=1130 audit(1689886804.050:40): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  313.438067] audit: type=1334 audit(1689886805.450:41): prog-id=53 op=LOAD
[  313.460262] audit: type=1334 audit(1689886805.470:42): prog-id=54 op=LOAD
[  313.474193] audit: type=1334 audit(1689886805.490:43): prog-id=55 op=LOAD
[  314.245566] audit: type=1334 audit(1689886806.260:44): prog-id=56 op=LOAD
[  314.898490] RPC: Registered named UNIX socket transport module.
[  314.899674] RPC: Registered udp transport module.
[  314.900519] RPC: Registered tcp transport module.
[  314.901385] RPC: Registered tcp NFSv4.1 backchannel transport module.
[  315.694049] audit: type=1305 audit(1689886807.710:45): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[  315.710735] audit: type=1300 audit(1689886807.710:45): arch=c00000b7 syscall=206 success=yes exit=60 a0=3 a1=fffff0827670 a2=3c a3=0 items=0 ppid=594 pid=601 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="auditd" exe="/usr/sbin/auditd" subj=system_u:system_r:auditd_t:s0 key=(null)

[  389.019906] zram0: detected capacity change from 0 to 6053888
[  393.093333] EXT4-fs (vda2): mounted filesystem a5d1694d-01d5-4b53-b1ff-e9e7fe2aedb4 with ordered data mode. Quota mode: none.
[  397.041189] Adding 3026940k swap on /dev/zram0.  Priority:100 extents:1 across:3026940k SSDscFS

Fedora Linux 38 (Server Edition)
Kernel 6.2.9-300.fc38.aarch64 on an aarch64 (ttyAMA0)

Web console: https://localhost:9090/

localhost login: [  444.406279] NET: Registered PF_QIPCRTR protocol family
================================================================================
================================================================================

1) [x] Language settings                 2) [x] Time settings
       (English (United States))                (Etc/UTC timezone)
3) [x] Network configuration             4) [x] Root password
       (Connected: enp0s1)                      (Root account is disabled)
5) [ ] User creation
       (No user will be created)


Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to
refresh]:
31415999[188156.604039] hrtimer: interrupt took 30180878 ns
r
Password: login: timed out after 60 seconds
Fedora Linux 38 (Server Edition)
Kernel 6.2.9-300.fc38.aarch64 on an aarch64 (ttyAMA0)

Web console: https://localhost:9090/

localhost login:

Comment 9 Pavel Valena 2023-09-13 13:52:15 UTC
I see, the shell locks after some time of inactivity. Let me check where this is invoked.

Comment 10 Aoife Moloney 2024-05-22 10:55:44 UTC
Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21.

Fedora Linux 38 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.