Created attachment 1852425 [details] The log of libguestfs Description of problem: As subject Version-Release number of selected component (if applicable): libguestfs-1.47.2-2.fc36.x86_64 qemu-6.2.0-2.fc36.x86_64 How reproducible: 100% Steps to Reproduce: 1. Download image ➜ ~ wget https://dl.fedoraproject.org/pub/fedora/linux/releases/35/Cloud/x86_64/images/Fedora-Cloud-Base-35-1.2.x86_64.qcow2 -O /var/lib/libvirt/images/fedora.qcow2 2. Execute virt-customize ➜ ~ LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 virt-customize -a /var/lib/libvirt/images/fedora.qcow2 --uninstall cloud-init --install qemu-guest-agent --install NetworkManager --install libselinux-python3 --network 2>&1 |tee /tmp/log Actual results: [ 2.155549] Code: 10 00 00 00 4c 89 44 24 08 4c 89 4c 24 10 e8 05 bd 00 00 48 83 fb ff 75 07 48 8b 1d 57 12 cd 03 e8 c4 bc 00 00 44 8b 7c 24 18 <4c> 8b 73 18 45 31 e4 c7 04 24 00 00 00 00 bd 01 00 00 00 41 83 e7 [ 2.155549] RSP: 0000:ffffb62f8000bd40 EFLAGS: 00010293 [ 2.155549] RAX: ffffffffac4ec9c7 RBX: 0000000000000000 RCX: 0000000000000010 [ 2.155549] RDX: ffffffffac4ec9c7 RSI: ffffffffac4ec9b0 RDI: 00000000000000a6 [ 2.155549] RBP: 0000000000000000 R08: ffffffffab8782de R09: 0000000000000000 [ 2.155549] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 2.155549] R13: ffffffffac8fd657 R14: 0000000000000000 R15: 0000000000000001 [ 2.155549] FS: 0000000000000000(0000) GS:ffff9b764e200000(0000) knlGS:0000000000000000 [ 2.155549] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.155549] CR2: 0000000000000018 CR3: 000000004b028001 CR4: 0000000000770ef0 [ 2.155549] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2.155549] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2.155549] PKRU: 55555554 [ 2.155549] Call Trace: [ 2.155549] <TASK> [ 2.155549] ? acpi_walk_namespace+0x13e/0x13e [ 2.155549] acpi_get_devices+0xd3/0x110 [ 2.155549] ? drm_core_init+0xd4/0xd4 [ 2.155549] ? drm_kms_helper_init+0xa/0xa [ 2.155549] detect_thinkpad_privacy_screen+0x51/0x8d [ 2.155549] drm_privacy_screen_lookup_init+0xa/0x43 [ 2.155549] drm_core_init+0xac/0xd4 [ 2.155549] do_one_initcall+0x67/0x350 [ 2.155549] ? kernel_init_freeable+0x273/0x2cf [ 2.155549] kernel_init_freeable+0x283/0x2cf [ 2.155549] ? rest_init+0x260/0x260 [ 2.155549] kernel_init+0x16/0x130 [ 2.155549] ret_from_fork+0x22/0x30 [ 2.155549] </TASK> [ 2.155549] Modules linked in: [ 2.155549] CR2: 0000000000000018 [ 2.155549] ---[ end trace 02928d238129499d ]--- [ 2.155549] RIP: 0010:acpi_ns_walk_namespace+0x60/0x27f [ 2.155549] Code: 10 00 00 00 4c 89 44 24 08 4c 89 4c 24 10 e8 05 bd 00 00 48 83 fb ff 75 07 48 8b 1d 57 12 cd 03 e8 c4 bc 00 00 44 8b 7c 24 18 <4c> 8b 73 18 45 31 e4 c7 04 24 00 00 00 00 bd 01 00 00 00 41 83 e7 libguestfs: child_cleanup: 0x556254c5c9d0: child process died libguestfs: trace: launch = -1 (error) virt-customize: error: libguestfs error: guestfs_launch failed, see earlier error messages If reporting bugs, run virt-customize with debugging enabled and include the complete output: Expected results: No kernel panic Additional info: See the full log in the attachment
This is likely a rawhide kernel bug. You might want to try upgrading to one of the newer kernels: https://koji.fedoraproject.org/koji/packageinfo?packageID=8 or use a non-rawhide kernel. But basically it's a kernel bug.
This bug also stopped a package rebuilding in the Fedora 36 mass rebuild yesterday: https://koji.fedoraproject.org/koji/taskinfo?taskID=81519983
And it breaks libguestfs builds on x86-64: https://koji.fedoraproject.org/koji/taskinfo?taskID=81593116
Still happening with kernel-5.17.0-0.rc0.20220112gitdaadb3bd0e8d.63.fc36.x86_64
Not really surprised that it hasn't been fixed in the original kernel that the error was reported on. I haven't been able to build a kernel since then due to gcc 12. No, I do not know when I will have kernels building again, every time I get passed one error, it exposes another.
I bisected this to: f809891ee51b706e1a2a42998d8766c120660796 is the first bad commit commit f809891ee51b706e1a2a42998d8766c120660796 Author: Hans de Goede <hdegoede> Date: Tue Oct 5 22:23:20 2021 +0200 platform/x86: thinkpad_acpi: Register a privacy-screen device Register a privacy-screen device on laptops with a privacy-screen, this exports the PrivacyGuard features to user-space using a standardized vendor-agnostic sysfs interface. Note the sysfs interface is read-only. Registering a privacy-screen device with the new privacy-screen class code will also allow the GPU driver to get a handle to it and export the privacy-screen setting as a property on the DRM connector object for the LCD panel. This DRM connector property is a new standardized interface which all user-space code should use to query and control the privacy-screen. Reviewed-by: Emil Velikov <emil.l.velikov> Reviewed-by: Lyude Paul <lyude> Reviewed-by: Mark Pearson <markpearson> Signed-off-by: Hans de Goede <hdegoede> Link: https://patchwork.freedesktop.org/patch/msgid/20211005202322.700909-9-hdegoede@redhat.com drivers/platform/x86/Kconfig | 2 + drivers/platform/x86/thinkpad_acpi.c | 97 ++++++++++++++++++++++++++---------- 2 files changed, 74 insertions(+), 25 deletions(-)
(In reply to Justin M. Forbes from comment #5) > Not really surprised that it hasn't been fixed in the original kernel that > the error was reported on. I haven't been able to build a kernel since then > due to gcc 12. No, I do not know when I will have kernels building again, > every time I get passed one error, it exposes another. I'm also unable to compile the kernel with GCC 12. To bisect this bug I reverted my Rawhide machine back to GCC 11.
Confirmed that reverting f809891ee51b70 (on top of current kernel head) fixes the problem. I don't really understand why though.
So the issue here is that acpi_walk_devices does not like to be called on systems where ACPI has not been initialized and the qemu model being used by guestfs is so old that ACPI fails to initialize: [ 0.013339] ACPI BIOS Error (bug): A valid RSDP was not found (20211217/tbxfroot-210) A kernel-fix for this has already landed in 5.17-rc2 https://cgit.freedesktop.org/drm-misc/commit/?h=drm-misc-fixes&id=7fde14d705985dd933a3d916d39daa72b1668098 And a pull-req has been submitted to Linus: https://lore.kernel.org/dri-devel/CAPM=9tweQ-RgLm5oewCYqVzRuiQ6cSQrb2yzVYP_537V67pdDQ@mail.gmail.com/ Note the fix talks about using acpi=off on the kernel commandline, but the acpi_disabled bool for which a check is added also gets set on systems where parsing the ACPI tables fails, so the patch should also fix this bug.
Ugh, I somehow ended up submitting my comment while I was still editing it. I meant to drop the: > And a pull-req has been submitted to Linus: > https://lore.kernel.org/dri-devel/CAPM=9tweQ-RgLm5oewCYqVzRuiQ6cSQrb2yzVYP_537V67pdDQ@mail.gmail.com/ Since as mentioned above that I noticed Linus has already pulled the fix: https://cgit.freedesktop.org/drm-misc/commit/?h=drm-misc-fixes&id=7fde14d705985dd933a3d916d39daa72b1668098 into 5.17-rc2. So this should be fixed as soon as we are able to build kernels in rawhide again. Richard, can you confirm that 5.17-rc2 fixes this by testing a 5.17-rc2 build with gcc11 ? Note a possible (temporary) workaround might be to use a newer machine model in qemu which does actually support ACPI.
(In reply to Hans de Goede from comment #9) > So the issue here is that acpi_walk_devices does not like to be called on > systems where ACPI has not been initialized and the qemu model being used by > guestfs is so old that ACPI fails to initialize: I don't think we set any model? Does libvirt / qemu pick some default model? The libvirt XML was: <?xml version="1.0"?> <domain type="qemu" xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0"> <name>guestfs-2slh17gcjz8370pb</name> <memory unit="MiB">1280</memory> <currentMemory unit="MiB">1280</currentMemory> <cpu mode="maximum"/> <vcpu>1</vcpu> <clock offset="utc"> <timer name="rtc" tickpolicy="catchup"/> <timer name="pit" tickpolicy="delay"/> <timer name="hpet" present="no"/> </clock> <os> <type>hvm</type> <kernel>/builddir/build/BUILD/guestfs-tools-1.47.3/tmp/.guestfs-1000/appliance.d/kernel</kernel> <initrd>/builddir/build/BUILD/guestfs-tools-1.47.3/tmp/.guestfs-1000/appliance.d/initrd</initrd> <cmdline>panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=UUID=d51c4ea7-05ba-48c9-a149-708edb152d66 selinux=0 guestfs_verbose=1 TERM=vt100</cmdline> <bios useserial="yes"/> </os> <seclabel type="none"/> <on_reboot>destroy</on_reboot> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <rng model="virtio"> <backend model="random">/dev/urandom</backend> </rng> <controller type="scsi" index="0" model="virtio-scsi"/> <disk device="disk" type="file"> <source file="/builddir/build/BUILD/guestfs-tools-1.47.3/tmp/libguestfslYmzSB/devnull1.img"/> <target dev="sda" bus="scsi"/> <driver name="qemu" type="raw" cache="writeback"/> <address type="drive" controller="0" bus="0" target="0" unit="0"/> </disk> <disk type="file" device="disk"> <source file="/builddir/build/BUILD/guestfs-tools-1.47.3/tmp/libguestfslYmzSB/overlay2.qcow2"/> <target dev="sdb" bus="scsi"/> <driver name="qemu" type="qcow2" cache="unsafe"/> <address type="drive" controller="0" bus="0" target="1" unit="0"/> </disk> <serial type="unix"> <source mode="connect" path="/tmp/libguestfsidUYqU/console.sock"/> <target port="0"/> </serial> <channel type="unix"> <source mode="connect" path="/tmp/libguestfsidUYqU/guestfsd.sock"/> <target type="virtio" name="org.libguestfs.channel.0"/> </channel> <controller type="usb" model="none"/> <memballoon model="none"/> </devices> <qemu:commandline> <qemu:env name="TMPDIR" value="/builddir/build/BUILD/guestfs-tools-1.47.3/tmp"/> </qemu:commandline> </domain>
A better workaround might actually be to add modprobe.blacklist=<module-name> to the kernel commandline to stop the GPU driver from loading (which will cause drm.ko to get loaded as dep and drm.ko has the bug). To figure out the <module-name> run: "lsmod | grep drm" On a vm using the above config. And then see which module(s) is/are depending on drm. There might be multiple, but the others which depend on drm are likely only being loaded because the driver for the emulated gfx-card also depends on some helper-libs. Just stopping the gfx-card driver itself from loading should be enough. I expect the gfx-card driver to be one of "cirrus", "qxl", "vmwgfx" or "virtio-gpu". Since no specific card is specified I guess it will be "cirrus".
> Richard, can you confirm that 5.17-rc2 fixes this by testing a 5.17-rc2 build with gcc11 ? Yes, I built 5.17-rc2 from git (using GCC 11) and can confirm that the bug has been fixed. I'll leave this bug open until we get it into Fedora.
This is now fixed in Fedora.