1. Please describe the problem: The system is being randomly shut down. Happens anything from 10 minutes after boot to as long as two hours. 2. What is the Version-Release number of the kernel: kernel-6.15.0-0.rc0.20250327git1a9239bb4253.5.fc43.x86_64 kernel-6.15.0-0.rc0.20250401git08733088b566.8.fc43.x86_64 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : First noticed on kernel-6.15.0-0.rc0.20250327git1a9239bb4253.5.fc43.x86_64. Previous kernel-6.14.0-0.rc7.20250321gitb3ee1e460951.60.fc43.x86_64 works as expected. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: 100% reproducible, simply boot any currently available 6.15 kernel. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: As of this writing, yes. 6. Are you running any modules that not shipped with directly Fedora's kernel?: No, not on this box. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. The system "believes" the power button is being pressed. The important log entries from 2 episodes follow: Apr 01 12:01:00 zorac CROND[1494]: (root) CMD (run-parts /etc/cron.hourly) Apr 01 12:01:00 zorac run-parts[1497]: (/etc/cron.hourly) starting 0anacron Apr 01 12:01:00 zorac run-parts[1503]: (/etc/cron.hourly) finished 0anacron Apr 01 12:01:00 zorac CROND[1493]: (root) CMDEND (run-parts /etc/cron.hourly) Apr 01 12:22:28 zorac systemd-logind[821]: Power key pressed short. Apr 01 12:22:28 zorac systemd-logind[821]: Powering off... Apr 01 12:22:28 zorac systemd-logind[821]: System is powering down. **** Apr 01 15:30:21 zorac audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Apr 01 15:30:21 zorac audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Apr 01 15:31:37 zorac systemd-logind[847]: Power key pressed short. Apr 01 15:31:37 zorac systemd-logind[847]: Powering off... Apr 01 15:31:37 zorac systemd-logind[847]: System is powering down. I installed 'evtest' as advised on the test mailing list and determined my power button was event2, however the events shutting down the system are arriving as event1 as shown when trapping event1: zorac$ sudo evtest --grab /dev/input/event1 [sudo] password for admin: Input driver version is 1.0.1 Input device ID: bus 0x19 vendor 0x0 product 0x1 version 0x0 Input device name: "Power Button" Supported events: Event type 0 (EV_SYN) Event type 1 (EV_KEY) Event code 116 (KEY_POWER) Event code 143 (KEY_WAKEUP) Properties: Testing ... (interrupt to exit) Event: time 1743497614.264130, type 1 (EV_KEY), code 116 (KEY_POWER), value 1 Event: time 1743497614.264130, -------------- SYN_REPORT ------------ Event: time 1743497614.264135, type 1 (EV_KEY), code 116 (KEY_POWER), value 0 Event: time 1743497614.264135, -------------- SYN_REPORT ------------ Event: time 1743500523.593170, type 1 (EV_KEY), code 116 (KEY_POWER), value 1 Event: time 1743500523.593170, -------------- SYN_REPORT ------------ Event: time 1743500523.593175, type 1 (EV_KEY), code 116 (KEY_POWER), value 0 Event: time 1743500523.593175, -------------- SYN_REPORT ------------ Event: time 1743502114.807090, type 1 (EV_KEY), code 116 (KEY_POWER), value 1 Event: time 1743502114.807090, -------------- SYN_REPORT ------------ Event: time 1743502114.807095, type 1 (EV_KEY), code 116 (KEY_POWER), value 0 Event: time 1743502114.807095, -------------- SYN_REPORT ------------ Event: time 1743507242.211034, type 1 (EV_KEY), code 116 (KEY_POWER), value 1 Event: time 1743507242.211034, -------------- SYN_REPORT ------------ Event: time 1743507242.211039, type 1 (EV_KEY), code 116 (KEY_POWER), value 0 Event: time 1743507242.211039, -------------- SYN_REPORT ------------ Event: time 1743540057.620123, type 1 (EV_KEY), code 116 (KEY_POWER), value 1 Event: time 1743540057.620123, -------------- SYN_REPORT ------------ Event: time 1743540057.620128, type 1 (EV_KEY), code 116 (KEY_POWER), value 0 Event: time 1743540057.620128, -------------- SYN_REPORT ------------ Event: time 1743541608.688139, type 1 (EV_KEY), code 116 (KEY_POWER), value 1 Event: time 1743541608.688139, -------------- SYN_REPORT ------------ Event: time 1743541608.688144, type 1 (EV_KEY), code 116 (KEY_POWER), value 0 Event: time 1743541608.688144, -------------- SYN_REPORT ------------ My event list looks as follows: Available devices: /dev/input/event0: Sleep Button /dev/input/event1: Power Button /dev/input/event10: HDA Intel PCH Headphone Mic /dev/input/event11: HDA Intel PCH Front Line Out /dev/input/event12: HDA Intel PCH HDMI/DP,pcm=3 /dev/input/event13: HDA Intel PCH HDMI/DP,pcm=7 /dev/input/event14: HDA Intel PCH HDMI/DP,pcm=8 /dev/input/event2: Power Button /dev/input/event3: PixArt Dell MS116 USB Optical Mouse /dev/input/event4: Dell KB216 Wired Keyboard /dev/input/event5: Dell KB216 Wired Keyboard System Control /dev/input/event6: Dell KB216 Wired Keyboard Consumer Control /dev/input/event7: Video Bus /dev/input/event8: PC Speaker /dev/input/event9: Dell WMI hotkeys Please note the real power button is event2. As a final test I booted the latest available 6.14 kernel, trapping event1, and it ran (and it is still running) more than 20 hours without any event1 events being reported. I believe therefore this is a 6.15 issue. My Hardware: Dell Optiplex 3040 1 x 6th Gen Intel(R) Core(TM) i3-6100T CPU @ 3.20GHz Intel Corporation HD Graphics 530 (rev 06) 16G RAM For reference the associated thread on Fedora Test List: https://lists.fedoraproject.org/archives/list/test@lists.fedoraproject.org/thread/SYNFSBCLQ7VUSGWIULVWUDXJM5JHYNH3/ Reproducible: Always
As mentioned in https://lore.kernel.org/linux-acpi/CAJZ5v0hbA6bqxHupTh4NZR-GVSb9M5RL7JSb2yQgvYYJg+z2aQ@mail.gmail.com/T/#t I'd like to understand how this is actually happening to decide what we should do about it. 1) Could you please add an acpidump into the bug report? 2) Can you please use acpica tracing to determine what is happening when this notify event comes in? The basic way to do it: echo 0x00000004 | sudo tee /sys/module/acpi/parameters/trace_debug_layer echo 0x00000004 | sudo tee /sys/module/acpi/parameters/trace_debug_level echo enable | sudo tee /sys/module/acpi/parameters/trace_state This should then save to the journal the associated event info.
Is there something I need to do before activating acpica tracing? I'm getting the following error: zorac$ echo 0x00000004 | sudo tee /sys/module/acpi/parameters/trace_debug_layer tee: /sys/module/acpi/parameters/trace_debug_layer: Permission denied 0x00000004 I get this even if I 'su' to root.
Created attachment 2083357 [details] acpidump This is after booting into: kernel-6.15.0-0.rc0.20250401git08733088b566.8.fc43.x86_64
I cannot write into /sys/module/acpi/parameters/ as root, even if I change the permissions on /sys to 755.
Is your kernel built with CONFIG_ACPI_DEBUG? If not; you might need to build with that for it to work. Here is more information on it: https://www.kernel.org/doc/html/v6.14-rc7/firmware-guide/acpi/method-tracing.html From your acpidump, am I right that your ACPI power button is \_SB_.PWRB? You can confirm it with this: # cat /sys/bus/acpi/drivers/button/PNP0C0C:00/path I see a few ways that this is notified. * Level triggered GPE 6D: Notify (\_SB.PWRB, 0x02) // Device Wake * PME for the root port an XHCI controller is attached to (when PME is enabled for that root port) Notify (PWRB, 0x02) // Device Wake Notify (XHC, 0x02) // Device Wake * System _WAK which notifies Super IO (PNP0C02) via \_SB.PCI0.LPCB.SIO1.SIOW If ((PMS1 & 0x08)) { Notify (PS2K, 0x02) // Device Wake Notify (PWRB, 0x02) // Device Wake } If ((PMS1 & 0x10)) { Notify (PS2M, 0x02) // Device Wake Notify (PWRB, 0x02) // Device Wake } * System _WAK which notifies RWAK Some other questions for you that might help me understand how this is happening. 1) What was your system doing when this happened? Did you by chance plug something into your USB controller? Or remove something? Did you do suspend/resume near then? 2) Is it possible for you to capture /sys/firmware/interrupts/gpe6D both at bootup and if it normally doesn't increment right after the issue happens? You might need to configure logind to ignore power button events for now to make sure your system doesn't turn off when it happens. 3) Would it be possible for to you try to revert the suspected patch to see if this issue goes away?
One more thing. Assuming that the root cause is this patch, can you test if this patch helps? diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c index 90b09840536dd..740c80cb17033 100644 --- a/drivers/acpi/button.c +++ b/drivers/acpi/button.c @@ -444,10 +444,14 @@ static void acpi_button_notify(acpi_handle handle, u32 event, void *data) struct input_dev *input; int keycode; + button = acpi_driver_data(device); + switch (event) { case ACPI_BUTTON_NOTIFY_STATUS: break; case ACPI_BUTTON_NOTIFY_WAKE: + if (!button->suspended) + return; break; default: acpi_handle_debug(device->handle, "Unsupported event [0x%x]\n", @@ -457,7 +461,6 @@ static void acpi_button_notify(acpi_handle handle, u32 event, void *data) acpi_pm_wakeup_event(&device->dev); - button = acpi_driver_data(device); if (button->suspended) return; @Yijun: Can you check if the original reason for https://git.kernel.org/torvalds/c/a7e23ec17feec still works with that patch?
In config-6.15.0-0.rc0.20250401git08733088b566.8.fc43.x86_64 I can see this: # CONFIG_ACPI_DEBUG is not set
zorac$ cat /sys/bus/acpi/drivers/button/PNP0C0C:00/path \_SB_.PWRB
> 1) What was your system doing when this happened? Did you by chance plug something into your USB controller? Or remove something? Did you do suspend/resume near then? No to all. Suspend/Resume is disabled. System was just sitting doing nothing. Normally this system is headless, I ssh/Xrdp into it as needed. I can now access console because in trying to do Comment#1 I now have a console (I thought being remote may have been the issue). > 2) Is it possible for you to capture /sys/firmware/interrupts/gpe6D both at bootup and if it normally doesn't increment right after the issue happens? You might need to configure logind to ignore power button events for now to make sure your system doesn't turn off when it happens. I am willing to do anything, but no clue how to do this. > 3) Would it be possible for to you try to revert the suspected patch to see if this issue goes away? I don't think I have the space on this box to compile a kernel, and honestly this is uncharted territory for me. I need local Fedora help to test this.
Here's a patch that I think should help your issue and still work for Yijun. https://lore.kernel.org/linux-acpi/20250404145034.2608574-1-superm1@kernel.org/T/#u Hopefully some Fedora guys can make you a test kernel. I'll needinfo Hans, maybe he can.
Or maybe Justin. Whoever does; please clear the needinfos for other Fedora guys when you post it.
This could be a "red herring" because we're dealing with an element of randomness, but is there any change having acpica-tools installed (or not) could influence this problem? I was running a 6.15 kernel overnight for maybe 6 hours and I didn't see any shutdown events. This morning I uninstalled acpica-tools and across about 3 hours I've seen 3 shutdown events. It could easily be a coincidence but it seems suspicious to me.
I don't see any reason to believe those two are linked. That package doesn't install any daemons, the tools inside it are launched on demand.
(In reply to Mario Limonciello from comment #10) > Here's a patch that I think should help your issue and still work for Yijun. > > https://lore.kernel.org/linux-acpi/20250404145034.2608574-1-superm1@kernel. > org/T/#u > > Hopefully some Fedora guys can make you a test kernel. I'll needinfo Hans, > maybe he can. https://koji.fedoraproject.org/koji/taskinfo?taskID=131147672 should be done soon for testing.
> https://koji.fedoraproject.org/koji/taskinfo?taskID=131147672 should be done > soon for testing. Thanks Justin, running it now and trapping event type 1.
More than 6 hours later still no bogus events. The acid test is overnight though. But it's looking really good so far.
Still no bogus events after 24 hours.
Sounds like the correct root cause. If you wouldn't mind, please leave a Tested-by tag [1] on the v2 patch submission. [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
(In reply to Mario Limonciello from comment #18) > Sounds like the correct root cause. If you wouldn't mind, please leave a > Tested-by tag [1] on the v2 patch submission. Hopefully I did that correctly.
I ran updates on my Rawhide box and allowed the kernel to update to: kernel-6.15.0-0.rc0.20250404gite48e99b6edf4.11.fc43.x86_64 Which (from 2025-04-05) I'm guessing would not yet have the patch to fix the issue, and I got my first bogus event1 in under 30 minutes.
If I'm not mistaken rc2 upstream has the fix for this, and rc2 is available in Fedora now. I suspect the patch made it into at least one of the later Fedora rc1 kernels as well because: kernel-6.15.0-0.rc1.20250413git7cdabafc0012.21.fc43 tested OK. I'll close this as fixed.