2264606 – Fedora only allows up to C3 PKG C-states. All other distros on same hardware allow C10

Bug 2264606 - Fedora only allows up to C3 PKG C-states. All other distros on same hardware allow C10

Summary: Fedora only allows up to C3 PKG C-states. All other distros on same hardware ...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gnome-settings-daemon
Sub Component:
Version:	39
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	GNOME SIG Unassigned
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://discussion.fedoraproject.org/...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-02-16 20:57 UTC by otheos
Modified:	2024-11-27 22:58 UTC (History)
CC List:	27 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2024-11-27 22:58:21 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg output (107.46 KB, text/plain) 2024-02-16 20:58 UTC, otheos	no flags	Details
turbostat output (4.89 KB, text/plain) 2024-02-16 20:59 UTC, otheos	no flags	Details
lspci -vvv output (38.30 KB, text/plain) 2024-02-16 21:00 UTC, otheos	no flags	Details
cpupower idle-info output (549 bytes, text/plain) 2024-02-16 21:01 UTC, otheos	no flags	Details
USB runtime_status, lsusb and lsusb -t output (1.42 KB, text/plain) 2024-02-17 13:50 UTC, otheos	no flags	Details
Ouput of Si0x script (1.81 KB, text/plain) 2024-02-17 20:06 UTC, otheos	no flags	Details
1.b.out (1.18 KB, text/plain) 2024-02-17 23:18 UTC, otheos	no flags	Details
1.c.out (1.37 KB, text/plain) 2024-02-17 23:21 UTC, otheos	no flags	Details
2.b.out (1.46 MB, image/jpeg) 2024-02-17 23:23 UTC, otheos	no flags	Details
2.c.out (1.37 KB, text/plain) 2024-02-17 23:24 UTC, otheos	no flags	Details
3.b.out (4.88 KB, text/plain) 2024-02-17 23:25 UTC, otheos	no flags	Details
3.c.out (5.23 KB, text/plain) 2024-02-17 23:27 UTC, otheos	no flags	Details
4.b.out (1.41 MB, image/jpeg) 2024-02-17 23:28 UTC, otheos	no flags	Details
4.c.out (5.20 KB, text/plain) 2024-02-17 23:29 UTC, otheos	no flags	Details
lspci_v_fedora_6.5.6.out (12.93 KB, text/plain) 2024-02-17 23:31 UTC, otheos	no flags	Details
lspci_v_fedora_6.7.5.out (13.10 KB, text/plain) 2024-02-17 23:32 UTC, otheos	no flags	Details
lspci_v_fedora_6.6.10.out (8.43 KB, text/plain) 2024-02-17 23:33 UTC, otheos	no flags	Details
lspci_v_pop_6.6.10.out (13.13 KB, text/plain) 2024-02-17 23:34 UTC, otheos	no flags	Details
log.txt (36.31 KB, text/plain) 2024-02-20 23:20 UTC, otheos	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNOME Gitlab	GNOME gnome-settings-daemon merge_requests 354	0	None	opened	Draft: smartcard: Fetch GckModule list on our own	2024-02-19 16:14:56 UTC

Description otheos 2024-02-16 20:57:06 UTC

1. Please describe the problem:

Follow the link for screenshots (if they help).

https://discussion.fedoraproject.org/t/fedora-only-allows-up-to-c3-pkg-c-states-all-other-distros-on-same-hardware-allow-c10/105270/1

I am trying to figure this out for sometime and I’m posting again in hope anyone can help.

This is on a ThinkPad X13 Yoga Gen 3, i7-1255u.

We got 10 of these laptops, all setup with F39, and we are now facing this issue, Fedora won’t let the CPU (pkg) reach deeper C-states, with a massive effect on battery life. On Ubuntu we get 8hrs, on Fedora only 5 with similar load.

In my testing I have done clean installs of F38, F39 and Rawhide, all with the same issue: The CPU (pkg) goes up to C3. That’s it.

F38 (kernel 6.2.9) and Rawhide (F41, kernel 6.8.0) are the same, only C3.

I have tried Ubuntu 23.10, EndeavourOS (latest ISO), OpenSuse TW, Debian Testing, they all go to C10. All clean installs.

Also tested PopOS, Arch, Manjaro and Garunda, all reach C10. Nobara (Fedora based) is also limited to C3. No screenshots for these.

2. What is the Version-Release number of the kernel:
6.7.4-200.fc39.x86_64 (Every kernel since 6.5 in F38 to 6.8 in F41 do the same)

3. Did it work previously in Fedora? If so, what kernel version did the issue
*first* appear? Old kernels are available for download at
https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

I went back to 6.5 in F38, the problem is there too.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
the issue below:

Install F39, install powertop, check PKG C-states.

5. Does this problem occur with the latest Rawhide kernel? To install the
Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
``sudo dnf update --enablerepo=rawhide kernel``:

The problem remains with kernel 6.8 in F41 (rawhide full install).

6. Are you running any modules that not shipped with directly Fedora's kernel?:

This occurs from installation. So, fresh install, before and after updates. No other modules.

7. Please attach the kernel logs. You can get the complete kernel log
for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
issue occurred on a previous boot, use the journalctl ``-b`` flag.

Reproducible: Always

Comment 1 otheos 2024-02-16 20:58:13 UTC

Created attachment 2017248 [details]
dmesg output

Dmesg.

Comment 2 otheos 2024-02-16 20:59:51 UTC

Created attachment 2017249 [details]
turbostat output

Comment 3 otheos 2024-02-16 21:00:31 UTC

Created attachment 2017250 [details]
lspci -vvv output

lspci -vvv output

Comment 4 otheos 2024-02-16 21:01:37 UTC

Created attachment 2017251 [details]
cpupower idle-info output

cpupower idle-info output

Comment 5 Hans de Goede 2024-02-16 22:49:05 UTC

Thank you for reporting this.

I have just tried on my own Lenovo ThinkPad X1 Yoga Gen 8 which is Raptor Lake based and which should be very similar and there C10 is reached without issues.

So it seems there is some X13 specific issue in play.

Can you look at the tunables tab in powertop and see if there are any tunables marked as bad there (there typically are) and then run powertop --auto-tune and see if that helps?

Comment 6 otheos 2024-02-17 00:00:49 UTC

Thank you for the quick response. 

I have indeed tried that. Same tunables are marked as bad in all distros. ```-auto-tune``` will turn them to good, no change in the pkg C-state.

The maths dept (we're physx) took delivery of a few T14 gen 4 on Moday, they're Windows, but I grabbed one to test and indeed it too goes to C10. It's an i7-1365u. Our older T14 gen1 (10th gen intel) and my wife's T14s gen 2 (11th gen), all go to C10. 

One other user on reddit where I've posted this also reported the same issue with the X13Y Gen3 with an i5. 

I have compared as much as I can think of between Fedora and other distros and everything appears the same. 
CPU governor, no_turbo, hwp_dynamic_boost, number of p_states and c-states, epp, aspm policy, i915 rc6. I have tested Fedora with nmi_watchdog to 0 and even workqueue power_efficient to Y (the only two things that I could find to be different from other distros out of the box), no change. Ditto for platform profile and lapmode. No difference. 

By the way these are all gnome distributions, I haven't tested Fedora KDE.

Many thanks again.

Comment 7 otheos 2024-02-17 00:45:41 UTC

Some more info on things I've tried:

Disabled everything (wifi/camera/Bt/TB4/etc) on the bios. Changed S3/Si0. Changed HTT on/off. Toggled Optimised Settings on/off. Power management (when off PKG only at C0, as expected). 

Stopped gdm and tested from tty, docked on USB-C dock and undocked (lid up/down), on power/on battery.

On TB3 dock other distros are limiting PKG to C3, Fedora to C2. That's the only possibly interesting finding.

Comment 8 Hans de Goede 2024-02-17 10:05:47 UTC

Ok, some further ideas to explore:

1. This feels like it might be related to some USB device never going to sleep.

Try doing:

cat /sys/bus/usb/devices/*/power/runtime_status

On a laptop without external USB devices connected this should output
a bunch of suspended lines and no active ones. If an USB device is
active then that is the likely culprit and you need to do the
cat one by one to find the culprit device(s).

If you find one or more culprit devices this way please provide "lsusb" and "lsusb -t" output.

2. Maybe some service is causing this issue.
2a For starters I would try disabling thermald
2b Maybe boot to text mode (append 3 to the kernel commandline) and then try stopping as much services as possible and see if P10 can be reached then

3. Give: https://github.com/intel/S0ixSelftestTool a try and read the linked (archived) blog post from there.

4. Try comparing "lspci -v" output between distros and see if there are devices
which have a driver in one distro and not in the other. Or if somehow the working
distro uses a different driver then Fedora. Specifically compare the
"Kernel driver in use" value for each PCI device.

5. Try grabbing a recent kernel from one of the other distros.

Grab both the /boot/vmlinuz-xxxx files and the /lib/modules/xxxx directory and copy these to the Fedora install.

Then, lets say xxxx=6.7.4-1-generic run the following command:

kernel-install add 6.7.4-1-generic /boot/vmlinuz-6.7.4-1-generic

where the second argument ("6.7.4-1-generic") is the name of the directory under /lib/modules and the third argument is the actual vmlinuz file under /boot and then boot into this other distros kernel. I have never tried this before, but I expect things to still mostly work (with some complaints about selinux not being available).

If you do chose a kernel with selinux support I suggest adding enforcing=0 to the kernel commandline in case there are some differences in the selinux kernel config.

Note in case it is not clear I'm mostly just using a scattershot approach with (somewhat educated) guesses here to try and find a culprit. Are in case of 5. at least determine that this is caused by something in the kernel and not something in userspace. Thank you for your patience in debugging this.

Comment 9 otheos 2024-02-17 13:50:25 UTC

Created attachment 2017349 [details]
USB runtime_status, lsusb and lsusb -t output

USB runtime_status, lsusb and lsusb -t output

Comment 10 otheos 2024-02-17 14:02:55 UTC

Wow, thanks. 

I have submitted the first, USB related output. I can see there are two active. I assume one is the Card reader, the other the IR camera.

I will go through the rest tomorrow or later today. I will need to do a clean Fedora install with grub (we use systemd-boot) so that I can follow your kernel swap instructions. 

I appreciate without the laptop on hand it's hard to work in finding the culprit. That's why for all troubleshooting I install fresh and add nothing to it. So I'll swap the SSD out (as I do currently work on this system) and start fresh to try all your suggestions and report back.

And please consider my patience a given. We've been with Fedora since Core 1 and before than since Redhat 5.2, ever since we moved from Sun/Solaris to x86/Linux. Anyway, many, many thanks for this.

Comment 11 otheos 2024-02-17 20:06:31 UTC

Created attachment 2017395 [details]
Ouput of Si0x script

./s0ix-selftest-tool.sh -s output from https://github.com/intel/S0ixSelftestTool?tab=readme-ov-file



C10 is reached when in S0ix.

Comment 12 otheos 2024-02-17 20:10:29 UTC

                                         
OK, some progress made.

2. Disabling thermald did not help. 
However booting to 3 did. ***If gdm is not loaded, I can reach C8.*** Not quite C10 but it's a huge progress.

I have yet to disable any more services to see if C10 can be approached.

3. The system supports S0ix (set in the bios to Windows and Linux, rather than Linux S3). This is verified by running the script at the top of: 
https://web.archive.org/web/20230614200816/https://01.org/blogs/qwang59/2018/how-achieve-s0ix-states-linux

The little scripts returns (among other things):

Low Power S0 Idle is 1
The system supports S0ix!


I have uploaded the output of the github script. The system goes to suspend, C10 reached.
The system will not go to C10 with screen on (Xorg and Wayland), nor with screen off (Xorg). I could not test in Wayland as the script uses xset. But I'd expect the same.

I have yet to test kernels from other distributions, which is what I am about to do now. 

Many thanks again.

Comment 13 otheos 2024-02-17 20:28:24 UTC

Some more updates.

Disabling gdm ```systemctl disable gdm``` as expected has the same results with booting to txt mode, that is C8 is reached.

Running ```./s0ix-selftest-tool.sh -r on``` from txt mode (booting to it), reaches C10.

After ```startx``` (as instructed) and running from gnome ```./s0ix-selftest-tool.sh -s off``` it fails to reach C10.

Exiting graphical mode and back to tty, checking again, only C3 can be reached. 

Many thanks again.

Comment 14 otheos 2024-02-17 20:54:55 UTC

5. No change!

I have used PopOS 6.6.10-generic. PopOS with this kernel goes to C10 (I dual boot Fedora with Pop).

The kernel boots in Fedora without a hickup, but once in graphical environment, C3 is the lowest it can get, again.

So it's not the kernel.

Comment 15 otheos 2024-02-17 23:16:42 UTC

Here's a conclusion of all findings:

Condition

1. Fedora 6.7.4 with GDM on
2. Fedora 6.7.4 with GDM off
3. PopOS 6.6.10 with GDM on
4. PopOS 6.6.10 with GDM off

Test
a. Powertop deeper PKG C-state
b. s0ix-selftest-tool.sh -r off
c. s0ix-selftest-tool.sh -r on (needs startx when GDM off, so, effectively GDM on)


I will use the numbers 1-4 and letters a-c for the results in the format Condition.Test:

So,

1.a: C3
2.a: C10
3.a: C3
4.a: C10

1.b: C10 not reached  
2.b: C10 reached
3.b: C10 not reached, but some different output to 1.b, please see uploaded files.
4.b: C10 reached but, but again, some differences to 2.b, please see uploaded files.

Below all run in Xorg, not Wayland (graphics mode) as xset is used in the script.
1.c: C10 not reached
2.c: C10 not reached
3.c: C10 not reached, with differences to 1.c/2.c
4.c: C10 not reached, with differences to 1.c/2.c

Conclusion:
GDM is clearly the issue. Once the graphical interface loads, both kernels only go up to C3.
With GDM disabled, both kernels go to C10. Some diffeences in the output of intel's script.

I also tested with Fedora's 6.5.6 kernel (original in F39 installation), and there is a difference:
This kernel behaves as 6.7.4, except it only goes to C8, not C10.

On inspecting ```lpci -v```, as suggested, 6.7.4 and 6.6.10 (Pop) produce very similar output. All drivers are the same.
However, the output of ```lspci -v``` from kernel 6.5.6 has a differene compared to both 6.6.10 (Pop) and 6.7.4 differs like so:

6.5.6:
~~~
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01)
	Subsystem: Lenovo Device 50b0
	Flags: fast devsel, IOMMU group 14
	Memory at a0800000 (32-bit, non-prefetchable) [size=4K]
~~~

Both 6.6.10 (Pop) and 6.7.4 have:
~~~
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01)
	Subsystem: Lenovo Device 50b0
	Flags: fast devsel, IOMMU group 14
	Memory at a0800000 (32-bit, non-prefetchable) [size=4K]
	Kernel driver in use: intel-spi
	Kernel modules: spi_intel_pci
~~~

The difference in the last two lines. 
Is it possible that this difference causes 6.5.6 to only go up to C8 while both others go to C10 (when GDM is off)?

I upload all files named after the above nomenclature. When on TTY I just took photos, sorry.

Many thanks for all this.

Comment 16 otheos 2024-02-17 23:18:47 UTC

Created attachment 2017423 [details]
1.b.out

s0ix-selftest-tool.sh -r on with 6.7.4 and GDM on.

Comment 17 otheos 2024-02-17 23:20:31 UTC

Comment on attachment 2017423 [details]
1.b.out

s0ix-selftest-tool.sh -r on with 6.7.4 and GDM on.

Comment 18 otheos 2024-02-17 23:21:59 UTC

Created attachment 2017424 [details]
1.c.out

s0ix-selftest-tool.sh -r off for 6.7.4 with GDM on

Comment 19 otheos 2024-02-17 23:23:12 UTC

Created attachment 2017425 [details]
2.b.out

s0ix-selftest-tool.sh -r on for 6.7.4 with GDM off.

Comment 20 otheos 2024-02-17 23:24:33 UTC

Created attachment 2017426 [details]
2.c.out

1s0ix-selftest-tool.sh -r off for 6.7.4 GDM off (after startx)

Comment 21 otheos 2024-02-17 23:25:50 UTC

Created attachment 2017427 [details]
3.b.out

s0ix-selftest-tool.sh -r on for 6.6.10 (Pop) with GDM on.

Comment 22 otheos 2024-02-17 23:27:10 UTC

Created attachment 2017428 [details]
3.c.out

s0ix-selftest-tool.sh -r off on 6.6.10 (Pop) GDM on.

Comment 23 otheos 2024-02-17 23:28:30 UTC

Created attachment 2017429 [details]
4.b.out

s0ix-selftest-tool.sh -r off for 6.6.10 (Pop) with GDM off

Comment 24 otheos 2024-02-17 23:29:16 UTC

Created attachment 2017430 [details]
4.c.out

s0ix-selftest-tool.sh -r on with 6.6.10 (Pop) with GDM off (but startx)

Comment 25 otheos 2024-02-17 23:31:16 UTC

Created attachment 2017431 [details]
lspci_v_fedora_6.5.6.out

lspci -v output for Fedora with kernel 6.5.6

Comment 26 otheos 2024-02-17 23:32:01 UTC

Created attachment 2017432 [details]
lspci_v_fedora_6.7.5.out

lspci -v output of Fedora with 6.7.4

Comment 27 otheos 2024-02-17 23:33:46 UTC

Created attachment 2017433 [details]
lspci_v_fedora_6.6.10.out

lspci -v output from Fedora with kernel 6.6.10 (Pop)

Comment 28 otheos 2024-02-17 23:34:41 UTC

Created attachment 2017434 [details]
lspci_v_pop_6.6.10.out

lspci -v output from PopOS with kernel 6.6.10

Comment 29 otheos 2024-02-18 01:13:40 UTC

I got it. It's the touchscreen. 

After it became clear that the graphics environment causes it, I installed Fedora KDE, went to the settings, disabled the touchscreen, rebooted, and powertop went straight to C10.

I'm back to gnome to find a way to disable the touchscreen to confirm.

Comment 30 otheos 2024-02-18 09:45:19 UTC

This was a very long Saturday. 

In the end it's not the touchscreen. It's gnome. I installed F39 KDE, and C10 is reached just fine. I added KDE to the first (gnome) installation, switched to sddm and started KDE, still only C3.

Comment 31 Hans de Goede 2024-02-18 10:56:47 UTC

Thank you for all your hard work on debugging this.

If I understand things correctly, you now have 2 F39 installs:

1. Workstation install with KDE installed in parallel, this only gets C3 even when using KDE
2. KDE install this gets C3 just fine

Both with the touchscreen enabled/working, correct ?

What happens if you do:

systemctl disable gdm.service
systemctl enable ssdm.service

on the mixed install and then reboot so that it boots into ssdm directly and then start kde, without ever having run gdm ?

Comment 32 Hans de Goede 2024-02-18 10:59:58 UTC

About your lusb output. Here is something which I don't think will fix the PC10 issue, but it should help save some power regardless:

1. sudo cp /lib/udev/hwdb.d/60-autosuspend-fingerprint-reader.hwdb /etc/udev/hwdb.d/60-autosuspend-fingerprint-reader.hwdb

2. sudo edit /etc/udev/hwdb.d/60-autosuspend-fingerprint-reader.hwdb

And then find:

usb:v06CBp00E9*

and below it add:

usb:v06CBp00F9*

and save.

3. sudo udevadm hwdb --update

And then after rebooting "cat /sys/bus/usb/devices/*/power/runtime_status" should now show all USB devices suspended. If you can confirm that this works and has no negative side-effects then I can submit this change upstream.

Comment 33 otheos 2024-02-18 12:05:03 UTC

Thank you for your Sunday reply. Much appreciated.

So currently:

1. Gnome + KDE. Only C3 when either GDM or SSDM are enabled. 
2. KDE only, C10 achieved.

The touchscreen was not responsible, I just got tired and failed to realise that once you use the computer (open settings, fire up thunderbird, C10 is not very likely and hence gone). So no, the touchscreen doesn't make a difference.


I will install gnome in the KDE install and see if enabling GDM and disabling SSDM will keep C10 possible in gnome. 

I will check on USB note (much appreciated taking the time for something like this) a bit later. 

Thanks again, and sorry for the bombardment of replies, I wish there was a delete/edit button.

Comment 34 otheos 2024-02-18 12:26:22 UTC

I stand corrected.

1. Gnome+KDE, C3 in Gnome/GDM, C10 in KDE/SDDM.
2. KDE+Gnome, C10 in KDE/SDDM, C3 in Gnome/GDM. 

It appears as soon as gnome loads, C3 is the deepest it goes. So the results are symmetric. KDE allows C10, gnome doesn't. Regardless of which was installed first.

Comment 35 Hans de Goede 2024-02-18 13:01:25 UTC

> Thanks again, and sorry for the bombardment of replies, I wish there was a delete/edit button.

No problem and thank you for your perseverance to get to the bottom of this.

What happens if you start with gdm and then from a text-console or over ssh do:

systemctl stop gdm
systemctl start ssdm

Does KDE in that case also only go to C3 ?

If yes, can you then compare the output of "ps aux" of a C10 KDE session and a C3 KDE sessions ?
(filter the output to only get the last column, then sort, then run diff -u on the 2 ?)

I suspect that GNOME dbus activates some daemon which then ends up keeping some hw open which is causing this.

###

If that does not help then try disabling various services in the
gnome-session, e.g.

systemctl --user disable --now org.gnome.SettingsDaemon.Wacom.target

And then:

systemctl --user status org.gnome.SettingsDaemon.Wacom.target

(some may require masking to really be stopped)

And then boot into ssdm and start a gnome-session from ssdm (the gdm session itself will still have the services enabled,
the systemctl --user disable --now only disables them in the gnome-session of the logged in user).

Possible culprits are:

org.gnome.SettingsDaemon.Color.target
org.gnome.SettingsDaemon.Smartcard.target
org.gnome.SettingsDaemon.UsbProtection.target
org.gnome.SettingsDaemon.Wacom.target

See "systemctl --user" output for a full list.

Comment 36 otheos 2024-02-18 15:03:40 UTC

> Does KDE in that case also only go to C3 ?

Steps:
1. Start with GDM
2. Don't log in
3. Go to tty
4. Disable gdm, enable sddm
5. Log in to KDE, goes to  C10
6. Go to tty
7. Disalbe sddm, enable gdm
8. Log in to Gnome, goes to C3
9. Go to tty
0. Disable gdm, enable sddm
1. Log in to KDE, goes to C10


I masked the named services (.target) you listed. No difference.

Steps:
1. Start to SDDM
2. Log in to Gnome, goes to C3
3. Logout, log in to KDE, goes to C10


I then did the opposite: Boot to GDM, start KDE. Goes to C8 (not C10, but I only tested once).

I will now disable all org.gnome.Settings* targets and report back.

Many thanks.

Comment 37 otheos 2024-02-18 15:05:14 UTC

I will test again your USB suggestions, but I think I need a clean install to test.

I have tested but it was inconclusive. I will report back. Thanks again.

Comment 38 otheos 2024-02-18 15:21:10 UTC

I have tested all org.gnome.SettingsDaemon.*.target.

I masked them, then boot to SDDM, log in to gnome, still C3.

I'm out of ideas at this point. There are hundreds of other targets to mask, but my understanding of gnome internals is minimal. 

Any ideas welcome. And as always, many thanks.

Comment 39 Hans de Goede 2024-02-18 15:35:47 UTC

Hmm, interesting. So if I understand things correctly then exiting GNOME will bring things back to C10 again, right ?

IOW the C3 problem only happens when GNOME is actually *running*, right ?

Can you look at the wakeups (initial / main screen) in powertop and see if anything stands out there in gnome vs KDE ?

And what about "cat /proc/interrupts" is there a device where the amount of interrupts increases significantly more per say 10 seconds under GNOME then under KDE ?

About the hundreds of other targets, you can ignore all the sys-devices-*.device targets as well as the *.mount ones...

Comment 40 Hans de Goede 2024-02-18 15:46:35 UTC

One more idea, try running:

sudo lsof -F n | grep -E '/dev/|/sys/' | sort | uniq

Under both GNOME and KDE and take a look to see if any differences stand out and/or attach the output of both commands then I can take a look.

Comment 41 otheos 2024-02-18 17:05:07 UTC

I did a watch -n 1 and -n 10 for ```/proc/interrupts```, not much difference. Looking at the main page of powertop also, nothing stands out, but to be honest I am not sure what to look at.

The two ```lsof``` commands have the following difference (diff gnome.out kde.out):

2d1
< n/dev/bus/usb/003/002
31a31
> n/dev/pts/4


I can attach the two outputs if needed.

But yes, I can log in to KDE from GDM or SDDM and get C10 but doing the same for gnome gets me only C3. Exit gnome, enter KDE, back to C10. 

I will proceed with to disable more targets and see what happens. Thanks!

Comment 42 otheos 2024-02-18 17:38:13 UTC

This is driving mad:

1. Boot to SDDM, wait at log in screen
2. Connect to the X13y from a T480s via ssh
3. Fire up powertop on the T480s. C10 achieved.
4. On X13y log in to gnome.
5. On T480s see in powertop the system to only reach C3.
6. Log out of gnome, back to SDDM log in screen in X13y.
7. On T480s see in powertop the system to go back to C10.

Same happens from GDM.

Now, while connected through sshd, I can see the main thing changing (a lot) is the SmartCard Reader in the device stats from 100% when gdm/gnome is running, to 0% when KDE is running.

I need to find a way to disable it and see if it makes a difference. I'll look also to just pull the wire if possible.

How can one disable a usb device from within the OS? There is no setting in the bios.

Comment 43 otheos 2024-02-18 18:02:08 UTC

I GOT IT!!!!!

It was the card reader!!

I'm a hardware guy, so I just pulled the tab and disconnected it altogehter.

Enter Gnome: C10 right away!


So the question then is, how can this be done without pulling the tab, at first.

And then, how can we stop gnome from pegging that Smartcard reader to 100%.

The interesting part is that I did, as you suggested, remove the smartcard target, but it made no difference. I don't know how this can be fixed, but I can provide any info required.

Do we need the smartcard readers? Sadly we do! But I prefer the battery life while this is address. Where do I file a bug for this though?

Comment 44 Michael Catanzaro 2024-02-18 21:54:11 UTC

(In reply to otheos from comment #43)
> Where do I file a bug for this though?

We need to figure out what exactly is using the smartcard reader. That might not be easy. Maybe start by killing nonessential services? You can't kill gnome-session without taking down the desktop, but most other things should be possible to kill. I see I have scdaemon running. That's a little suspicious. Less likely, but worth a try: p11-kit-remote/p11-kit-server, gpg-agent, gssproxy, ssh-agent (all seem like things that might plausibly want to use smartcard). Nuclear option: kill everything you can that begins with letters A-M, and if that doesn't work, kill everything that begins with letters N-Z; hope that it's caused by one half or the other.

Comment 45 Hans de Goede 2024-02-18 21:58:00 UTC

> We need to figure out what exactly is using the smartcard reader.

One starting point here would be the output of:

sudo lsof | grep /dev/bus/usb/003/002

Since that showed up in the open /dev files list difference between KDE and GNOME.

I expect that if you kill the process which the above command lists, that that will give you PC10, assuming the process does not get restarted immediately again ...

Comment 46 otheos 2024-02-18 22:14:38 UTC

sudo lsof | grep /dev/bus/usb/003/002

There it is: pcscd

It does indeed restart as soon as I kill it, but I renamed ```/usr/sbin/pcscd``` and couldn't start, as soon as it was killed, I got C8 (from C3).  Let me restart the session.

Comment 47 otheos 2024-02-18 22:18:17 UTC

Yes, I can confirm, with pcscd out of the way (renamed, not loading/running) after a clean restart, I get C10.

Comment 48 otheos 2024-02-18 22:24:52 UTC

I quickly installed OpenSUSE TW and EndeavourOS (I already have PopOS on this disc), neither run pcscd when gnome loads. I don't now if this helps, but it certainly explains the initial question: Why only on Fedora, not other distributions.

Comment 49 otheos 2024-02-18 23:14:34 UTC

Final update for tonight:

I installed pcscd (and pcsc_scan) and started it in PopOS to see how it behaves. When the service starts, it takes over the USB device, and the PKG goes to C3 (from C10). It then releases it, and the PKG goes back to C10. I test when it takes over/releases with ```sudo lsof | grep /dev/bus/usb/003/002```. Takes over = pcscd listed, powertop shows 100% usage , releases = pcscd not listed, powertop shows 0% usage.

Now, once the system is in C10 again, I start pcsc_scan, the PKG goes back to C3 (pcscd takes over again). A few seconds later it releases the usb device, PKG goes to C10 again when pcscd releases the usb device. 

In Fedora the process is the same, that is, pcscd takes over and releases the usb device, however on powertop the usb device is pegged at 100% and the PKG is at C3 at best.

So maybe this is a problem with the pcscd package?

Comment 50 otheos 2024-02-19 09:13:51 UTC

Please ignore comment 48, it's not correct. I will test later. Thanks.(In reply to otheos from comment #48)

Comment 52 Ray Strode [halfline] 2024-02-19 13:29:25 UTC

Ludovic Rousseau can you provide any insight here ? Under what circumstances can a card reader go into a low power state when pcscd is running? 

If GNOME is, e.g. querying slot state will that power the card reader up, or will pcscd notice when the reader is powered down and return the last known value instead of powering it up ? 

Will inserting a card into a reader in a low power state wake it up?

Here's my stab in the dark guess what's going on:

1. the pkcs11 module associated with these card readers doesn't support blocking while waiting for card insertion/removal events
2. this is making gsd-smartcard fall back to polling with CKF_DONT_BLOCK once a second
3. each time it polls, pcscd restarts some internal, maybe multi-second timer to prevent itself from powering down the card reader.

it's a little confusing that blocking the gnome smartcard service didn't stop the problem though.  What about taking the 

Wants=org.gnome.SettingsDaemon.Smartcard.target

out of /lib/systemd/user/gnome-session.d/gnome.session.conf ?

Comment 53 Ray Strode [halfline] 2024-02-19 16:14:57 UTC

So current theory:

1. Calling C_WaitForSlotEvent(CKF_DONT_BLOCK) powers up a suspended card reader (rather than say returning CKR_NO_EVENT)
2. gnome-settings-daemon since being ported to p11-kit always calls C_WaitForSlotEvent(CKF_DONT_BLOCK) in a loop, because the p11-kit proxy pkcs11 module doesn't support blocking mode

I've sketched out a (yet untested) draft that may help here: https://gitlab.gnome.org/GNOME/gnome-settings-daemon/-/merge_requests/354

It changes gnome-settings-daemon to forgo the p11-kit proxy module and just use the underlying PKCS11 modules directly (and they presumably support blocking mode).

It also makes the gnome-settings-daemon smartcard process not call into pcscd at all unless smartcard authentication is enabled. We currently only use the list of smartcards in gnome-settings-daemon for authentication purposes anyway (lock screen on removal, start smartcard pam service when locked and a smartcard is inserted).

Comment 54 otheos 2024-02-19 16:35:04 UTC

I appreciate this is now being discussed at a higher level than my understanding, and I'm not sure if my input helps anymore, but since I've done this little testing, here are my results.

I tried Ubuntu 23.10, Opensuse TW and EndeavourOS.

OpenSuse and Endeavour OS do have pcscd running after install once gnome starts, but the hardware is not available. Running pcsc_scan cannot find the reader. A different problem altogether, but to my eyes, there is "no problem" with c-states because clearly the card reader stays unused.

Ubuntu doesn't even install pcscd out of the box, so again, a clean install on the X13Y Gen3, will make it seem there is not C-states issue.

What is interesting however is that once pcscd is manually installed in Ubuntu, it works well. That is, once an attempt to read a card is made with pcsc_scan (also installed by me), the card is read fine. You can see that once pcsc_scan starts, the use of the reader goes to 100% (in powertop), and C-states go up to C3. Once pcsc_scan ends, pcscd keeps the card reader active for exactly 60 seconds, then releases it. The use drops to 0%, C-states return to C10.

I hope this helps. I'll stay quiet now.

Again, many thanks for your attention help and support. It means a lot.

Comment 55 Ludovic Rousseau 2024-02-19 20:06:38 UTC

> Ludovic Rousseau can you provide any insight here ? Under what circumstances can a card reader go into a low power state when pcscd is running? 

I don't think a smart card reader can go in low power state when pcscd is running. Because the USB device is still used/opened by the CCID driver used by pcscd.

What happens is:
- pcscd is started on demand when an application wants to use a smart card. See "pcscd auto start using systemd" https://blog.apdu.fr/posts/2011/11/pcscd-auto-start-using-systemd/
- pcscd will open and use any connected USB CCID reader (using the CCID driver)
- a smart card is powered on if an application is using the card. See "Card auto power on and off" https://blog.apdu.fr/posts/2010/10/card-auto-power-on-and-off/

- pcscd will exit automatically after 60 seconds if no application is using it.
- when pcscd exits the USB reader(s) will be released and can go to low power state. See https://github.com/LudovicRousseau/CCID/blob/master/src/92_pcscd_ccid.rules#L25-L26

> If GNOME is, e.g. querying slot state will that power the card reader up, or will pcscd notice when the reader is powered down and return the last known value instead of powering it up ? 

As long as pcscd is running the USB smart card reader should be active.
And pcscd will run as long as an application is using it (pcsc_scan or another one)

> Will inserting a card into a reader in a low power state wake it up?

Wake up the smart card reader? or wake up the laptop?

Now the question is: what prevents the CPU to go in C10 state instead of C3 state?
pcscd and the CCID driver are not doing active polling. They wait for events: libusb USB interrupt for card notification and udev event for USB reader insertion/removal.

I will try to see what behaviour I get on my Lenovo Thinkpad P17.

Comment 56 Ludovic Rousseau 2024-02-20 17:34:38 UTC

I am not a powertop expert.

± sudo ./s0ix-selftest-tool.sh -s

---Check S2idle path S0ix Residency---:

The system OS Kernel version is:
Linux lenovo 6.6.15-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.15-2 (2024-02-04) x86_64 GNU/Linux

---Check whether your system supports S0ix or not---:

Low Power S0 Idle is:0
Your system does not support low power S0 idle capability.     
Isolation suggestion:     
Please check BIOS low power S0 idle capability setting.


Maybe I have to change something in the BIOS.

without pcscd running, powertop gives me:
 *  *  *   Processor Idle State Report   *  *  *

Package;0
C2 (pc2); 26,0%
C3 (pc3);  0,9%
C6 (pc6); 10,4%
C7 (pc7);  7,9%
C8 (pc8); 44,1%
C9 (pc9);  0,0%
C10 (pc10);  0,0%

Whith pcscd running, I get very similar results:
 *  *  *   Processor Idle State Report   *  *  *

Package;0
C2 (pc2); 22,7%
C3 (pc3);  0,6%
C6 (pc6); 12,1%
C7 (pc7);  5,4%
C8 (pc8); 51,4%
C9 (pc9);  0,0%
C10 (pc10);  0,0%


At least my system goes into C8 state.

What exactly should I do to check if C10 state is used?
Should I also suspend the laptop or just do nothing with it so it is idle?

Comment 57 otheos 2024-02-20 19:24:58 UTC

I think this problem may be unique to the X13 Yoga Gen 3. Our T14s Gen4 (AMD) don't suffer from it, neither our X1 Carbon Gen 11, or the older T480s.

Comment 58 Mark Pearson 2024-02-20 19:54:03 UTC

Interesting thread and debug - thanks all.

Picking up on the last comment, that this may be X13 Yoga specific: Is this something I should be following up with the Lenovo FW team for?

I'm assuming there is something on the X13 G3 that is keeping PCSD 'busy' so it doesn't release the card reader? Are there any signals we should check for that might be causing it to run/stay running?

Will try and reproduce in the lab. Tracked internally with LO-2874.

Mark

Comment 59 Ludovic Rousseau 2024-02-20 20:09:45 UTC

@bugzilla what you can do is to generate a pcscd log trace as documented at https://pcsclite.apdu.fr/#support

Comment 60 otheos 2024-02-20 22:27:48 UTC

Many thanks for all the support on this.

I resumed testing this evening (weekdays are difficult, teaching etc in the way). 

The problem has been simplified significantly: It only occurs after a fresh boot. If I restart pcscd.socket, the expected behaviour is resumed. That is the usb smartcard reader is released after 60 seconds and the pkg returns to C10.

I haven't got as much access to those X13y's during weekdays, but I will test Ubuntu to simulate if this occurs. As a reminder Ubuntu 23.10 does not install pcscd by default, and I've only tested it by installing it myself and running it (and it ran as expected) but I have not tested starting from a cold boot like in Fedora. 

I am not sure if this helps, but this is a video of a cold boot and then restarting the socket: https://streamable.com/at644t

Comment 61 otheos 2024-02-20 23:17:57 UTC

I cannot replicate the issue on Ubuntu.

I don't know what to make of this, but there is a clear change in the output of "sudo lsof | grep /dev/bus/usb/003/002" before and after I restart the socket. Maybe this is relevant. My understanding is limited, sorry. 

The restart is at 4:55.

https://streamable.com/at644t

Comment 62 otheos 2024-02-20 23:20:06 UTC

Created attachment 2017903 [details]
log.txt

pcscd debug output from: 

sudo LIBCCID_ifdLogLevel=0x000F pcscd --foreground --debug --apdu --color | tee -i log.txt

Comment 63 Ludovic Rousseau 2024-02-21 10:46:06 UTC

Your pcscd log does not show anything special.
No application that use the smart card is running so that is not surprising we do not see anything special.

You can use this script to know what application is using pcscd
https://github.com/LudovicRousseau/PCSC-contrib/blob/master/list_pcsc_applications.sh

The problem is not that pcscd does not release the reader. pcscd will have the same behavior on other laptops were the problem is not present. Please do not blame pcscd :-)

If you have access to an external USB reader we can do another test: use the external reader, disable the internal Alcor Micro reader and see if you have the same problem.
See "Remove and/or customize PC/SC reader names" https://blog.apdu.fr/posts/2015/12/remove-andor-customize-pcsc-reader-names/
Edit /etc/default/pcscd and add the line:
PCSCLITE_FILTER_IGNORE_READER_NAMES="Alcor"
restart pcscd or reboot

Comment 64 otheos 2024-02-25 11:19:04 UTC

Sorry for not getting back to this all week, but those X13's are all not handed to their users after half term break, since we're back to full teaching now.

I have for the time being masked pcscd.socket so it won't start at boot up. This has clearly stopped the issue and we do our log in with passwords (the smartcards were not setup yet anyway).

I would not blame pcscd, I know nothing about how it works and I couldn't make any assumptions as to what the problem is, but it is specific to the X13 Yoga Gen 3 model.

I have tested 4 different ThinkPads and it works fine. Ditto for a couple of (older) Dell Latitudes. It works as expected.

Sadly I don't have an external reader to try.

To summarise, the issue is at boot up. If I restart pcscd.socket it behaves as expected. Sadly my comparison with other distributions has fallen short as a controlled experiment. This is due to the fact that EndeavourOS and OpenSuse TW while they install pcscd out of the box, running pscd_scan cannot access the reader. So I cannot test.
On ubuntu it works but Ubuntu doesn't install pcscd out of the box, and when I install it manually, it works fine. So that's as close as I can get to an experiment.

At this point, and with the massive help of everyone here, and again, thank you so much for your time and attention, I can work around this issue. Once the smartcards come from our IT dept I will just run a small startup script to restart pcscd at boot to work around the issue and still have access to the reader.

What I haven't tested is when pcscd starts to see if that has an effect. I will have an X13 available sometime this week and report back.

Again, many thanks to everyone.

Comment 65 Aoife Moloney 2024-11-13 12:04:51 UTC

This message is a reminder that Fedora Linux 39 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 39 on 2024-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '39'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 39 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 66 Aoife Moloney 2024-11-27 22:58:21 UTC

Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26.

Fedora Linux 39 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.

acaringi
adscvr
airlied
alciregi
bskeggs
fmuellner
gnome-sig
hdegoede
hpa
jarod
josef
kernel-maint
klember
linville
ludovic.rousseau
masami256
mcatanza
mchehab
mkasik
mpearson
nixuser
ofourdan
ptalbert
rstrode
steved
tiagomatos
yaneti