Bug 2211024

Summary: systemd mistakes LidSwitchDocked event for LidSwitchExternalPower
Product: Red Hat Enterprise Linux 9 Reporter: Laszlo Ersek <lersek>
Component: systemdAssignee: systemd maint <systemd-maint>
Status: NEW --- QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: unspecified    
Version: 9.2CC: dtardon, systemd-maint-list
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Laszlo Ersek 2023-05-30 10:07:04 UTC
*** Description of problem:

My work laptop is a ThinkPad P1 Gen 3; at home I use it docked. The dock is the "ThinkPad Thunderbolt 4 Workstation Dock", i.e., the one with the "power passthrough" cable:

https://support.lenovo.com/hu/hu/solutions/pd500533-thinkpad-thunderbolt-4-workstation-dock-overview-and-service-parts

When the laptop is docked in my home office, I always keep its lid closed; I only use the main / standalone monitor.

When I *cold-boot* the laptop (by pressing the power button on the dock), and -- again -- the lid is closed already when I press the power button on the dock, then the laptop enters ACPI S3 suspend as soon as the boot process reaches the GDM login screen.

The default (commented out) "lid switch" settings in "/etc/systemd/logind.conf" are:

#HandleLidSwitch=suspend
#HandleLidSwitchExternalPower=suspend
#HandleLidSwitchDocked=ignore

After uncommenting "HandleLidSwitchExternalPower" and changing its value to "ignore", such as in:

HandleLidSwitchExternalPower=ignore

the symptom vanishes, and the GDM login screen is reached as expected.

This makes me think that systemd confuses the "LidSwitch while Docked" event with the "LidSwitch while on ExternalPower" event -- in other words, systemd thinks that I'm on "external power" when in fact I'm docked.

In other words, the default settings in "/etc/systemd/logind.conf" would in fact match what I wanted, except "HandleLidSwitchExternalPower=suspend" gets applied also when I'm docked (and don't touch the already closed lid at all), and that prevents me from logging in at the GDM login screen.

*** Version-Release number of selected component (if applicable):
systemd-252-13.el9_2.x86_64

*** How reproducible:
This is a somewhat tricky question, as reproducibility depends very much on the RHEL-9 kernel version, in my experience.

With the latest RHEL-9.1 kernel (5.14.0-162.18.1.el9_1.x86_64), the symptom is effectively invisible.

With the latest RHEL-9.2 kernel (5.14.0-284.11.1.el9_2.x86_64), the symptom always reproduces (again, you need to *cold-boot* the laptop).

*** Steps to Reproduce:
1. Power down the laptop.
2. Make sure it's docked.
3. Close the lid.
4. Power on the laptop using the power button on the dock.
5. Enter the LUKS password (if any).
6. Let the boot progress to the GDM login screen (graphical.target).

*** Actual results:
- Laptop immediately suspends.
- When the laptop is resumed, the GDM login screen is broken. No user list to pick a user from, and the various widgets at the top of the screen are broken -- they don't work when clicked, and there is some visual screen corruption too.

*** Expected results:
- Laptop should not suspend.
- GDM login screen should appear and be usable.

*** Additional info:
Even with RHEL-9.1 components (kernel + userland), I experienced a similar symptom whenever I *logged out* of my window manager session *back* to the GDM login screen. In that case, the laptop would suspend similarly. I'd not investigated or reported that problem, because I'd log out very infrequently -- that problem didn't interfere with work.

Comment 1 David Tardon 2023-06-01 05:54:09 UTC
(In reply to Laszlo Ersek from comment #0)
> With the latest RHEL-9.1 kernel (5.14.0-162.18.1.el9_1.x86_64), the symptom
> is effectively invisible.

What does this mean? That it doesn't happen at all, or that it does happen, but only rarely?
 
> With the latest RHEL-9.2 kernel (5.14.0-284.11.1.el9_2.x86_64), the symptom
> always reproduces (again, you need to *cold-boot* the laptop).
> 
> *** Steps to Reproduce:
> 1. Power down the laptop.
> 2. Make sure it's docked.
> 3. Close the lid.
> 4. Power on the laptop using the power button on the dock.
> 5. Enter the LUKS password (if any).
> 6. Let the boot progress to the GDM login screen (graphical.target).

Could you provide a log with systemd.log-level=debug?

> 
> *** Actual results:
> - Laptop immediately suspends.
> - When the laptop is resumed, the GDM login screen is broken. No user list
> to pick a user from, and the various widgets at the top of the screen are
> broken -- they don't work when clicked, and there is some visual screen
> corruption too.

I doubt the latter has anything to do with logind.

Comment 2 Laszlo Ersek 2023-06-01 06:25:12 UTC
(In reply to David Tardon from comment #1)
> (In reply to Laszlo Ersek from comment #0)
> > With the latest RHEL-9.1 kernel (5.14.0-162.18.1.el9_1.x86_64), the symptom
> > is effectively invisible.
> 
> What does this mean? That it doesn't happen at all, or that it does happen,
> but only rarely?

It happens *extremely* rarely.

I didn't mean to clutter the original report with details that I deemed irrelevant, but here's another bit: I actually "bisected" the kernel build range between 5.14.0-162.18.1.el9_1.x86_64 and 5.14.0-284.11.1.el9_2.x86_64, using the development kernel RPMs from Brew. -284 had always reproduced the issue, and -162.18.1. had never done so. So I was actually nearing completion of the bisection, which seemed to indicate that the problem had been introduced somewhere between -205 and -208 -- but then I cold-booted the laptop with the original -162.18.1 too, for some reason, and boom, the failure popped up with that one as well, totally unexpectedly.

That invalidated the entire bisection of course (I couldn't call the starting point -162.18.1.el9_1.x86_64 "good" any longer). It remains a fact that I've seen the failure when cold-booting with -162.18.1.el9_1.x86_64 only once, out of dozens or even hundreds of boots.

> > With the latest RHEL-9.2 kernel (5.14.0-284.11.1.el9_2.x86_64), the symptom
> > always reproduces (again, you need to *cold-boot* the laptop).
> > 
> > *** Steps to Reproduce:
> > 1. Power down the laptop.
> > 2. Make sure it's docked.
> > 3. Close the lid.
> > 4. Power on the laptop using the power button on the dock.
> > 5. Enter the LUKS password (if any).
> > 6. Let the boot progress to the GDM login screen (graphical.target).
> 
> Could you provide a log with systemd.log-level=debug?

Let me ask back first:

- will this not render the system unbootable itself? (Sorry if this question sounds silly, but I vaguely recall booting an earlier RHEL major release like this, for a different investigation, and there were so many log messages and such a slowdown that I couldn't actually boot the system!)

- Where will the log be captured? Is it available with "journalctl" or in some other way? The laptop doesn't have a serial port, so I can't log directly to a different machine.

> > *** Actual results:
> > - Laptop immediately suspends.
> > - When the laptop is resumed, the GDM login screen is broken. No user list
> > to pick a user from, and the various widgets at the top of the screen are
> > broken -- they don't work when clicked, and there is some visual screen
> > corruption too.
> 
> I doubt the latter has anything to do with logind.

I'm unsure, but I can imagine it is related. It seems that, exactly when the GDM login screen is about to enter, something notices that the lid is closed (note: level triggered, not edge triggered -- the lid has not been touched at all!), and apparently synthesizes a "lid closed" *event*. The only difference between the two "vectors" is that in the first case, the GDM login screen comes up as a part of a normal cold boot, while in the second case, the GDM login screen appears after logging out of a window manager session. As long as the invalid event is emitted in close connection with the GDM login screen appearing, both symptoms could originate from the same stem.

(In that sense, the root cause may not even be that systemd mistakes LidSwitchDockedfor LidSwitchExternalPower -- the primary issue may be that *any* LidSwitch event is emitted when the GDM screen appears, without me touching the lid at all!)

Thanks.