Bug 1914362

Summary: Tests sometimes flake with: "user@1000.service: Failed with result 'protocol'."
Product: OpenShift Container Platform Reporter: Jonathan Lebon <jlebon>
Component: RHCOSAssignee: Jonathan Lebon <jlebon>
Status: CLOSED DEFERRED QA Contact: Michael Nguyen <mnguyen>
Severity: medium Docs Contact:
Priority: low    
Version: 4.7CC: bbreard, imcleod, jligon, miabbott, nstielau, travier
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-11 21:10:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jonathan Lebon 2021-01-08 16:35:09 UTC
See e.g. https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1199#note_1955603.

```
Dec 14 20:16:57.891063 systemd[1]: Starting User Manager for UID 1000...
Dec 14 20:16:57.906946 systemd[1897]: pam_unix(systemd-user:session): session opened for user core by (uid=0)
Dec 14 20:16:57.950027 systemd[1897]: Failed to fully start up daemon: Permission denied
Dec 14 20:16:57.951477 systemd[1899]: pam_unix(systemd-user:session): session closed for user core
Dec 14 20:16:57.957096 systemd[1]: user: Failed with result 'protocol'.
Dec 14 20:16:57.959203 sshd[1893]: pam_systemd(sshd:session): Failed to create session: Start job for unit user failed with 'failed'
Dec 14 20:16:57.957389 systemd[1]: Failed to start User Manager for UID 1000.
```

We've hit it again in the pipeline for `luks.sss.t2`.

Seems to happen mostly in `luks.sss.*` tests.

Comment 1 Micah Abbott 2021-01-15 20:36:24 UTC
Higher priority work has prevented from this issue being solved; adding UpcomingSprint keyword

Comment 2 Micah Abbott 2021-03-16 16:58:14 UTC
Saw this on the `coreos.boot-mirror` test against RHCOS 4.8 most recently

Comment 3 Jonathan Lebon 2021-04-30 19:08:40 UTC
I actually haven't really dug into this this sprint and it's a pretty low occurrence flake AFAICT.

Comment 4 Micah Abbott 2021-06-28 15:57:47 UTC
Still haven't sorted this out; moving to 4.9.0

Comment 5 Micah Abbott 2021-06-28 17:59:20 UTC
This error might be resolved come RHEL 8.5 - https://bugzilla.redhat.com/show_bug.cgi?id=1946453

Comment 6 Timothée Ravier 2021-06-28 18:00:45 UTC
Workaround in https://github.com/coreos/coreos-assembler/pull/2261

Comment 7 Micah Abbott 2022-01-11 21:10:34 UTC
This feels more like a failure in how our tests are running, so I've copied the failure/error to https://github.com/openshift/os/issues/691

If we find something in the OS that we should fix, we can reopen this.