Bug 1914362 - Tests sometimes flake with: "user: Failed with result 'protocol'."
Summary: Tests sometimes flake with: "user@1000.service: Failed with result 'protocol'."
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.7
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.10.0
Assignee: Jonathan Lebon
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-08 16:35 UTC by Jonathan Lebon
Modified: 2022-01-11 21:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-11 21:10:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jonathan Lebon 2021-01-08 16:35:09 UTC
See e.g. https://gitlab.cee.redhat.com/coreos/redhat-coreos/-/merge_requests/1199#note_1955603.

```
Dec 14 20:16:57.891063 systemd[1]: Starting User Manager for UID 1000...
Dec 14 20:16:57.906946 systemd[1897]: pam_unix(systemd-user:session): session opened for user core by (uid=0)
Dec 14 20:16:57.950027 systemd[1897]: Failed to fully start up daemon: Permission denied
Dec 14 20:16:57.951477 systemd[1899]: pam_unix(systemd-user:session): session closed for user core
Dec 14 20:16:57.957096 systemd[1]: user: Failed with result 'protocol'.
Dec 14 20:16:57.959203 sshd[1893]: pam_systemd(sshd:session): Failed to create session: Start job for unit user failed with 'failed'
Dec 14 20:16:57.957389 systemd[1]: Failed to start User Manager for UID 1000.
```

We've hit it again in the pipeline for `luks.sss.t2`.

Seems to happen mostly in `luks.sss.*` tests.

Comment 1 Micah Abbott 2021-01-15 20:36:24 UTC
Higher priority work has prevented from this issue being solved; adding UpcomingSprint keyword

Comment 2 Micah Abbott 2021-03-16 16:58:14 UTC
Saw this on the `coreos.boot-mirror` test against RHCOS 4.8 most recently

Comment 3 Jonathan Lebon 2021-04-30 19:08:40 UTC
I actually haven't really dug into this this sprint and it's a pretty low occurrence flake AFAICT.

Comment 4 Micah Abbott 2021-06-28 15:57:47 UTC
Still haven't sorted this out; moving to 4.9.0

Comment 5 Micah Abbott 2021-06-28 17:59:20 UTC
This error might be resolved come RHEL 8.5 - https://bugzilla.redhat.com/show_bug.cgi?id=1946453

Comment 6 Timothée Ravier 2021-06-28 18:00:45 UTC
Workaround in https://github.com/coreos/coreos-assembler/pull/2261

Comment 7 Micah Abbott 2022-01-11 21:10:34 UTC
This feels more like a failure in how our tests are running, so I've copied the failure/error to https://github.com/openshift/os/issues/691

If we find something in the OS that we should fix, we can reopen this.


Note You need to log in before you can comment on or make changes to this bug.