Bug 1335635

Summary: "Failed to get pwuid struct: user: unknown userid " log spam
Product: Red Hat Enterprise Linux 7 Reporter: Micah Abbott <miabbott>
Component: dockerAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.2CC: charles_sheridan, dlbewley, dwalsh, fcami, jeder, jkaur, liko, lsm5, lsu, mliyazud, mwysocki, pprakash, rhowe, rrajaram, sdodson, skuznets, striker
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: a bug in the way the code get the file descriptor number from an http response Consequence: lots of log spam about failing to get the correct file descriptor Fix: fix the way we get the file descriptor Result: no more log spam
Story Points: ---
Clone Of:
: 1340921 1341731 (view as bug list) Environment:
Last Closed: 2016-06-23 16:18:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1323934, 1340921, 1341731    

Description Micah Abbott 2016-05-12 17:45:50 UTC
This looks identical to BZ1275399

Using RHELAH 7.2.4:

docker-1.9.1-40.el7.x86_64


I originally stood up a kube master node on my system using the kube container images while my system was running RHELAH 7.2.3-1:

rhel7/kubernetes-apiserver:1.2.0-8
rhel7/kubernetes-controller-mgr:1.2.0-8
rhel7/kubernetes-scheduler:1.2.0-8
rhel7/pod-infrastructure:v3.1.1.6-8


After upgrading to 7.2.4, I saw the following in my journal:

May 12 17:30:01 rhel-atomic-7.2-test forward-journal[2070]: time="2016-05-12T17:30:01.528093328Z" level=error msg="Failed to get pwuid struct: user: unknown userid 4294967295"
May 12 17:30:01 rhel-atomic-7.2-test forward-journal[2070]: time="2016-05-12T17:30:01.528125042Z" level=info msg="{Action=remove, ID=b4d9d165a819733d4f4b100fb7ce660827a42b311d490f317b8fe6021af20f15, LoginUID=4294967295, PID=2154}"
May 12 17:30:01 rhel-atomic-7.2-test forward-journal[2070]: time="2016-05-12T17:30:01.555416168Z" level=error msg="Failed to get pwuid struct: user: unknown userid 4294967295"


Sure enough, PID 2154 corresponds to the kubelet.

Comment 2 Antonio Murdaca 2016-05-12 18:23:43 UTC
should we patch docker-1.9?

Comment 3 Antonio Murdaca 2016-05-12 18:33:09 UTC
Patch for Docker 1.9: https://github.com/projectatomic/docker/pull/156

However, Dan, I think the patches for logging daemon API call don't seem to be working well (anymore), output from latest docker (and probably 1.10.x version):

May 12 18:20:41 localhost.localdomain docker[10929]: time="2016-05-12T18:20:41.964145762+02:00" level=warning msg="tcpconn is not a struct"
May 12 18:20:41 localhost.localdomain docker[10929]: time="2016-05-12T18:20:41.964217101+02:00" level=error msg="Unable to read peer creds and server socket address: bad file descriptor"

I think they may be re-worked

Comment 4 Antonio Murdaca 2016-05-12 18:35:59 UTC
definitely fixed in docker-1.10 (if you could try it out, that would be awesome)

Comment 5 Daniel Walsh 2016-05-12 18:36:47 UTC
Fixed in docker-1.10.

Comment 6 Micah Abbott 2016-05-12 19:15:41 UTC
@runc0m - I'll try to reproduce this with 1.10


What's more troubling about this problem is that the last time it was reported in BZ1275399, it was reported as fixed + verified with 1.9.1-23:

https://bugzilla.redhat.com/show_bug.cgi?id=1275399#c28


So did the patch get lost somewhere along the way to 1.9.1-40?

Comment 7 Daniel Walsh 2016-05-12 19:44:32 UTC
Looks like it got accidently dropped.

Comment 8 Micah Abbott 2016-05-12 21:35:12 UTC
I lied.  I can't (easily) test my kube configuration with 1.10, because the our version of kube 1.2 in RHELAH 7.2.4 requires docker 1.9

I might be able to try in Fedora, but I've not crossed that bridge yet.

Comment 9 Marcel Wysocki 2016-05-13 20:25:26 UTC
Seeing this too in an OSE 3.2 installation on rhel-ah

Comment 15 Daniel Walsh 2016-05-26 17:13:47 UTC
If we are still seeing this on docker-1.10.3-25 then the bug is not fixed.

Comment 16 Steve Kuznetsov 2016-05-26 17:53:52 UTC
Seeing the logspam that Antonio mentioned in Comment 3[1] in the OpenShift CI AMIs:

May 26 05:48:36 ip-172-18-1-46.ec2.internal forward-journal[2108]: time="2016-05-26T05:48:36.316938115-04:00" level=warning msg="ResponseWriter does not contain a field named conn"
May 26 05:48:36 ip-172-18-1-46.ec2.internal forward-journal[2108]: time="2016-05-26T05:48:36.316987952-04:00" level=error msg="Unable to read peer creds and server socket address: bad file descriptor"

The OpenShift AMIs are built with:

$ docker version
Client:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-25.el7.x86_64
 Go version:      go1.4.2
 Git commit:      86bbf84/1.10.3
 Built:           
 OS/Arch:         linux/amd64

Server:
 Version:         1.10.3
 API version:     1.22
 Package version: docker-common-1.10.3-25.el7.x86_64
 Go version:      go1.4.2
 Git commit:      86bbf84/1.10.3
 Built:           
 OS/Arch:         linux/amd64

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1335635#c3

Comment 18 Antonio Murdaca 2016-05-30 16:43:35 UTC
I'm taking care of the other bogus warnings though - I'll fix these in both RHEL and Fedora 1.10 but we should have another BZ tracking those other bogus warnings. I'll try to fixup rhel 1.9 but no promise.

Comment 19 Antonio Murdaca 2016-05-30 16:53:38 UTC
Fedora fix for 1.10.3: https://github.com/projectatomic/docker/commit/5e3f07110aeab2fc7674cf87c0a4669ae0215ee4
RHEL fix for 1.10.3: https://github.com/projectatomic/docker/commit/6246c4a567cfd6be390439f66965c80fec1b493a

now we can correctly see the correct log line:

May 30 18:52:46 localhost.localdomain docker[27852]: time="2016-05-30T18:52:46.113496163+02:00" level=debug msg="GET /v1.22/version"
May 30 18:52:46 localhost.localdomain docker[27852]: time="2016-05-30T18:52:46.118623888+02:00" level=debug msg="{Action=version, Username=amurdaca, LoginUID=27212, PID=27923}"

Comment 20 Antonio Murdaca 2016-05-30 17:11:04 UTC
To summarize:

$SUBJECT ("Failed to get pwuid struct: user: unknown userid " log spam) has been fixed in rhel7-1.9 and it's fixed in rhel7-1.10.3 (if you don't see the fix, we need to rebuild it probably)

The other, annoying, log spam like the ones in https://bugzilla.redhat.com/show_bug.cgi?id=1335635#c3 have been fixed with the above patches (see previous comment)

Comment 29 errata-xmlrpc 2016-06-23 16:18:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1274