Bug 1320053

Summary: Failed to run 'lsof' when router is using scc 'hostnetwork'
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: NetworkingAssignee: Ram Ranganathan <ramr>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, tdawson
Version: 3.2.0   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-12 16:33:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description zhaozhanqi 2016-03-22 08:51:35 UTC
Description of problem:
When router is using 'hostnetwork' scc, the uid of router is NOT root(0), since 'lsof' need root user. so router will print log 'lsof: no pwd entry for UID 1000000000'.  

Version-Release number of selected component (if applicable):
oc v3.2.0.6
kubernetes v1.2.0-36-g4a3f9c5
router image:
openshift3/ose-haproxy-router   v3.2.0.6            18dd26854955        11 hours ago        491.9 MB
How reproducible:
always

Steps to Reproduce:
1. Create sa for router
   echo '{ "kind": "ServiceAccount", "apiVersion": "v1", "metadata": { "name": "router" } }' | oc create -f -

2. add scc for router according to the step 2 error 
 oadm policy add-scc-to-user hostnetwork -z router

3. Create router 
    oadm router first --credentials=/etc/origin/master/openshift-router.kubeconfig --service-account=router --images='brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-${component}:${version}'

4. Check router pod logs when it becomes running

Actual results:

step 4:

# oc logs first-1-1nhsd
I0322 00:15:02.512250       1 router.go:161] Router is including routes in all namespaces
I0322 00:15:15.913945       1 router.go:310] Router reloaded:
 - Checking if HAProxy is listening on port 1936 ...
lsof: no pwd entry for UID 1000000000
lsof: no pwd entry for UID 1000000000
lsof: no pwd entry for UID 1000000000
lsof: no pwd entry for UID 1000000000


Expected results:

router can reload success

Additional info:

no this error when using 'privileged' scc 
oadm policy add-scc-to-user privileged -z router

Comment 1 Ram Ranganathan 2016-03-22 17:36:28 UTC
Related github issue: https://github.com/openshift/origin/issues/8143

Working on a fix.

Comment 2 Ram Ranganathan 2016-03-24 22:47:15 UTC
@zhaozhanqi not sure about the workflow for this bugz since this should have been as an origin only issue. The PR has merged and this is now fixed. 
Am setting to modified to allow the OSE images to be built - but this is now ready for QE.  Thx

Comment 3 zhaozhanqi 2016-03-25 03:56:43 UTC
Thanks Ram

Just re-build the haproxy images using the latest images, this issue did not be reproduced

  [root@ip-172-18-133-7 haproxy]# oc logs router-1-xddsf
I0325 03:47:21.902548       1 router.go:161] Router is including routes in all namespaces
I0325 03:47:23.041504       1 router.go:310] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0325 03:47:26.754932       1 router.go:310] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0325 03:47:40.324838       1 router.go:310] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).

I will verify this bug once it is merged to OSE

Comment 4 Troy Dawson 2016-03-28 18:44:07 UTC
This has been merged into OSE and is in release v3.2.0.8

Comment 5 zhaozhanqi 2016-03-29 02:57:56 UTC
Verified this issue on ose v3.2.0.8. it has been fixed

 images id: 9ae42a3ebc0c

# oc logs second-1-qfore
I0328 22:45:04.228272       1 router.go:161] Router is including routes in all namespaces
I0328 22:45:16.245479       1 router.go:310] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0328 22:45:21.058184       1 router.go:310] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0328 22:45:37.386684       1 router.go:310] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).

Comment 7 errata-xmlrpc 2016-05-12 16:33:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064