Description of problem: Customer saw the Heapster pod fail with VerifyNonRootError, and after looking at some code [0][1] it looks that this error occurs when a pod tries to deploy as root user. [0] https://github.com/kubernetes/kubernetes/blob/8371a778f6228ca8f0db7374ed48722f4c26928c/pkg/kubelet/dockertools/docker_manager.go#L2333-L2338 [1] https://github.com/kubernetes/kubernetes/blob/8371a778f6228ca8f0db7374ed48722f4c26928c/pkg/kubelet/dockertools/docker_manager.go#L2445 Version-Release number of selected component (if applicable): RC's indicate that it is running the latest image (as of November 4th, 2016) Additional info: Heapster does not complete the deploy. Attaching events and logs from metrics deployer pod shortly
I cannot reproduce. A couple of things to try: please make sure they are deploy metrics using --as=system:serviceaccount:openshift-infra:metrics-deployer (see https://docs.openshift.com/container-platform/3.3/install_config/cluster_metrics.html#deploying-the-metrics-components). This will make sure that its not something weird with the permissions of the user performing the deployment. please do not run with the 'latest' image, this is not supported and will only problems when a new docker image is available, but its not designed for their version of OpenShift. Can you please have them attach the templates they are using? I just want to rule out that there has been any additional changes to them.
The admin in this case has changed the SCC's runAsUser from "MustRunAsRange" to "MustRunAsNonRoot". This means that instead of randomly assigning a user id from a range of user ids, the pod is now run as whatever user the docker image has been set as. The Heapster docker image does not specify any direct user, and as such it defaults to the root user. This is why they are running into this issue. Since they are using the 'MustRunAsNonRoot' option, they will need to set the user id in the replication controller and restart the heapster pods: oc patch rc heapster -p '{"spec":{"template":{"spec":{"containers":[{"name":"heapster","securityContext":{"runAsUser": 1000}}]}}}}' oc scale rc heapster --replicas=0;oc scale rc heapster --replicas=1 The docker image for Heapster should also be updated so that users don't run into this issue in the future.
Fixed in 3.5
verified with metrics-heapster 3.5.0 03d0a94d4bd2 11 hours ago 318.3 MB #ps -aux | grep heapster 1000020+ 3427 0.3 0.5 534664 43368 ? Ssl Feb09 0:50 heapster ... Hapster run not as root user
@penli setting this back to 'ON_QA' from the output above it looks like this is not being properly tested with the correct SCC. You need to set set the SCC's runAsUser to "MustRunAsNonRoot" and then install metrics. In this case the user will not have a random UI (eg like the '1000020+' value in your output above shows) but it should be I believe '1000' which is the UID of the default user we have for Heapster.
@mwringe I did as the steps in Comment 7 1. modify restricted scc # oc describe scc restricted | grep "Run As User" Run As User Strategy: MustRunAsNonRoot 2. install Metrics 3. on the node check the process, it's not 1000. And 1000 should has been used by other process # ps -aux | grep heapster cloud-u+ 92375 0.3 0.3 198784 28484 ? Ssl 01:12 0:00 heapster ... root 1000 0.0 0.1 553164 14944 ? Ssl Feb12 0:08 /usr/bin/python -Es /usr/sbin/tuned -l -P
In https://bugzilla.redhat.com/show_bug.cgi?id=1393103#c6 the user id is '1000020+' which indicates its not MustRunAsNonRoot but instead using the default MustRunAsRange. In https://bugzilla.redhat.com/show_bug.cgi?id=1393103#c8 it now shows the user id as 'cloud-u+'. Assuming that the 'cloud-u+' has a id of 1000, then this does now appear to be correct.
Thanks for the info, test this on EC2. Apologize for miss recognize PID as UID in Comment 8. # ps -aux | grep heapster ec2-user 40145 0.1 0.3 179764 26084 ? Rsl 21:16 0:00 heapster ... # id -u ec2-user 1000
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884