Bug 1393103 - Metrics Deployer Succeeded but tried to deployer Heapster pod as a root user
Summary: Metrics Deployer Succeeded but tried to deployer Heapster pod as a root user
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: Peng Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-08 21:11 UTC by Eric Jones
Modified: 2020-01-17 16:08 UTC (History)
7 users (show)

Fixed In Version: 3.5.0
Doc Type: Bug Fix
Doc Text:
Cause: The Heapster image and pod did not specify a user it should be run under. Consequence: With no user specified, it will default to using the root user. If someone is running with the SCC of "MustRunAsNonRoot" then it will fail since it its not allowed to be run as a root user. Fix: Specify a default user for the Heapster image Result: Users can run with the SCC "MustRunAsNonRoot" without issues.
Clone Of:
Environment:
Last Closed: 2017-04-12 19:16:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0884 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.5 RPM Release Advisory 2017-04-12 22:50:07 UTC

Description Eric Jones 2016-11-08 21:11:19 UTC
Description of problem:
Customer saw the Heapster pod fail with VerifyNonRootError, and after looking at some code [0][1] it looks that this error occurs when a pod tries to deploy as root user.

[0] https://github.com/kubernetes/kubernetes/blob/8371a778f6228ca8f0db7374ed48722f4c26928c/pkg/kubelet/dockertools/docker_manager.go#L2333-L2338
[1] https://github.com/kubernetes/kubernetes/blob/8371a778f6228ca8f0db7374ed48722f4c26928c/pkg/kubelet/dockertools/docker_manager.go#L2445

Version-Release number of selected component (if applicable):
RC's indicate that it is running the latest image (as of November 4th, 2016)


Additional info:
Heapster does not complete the deploy. Attaching events and logs from metrics deployer pod shortly

Comment 2 Matt Wringe 2016-11-08 21:31:34 UTC
I cannot reproduce.

A couple of things to try: 

please make sure they are deploy metrics using --as=system:serviceaccount:openshift-infra:metrics-deployer (see https://docs.openshift.com/container-platform/3.3/install_config/cluster_metrics.html#deploying-the-metrics-components). This will make sure that its not something weird with the permissions of the user performing the deployment.

please do not run with the 'latest' image, this is not supported and will only problems when a new docker image is available, but its not designed for their version of OpenShift.

Can you please have them attach the templates they are using? I just want to rule out that there has been any additional changes to them.

Comment 4 Matt Wringe 2016-11-22 22:13:07 UTC
The admin in this case has changed the SCC's runAsUser from "MustRunAsRange" to "MustRunAsNonRoot". This means that instead of randomly assigning a user id from a range of user ids, the pod is now run as whatever user the docker image has been set as.

The Heapster docker image does not specify any direct user, and as such it defaults to the root user.

This is why they are running into this issue.

Since they are using the 'MustRunAsNonRoot' option, they will need to set the user id in the replication controller and restart the heapster pods:

oc patch rc heapster -p '{"spec":{"template":{"spec":{"containers":[{"name":"heapster","securityContext":{"runAsUser": 1000}}]}}}}'
oc scale rc heapster --replicas=0;oc scale rc heapster --replicas=1

The docker image for Heapster should also be updated so that users don't run into this issue in the future.

Comment 5 Matt Wringe 2017-02-09 20:19:29 UTC
Fixed in 3.5

Comment 6 Peng Li 2017-02-10 08:09:14 UTC
verified with metrics-heapster                   3.5.0               03d0a94d4bd2        11 hours ago        318.3 MB


#ps -aux | grep heapster
1000020+  3427  0.3  0.5 534664 43368 ?        Ssl  Feb09   0:50 heapster ...

Hapster run not as root user

Comment 7 Matt Wringe 2017-02-10 16:07:10 UTC
@penli setting this back to 'ON_QA' from the output above it looks like this is not being properly tested with the correct SCC.

You need to set set the SCC's runAsUser to "MustRunAsNonRoot" and then install metrics.

In this case the user will not have a random UI (eg like the '1000020+' value in your output above shows) but it should be I believe '1000' which is the UID of the default user we have for Heapster.

Comment 8 Peng Li 2017-02-14 06:20:27 UTC
@mwringe

I did as the steps in Comment 7

1. modify restricted scc

# oc describe scc restricted | grep "Run As User"
  Run As User Strategy: MustRunAsNonRoot

2. install Metrics

3. on the node check the process, it's not 1000. And 1000 should has been used by other process

# ps -aux | grep heapster
cloud-u+  92375  0.3  0.3 198784 28484 ?        Ssl  01:12   0:00 heapster ...

root       1000  0.0  0.1 553164 14944 ?        Ssl  Feb12   0:08 /usr/bin/python -Es /usr/sbin/tuned -l -P

Comment 9 Matt Wringe 2017-02-14 17:02:09 UTC
In https://bugzilla.redhat.com/show_bug.cgi?id=1393103#c6 the user id is '1000020+' which indicates its not MustRunAsNonRoot but instead using the default MustRunAsRange.

In https://bugzilla.redhat.com/show_bug.cgi?id=1393103#c8 it now shows the user id as 'cloud-u+'. Assuming that the 'cloud-u+' has a id of 1000, then this does now appear to be correct.

Comment 10 Peng Li 2017-02-15 02:24:31 UTC
Thanks for the info, test this on EC2. Apologize for miss recognize PID as UID in Comment 8.

# ps -aux | grep heapster
ec2-user  40145  0.1  0.3 179764 26084 ?        Rsl  21:16   0:00 heapster  ...
# id -u ec2-user
1000

Comment 12 errata-xmlrpc 2017-04-12 19:16:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884


Note You need to log in before you can comment on or make changes to this bug.