Bug 1581760 - prometheus-operator deployment fails to start
Summary: prometheus-operator deployment fails to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 3.10.0
Assignee: Dan Mace
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-23 15:03 UTC by Dan Mace
Modified: 2018-07-30 19:16 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-07-30 19:16:18 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:16:38 UTC
Github https://github.com/openshift openshift-ansible pull 8514 None None None 2018-05-24 11:56:58 UTC
Github https://github.com/openshift openshift-ansible pull 8531 None None None 2018-05-25 15:16:07 UTC

Description Dan Mace 2018-05-23 15:03:37 UTC
Description of problem:

The openshift-monitoring/prometheus-operator deployment (managed by the cluster-monitoring-operator) fails to roll out because of an SCC issue:

   message: container has runAsNonRoot and image will run as root

As a result, the monitoring stack as a whole fails to deploy.

Version-Release number of selected component (if applicable):


How reproducible:

Launch a cluster with monitoring enabled via inventory:

   openshift_monitoring_deploy: true


Actual results:

cluster-monitoring-operator deploys successfully, but prometheus-operator fails to scale up.

Expected results:

The full monitoring stack to bootstrap in the openshift-monitoring namespace.

Additional info:

Comment 1 Dan Mace 2018-05-23 15:05:29 UTC
Already fixed in https://github.com/openshift/cluster-monitoring-operator/pull/20, still working on getting a new image released; when I have a new release, I'll link to an openshift-ansible PR to represent the fix.

Comment 3 Dan Mace 2018-05-25 14:18:55 UTC
Fixing this problem revealed a related SCC issue, which needs another patch. Pulling this back to "ASSIGNED".

Comment 7 Junqi Zhao 2018-06-05 09:20:05 UTC
@Dan

Which playbook shall I use, I set openshift_monitoring_deploy: true in inventory and run with playbooks/openshift-prometheus/config.yml, there is not prometheus-operator deployment under every namepace

Comment 8 Dan Mace 2018-06-05 12:56:46 UTC
(In reply to Junqi Zhao from comment #7)
> @Dan
> 
> Which playbook shall I use, I set openshift_monitoring_deploy: true in
> inventory and run with playbooks/openshift-prometheus/config.yml, there is
> not prometheus-operator deployment under every namepace

Junqi,

Here are where the new monitoring playbooks are located:

https://github.com/openshift/openshift-ansible/tree/master/playbooks/openshift-monitoring

The "openshift-prometheus" playbook is being replaced by "openshift-monitoring".

Comment 9 Dan Mace 2018-06-05 13:00:32 UTC
Juniqi,

One more thing: the monitoring infrastructure will be installed in the openshift-monitoring namespace.

Comment 10 Junqi Zhao 2018-06-06 02:51:04 UTC
Tested with openshift-ansible-3.10.0-0.60.0.git.0.bf95bf8.el7.noarch, prometheus-operator could be scaled up now, all pods are normal.

Steps:
1. set openshift_monitoring_deploy=true in inventory file
2. run with playbooks/openshift-monitoring/config.yml playbook

Comment 11 Junqi Zhao 2018-06-06 02:51:55 UTC
# oc get po -n openshift-monitoring
NAME                                           READY     STATUS    RESTARTS   AGE
alertmanager-main-0                            3/3       Running   0          53m
alertmanager-main-1                            3/3       Running   0          53m
cluster-monitoring-operator-7f6c68764b-f5qc4   1/1       Running   0          54m
kube-state-metrics-d6f855965-ztd4s             3/3       Running   0          52m
node-exporter-dx5zn                            2/2       Running   0          52m
node-exporter-g6dw5                            2/2       Running   0          52m
prometheus-k8s-0                               3/3       Running   1          54m
prometheus-k8s-1                               3/3       Running   1          54m
prometheus-operator-7878fffc55-hlls5           1/1       Running   0          7m

Comment 13 errata-xmlrpc 2018-07-30 19:16:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.