Bug 1678475 - install/upgrade perf: cluster monitoring operator deploys operands serially rather than in parallel
Summary: install/upgrade perf: cluster monitoring operator deploys operands serially r...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.1.0
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-18 21:02 UTC by Seth Jennings
Modified: 2019-06-04 10:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:44:14 UTC
Target Upstream Version:


Attachments (Terms of Use)
cmo-kubechart.png (45.72 KB, image/png)
2019-02-18 21:02 UTC, Seth Jennings
no flags Details
kubechart-2.png (53.35 KB, image/png)
2019-02-25 22:23 UTC, Seth Jennings
no flags Details
kubechart -3 (150.85 KB, image/png)
2019-04-10 02:44 UTC, Junqi Zhao
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:44:20 UTC

Description Seth Jennings 2019-02-18 21:02:41 UTC
Created attachment 1536117 [details]
cmo-kubechart.png

Currently, the CMO takes about 6-8m to roll out all the monitoring components.

However, it does so serially rather than in parallel (see attached kubechart)

Is there a reason for this? If not, lets parallelize this as CMO is deployed late in install/upgrade and is on the critical path to completion of install/upgrade.  The telemeter-client is currently the last pod to start in the cluster.

Comment 4 Seth Jennings 2019-02-25 22:23:43 UTC
Created attachment 1538613 [details]
kubechart-2.png

I see the changes but there seems to be a lot of time where nothing is happening now (see new attachment)

Basically:
t-0 - CMO starts
+1m - prom operator starts (>1m image pull time)
+2m - prom operator running
+3m - everything except prom and prom-adapter starts
+6m - prom and prom-adapter start

By my observation, things are starting more in parallel, but due to things just sitting around doing nothing, still takes the same amount of time :-/

Comment 6 Junqi Zhao 2019-02-27 07:46:35 UTC
As per Comment 6, change back to MODIFIED

Comment 10 Junqi Zhao 2019-04-10 02:44:04 UTC
from the chart, monitoring components are deployed in parallel now, but it took about 12 minutes to roll out all the monitoring components. other products such as openshift-kube-scheduler-operator, openshift-marketplace also took about 12 minutes to roll out all components

cluster-monitoring-operator-775cccc768-b7sj7 "2019-04-09T21:22:59.633855813-04:00", "2019-04-09T21:34:03.680993632-04:00"
node-exporter-qtj7g                          "2019-04-09T21:22:59.632973119-04:00", "2019-04-09T21:34:03.680989644-04:00"
node-exporter-7r4r4			     "2019-04-09T21:22:59.634100905-04:00", "2019-04-09T21:34:03.680997416-04:00"
node-exporter-fmgxk		  	     "2019-04-09T21:23:14.330596029-04:00", "2019-04-09T21:34:03.680992082-04:00"
node-exporter-r6xxk			     "2019-04-09T21:25:21.89682946-04:00", "2019-04-09T21:34:03.680997923-04:00"
prometheus-operator-5ff75f95fc-k854z         "2019-04-09T21:25:22.795222761-04:00", "2019-04-09T21:34:03.680995725-04:00"
node-exporter-lvt8c			     "2019-04-09T21:25:32.744990996-04:00", "2019-04-09T21:34:03.680990378-04:00"
node-exporter-nnvpc			     "2019-04-09T21:26:09.737593613-04:00", "2019-04-09T21:34:03.680998403-04:00"
telemeter-client-8d885568b-9prt4 	     "2019-04-09T21:26:29.870418508-04:00", "2019-04-09T21:34:03.680993034-04:00"
kube-state-metrics-697cd6f695-wsvmf	     "2019-04-09T21:26:43.857177907-04:00", "2019-04-09T21:34:03.680997163-04:00"
grafana-56879d5757-bbxvg		     "2019-04-09T21:27:18.222775024-04:00", "2019-04-09T21:34:03.680994595-04:00"
alertmanager-main-0			     "2019-04-09T21:27:25.66836269-04:00", "2019-04-09T21:34:03.680994115-04:00"
alertmanager-main-1			     "2019-04-09T21:27:47.949750711-04:00", "2019-04-09T21:34:03.68099098-04:00"
alertmanager-main-2			     "2019-04-09T21:28:17.606583558-04:00", "2019-04-09T21:34:03.680991418-04:00"
prometheus-k8s-1			     "2019-04-09T21:28:51.757987889-04:00", "2019-04-09T21:34:03.681027516-04:00"
prometheus-k8s-0   			     "2019-04-09T21:29:53.943373903-04:00", "2019-04-09T21:34:03.681027175-04:00"
prometheus-adapter-7cc8fbcbd-4ldtm	     "2019-04-09T21:30:13.559570559-04:00", "2019-04-09T21:34:03.680995126-04:00"
prometheus-adapter-7cc8fbcbd-9ttsx	     "2019-04-09T21:30:13.559570559-04:00", "2019-04-09T21:34:03.680995126-04:00"

Comment 11 Junqi Zhao 2019-04-10 02:44:28 UTC
Created attachment 1554009 [details]
kubechart -3

Comment 12 Junqi Zhao 2019-04-10 02:45:18 UTC
payload: 4.0.0-0.nightly-2019-04-05-165550

Comment 14 errata-xmlrpc 2019-06-04 10:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.