Bug 1678475

Summary: install/upgrade perf: cluster monitoring operator deploys operands serially rather than in parallel
Product: OpenShift Container Platform Reporter: Seth Jennings <sjenning>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: ccoleman, fbranczy, mloibl, sponnaga, surbania
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cmo-kubechart.png
none
kubechart-2.png
none
kubechart -3 none

Description Seth Jennings 2019-02-18 21:02:41 UTC
Created attachment 1536117 [details]
cmo-kubechart.png

Currently, the CMO takes about 6-8m to roll out all the monitoring components.

However, it does so serially rather than in parallel (see attached kubechart)

Is there a reason for this? If not, lets parallelize this as CMO is deployed late in install/upgrade and is on the critical path to completion of install/upgrade.  The telemeter-client is currently the last pod to start in the cluster.

Comment 4 Seth Jennings 2019-02-25 22:23:43 UTC
Created attachment 1538613 [details]
kubechart-2.png

I see the changes but there seems to be a lot of time where nothing is happening now (see new attachment)

Basically:
t-0 - CMO starts
+1m - prom operator starts (>1m image pull time)
+2m - prom operator running
+3m - everything except prom and prom-adapter starts
+6m - prom and prom-adapter start

By my observation, things are starting more in parallel, but due to things just sitting around doing nothing, still takes the same amount of time :-/

Comment 6 Junqi Zhao 2019-02-27 07:46:35 UTC
As per Comment 6, change back to MODIFIED

Comment 10 Junqi Zhao 2019-04-10 02:44:04 UTC
from the chart, monitoring components are deployed in parallel now, but it took about 12 minutes to roll out all the monitoring components. other products such as openshift-kube-scheduler-operator, openshift-marketplace also took about 12 minutes to roll out all components

cluster-monitoring-operator-775cccc768-b7sj7 "2019-04-09T21:22:59.633855813-04:00", "2019-04-09T21:34:03.680993632-04:00"
node-exporter-qtj7g                          "2019-04-09T21:22:59.632973119-04:00", "2019-04-09T21:34:03.680989644-04:00"
node-exporter-7r4r4			     "2019-04-09T21:22:59.634100905-04:00", "2019-04-09T21:34:03.680997416-04:00"
node-exporter-fmgxk		  	     "2019-04-09T21:23:14.330596029-04:00", "2019-04-09T21:34:03.680992082-04:00"
node-exporter-r6xxk			     "2019-04-09T21:25:21.89682946-04:00", "2019-04-09T21:34:03.680997923-04:00"
prometheus-operator-5ff75f95fc-k854z         "2019-04-09T21:25:22.795222761-04:00", "2019-04-09T21:34:03.680995725-04:00"
node-exporter-lvt8c			     "2019-04-09T21:25:32.744990996-04:00", "2019-04-09T21:34:03.680990378-04:00"
node-exporter-nnvpc			     "2019-04-09T21:26:09.737593613-04:00", "2019-04-09T21:34:03.680998403-04:00"
telemeter-client-8d885568b-9prt4 	     "2019-04-09T21:26:29.870418508-04:00", "2019-04-09T21:34:03.680993034-04:00"
kube-state-metrics-697cd6f695-wsvmf	     "2019-04-09T21:26:43.857177907-04:00", "2019-04-09T21:34:03.680997163-04:00"
grafana-56879d5757-bbxvg		     "2019-04-09T21:27:18.222775024-04:00", "2019-04-09T21:34:03.680994595-04:00"
alertmanager-main-0			     "2019-04-09T21:27:25.66836269-04:00", "2019-04-09T21:34:03.680994115-04:00"
alertmanager-main-1			     "2019-04-09T21:27:47.949750711-04:00", "2019-04-09T21:34:03.68099098-04:00"
alertmanager-main-2			     "2019-04-09T21:28:17.606583558-04:00", "2019-04-09T21:34:03.680991418-04:00"
prometheus-k8s-1			     "2019-04-09T21:28:51.757987889-04:00", "2019-04-09T21:34:03.681027516-04:00"
prometheus-k8s-0   			     "2019-04-09T21:29:53.943373903-04:00", "2019-04-09T21:34:03.681027175-04:00"
prometheus-adapter-7cc8fbcbd-4ldtm	     "2019-04-09T21:30:13.559570559-04:00", "2019-04-09T21:34:03.680995126-04:00"
prometheus-adapter-7cc8fbcbd-9ttsx	     "2019-04-09T21:30:13.559570559-04:00", "2019-04-09T21:34:03.680995126-04:00"

Comment 11 Junqi Zhao 2019-04-10 02:44:28 UTC
Created attachment 1554009 [details]
kubechart -3

Comment 12 Junqi Zhao 2019-04-10 02:45:18 UTC
payload: 4.0.0-0.nightly-2019-04-05-165550

Comment 14 errata-xmlrpc 2019-06-04 10:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758