Bug 1678475
Summary: | install/upgrade perf: cluster monitoring operator deploys operands serially rather than in parallel | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Seth Jennings <sjenning> | ||||||||
Component: | Monitoring | Assignee: | Sergiusz Urbaniak <surbania> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 4.1.0 | CC: | ccoleman, fbranczy, mloibl, sponnaga, surbania | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | 4.1.0 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2019-06-04 10:44:14 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Created attachment 1538613 [details]
kubechart-2.png
I see the changes but there seems to be a lot of time where nothing is happening now (see new attachment)
Basically:
t-0 - CMO starts
+1m - prom operator starts (>1m image pull time)
+2m - prom operator running
+3m - everything except prom and prom-adapter starts
+6m - prom and prom-adapter start
By my observation, things are starting more in parallel, but due to things just sitting around doing nothing, still takes the same amount of time :-/
from the chart, monitoring components are deployed in parallel now, but it took about 12 minutes to roll out all the monitoring components. other products such as openshift-kube-scheduler-operator, openshift-marketplace also took about 12 minutes to roll out all components cluster-monitoring-operator-775cccc768-b7sj7 "2019-04-09T21:22:59.633855813-04:00", "2019-04-09T21:34:03.680993632-04:00" node-exporter-qtj7g "2019-04-09T21:22:59.632973119-04:00", "2019-04-09T21:34:03.680989644-04:00" node-exporter-7r4r4 "2019-04-09T21:22:59.634100905-04:00", "2019-04-09T21:34:03.680997416-04:00" node-exporter-fmgxk "2019-04-09T21:23:14.330596029-04:00", "2019-04-09T21:34:03.680992082-04:00" node-exporter-r6xxk "2019-04-09T21:25:21.89682946-04:00", "2019-04-09T21:34:03.680997923-04:00" prometheus-operator-5ff75f95fc-k854z "2019-04-09T21:25:22.795222761-04:00", "2019-04-09T21:34:03.680995725-04:00" node-exporter-lvt8c "2019-04-09T21:25:32.744990996-04:00", "2019-04-09T21:34:03.680990378-04:00" node-exporter-nnvpc "2019-04-09T21:26:09.737593613-04:00", "2019-04-09T21:34:03.680998403-04:00" telemeter-client-8d885568b-9prt4 "2019-04-09T21:26:29.870418508-04:00", "2019-04-09T21:34:03.680993034-04:00" kube-state-metrics-697cd6f695-wsvmf "2019-04-09T21:26:43.857177907-04:00", "2019-04-09T21:34:03.680997163-04:00" grafana-56879d5757-bbxvg "2019-04-09T21:27:18.222775024-04:00", "2019-04-09T21:34:03.680994595-04:00" alertmanager-main-0 "2019-04-09T21:27:25.66836269-04:00", "2019-04-09T21:34:03.680994115-04:00" alertmanager-main-1 "2019-04-09T21:27:47.949750711-04:00", "2019-04-09T21:34:03.68099098-04:00" alertmanager-main-2 "2019-04-09T21:28:17.606583558-04:00", "2019-04-09T21:34:03.680991418-04:00" prometheus-k8s-1 "2019-04-09T21:28:51.757987889-04:00", "2019-04-09T21:34:03.681027516-04:00" prometheus-k8s-0 "2019-04-09T21:29:53.943373903-04:00", "2019-04-09T21:34:03.681027175-04:00" prometheus-adapter-7cc8fbcbd-4ldtm "2019-04-09T21:30:13.559570559-04:00", "2019-04-09T21:34:03.680995126-04:00" prometheus-adapter-7cc8fbcbd-9ttsx "2019-04-09T21:30:13.559570559-04:00", "2019-04-09T21:34:03.680995126-04:00" Created attachment 1554009 [details]
kubechart -3
payload: 4.0.0-0.nightly-2019-04-05-165550 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |
Created attachment 1536117 [details] cmo-kubechart.png Currently, the CMO takes about 6-8m to roll out all the monitoring components. However, it does so serially rather than in parallel (see attached kubechart) Is there a reason for this? If not, lets parallelize this as CMO is deployed late in install/upgrade and is on the critical path to completion of install/upgrade. The telemeter-client is currently the last pod to start in the cluster.