Bug 2083886 - e2e-metal-ipi-ovn-ipv6 taking a lot longer to complete in 4.11
Summary: e2e-metal-ipi-ovn-ipv6 taking a lot longer to complete in 4.11
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Tomas Sedovic
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-10 22:43 UTC by Stephen Benjamin
Modified: 2023-03-09 01:19 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:19:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Stephen Benjamin 2022-05-10 22:43:51 UTC
It looks like starting around the beginning April, metal installs are taking about 40% longer -- sometimes even exceeding the 4h mark and failing.

See below to compare 4.10 and 4.11:

https://search.ci.openshift.org/graph/metrics?metric=job%3Aduration%3Atotal%3Aseconds&job=periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-ipv6&job=periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6

Comment 1 W. Trevor King 2022-05-10 23:04:19 UTC
Picking a recent run from each job [1,2]:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-ipv6/1524094298423824384/artifacts/e2e-metal-ipi-ovn-ipv6/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/06_create_cluster-2022-05-10-192143.log | grep -A6 'Time elapsed per stage'
2022-05-10 20:10:24 level=debug msg=Time elapsed per stage:
2022-05-10 20:10:24 level=debug msg=         bootstrap: 2m12s
2022-05-10 20:10:24 level=debug msg=           masters: 14m32s
2022-05-10 20:10:24 level=debug msg=Bootstrap Complete: 12m40s
2022-05-10 20:10:24 level=debug msg= Bootstrap Destroy: 17s
2022-05-10 20:10:24 level=debug msg= Cluster Operators: 18m29s
2022-05-10 20:10:24 level=info msg=Time elapsed: 48m39s
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-ipv6/1524094298423824384/build-log.txt | grep 'devscripts-setup succe
eded after'
INFO[2022-05-10T20:10:53Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-devscripts-setup succeeded after 1h0m50s.
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1524084786757373952/artifacts/e2e-metal-ipi-ovn-ipv6/baremetalds-devscripts-setup/artifacts/root/dev-scripts/logs/06_create_cluster-2022-05-10-182006.log | grep -A6 'Time elapsed per stage'
2022-05-10 19:07:22 level=debug msg=Time elapsed per stage:
2022-05-10 19:07:22 level=debug msg=         bootstrap: 10s
2022-05-10 19:07:22 level=debug msg=           masters: 12m59s
2022-05-10 19:07:22 level=debug msg=Bootstrap Complete: 14m45s
2022-05-10 19:07:22 level=debug msg= Bootstrap Destroy: 8s
2022-05-10 19:07:22 level=debug msg= Cluster Operators: 18m42s
2022-05-10 19:07:22 level=info msg=Time elapsed: 47m14s
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1524084786757373952/build-log.txt | grep 'devscripts-setup succe
eded after'
INFO[2022-05-10T19:07:41Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-devscripts-setup succeeded after 57m30s.

So... that's pretty close?  Maybe I'm interpreting "metal installs" too narrowly.

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-ipv6/1524094298423824384/build-log.txt | grep 'succeeded after'INFO[2022-05-10T19:08:43Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-packet-setup succeeded after 9m40s. 
INFO[2022-05-10T19:08:53Z] Step e2e-metal-ipi-ovn-ipv6-ipi-install-rbac succeeded after 10s. 
INFO[2022-05-10T19:10:03Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-devscripts-proxy succeeded after 1m10s. 
INFO[2022-05-10T20:10:53Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-devscripts-setup succeeded after 1h0m50s. 
INFO[2022-05-10T20:10:53Z] Step phase pre succeeded after 1h11m50s.     
INFO[2022-05-10T21:30:33Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-e2e-test succeeded after 1h19m40s. 
INFO[2022-05-10T21:30:33Z] Step phase test succeeded after 1h19m40s.    
INFO[2022-05-10T21:31:33Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-devscripts-gather succeeded after 1m0s. 
INFO[2022-05-10T21:33:33Z] Step e2e-metal-ipi-ovn-ipv6-gather-must-gather succeeded after 2m0s. 
INFO[2022-05-10T21:36:43Z] Step e2e-metal-ipi-ovn-ipv6-gather-extra succeeded after 3m10s. 
INFO[2022-05-10T21:37:34Z] Step e2e-metal-ipi-ovn-ipv6-gather-audit-logs succeeded after 50s. 
INFO[2022-05-10T21:37:54Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-packet-teardown succeeded after 20s. 
INFO[2022-05-10T21:37:54Z] Step phase post succeeded after 7m20s.     
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1524084786757373952/build-log.txt | grep 'succeeded after'INFO[2022-05-10T18:09:01Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-packet-setup succeeded after 10s. 
INFO[2022-05-10T18:09:11Z] Step e2e-metal-ipi-ovn-ipv6-ipi-install-rbac succeeded after 10s. 
INFO[2022-05-10T18:10:11Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-devscripts-proxy succeeded after 1m0s. 
INFO[2022-05-10T19:07:41Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-devscripts-setup succeeded after 57m30s. 
INFO[2022-05-10T19:07:41Z] Step phase pre succeeded after 58m50s.       
INFO[2022-05-10T20:51:22Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-e2e-test succeeded after 1h43m40s. 
INFO[2022-05-10T20:51:22Z] Step phase test succeeded after 1h43m40s.    
INFO[2022-05-10T20:52:02Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-devscripts-gather succeeded after 40s. 
INFO[2022-05-10T20:54:02Z] Step e2e-metal-ipi-ovn-ipv6-gather-must-gather succeeded after 2m0s. 
INFO[2022-05-10T20:56:42Z] Step e2e-metal-ipi-ovn-ipv6-gather-extra succeeded after 2m40s. 
INFO[2022-05-10T20:57:22Z] Step e2e-metal-ipi-ovn-ipv6-gather-audit-logs succeeded after 40s. 
INFO[2022-05-10T20:57:32Z] Step e2e-metal-ipi-ovn-ipv6-baremetalds-packet-teardown succeeded after 10s. 
INFO[2022-05-10T20:57:32Z] Step phase post succeeded after 6m10s.  

That's packet-setup bumping from 10s to ~10m.  And e2e-test going from ~1h20m to ~1h44m.  Nothing else looks significantly different.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-ipv6/1524094298423824384
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1524084786757373952

Comment 2 Derek Higgins 2022-05-11 15:18:39 UTC
> That's packet-setup bumping from 10s to ~10m
This discrepancy can happen on any job now that we are using the resource pooler, 
if a host is available in the pool it will take ~10s else 10 to 20 minutes. Its
not release/job specific, to confirm this in the examples you referenced

you can see the 4.10 fails to get a host from the pool
> curl: (22) The requested URL returned error: 409 CONFLICT
and then continues to use the old on demand method



> And e2e-test going from ~1h20m to ~1h44m
This is definitely a thing we need to look into, I've compared 3 jobs from March 27/28th[1] to 
3 Jobs from Apr 21/22nd[2]

I couldn't see any single test taking up 20 minutes, the longest single tests were 10 minutes
(2 of them) but they existed on all 6 jobs, the one thing I noticed is the difference in log
file size, has the debug level been increased?

[derekh@laptop i]$ ls -lh
total 71M
-rw-rw-r--. 1 derekh derekh 9.3M May 11 14:59 1507979108498280448.log
-rw-rw-r--. 1 derekh derekh 9.3M May 11 14:59 1508084230968578048.log
-rw-rw-r--. 1 derekh derekh 9.1M May 11 14:59 1508302866471718912.log
-rw-rw-r--. 1 derekh derekh  15M May 11 14:59 1517223154093658112.log
-rw-rw-r--. 1 derekh derekh  14M May 11 15:00 1517299324424294400.log
-rw-rw-r--. 1 derekh derekh  15M May 11 15:00 1517389993293647872.log



[1]
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1508302866471718912/artifacts/e2e-metal-ipi-ovn-ipv6/baremetalds-e2e-test/build-log.txt
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1508084230968578048/artifacts/e2e-metal-ipi-ovn-ipv6/baremetalds-e2e-test/build-log.txt
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1507979108498280448/artifacts/e2e-metal-ipi-ovn-ipv6/baremetalds-e2e-test/build-log.txt

[2]
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1517223154093658112/artifacts/e2e-metal-ipi-ovn-ipv6/baremetalds-e2e-test/build-log.txt
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1517299324424294400/artifacts/e2e-metal-ipi-ovn-ipv6/baremetalds-e2e-test/build-log.txt
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-ovn-ipv6/1517389993293647872/artifacts/e2e-metal-ipi-ovn-ipv6/baremetalds-e2e-test/build-log.txt

Comment 5 Shiftzilla 2023-03-09 01:19:08 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9264


Note You need to log in before you can comment on or make changes to this bug.