Description of problem: During creation of builds, some sdn pods and thanos pods are failing. This causes an increase in the build time (build and push) When doing a must gather get information: clusteroperator/network is progressing: DaemonSet "openshift-sdn/sdn" is not available (awaiting 1 nodes) The same test was performed with 4.8.0-0.nightly-2021-06-08-034312 and 4.8.0-0.nightly-2021-05-05-030749 versions and no issues in these pods were seen Version-Release number of selected component (if applicable): 4.8.0-rc.3 How reproducible: 100% Steps to Reproduce: 1. Clone https://github.com/openshift/svt repo 2. Cd to svt/openshift_performance/ci/scripts 3. Make sure "python --version" returns python 2 (see more info https://github.com/openshift/svt/blob/master/openshift_performance/ci/scripts/README.md) 4. Edit conc_builds.sh to have the following: app_array=("cakephp") #line 12 5. Run command: ./conc_builds.sh Actual results: Sdn containers in multiple openshift-sdn pods are failing, as well as thanos-query pods in openshift-monitoring Expected results: All pods in all components continue to run with no failures and no increase in build or push time Additional info: Test Case in polarion: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-9191 Similar test case with 2 4.8 nightly run results: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30069 4.8.0-rc.3 2021-07-06 21:42:50,267 - build_test - MainThread - INFO - Average build time, all good builds: 1110 2021-07-06 21:42:50,268 - build_test - MainThread - INFO - Average push time, all good builds: 8.59205333333 2021-07-06 21:42:50,267 - build_test - MainThread - INFO - Good builds included in stats: 150 During these runs the same components were monitored and there were no failed pods in either of the namespaces 4.8.0-0.nightly-2021-06-08-034312 2021-06-10 16:03:46,892 - build_test - MainThread - INFO - Average build time, all good builds: 449 2021-06-10 16:03:46,893 - build_test - MainThread - INFO - Average push time, all good builds: 3.79988 2021-06-10 16:03:46,892 - build_test - MainThread - INFO - Good builds included in stats: 150 4.8.0-0.nightly-2021-05-05-030749 2021-05-03 16:02:02,873 - build_test - MainThread - INFO - Average build time, all good builds: 457 2021-05-03 16:02:02,873 - build_test - MainThread - INFO - Average push time, all good builds: 3.98708666667 2021-05-03 16:02:02,873 - build_test - MainThread - INFO - Good builds included in stats: 150
Any update on this bug? I had seen a similar issue in 4.7 and was fixed by a newer 4.7 nightly. Linking
Reran this test on 4.9 and I am still seeing the same problem. During the run I watched all namespaces starting with "openshift-" and saw that the marketplace namespace also was failing/restarting pods during this run. # oc version Client Version: 4.9.0-0.nightly-2021-07-14-083200 Server Version: 4.9.0-0.nightly-2021-07-14-083200 Average build time, all good builds: 1102 Average push time, all good builds: 12.9827466667 Good builds included in stats: 150 Will add a must gather, prometheus and list of failed pods in following comment
Paige, Does this issue persist into 4.8.5+ or in 4.9 nightlies?
Will rerun with the newest 4.8 and 4.9 nightly
Reran on 4.8 versions and 4.9.0-0.nightly-2021-08-31-081832. Saw the sdn issue happen less on 4.8.9; attaching must gather and prometheus dump for 4.8. Both versions were still showing an increase/regression in build times from previous release. Need to do more investigation across other applications and where the slow down is Client Version: 4.9.0-0.nightly-2021-08-30-100210 Server Version: 4.9.0-0.nightly-2021-08-31-081832 39012 ================ Average times for cakephp app ================= 2021-08-31 19:13:46,761 - build_test - MainThread - INFO - Average build time, all good builds: 54 2021-08-31 19:17:22,850 - build_test - MainThread - INFO - Average build time, all good builds: 70 2021-08-31 19:22:26,150 - build_test - MainThread - INFO - Average build time, all good builds: 116 2021-08-31 19:31:49,524 - build_test - MainThread - INFO - Average build time, all good builds: 236 2021-08-31 19:48:11,882 - build_test - MainThread - INFO - Average build time, all good builds: 434 2021-08-31 20:16:36,113 - build_test - MainThread - INFO - Average build time, all good builds: 624 2021-08-31 20:52:37,505 - build_test - MainThread - INFO - Average build time, all good builds: 802 2021-08-31 19:13:46,761 - build_test - MainThread - INFO - Average push time, all good builds: 2.6435 2021-08-31 19:17:22,850 - build_test - MainThread - INFO - Average push time, all good builds: 3.23725 2021-08-31 19:22:26,150 - build_test - MainThread - INFO - Average push time, all good builds: 4.00146666667 2021-08-31 19:31:49,525 - build_test - MainThread - INFO - Average push time, all good builds: 4.88183333333 2021-08-31 19:48:11,882 - build_test - MainThread - INFO - Average push time, all good builds: 6.20964444444 2021-08-31 20:16:36,114 - build_test - MainThread - INFO - Average push time, all good builds: 10.1405172414 2021-08-31 20:52:37,505 - build_test - MainThread - INFO - Average push time, all good builds: 10.087877551 2021-08-31 19:13:46,760 - build_test - MainThread - INFO - Good builds included in stats: 2 2021-08-31 19:17:22,850 - build_test - MainThread - INFO - Good builds included in stats: 16 2021-08-31 19:22:26,150 - build_test - MainThread - INFO - Good builds included in stats: 30 2021-08-31 19:31:49,524 - build_test - MainThread - INFO - Good builds included in stats: 60 2021-08-31 19:48:11,882 - build_test - MainThread - INFO - Good builds included in stats: 90 2021-08-31 20:16:36,113 - build_test - MainThread - INFO - Good builds included in stats: 116 2021-08-31 20:52:37,505 - build_test - MainThread - INFO - Good builds included in stats: 147 ============================================================== Client Version: 4.8.9 Server Version: 4.8.9 ================ Average times for cakephp app ================= 2021-08-31 19:46:36,754 - build_test - MainThread - INFO - Average build time, all good builds: 52 2021-08-31 19:50:17,343 - build_test - MainThread - INFO - Average build time, all good builds: 73 2021-08-31 19:55:46,540 - build_test - MainThread - INFO - Average build time, all good builds: 125 2021-08-31 20:06:42,895 - build_test - MainThread - INFO - Average build time, all good builds: 267 2021-08-31 20:25:18,951 - build_test - MainThread - INFO - Average build time, all good builds: 488 2021-08-31 20:53:14,810 - build_test - MainThread - INFO - Average build time, all good builds: 748 2021-08-31 21:31:03,306 - build_test - MainThread - INFO - Average build time, all good builds: 1049 2021-08-31 19:46:36,754 - build_test - MainThread - INFO - Average push time, all good builds: 2.5655 2021-08-31 19:50:17,343 - build_test - MainThread - INFO - Average push time, all good builds: 3.1709375 2021-08-31 19:55:46,541 - build_test - MainThread - INFO - Average push time, all good builds: 3.8516 2021-08-31 20:06:42,896 - build_test - MainThread - INFO - Average push time, all good builds: 5.24065 2021-08-31 20:25:18,952 - build_test - MainThread - INFO - Average push time, all good builds: 6.22831111111 2021-08-31 20:53:14,810 - build_test - MainThread - INFO - Average push time, all good builds: 10.3179333333 2021-08-31 21:31:03,307 - build_test - MainThread - INFO - Average push time, all good builds: 9.20786666667 2021-08-31 19:46:36,754 - build_test - MainThread - INFO - Good builds included in stats: 2 2021-08-31 19:50:17,343 - build_test - MainThread - INFO - Good builds included in stats: 16 2021-08-31 19:55:46,539 - build_test - MainThread - INFO - Good builds included in stats: 30 2021-08-31 20:06:42,895 - build_test - MainThread - INFO - Good builds included in stats: 60 2021-08-31 20:25:18,951 - build_test - MainThread - INFO - Good builds included in stats: 90 2021-08-31 20:53:14,809 - build_test - MainThread - INFO - Good builds included in stats: 120 2021-08-31 21:31:03,306 - build_test - MainThread - INFO - Good builds included in stats: 150 ============================================================== https://drive.google.com/drive/folders/1y6FC1Cc_XWNF2BdYbrGzZhb-ouqKHWOC?usp=sharing List of failing pods during run { "timestamp": "2021-08-31 19:42:22", "count": 1, "issue": "pod crash", "name": "sdn-7nqtf", "component": "sdn" }, { "timestamp": "2021-08-31 19:42:22", "count": 1, "issue": "pod crash", "name": "sdn-phdmd", "component": "sdn" }, { "timestamp": "2021-08-31 21:10:15", "count": 1, "issue": "pod crash", "name": "sdn-phdmd", "component": "sdn" }, { "timestamp": "2021-09-01 00:00:15", "count": 1, "issue": "pod crash", "name": "image-pruner-27174240-k8vff", "component": "image-registry" }, { "timestamp": "2021-09-01 00:00:41", "count": 1, "issue": "pod crash", "name": "image-pruner-27174240-k8vff", "component": "image-registry" }
Could you please upload SDN logs from one of these runs? We can't really tell why the SDN pods are crashing without that.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days