Bug 1979999 - Sdn containers failing during builds causing increase in build and push times
Summary: Sdn containers failing during builds causing increase in build and push times
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Alexander Constantinescu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-07 15:02 UTC by Paige Rubendall
Modified: 2023-09-15 01:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-19 21:18:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1916931 1 unspecified CLOSED SDN failures causing builds to fail and increased build/push times 2023-09-15 00:58:24 UTC

Description Paige Rubendall 2021-07-07 15:02:10 UTC
Description of problem:
During creation of builds, some sdn pods and thanos pods are failing. This causes an increase in the build time (build and push) 
When doing a must gather get information: clusteroperator/network is progressing: DaemonSet "openshift-sdn/sdn" is not available (awaiting 1 nodes)
The same test was performed with 4.8.0-0.nightly-2021-06-08-034312 and 4.8.0-0.nightly-2021-05-05-030749 versions and no issues in these pods were seen 


Version-Release number of selected component (if applicable): 4.8.0-rc.3


How reproducible: 100%


Steps to Reproduce:
1. Clone https://github.com/openshift/svt repo
2. Cd to svt/openshift_performance/ci/scripts
3. Make sure "python --version" returns python 2 (see more info https://github.com/openshift/svt/blob/master/openshift_performance/ci/scripts/README.md)
4. Edit conc_builds.sh to have the following:

app_array=("cakephp") #line 12

5. Run command: ./conc_builds.sh

Actual results:
Sdn containers in multiple openshift-sdn pods are failing, as well as thanos-query pods in openshift-monitoring  

Expected results:
All pods in all components continue to run with no failures and no increase in build or push time 

Additional info:
Test Case in polarion: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-9191
Similar test case with 2 4.8 nightly run results: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-30069

4.8.0-rc.3
2021-07-06 21:42:50,267 - build_test - MainThread - INFO - Average build time, all good builds: 1110
2021-07-06 21:42:50,268 - build_test - MainThread - INFO - Average push time, all good builds: 8.59205333333
2021-07-06 21:42:50,267 - build_test - MainThread - INFO - Good builds included in stats: 150


During these runs the same components were monitored and there were no failed pods in either of the namespaces 
4.8.0-0.nightly-2021-06-08-034312
2021-06-10 16:03:46,892 - build_test - MainThread - INFO - Average build time, all good builds: 449
2021-06-10 16:03:46,893 - build_test - MainThread - INFO - Average push time, all good builds: 3.79988
2021-06-10 16:03:46,892 - build_test - MainThread - INFO - Good builds included in stats: 150

4.8.0-0.nightly-2021-05-05-030749
2021-05-03 16:02:02,873 - build_test - MainThread - INFO - Average build time, all good builds: 457
2021-05-03 16:02:02,873 - build_test - MainThread - INFO - Average push time, all good builds: 3.98708666667
2021-05-03 16:02:02,873 - build_test - MainThread - INFO - Good builds included in stats: 150

Comment 2 Paige Rubendall 2021-07-09 13:13:17 UTC
Any update on this bug?

I had seen a similar issue in 4.7 and was fixed by a newer 4.7 nightly. Linking

Comment 3 Paige Rubendall 2021-07-16 19:14:51 UTC
Reran this test on 4.9 and I am still seeing the same problem. During the run I watched all namespaces starting with "openshift-" and saw that the marketplace namespace also was failing/restarting pods during this run. 


# oc version
Client Version: 4.9.0-0.nightly-2021-07-14-083200
Server Version: 4.9.0-0.nightly-2021-07-14-083200


Average build time, all good builds: 1102
Average push time, all good builds: 12.9827466667
Good builds included in stats: 150


Will add a must gather, prometheus and list of failed pods in following comment

Comment 5 Scott Dodson 2021-08-23 18:18:56 UTC
Paige,

Does this issue persist into 4.8.5+ or in 4.9 nightlies?

Comment 6 Paige Rubendall 2021-08-23 18:20:34 UTC
Will rerun with the newest 4.8 and 4.9 nightly

Comment 7 Paige Rubendall 2021-09-01 03:00:35 UTC
Reran on 4.8 versions and 4.9.0-0.nightly-2021-08-31-081832. Saw the sdn issue happen less on 4.8.9; attaching must gather and prometheus dump for 4.8. 
 
Both versions were still showing an increase/regression in build times from previous release. Need to do more investigation across other applications and where the slow down is 


Client Version: 4.9.0-0.nightly-2021-08-30-100210
Server Version: 4.9.0-0.nightly-2021-08-31-081832
39012
================ Average times for cakephp app =================
2021-08-31 19:13:46,761 - build_test - MainThread - INFO - Average build time, all good builds: 54
2021-08-31 19:17:22,850 - build_test - MainThread - INFO - Average build time, all good builds: 70
2021-08-31 19:22:26,150 - build_test - MainThread - INFO - Average build time, all good builds: 116
2021-08-31 19:31:49,524 - build_test - MainThread - INFO - Average build time, all good builds: 236
2021-08-31 19:48:11,882 - build_test - MainThread - INFO - Average build time, all good builds: 434
2021-08-31 20:16:36,113 - build_test - MainThread - INFO - Average build time, all good builds: 624
2021-08-31 20:52:37,505 - build_test - MainThread - INFO - Average build time, all good builds: 802
2021-08-31 19:13:46,761 - build_test - MainThread - INFO - Average push time, all good builds: 2.6435
2021-08-31 19:17:22,850 - build_test - MainThread - INFO - Average push time, all good builds: 3.23725
2021-08-31 19:22:26,150 - build_test - MainThread - INFO - Average push time, all good builds: 4.00146666667
2021-08-31 19:31:49,525 - build_test - MainThread - INFO - Average push time, all good builds: 4.88183333333
2021-08-31 19:48:11,882 - build_test - MainThread - INFO - Average push time, all good builds: 6.20964444444
2021-08-31 20:16:36,114 - build_test - MainThread - INFO - Average push time, all good builds: 10.1405172414
2021-08-31 20:52:37,505 - build_test - MainThread - INFO - Average push time, all good builds: 10.087877551
2021-08-31 19:13:46,760 - build_test - MainThread - INFO - Good builds included in stats: 2
2021-08-31 19:17:22,850 - build_test - MainThread - INFO - Good builds included in stats: 16
2021-08-31 19:22:26,150 - build_test - MainThread - INFO - Good builds included in stats: 30
2021-08-31 19:31:49,524 - build_test - MainThread - INFO - Good builds included in stats: 60
2021-08-31 19:48:11,882 - build_test - MainThread - INFO - Good builds included in stats: 90
2021-08-31 20:16:36,113 - build_test - MainThread - INFO - Good builds included in stats: 116
2021-08-31 20:52:37,505 - build_test - MainThread - INFO - Good builds included in stats: 147
==============================================================





Client Version: 4.8.9
Server Version: 4.8.9
================ Average times for cakephp app =================
2021-08-31 19:46:36,754 - build_test - MainThread - INFO - Average build time, all good builds: 52
2021-08-31 19:50:17,343 - build_test - MainThread - INFO - Average build time, all good builds: 73
2021-08-31 19:55:46,540 - build_test - MainThread - INFO - Average build time, all good builds: 125
2021-08-31 20:06:42,895 - build_test - MainThread - INFO - Average build time, all good builds: 267
2021-08-31 20:25:18,951 - build_test - MainThread - INFO - Average build time, all good builds: 488
2021-08-31 20:53:14,810 - build_test - MainThread - INFO - Average build time, all good builds: 748
2021-08-31 21:31:03,306 - build_test - MainThread - INFO - Average build time, all good builds: 1049
2021-08-31 19:46:36,754 - build_test - MainThread - INFO - Average push time, all good builds: 2.5655
2021-08-31 19:50:17,343 - build_test - MainThread - INFO - Average push time, all good builds: 3.1709375
2021-08-31 19:55:46,541 - build_test - MainThread - INFO - Average push time, all good builds: 3.8516
2021-08-31 20:06:42,896 - build_test - MainThread - INFO - Average push time, all good builds: 5.24065
2021-08-31 20:25:18,952 - build_test - MainThread - INFO - Average push time, all good builds: 6.22831111111
2021-08-31 20:53:14,810 - build_test - MainThread - INFO - Average push time, all good builds: 10.3179333333
2021-08-31 21:31:03,307 - build_test - MainThread - INFO - Average push time, all good builds: 9.20786666667
2021-08-31 19:46:36,754 - build_test - MainThread - INFO - Good builds included in stats: 2
2021-08-31 19:50:17,343 - build_test - MainThread - INFO - Good builds included in stats: 16
2021-08-31 19:55:46,539 - build_test - MainThread - INFO - Good builds included in stats: 30
2021-08-31 20:06:42,895 - build_test - MainThread - INFO - Good builds included in stats: 60
2021-08-31 20:25:18,951 - build_test - MainThread - INFO - Good builds included in stats: 90
2021-08-31 20:53:14,809 - build_test - MainThread - INFO - Good builds included in stats: 120
2021-08-31 21:31:03,306 - build_test - MainThread - INFO - Good builds included in stats: 150
==============================================================

https://drive.google.com/drive/folders/1y6FC1Cc_XWNF2BdYbrGzZhb-ouqKHWOC?usp=sharing

List of failing pods during run
{
 "timestamp": "2021-08-31 19:42:22",
 "count": 1,
 "issue": "pod crash",
 "name": "sdn-7nqtf",
 "component": "sdn"
 },
 {
 "timestamp": "2021-08-31 19:42:22",
 "count": 1,
 "issue": "pod crash",
 "name": "sdn-phdmd",
 "component": "sdn"
 },
 {
 "timestamp": "2021-08-31 21:10:15",
 "count": 1,
 "issue": "pod crash",
 "name": "sdn-phdmd",
 "component": "sdn"
 },
 {
 "timestamp": "2021-09-01 00:00:15",
 "count": 1,
 "issue": "pod crash",
 "name": "image-pruner-27174240-k8vff",
 "component": "image-registry"
 },
 {
 "timestamp": "2021-09-01 00:00:41",
 "count": 1,
 "issue": "pod crash",
 "name": "image-pruner-27174240-k8vff",
 "component": "image-registry"
 }

Comment 8 Alexander Constantinescu 2022-01-13 14:37:50 UTC
Could you please upload SDN logs from one of these runs? We can't really tell why the SDN pods are crashing without that.

Comment 9 Red Hat Bugzilla 2023-09-15 01:11:05 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.