Bug 1886940 - CI tests on api.ci.openshift.com can timeout waiting for dns response
Summary: CI tests on api.ci.openshift.com can timeout waiting for dns response
Keywords:
Status: CLOSED DUPLICATE of bug 1846875
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Multi-Arch
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Deep Mistry
QA Contact: Jeremy Poulin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-09 19:01 UTC by Jeremy Poulin
Modified: 2021-02-20 20:42 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
[sig-builds][Feature:Builds] oc new-app should succeed with a --name of 58 characters
Last Closed: 2021-02-20 20:42:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeremy Poulin 2020-10-09 19:01:01 UTC
test:
[sig-builds][Feature:Builds] oc new-app  should succeed with a --name of 58 characters 
[sig-builds][Feature:Builds] oc new-app should fail with a --name longer than 58 characters 
[sig-builds][Feature:Builds] oc new-app should succeed with an imagestream


is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-builds%5C%5D%5C%5BFeature%3ABuilds%5C%5D+oc+new-app++should+succeed+with+a+--name+of+58+characters


FIXME: Replace this paragraph with a particular job URI from the search results to ground discussion.  A given test may fail for several reasons, and this bug should be scoped to one of those reasons.  Ideally you'd pick a job showing the most-common reason, but since that's hard to determine, you may also chose to pick a job at random.  Release-gating jobs (release-openshift-...) should be preferred over presubmits (pull-ci-...) because they are closer to the released product and less likely to have in-flight code changes that complicate analysis.

FIXME: Provide a snippet of the test failure or error from the job log

Comment 2 Dan Li 2020-10-19 20:22:46 UTC
Hi @Deep, will this bug be resolved before the end of this sprint (Oct. 24th)? If not, can we add the "UpcomingSprint" tag?

Comment 3 Dan Li 2020-11-12 00:57:18 UTC
Hi Deep, will this bug be resolved by the end of this sprint (Nov 14th)? If not, can we add the "UpcomingSprint" label?

Comment 4 Dan Li 2020-11-12 13:35:26 UTC
Adding UpcomingSprint per Deep

Comment 8 Dan Li 2020-12-02 18:44:32 UTC
Hi Deep, will this bug be resolved before the end of this sprint (Dec 5th)? If not, can we add "UpcomingSprint"?

Comment 9 Dan Li 2020-12-15 18:27:04 UTC
Hi Deep, I am doing this exercise one week early because most people are out next week. 

1. Do you think this bug will be resolved before the end of this sprint (December 26th)? If not, I'd like to add "UpcomingSprint"
2. Do you think this bug's Target Release is still 4.7.0? If it does not target 4.7, can we set it to blank value "---"?

Comment 10 Jeremy Poulin 2021-01-05 17:17:14 UTC
Lots of updates for this issue today:

I was able to reproduce the issue in a live state. Debugging threads:
https://coreos.slack.com/archives/C017UFPQA4X/p1609863078302400
https://coreos.slack.com/archives/CBN38N3MW/p1609864629064900

Based on the second thread listed above, our best bet is to update to one of the 4.x clusters set up for CI, since there is no way to resolve dns issues in the 3.11 cluster we currently run in.
Since this is happening so close to the 4.7 release, I've notified CI stakeholders and plan to have this scheduled outage set up so that we can plan to have it post release of 4.7.

In the meantime, I've instructed our teams to file patches to disable tests that rely on external networking in the 3.11 cluster.
https://coreos.slack.com/archives/C017UFPQA4X/p1609866503321700

Comment 11 Dan Li 2021-02-01 15:23:20 UTC
Hi Deep, do you think this bug will be resolved before the end of this sprint (Feb. 6th)? If not, can we set the "Reviewed-in-Sprint" flag to "+"?

Comment 12 Jeremy Poulin 2021-02-20 20:42:10 UTC
Closing this as duplicate of the networking instability bug, since both are tracking the cluster migration to 4.x.

*** This bug has been marked as a duplicate of bug 1846875 ***


Note You need to log in before you can comment on or make changes to this bug.