Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2025967

Summary: azure: "read: connection timed out" failing image pulls
Product: OpenShift Container Platform Reporter: David Eads <deads>
Component: NetworkingAssignee: mcambria <mcambria>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED WONTFIX Docs Contact:
Severity: high    
Priority: high CC: dgoodwin, jerzhang, mcambria, sdodson, sippy, trozet, wking
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-network] pods should successfully create sandboxes by not timing out
Last Closed: 2022-03-28 17:32:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Eads 2021-11-23 14:23:04 UTC
Digging into clusteroperator/dns failures on azure, we found that image pulls are failing with "read: connection timed out", but almost exclusively on azure.  https://search.ci.openshift.org/?search=read%3A+connection+timed+out&maxAge=48h&context=0&type=junit&name=4.10.*azure&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job shows that 35% of azure runs show this backoff behavior, but only 1% of non-azure clusters.

It's possible that this "read: connection timed out" impacts more communication than image pulls, but image pull errors are very well reported, so it's easy to find the runs that hit them.  These failures cause install and upgrade failures.

Comment 6 Devan Goodwin 2021-11-30 12:56:32 UTC
*** Bug 2011939 has been marked as a duplicate of this bug. ***

Comment 32 Devan Goodwin 2022-02-01 13:16:53 UTC
Hit rate continues to drop, last week is now 9%, last two days 5%. 

I am working on gathering some "az monitor metrics" data in CI runs now, but our reproducer is disappearing slowly.

Comment 35 Red Hat Bugzilla 2023-09-15 01:17:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days