Bug 1825255
Summary: | Long timeout when connecting to services without endpoints | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sergio G. <sgarciam> | ||||||
Component: | Networking | Assignee: | Juan Luis de Sousa-Valadas <jdesousa> | ||||||
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> | ||||||
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | unspecified | CC: | andcosta, bbennett, jdesousa, zzhao | ||||||
Version: | 4.3.z | ||||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.7.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | SDN-CUST-IMPACT SDN-CI-IMPACT SDN-BP | ||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-09-07 14:20:00 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1781575, 1832332, 1834184 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Sergio G.
2020-04-17 13:45:34 UTC
Created attachment 1679669 [details]
reproducer for 4.2 including information about the cluster and iptables
Created attachment 1679670 [details]
reproducer for 4.3 including information about the cluster and iptables
Probably related with https://bugzilla.redhat.com/show_bug.cgi?id=1782857 but according to it, this should be already fixed. hi, Juan , this cluster will be destroyed by automated in 2 days. So if you find this cannot be used. you can ask weliang or anusaxen for helping. Andre, I think I found the issue, we're hitting the icmp rate limit. Can you ask the customer to try this is *exactly* the same issue they are seeing? 1- oc debug node <node name> On that terminal: 2- keep the output of: cat /proc/sys/net/ipv4/icmp_ratemask 3- Temporarly disable icmp rate limit: echo 0 > /proc/sys/net/ipv4/icmp_ratemask 4- On a diferent terminal: Verify if the issue still happens or if at least it happens intermittently. 5- On the previous terminal: Unless this has a business impact for them, on the terminal from step 1: echo <value from 1> > /proc/sys/net/ipv4/icmp_ratemask The reason to restore the rate limit is that currently we don't know why are there so many icmp packets or how many are there. Therefore if the limit is removed permanently on every node I cannot guarantee that it won't cause problems in the network, if it's a short period of time on just one node it shouldn't be noticeable. @Juan -- can we see if we can either change the rate limit, or change the rate mask to allow the icmp reporting no connection? Case is closed so this probably doesn't need a backport. Newer releases with newer kernels don't have this issue. Reopen this if a customer actually wants a backport. |