Bug 2003558
Summary: | network-metrics-daemon not available after install: timed out waiting for OVS port binding (ovn-installed) | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Qiujie Li <qili> |
Component: | Networking | Assignee: | Martin Kennelly <mkennell> |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aconstan, anbhat, martin.kennelly, mifiedle, mkennell, vpickard |
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-09-22 10:06:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Qiujie Li
2021-09-13 08:41:09 UTC
Hi Could you please provide us with a must-gather or a cluster where this issue is reproduced? Thanks in advance, Alex @mkennell @aconstan let me know if you can't see the private Comment2 for the kubeconfig and must-gather. Bugzilla promoted "The assignee of this bug cannot see private comments!" when I marked it as private. I wonder if it is a fake alert as if you're members of redhat group, you should be able to see it. Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1996201 @qili I cannot access the kubeconfig. Permission denied. Can you correct this and give me access? Thank you. Like what Mike said, it maybe related to that issue. Can you check if there is a large amount of veth's on the node where this failed pod is scheduled? You may use oc debug node/${NODE_NAME} to get a terminal on the worker host. On that node run: # See number of OVS controlled interfaces ovs-vsctl --columns=name --data=bare --format=table list interface | wc -l # See number of interfaces actually on the node ifconfig | grep MULTICAST | wc -l This will give you an idea of the number of interfaces seen on the host and the number managed by OVS. If there is a large discrepancy then it may indicate the veth "leak" is occurring and what Mike linked above is occurring. Also, this veth leak will clean itself up after a period of time, so you may not see the veth's anymore. You may have to rerun your test suite, see if there is a rising level of veths present on the host. @mkennell Sorry the test cluster has already been destroyed. This does not reproduce every time in my previous tests, I will try the step you suggest when I can reproduce it again. Thanks for the triage. Thank you for the update. As stated by Qiujie Li, network-metrics-daemon fails to roll out during install on one node. This is not a blocker+ for OVN-K because it does not degrade provisioning of work loads following the failure seen. Test of workloads were still carried out successfully. I am trying to understand now why network-metrics-daemon failed to provision and is it related to OVN-K. The must-gather logs were retrieved 7 days post install and when this error occurred. ovnkube-master logs unfortunately do not cover the period of time when this pods LSP was or was not added. @qili: Can you repro and get me a must-gather? Chatted with Riccardo. He is further ahead with his understanding of this bug. Thank you Qiujie Li for repro this issue. I passed the details of the cluster to Riccardo. *** This bug has been marked as a duplicate of bug 1997205 *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |