Bug 1982448
Summary: | [OVN Kubernetes] ovnkube-node container withing ovnkube-node pods report readiness probe times out | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Kedar Kulkarni <kkulkarn> |
Component: | Networking | Assignee: | Mohamed Mahmoud <mmahmoud> |
Networking sub component: | ovn-kubernetes | QA Contact: | Kedar Kulkarni <kkulkarn> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | anbhat, astoycos, dblack, jlema, pehunt, smalleni |
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | perfscale-ovn | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-23 13:23:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Kedar Kulkarni
2021-07-14 20:48:01 UTC
I would say this is a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1978268 *** This bug has been marked as a duplicate of bug 1978268 *** I am commenting here because I did not see any OVN related info in bug 1978268, and we are only observing readiness probe timeout in the ovn and openshift-kni-projects. We are getting the same errors during 120 node baremetal perf testing, both for ovnkube-master pods and ovnkube-node ones: $ oc get events -n openshift-ovn-kubernetes LAST SEEN TYPE REASON OBJECT MESSAGE 21s Warning Unhealthy pod/ovnkube-master-6qhlm Readiness probe failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=3 cluster/status OVN_Southbound ++ grep 'Leader: unknown' ++ true + leader_status= 4m51s Warning Unhealthy pod/ovnkube-master-6qhlm Readiness probe failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound ++ grep 'Leader: unknown' ++ true + leader_status= 137m Warning Unhealthy pod/ovnkube-master-qcwqc Readiness probe failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound ++ grep 'Leader: unknown' ++ true + leader_status= 137m Warning Unhealthy pod/ovnkube-master-qcwqc Readiness probe failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=3 cluster/status OVN_Southbound ++ grep 'Leader: unknown' ++ true + leader_status= 132m Warning Unhealthy pod/ovnkube-master-zhfr7 Readiness probe failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 cluster/status OVN_Northbound ++ grep 'Leader: unknown' ++ true + leader_status= 3m39s Warning Unhealthy pod/ovnkube-master-zhfr7 Readiness probe failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=3 cluster/status OVN_Southbound ++ grep 'Leader: unknown' ++ true + leader_status= 109s Warning Unhealthy pod/ovnkube-node-4fz8h Readiness probe failed: command timed out 4s Warning Unhealthy pod/ovnkube-node-8wjjr Readiness probe failed: command timed out 32m Warning Unhealthy pod/ovnkube-node-jmr47 Readiness probe failed: command timed out 105s Warning Unhealthy pod/ovnkube-node-pn4v8 Readiness probe failed: command timed out 2m18s Warning Unhealthy pod/ovnkube-node-tf576 Readiness probe failed: command timed out Hi, I ran the tests for 120 worker cluster_density with 1000 iterations, and I couldn't reproduce the problem with the latest build of 4.9 nightly. I didn't see any " Warning Unhealthy 33s (x92 over 3h42m) kubelet Readiness probe failed: command timed out". It is ok for this bz to remain closed duplicate. Build version:4.9.0-0.nightly-2021-07-29-004741 @jlema Please open a new bug to track the errors mentioned in comment 4. Thanks, KK. (In reply to Jose Castillo Lema from comment #4) > I am commenting here because I did not see any OVN related info in bug > 1978268, and we are only observing readiness probe timeout in the ovn and > openshift-kni-projects. > > We are getting the same errors during 120 node baremetal perf testing, both > for ovnkube-master pods and ovnkube-node ones: > $ oc get events -n openshift-ovn-kubernetes > LAST SEEN TYPE REASON OBJECT MESSAGE > 21s Warning Unhealthy pod/ovnkube-master-6qhlm Readiness probe > failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=3 > cluster/status OVN_Southbound > ++ grep 'Leader: unknown' > ++ true > + leader_status= > 4m51s Warning Unhealthy pod/ovnkube-master-6qhlm Readiness probe > failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 > cluster/status OVN_Northbound > ++ grep 'Leader: unknown' > ++ true > + leader_status= > 137m Warning Unhealthy pod/ovnkube-master-qcwqc Readiness probe > failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 > cluster/status OVN_Northbound > ++ grep 'Leader: unknown' > ++ true > + leader_status= > 137m Warning Unhealthy pod/ovnkube-master-qcwqc Readiness probe > failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=3 > cluster/status OVN_Southbound > ++ grep 'Leader: unknown' > ++ true > + leader_status= > 132m Warning Unhealthy pod/ovnkube-master-zhfr7 Readiness probe > failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=3 > cluster/status OVN_Northbound > ++ grep 'Leader: unknown' > ++ true > + leader_status= > 3m39s Warning Unhealthy pod/ovnkube-master-zhfr7 Readiness probe > failed: ++ /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=3 > cluster/status OVN_Southbound > ++ grep 'Leader: unknown' > ++ true > + leader_status= > 109s Warning Unhealthy pod/ovnkube-node-4fz8h Readiness probe > failed: command timed out > 4s Warning Unhealthy pod/ovnkube-node-8wjjr Readiness probe > failed: command timed out > 32m Warning Unhealthy pod/ovnkube-node-jmr47 Readiness probe > failed: command timed out > 105s Warning Unhealthy pod/ovnkube-node-pn4v8 Readiness probe > failed: command timed out > 2m18s Warning Unhealthy pod/ovnkube-node-tf576 Readiness probe > failed: command timed out What version are you seeing this on? I think this BZ tracks 4.9 Apologies, I haven't realized this is a BZ tracking 4.9. I have seen this events in a OCP 4.8.0-fc.9 (local gateway / ovn2.13-20.12.0-25.el8fdp.x86_64). |