Bug 1975155
Summary: | Kubernetes service IP cannot be accessed for rhel worker | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | zhaozhanqi <zzhao> |
Component: | Networking | Assignee: | Andrew Stoycos <astoycos> |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | trozet, wewang |
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 23:13:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
zhaozhanqi
2021-06-23 08:15:13 UTC
The problem looks to be the return syn/ack packet is getting dropped during an upcall to vswitchd. I setup a server on the master node, and then curl'ed from the pod on the rhel node. In OVS logs we can see the packet gets upcalled: Jun 24 10:16:37 wewang-623-rwwrt-rhel-0 ovs-vswitchd[1526]: ovs|00196|dpif(handler16)|DBG|system@ovs-system: action upcall: Jun 24 10:16:37 wewang-623-rwwrt-rhel-0 ovs-vswitchd[1526]: recirc_id(0x22),dp_hash(0),skb_priority(0),in_port(1),skb_mark(0),ct_state(0x2a),ct_zone(0xfa00),ct_mark(0),ct_label(0),ct_tuple4(src=172.31.249.173,dst=172.31.249.212,proto=6,tp_src=59310,tp_dst=1337),eth(src=0:50:56:ac:65:d9,dst=00:50:56:ac:e5:24),eth_type(0x0800),ipv4(src=172.31.249.212,dst=172.31.249.173,proto=6,tos=0,ttl=64,frag=no),tcp(src=1337,dst=59310),tcp_flags(syn|ack) Then if we look at the dpctl flows for recird id x22: recirc_id(0x22),in_port(1),ct_state(-new+est-rel+rpl-inv+trk),ct_label(0/0x3),eth(src=00:50:56:ac:65:d9,dst=00:50:56:ac:e5:24),eth_type(0x0800),ipv4(src=172.31.249.192/255.255.255.224,dst=172.31.249.173,proto=6,ttl=64,frag=no), packets:3789, bytes:280386, used:0.236s, flags:S., actions:userspace(pid=4294963116,slow_path(action)) recirc_id(0x22),in_port(1),ct_state(-new+est-rel+rpl-inv+trk),ct_label(0/0x3),eth(src=00:50:56:ac:67:86,dst=00:50:56:ac:e5:24),eth_type(0x0800),ipv4(src=172.31.249.0/255.255.255.128,dst=172.31.249.173,proto=6,ttl=64,frag=no), packets:3479, bytes:257446, used:0.734s, flags:S., actions:userspace(pid=4294963116,slow_path(action)) recirc_id(0x22),in_port(1),ct_state(-new+est-rel+rpl-inv+trk),ct_label(0/0x3),eth(src=00:50:56:ac:49:85,dst=00:50:56:ac:e5:24),eth_type(0x0800),ipv4(src=172.31.249.224/255.255.255.240,dst=172.31.249.173,proto=6,ttl=64,frag=no), packets:374, bytes:27676, used:1.488s, flags:S., actions:userspace(pid=4294963116,slow_path(action)) I'm wondering if this is due to check packet length action, and related to https://bugzilla.redhat.com/show_bug.cgi?id=1961506 We lost the test cluster so I was unable to try a workaround. Could you please retry with https://github.com/openshift/ovn-kubernetes/pull/584 ? Thanks. Checked on cluster 4.8.0-0.nightly-2021-06-24-222938 with 584 PR merged. it works well. $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.8.0-0.nightly-2021-06-24-222938 True False False 55m baremetal 4.8.0-0.nightly-2021-06-24-222938 True False False 85m cloud-credential 4.8.0-0.nightly-2021-06-24-222938 True False False 94m cluster-autoscaler 4.8.0-0.nightly-2021-06-24-222938 True False False 89m config-operator 4.8.0-0.nightly-2021-06-24-222938 True False False 90m console 4.8.0-0.nightly-2021-06-24-222938 True False False 36m csi-snapshot-controller 4.8.0-0.nightly-2021-06-24-222938 True False False 89m dns 4.8.0-0.nightly-2021-06-24-222938 True False False 85m etcd 4.8.0-0.nightly-2021-06-24-222938 True False False 89m image-registry 4.8.0-0.nightly-2021-06-24-222938 True False False 83m ingress 4.8.0-0.nightly-2021-06-24-222938 True False False 80m insights 4.8.0-0.nightly-2021-06-24-222938 True False False 84m kube-apiserver 4.8.0-0.nightly-2021-06-24-222938 True False False 86m kube-controller-manager 4.8.0-0.nightly-2021-06-24-222938 True False False 87m kube-scheduler 4.8.0-0.nightly-2021-06-24-222938 True False False 87m kube-storage-version-migrator 4.8.0-0.nightly-2021-06-24-222938 True False False 90m machine-api 4.8.0-0.nightly-2021-06-24-222938 True False False 86m machine-approver 4.8.0-0.nightly-2021-06-24-222938 True False False 89m machine-config 4.8.0-0.nightly-2021-06-24-222938 True False False 89m marketplace 4.8.0-0.nightly-2021-06-24-222938 True False False 89m monitoring 4.8.0-0.nightly-2021-06-24-222938 True False False 80m network 4.8.0-0.nightly-2021-06-24-222938 True False False 90m node-tuning 4.8.0-0.nightly-2021-06-24-222938 True False False 90m openshift-apiserver 4.8.0-0.nightly-2021-06-24-222938 True False False 80m openshift-controller-manager 4.8.0-0.nightly-2021-06-24-222938 True False False 89m openshift-samples 4.8.0-0.nightly-2021-06-24-222938 True False False 85m operator-lifecycle-manager 4.8.0-0.nightly-2021-06-24-222938 True False False 89m operator-lifecycle-manager-catalog 4.8.0-0.nightly-2021-06-24-222938 True False False 89m operator-lifecycle-manager-packageserver 4.8.0-0.nightly-2021-06-24-222938 True False False 86m service-ca 4.8.0-0.nightly-2021-06-24-222938 True False False 90m storage 4.8.0-0.nightly-2021-06-24-222938 True False False 90m $ oc rsh hello-8blgk / # curl https://172.30.0.1:443 curl: (60) SSL certificate problem: self signed certificate in certificate chain More details here: https://curl.haxx.se/docs/sslcerts.html curl failed to verify the legitimacy of the server and therefore could not establish a secure connection to it. To learn more about this situation and how to fix it, please visit the web page mentioned above. / # curl https://172.30.0.1:443 -k { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"", "reason": "Forbidden", "details": { }, "code": 403 Move this bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |