Description of problem: On running node-density-heavy workload using this script - https://github.com/cloud-bulldozer/e2e-benchmarking/tree/master/workloads/kube-burner#node-density-and-node-density-heavy-variables, pods get into CrashLoopBackOff due to readiness probe failure and 'dial tcp 172.30.173.143:5432: connect: connection refused'. The service endpoint(172.30.173.143) is available and able to connect from a different test container. Version-Release number of selected component (if applicable): 4.11.0-rc.1 How reproducible: Always Steps to Reproduce: 1. Deploy a healthy managed-service ROSA cluster using OVNK (or OSD GCP), with 24 worker nodes at least 2. Run node-density-heavy workload, with 245 pods per node 3. Wait for pods to come up and gets in to CrashLoopBackOff state Actual results: Pods are failing to connect to the service IP and hence readiness probe fails Expected results: Pod should able to connect within expected time Additional info: That Script creates 245 pods per node, that totals to ~2.5k server pods, ~2.5 client pods, ~2.5k clusterIp services on 24 node cluster. must-gather logs - http://perf1.perf.lab.eng.bos.redhat.com/pub/mukrishn/bz_2111619/
Picking a crashing pod: $ oc get pod -n 32786f34-node-density-heavy-20220727 perfapp-1-967-6dc55874bf-98htj -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES perfapp-1-967-6dc55874bf-98htj 0/1 CrashLoopBackOff 385 (4m48s ago) 21h 10.129.10.109 ip-10-0-143-216.us-west-2.compute.internal <none> <none> its crashing because: kubelet Liveness probe failed: Get "http://10.129.10.109:8080/health": dial tcp 10.129.10.109:8080: connect: connection refused health probe checks are failing since the perfapp container isn't coming up. logs indicate: time="2022-07-28 13:13:20" level=info msg="Connecting with database postgres-967:5432" time="2022-07-28 13:13:20" level=warning msg="dial tcp 172.30.12.40:5432: connect: connection refused" time="2022-07-28 13:13:20" level=warning msg="Retrying connection with postgres-967:5432 in 5 seconds" $ oc get svc -A -owide | grep 172.30.12.40 32786f34-node-density-heavy-20220727 postgres-967 ClusterIP 172.30.12.40 <none> 5432/TCP 21h name=postgres-1-967 $ oc describe svc -n 32786f34-node-density-heavy-20220727 postgres-967 Name: postgres-967 Namespace: 32786f34-node-density-heavy-20220727 Labels: kube-burner-index=2 kube-burner-job=node-density-heavy kube-burner-uuid=32786f34-node-density-heavy-20220727 Annotations: <none> Selector: name=postgres-1-967 Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.12.40 IPs: 172.30.12.40 Port: <unset> 5432/TCP TargetPort: 5432/TCP Endpoints: <none> Session Affinity: None Events: <none> $ oc describe ep -n 32786f34-node-density-heavy-20220727 postgres-967 Name: postgres-967 Namespace: 32786f34-node-density-heavy-20220727 Labels: kube-burner-index=2 kube-burner-job=node-density-heavy kube-burner-uuid=32786f34-node-density-heavy-20220727 Annotations: <none> Subsets: Events: <none> The service doesn't have any endpoints: $ oc get pods -owide -A | grep postgres-1-967 32786f34-node-density-heavy-20220727 postgres-1-967-ff7c86cbd-kksjq 0/1 Pending 0 20h <none> <none> <none> <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 57m (x158 over 14h) default-scheduler 0/36 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {role: workload}, 36 node(s) didn't match Pod's node affinity/selector, 7 Too many pods. preemption: 0/36 nodes are available: 36 Preemption is not helpful for scheduling.
After the pending pods were cleaned up we didn't see this issue. Murali is doing another run to see if we can reproduce this when endpoints are healthy.
We found a case where pod isn't able to talk to service: $ oc get pods -owide -A | grep 2115 e68cc68f-node-density-heavy-20220728 perfapp-1-2115-6467b56bf5-xpzx6 0/1 CrashLoopBackOff 17 (83s ago) 73m 10.130.11.210 ip-10-0-178-75.us-west-2.compute.internal <none> <none> e68cc68f-node-density-heavy-20220728 postgres-1-2115-6458cb9bfc-xjknp 1/1 Running 1 73m 10.130.11.209 ip-10-0-178-75.us-west-2.compute.internal <none> <none> $ oc describe ep -n e68cc68f-node-density-heavy-20220728 postgres-2115 Name: postgres-2115 Namespace: e68cc68f-node-density-heavy-20220728 Labels: kube-burner-index=2 kube-burner-job=node-density-heavy kube-burner-uuid=e68cc68f-node-density-heavy-20220728 Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2022-07-28T16:17:25Z Subsets: Addresses: 10.130.11.209 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- <unset> 5432 TCP Events: <none> [surya@hidden-temple ovn-kubernetes]$ oc describe svc -n e68cc68f-node-density-heavy-20220728 postgres-2115 Name: postgres-2115 Namespace: e68cc68f-node-density-heavy-20220728 Labels: kube-burner-index=2 kube-burner-job=node-density-heavy kube-burner-uuid=e68cc68f-node-density-heavy-20220728 Annotations: <none> Selector: name=postgres-1-2115 Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.240.216 IPs: 172.30.240.216 Port: <unset> 5432/TCP TargetPort: 5432/TCP Endpoints: 10.130.11.209:5432 Session Affinity: None Events: <none> $ oc logs -n e68cc68f-node-density-heavy-20220728 perfapp-1-2115-6467b56bf5-xpzx6 time="2022-07-28 16:58:59" level=info msg="Connecting with database postgres-2115:5432" time="2022-07-28 16:58:59" level=warning msg="dial tcp 172.30.240.216:5432: connect: connection refused" time="2022-07-28 16:58:59" level=warning msg="Retrying connection with postgres-2115:5432 in 5 seconds" time="2022-07-28 16:59:04" level=info msg="Connecting with database postgres-2115:5432" time="2022-07-28 16:59:04" level=warning msg="dial tcp 172.30.240.216:5432: connect: connection refused" time="2022-07-28 16:59:04" level=warning msg="Retrying connection with postgres-2115:5432 in 5 seconds" time="2022-07-28 16:59:09" level=info msg="Connecting with database postgres-2115:5432" The client pod and endpoint pod are on the same node! ovn-trace looks fine: sh-4.4# ovn-trace --ct new 'inport=="e68cc68f-node-density-heavy-20220728_perfapp-1-2115-6467b56bf5-xpzx6" && eth.src==0a:58:0a:82:0b:d2 && eth.dst==0a:58:0a:82:0a:01 && ip4 .src==10.130.11.210 && ip4.dst==172.30.240.216 && ip.ttl==64 && tcp && tcp.src==80 && tcp.dst==5432' # tcp,reg14=0xbd,vlan_tci=0x0000,dl_src=0a:58:0a:82:0b:d2,dl_dst=0a:58:0a:82:0a:01,nw_src=10.130.11.210,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=54 32,tcp_flags=0 ingress(dp="ip-10-0-178-75.us-west-2.compute.internal", inport="e68cc68f-node-density-heavy-20220728_perfapp-1-2115-6467b56bf5-xpzx6") -------------------------------------------------------------------------------------------------------------------------------------- 0. ls_in_check_port_sec (northd.c:7623): 1, priority 50, uuid 67bb3e9b reg0[15] = check_in_port_sec(); next; 4. ls_in_pre_acl (northd.c:5593): ip, priority 100, uuid afb544c6 reg0[0] = 1; next; 5. ls_in_pre_lb (northd.c:5764): ip, priority 100, uuid b294b6c7 reg0[2] = 1; next; 6. ls_in_pre_stateful (northd.c:5791): reg0[2] == 1 && ip4 && tcp, priority 120, uuid 13f53fdd reg1 = ip4.dst; reg2[0..15] = tcp.dst; ct_lb_mark; ct_lb_mark [100/14473] ---------- 7. ls_in_acl_hint (northd.c:5864): ct.new && !ct.est, priority 7, uuid e75feee6 reg0[7] = 1; reg0[9] = 1; next; 8. ls_in_acl (northd.c:6458): ip && !ct.est, priority 1, uuid 00fae6dd reg0[1] = 1; next; 11. ls_in_lb (northd.c:6765): ct.new && ip4.dst == 172.30.240.216 && tcp.dst == 5432, priority 120, uuid f9883682 reg0[1] = 0; reg1 = 172.30.240.216; reg2[0..15] = 5432; ct_lb_mark(backends=10.130.11.209:5432); ct_lb_mark /* default (use --ct to customize) */ ------------------------------------------------ 14. ls_in_pre_hairpin (northd.c:6832): ip && ct.trk, priority 100, uuid 519b4bb4 reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); *** chk_lb_hairpin_reply action not implemented next; 23. ls_in_l2_lkup (northd.c:8363): eth.dst == 0a:58:0a:82:0a:01, priority 50, uuid 2a6f3080 outport = "stor-ip-10-0-178-75.us-west-2.compute.internal"; output; egress(dp="ip-10-0-178-75.us-west-2.compute.internal", inport="e68cc68f-node-density-heavy-20220728_perfapp-1-2115-6467b56bf5-xpzx6", outport="stor-ip-10-0-178-75.us-west-2. compute.internal") ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------ 0. ls_out_pre_lb (northd.c:5483): ip && outport == "stor-ip-10-0-178-75.us-west-2.compute.internal", priority 110, uuid 5aebf16e next; 1. ls_out_pre_acl (northd.c:5483): ip && outport == "stor-ip-10-0-178-75.us-west-2.compute.internal", priority 110, uuid 1c159bb2 next; 3. ls_out_acl_hint (northd.c:5916): ct.est && ct_mark.blocked == 0, priority 1, uuid 13703a08 reg0[10] = 1; next; 8. ls_out_check_port_sec (northd.c:5449): 1, priority 0, uuid 5d6c21a8 reg0[15] = check_out_port_sec(); next; 9. ls_out_apply_port_sec (northd.c:5454): 1, priority 0, uuid c80920c0 output; /* output to "stor-ip-10-0-178-75.us-west-2.compute.internal", type "patch" */ ingress(dp="ovn_cluster_router", inport="rtos-ip-10-0-178-75.us-west-2.compute.internal") ----------------------------------------------------------------------------------------- 0. lr_in_admission (northd.c:10720): eth.dst == 0a:58:0a:82:0a:01 && inport == "rtos-ip-10-0-178-75.us-west-2.compute.internal" && is_chassis_resident("cr-rtos-ip-10-0-178- 75.us-west-2.compute.internal"), priority 50, uuid ac5f93c4 xreg0[0..47] = 0a:58:0a:82:0a:01; next; 1. lr_in_lookup_neighbor (northd.c:10863): 1, priority 0, uuid 3ff8a4a2 reg9[2] = 1; next; 2. lr_in_learn_neighbor (northd.c:10872): reg9[2] == 1, priority 100, uuid d45ed856 next; 10. lr_in_ip_routing_pre (northd.c:11095): 1, priority 0, uuid 06721c07 reg7 = 0; next; 11. lr_in_ip_routing (northd.c:9520): ip4.dst == 10.130.10.0/23, priority 71, uuid d4629e4c ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 10.130.10.1; eth.src = 0a:58:0a:82:0a:01; outport = "rtos-ip-10-0-178-75.us-west-2.compute.internal"; flags.loopback = 1; next; 12. lr_in_ip_routing_ecmp (northd.c:11170): reg8[0..15] == 0, priority 150, uuid d9fc79f9 next; 13. lr_in_policy (northd.c:8741): ip4.src == 10.128.0.0/14 && ip4.dst == 10.128.0.0/14, priority 101, uuid 83c48946 reg8[0..15] = 0; next; 14. lr_in_policy_ecmp (northd.c:11338): reg8[0..15] == 0, priority 150, uuid dfdc5b37 next; 15. lr_in_arp_resolve (northd.c:11542): outport == "rtos-ip-10-0-178-75.us-west-2.compute.internal" && reg0 == 10.130.11.209, priority 100, uuid 727f7b3c eth.dst = 0a:58:0a:82:0b:d1; next; 18. lr_in_gw_redirect (northd.c:11939): outport == "rtos-ip-10-0-178-75.us-west-2.compute.internal", priority 50, uuid 475902c1 outport = "cr-rtos-ip-10-0-178-75.us-west-2.compute.internal"; next; 19. lr_in_arp_request (northd.c:12035): 1, priority 0, uuid de98330d output; /* Replacing type "chassisredirect" outport "cr-rtos-ip-10-0-178-75.us-west-2.compute.internal" with distributed port "rtos-ip-10-0-178-75.us-west-2.compute.internal". * / egress(dp="ovn_cluster_router", inport="rtos-ip-10-0-178-75.us-west-2.compute.internal", outport="rtos-ip-10-0-178-75.us-west-2.compute.internal") -------------------------------------------------------------------------------------------------------------------------------------------------- 0. lr_out_chk_dnat_local (northd.c:13300): 1, priority 0, uuid 10ea4083 reg9[4] = 0; next; 6. lr_out_delivery (northd.c:12083): outport == "rtos-ip-10-0-178-75.us-west-2.compute.internal", priority 100, uuid a6c539f0 output; /* output to "rtos-ip-10-0-178-75.us-west-2.compute.internal", type "patch" */ ingress(dp="ip-10-0-178-75.us-west-2.compute.internal", inport="stor-ip-10-0-178-75.us-west-2.compute.internal") ---------------------------------------------------------------------------------------------------------------- 0. ls_in_check_port_sec (northd.c:7623): 1, priority 50, uuid 67bb3e9b reg0[15] = check_in_port_sec(); next; 4. ls_in_pre_acl (northd.c:5480): ip && inport == "stor-ip-10-0-178-75.us-west-2.compute.internal", priority 110, uuid 7c57d21b next; 5. ls_in_pre_lb (northd.c:5480): ip && inport == "stor-ip-10-0-178-75.us-west-2.compute.internal", priority 110, uuid a5e01bf3 next; 7. ls_in_acl_hint (northd.c:5916): ct.est && ct_mark.blocked == 0, priority 1, uuid 93615c46 reg0[10] = 1; next; 14. ls_in_pre_hairpin (northd.c:6832): ip && ct.trk, priority 100, uuid 519b4bb4 reg0[6] = chk_lb_hairpin(); reg0[12] = chk_lb_hairpin_reply(); *** chk_lb_hairpin_reply action not implemented next; 23. ls_in_l2_lkup (northd.c:8292): eth.dst == 0a:58:0a:82:0b:d1, priority 50, uuid d235f420 outport = "e68cc68f-node-density-heavy-20220728_postgres-1-2115-6458cb9bfc-xjknp"; output; egress(dp="ip-10-0-178-75.us-west-2.compute.internal", inport="stor-ip-10-0-178-75.us-west-2.compute.internal", outport="e68cc68f-node-density-heavy-20220728_postgres-1-2115-6458cb9bfc-xjknp") ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0. ls_out_pre_lb (northd.c:5766): ip, priority 100, uuid 62c6397e reg0[2] = 1; next; 1. ls_out_pre_acl (northd.c:5595): ip, priority 100, uuid 7b99ff66 reg0[0] = 1; next; 2. ls_out_pre_stateful (northd.c:5811): reg0[2] == 1, priority 110, uuid f94600af ct_lb_mark; ct_lb_mark /* default (use --ct to customize) */ ------------------------------------------------ 3. ls_out_acl_hint (northd.c:5916): ct.est && ct_mark.blocked == 0, priority 1, uuid 13703a08 reg0[10] = 1; next; 8. ls_out_check_port_sec (northd.c:5449): 1, priority 0, uuid 5d6c21a8 reg0[15] = check_out_port_sec(); next; 9. ls_out_apply_port_sec (northd.c:5454): 1, priority 0, uuid c80920c0 output; /* output to "e68cc68f-node-density-heavy-20220728_postgres-1-2115-6458cb9bfc-xjknp", type "" */
ovn-controller seems to be having issues with openflow installs: 2022-07-28T16:17:25.928Z|03812|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x48e43): OFPGMFC_GROUP_EXISTS OFPT_BUNDLE_ADD_MESSAGE (OF1.5) (xid=0x48e43): bundle_id=0x6ea1 flags=atomic ordered OFPT_GROUP_MOD (OF1.5) (xid=0x48e43): ADD group_id=4,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=10.130.11.209:5432),exec(set_field:0x2/0x2->ct_mark)) 2022-07-28T16:17:25.928Z|03813|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x48e4a): OFPBFC_MSG_FAILED OFPT_BUNDLE_CONTROL (OF1.5) (xid=0x48e4a): bundle_id=0x6ea1 type=COMMIT_REQUEST flags=atomic ordered digging in deeper I don't see any groups installed for the load balancer VIP we are interested in: sh-4.4# ovs-ofctl dump-groups br-int | grep 10.130.11.209 sh-4.4# NADA. According to OVN team: https://coreos.slack.com/archives/C01G7T6SYSD/p1659029855612459 this might be an OVN bug. BAD OVS trace: perfapp-1-2115-6467b56bf5-xpzx6 to postgres-1-2115-6458cb9bfc-xjknp sh-4.4# ovs-vsctl --columns ofport find interface external_ids:iface-id=e68cc68f-node-density-heavy-20220728_perfapp-1-2115-6467b56bf5-xpzx6 ofport : 457 sh-4.4# ovs-appctl ofproto/trace br-int in_port=457,dl_src=0a:58:0a:82:0b:d2,dl_dst=0a:58:0a:82:0a:01,tcp,nw_src=10.130.11.210,nw_dst=172.30.240.216,tp_dst=5432,tp_src=80,nw _ttl=64,dp_hash=1 Flow: dp_hash=0x1,tcp,in_port=457,vlan_tci=0x0000,dl_src=0a:58:0a:82:0b:d2,dl_dst=0a:58:0a:82:0a:01,nw_src=10.130.11.210,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp _src=80,tp_dst=5432,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=457, priority 100, cookie 0xf5ecb115 set_field:0xc4->reg13 set_field:0x5->reg11 set_field:0xd->reg12 set_field:0x2f->metadata set_field:0xbd->reg14 resubmit(,8) 8. metadata=0x2f, priority 50, cookie 0x67bb3e9b set_field:0/0x1000->reg10 resubmit(,73) 73. ip,reg14=0xbd,metadata=0x2f,dl_src=0a:58:0a:82:0b:d2,nw_src=10.130.11.210, priority 90, cookie 0xf5ecb115 set_field:0/0x1000->reg10 move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] -> NXM_NX_XXREG0[111] is now 0 resubmit(,9) 9. metadata=0x2f, priority 0, cookie 0x62dbc406 resubmit(,10) 10. metadata=0x2f, priority 0, cookie 0xfc1e5e8 resubmit(,11) 11. metadata=0x2f, priority 0, cookie 0x700f46a5 resubmit(,12) 12. ip,metadata=0x2f, priority 100, cookie 0xafb544c6 set_field:0x1000000000000000000000000/0x1000000000000000000000000->xxreg0 resubmit(,13) 13. ip,metadata=0x2f, priority 100, cookie 0xb294b6c7 set_field:0x4000000000000000000000000/0x4000000000000000000000000->xxreg0 resubmit(,14) 14. tcp,reg0=0x4/0x4,metadata=0x2f, priority 120, cookie 0x13f53fdd move:NXM_OF_IP_DST[]->NXM_NX_XXREG0[64..95] -> NXM_NX_XXREG0[64..95] is now 0xac1ef0d8 move:NXM_OF_TCP_DST[]->NXM_NX_XXREG0[32..47] -> NXM_NX_XXREG0[32..47] is now 0x1538 ct(table=15,zone=NXM_NX_REG13[0..15],nat) nat -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 15. -> Sets the packet to an untracked state, and clears all the conntrack fields. Final flow: dp_hash=0x1,tcp,reg0=0x5,reg1=0xac1ef0d8,reg2=0x1538,reg11=0x5,reg12=0xd,reg13=0xc4,reg14=0xbd,metadata=0x2f,in_port=457,vlan_tci=0x0000,dl_src=0a:58:0a:82:0b:d2 ,dl_dst=0a:58:0a:82:0a:01,nw_src=10.130.11.210,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=5432,tcp_flags=0 Megaflow: recirc_id=0,ct_state=-new-est-trk,ct_mark=0/0x2,eth,tcp,in_port=457,dl_src=0a:58:0a:82:0b:d2,dl_dst=0a:58:0a:82:0a:01,nw_src=10.130.11.210,nw_dst=172.30.240.216,nw _frag=no,tp_dst=5432 Datapath actions: ct(zone=196,nat),recirc(0x2945ab) =============================================================================== recirc(0x2945ab) - resume conntrack with default ct_state=trk|new (use --ct-next to customize) Replacing src/dst IP/ports to simulate NAT: Initial flow: Modified flow: =============================================================================== Flow: recirc_id=0x2945ab,dp_hash=0x1,ct_state=new|trk,ct_zone=196,eth,tcp,reg0=0x5,reg1=0xac1ef0d8,reg2=0x1538,reg11=0x5,reg12=0xd,reg13=0xc4,reg14=0xbd,metadata=0x2f,in_por t=457,vlan_tci=0x0000,dl_src=0a:58:0a:82:0b:d2,dl_dst=0a:58:0a:82:0a:01,nw_src=10.130.11.210,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=5432,tcp_flag s=0 bridge("br-int") ---------------- thaw Resuming from table 15 15. ct_state=+new-est+trk,metadata=0x2f, priority 7, cookie 0xe75feee6 set_field:0x80000000000000000000000000/0x80000000000000000000000000->xxreg0 set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0 resubmit(,16) 16. ct_state=-est+trk,ip,metadata=0x2f, priority 1, cookie 0xfae6dd set_field:0x2000000000000000000000000/0x2000000000000000000000000->xxreg0 resubmit(,17) 17. metadata=0x2f, priority 0, cookie 0x78a29936 resubmit(,18) 18. metadata=0x2f, priority 0, cookie 0x349041a3 resubmit(,19) 19. ct_state=+new+trk,tcp,metadata=0x2f,nw_dst=172.30.240.216,tp_dst=5432, priority 120, cookie 0x7df08bb3 set_field:0/0xffffffff000000000000000000000000->xxreg0 controller(userdata=00.00.00.16.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1b.00.00.00.01.1c.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1b.00.00.00.01.1e.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1c.00.00.00.01.1c.04.00.20.00.00.00.00.00.00.ff.ff.00.18.00.00.23.20.00.1c.00.00.00.01.1e.04.00.20.00.00.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.2d.00.00.00) Final flow: recirc_id=0x2945ab,dp_hash=0x1,ct_state=new|trk,ct_zone=196,eth,tcp,reg1=0xac1ef0d8,reg2=0x1538,reg11=0x5,reg12=0xd,reg13=0xc4,reg14=0xbd,metadata=0x2f,in_port=457,vlan_tci=0x0000,dl_src=0a:58:0a:82:0b:d2,dl_dst=0a:58:0a:82:0a:01,nw_src=10.130.11.210,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=5432,tcp_flags=0 Megaflow: recirc_id=0x2945ab,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,tcp,in_port=457,dl_dst=0a:58:0a:82:0a:01,nw_dst=172.30.240.216,nw_frag=no,tp_dst=5432 Datapath actions: userspace(pid=4294967295,controller(reason=1,dont_send=1,continuation=0,recirc_id=2704812,rule_cookie=0x7df08bb3,controller_id=0,max_len=65535)) This is an OVN bug since the flows are not being installed correctly. GOOD OVS TRACE: perfapp-1-1-77db8b4657-w9tkq to postgres-1-2115-6458cb9bfc-xjknp sh-4.4# ovs-ofctl dump-groups br-int | grep 10.130.11.209 group_id=763,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=15,zone=NXM_NX_REG11[0..15],nat(dst=10.130.11.209:5432),exec(load:0x1->NXM_NX_CT_MARK[1])) group_id=764,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=10.130.11.209:5432),exec(load:0x1->NXM_NX_CT_MARK[1])) sh-4.4# ovs-appctl ofproto/trace br-int in_port=279,dl_src=0a:58:0a:80:0b:0b,dl_dst=0a:58:0a:80:0a:01,tcp,nw_src=10.128.11.11,nw_dst=172.30.240.216,tp_dst=5432,tp_src=80,nw_ ttl=64,dp_hash=1 Flow: dp_hash=0x1,tcp,in_port=279,vlan_tci=0x0000,dl_src=0a:58:0a:80:0b:0b,dl_dst=0a:58:0a:80:0a:01,nw_src=10.128.11.11,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_ src=80,tp_dst=5432,tcp_flags=0 bridge("br-int") ---------------- 0. in_port=279, priority 100, cookie 0x6dba739b set_field:0xb->reg13 set_field:0x11->reg11 set_field:0x4->reg12 set_field:0x25->metadata set_field:0xc->reg14 resubmit(,8) 8. metadata=0x25, priority 50, cookie 0x67bb3e9b set_field:0/0x1000->reg10 resubmit(,73) 73. ip,reg14=0xc,metadata=0x25,dl_src=0a:58:0a:80:0b:0b,nw_src=10.128.11.11, priority 90, cookie 0x6dba739b set_field:0/0x1000->reg10 move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] -> NXM_NX_XXREG0[111] is now 0 resubmit(,9) 9. metadata=0x25, priority 0, cookie 0x62dbc406 resubmit(,10) 10. metadata=0x25, priority 0, cookie 0xfc1e5e8 resubmit(,11) 11. metadata=0x25, priority 0, cookie 0x700f46a5 resubmit(,12) 12. ip,metadata=0x25, priority 100, cookie 0xafb544c6 set_field:0x1000000000000000000000000/0x1000000000000000000000000->xxreg0 resubmit(,13) 13. ip,metadata=0x25, priority 100, cookie 0xb294b6c7 set_field:0x4000000000000000000000000/0x4000000000000000000000000->xxreg0 resubmit(,14) 14. tcp,reg0=0x4/0x4,metadata=0x25, priority 120, cookie 0x13f53fdd move:NXM_OF_IP_DST[]->NXM_NX_XXREG0[64..95] -> NXM_NX_XXREG0[64..95] is now 0xac1ef0d8 move:NXM_OF_TCP_DST[]->NXM_NX_XXREG0[32..47] -> NXM_NX_XXREG0[32..47] is now 0x1538 ct(table=15,zone=NXM_NX_REG13[0..15],nat) nat -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 15. -> Sets the packet to an untracked state, and clears all the conntrack fields. Final flow: dp_hash=0x1,tcp,reg0=0x5,reg1=0xac1ef0d8,reg2=0x1538,reg11=0x11,reg12=0x4,reg13=0xb,reg14=0xc,metadata=0x25,in_port=279,vlan_tci=0x0000,dl_src=0a:58:0a:80:0b:0b, dl_dst=0a:58:0a:80:0a:01,nw_src=10.128.11.11,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=5432,tcp_flags=0 Megaflow: recirc_id=0,ct_state=-new-est-trk,ct_mark=0/0x2,eth,tcp,in_port=279,dl_src=0a:58:0a:80:0b:0b,dl_dst=0a:58:0a:80:0a:01,nw_src=10.128.11.11,nw_dst=172.30.240.216,nw_ frag=no,tp_dst=5432 Datapath actions: ct(zone=11,nat),recirc(0x29e664) =============================================================================== recirc(0x29e664) - resume conntrack with default ct_state=trk|new (use --ct-next to customize) Replacing src/dst IP/ports to simulate NAT: Initial flow: Modified flow: =============================================================================== Flow: recirc_id=0x29e664,dp_hash=0x1,ct_state=new|trk,ct_zone=11,eth,tcp,reg0=0x5,reg1=0xac1ef0d8,reg2=0x1538,reg11=0x11,reg12=0x4,reg13=0xb,reg14=0xc,metadata=0x25,in_port= 279,vlan_tci=0x0000,dl_src=0a:58:0a:80:0b:0b,dl_dst=0a:58:0a:80:0a:01,nw_src=10.128.11.11,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=5432,tcp_flags=0 bridge("br-int") ---------------- thaw Resuming from table 15 15. ct_state=+new-est+trk,metadata=0x25, priority 7, cookie 0xe75feee6 set_field:0x80000000000000000000000000/0x80000000000000000000000000->xxreg0 set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0 resubmit(,16) 16. ct_state=-est+trk,ip,metadata=0x25, priority 1, cookie 0xfae6dd set_field:0x2000000000000000000000000/0x2000000000000000000000000->xxreg0 resubmit(,17) 17. metadata=0x25, priority 0, cookie 0x78a29936 resubmit(,18) 18. metadata=0x25, priority 0, cookie 0x349041a3 resubmit(,19) 19. ct_state=+new+trk,tcp,metadata=0x25,nw_dst=172.30.240.216,tp_dst=5432, priority 120, cookie 0xf9883682 set_field:0/0x2000000000000000000000000->xxreg0 set_field:0xac1ef0d80000000000000000/0xffffffff0000000000000000->xxreg0 set_field:0x153800000000/0xffff00000000->xxreg0 group:764 -> using bucket 0 bucket 0 ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=10.130.11.209:5432),exec(set_field:0x2/0x2->ct_mark)) nat(dst=10.130.11.209:5432) set_field:0x2/0x2->ct_mark -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 20. -> Sets the packet to an untracked state, and clears all the conntrack fields. Final flow: recirc_id=0x29e664,dp_hash=0x1,ct_state=new|trk,ct_zone=11,eth,tcp,reg0=0x285,reg1=0xac1ef0d8,reg2=0x1538,reg11=0x11,reg12=0x4,reg13=0xb,reg14=0xc,metadata=0x25, in_port=279,vlan_tci=0x0000,dl_src=0a:58:0a:80:0b:0b,dl_dst=0a:58:0a:80:0a:01,nw_src=10.128.11.11,nw_dst=172.30.240.216,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=5432,tcp _flags=0 Megaflow: recirc_id=0x29e664,dp_hash=0x1/0xf,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0/0x1,eth,tcp,in_port=279,dl_dst=0a:58:0a:80:0a:01,nw_dst=172.30.240.216,nw_frag=no,tp _dst=5432 Datapath actions: ct(commit,zone=11,mark=0x2/0x2,nat(dst=10.130.11.209:5432)),recirc(0x29e665) =============================================================================== recirc(0x29e665) - resume conntrack with default ct_state=trk|new (use --ct-next to customize) Replacing src/dst IP/ports to simulate NAT: Initial flow: nw_src=10.128.11.11,tp_src=80,nw_dst=172.30.240.216,tp_dst=5432 Modified flow: nw_src=10.128.11.11,tp_src=80,nw_dst=10.130.11.209,tp_dst=5432 =============================================================================== Flow: recirc_id=0x29e665,dp_hash=0x1,ct_state=new|trk,ct_zone=11,ct_mark=0x2,eth,tcp,reg0=0x285,reg1=0xac1ef0d8,reg2=0x1538,reg11=0x11,reg12=0x4,reg13=0xb,reg14=0xc,metadata =0x25,in_port=279,vlan_tci=0x0000,dl_src=0a:58:0a:80:0b:0b,dl_dst=0a:58:0a:80:0a:01,nw_src=10.128.11.11,nw_dst=10.130.11.209,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=543 2,tcp_flags=0 bridge("br-int") ---------------- thaw Resuming from table 20 20. metadata=0x25, priority 0, cookie 0x9f09375 resubmit(,21) 21. metadata=0x25, priority 0, cookie 0x50b0b39c resubmit(,22) 22. ct_state=+trk,ip,metadata=0x25, priority 100, cookie 0x519b4bb4 set_field:0/0x80->reg10 resubmit(,68) 68. No match. drop move:NXM_NX_REG10[7]->NXM_NX_XXREG0[102] -> NXM_NX_XXREG0[102] is now 0 set_field:0/0x80->reg10 resubmit(,69) 69. No match. drop move:NXM_NX_REG10[7]->NXM_NX_XXREG0[108] -> NXM_NX_XXREG0[108] is now 0 resubmit(,23) 23. metadata=0x25, priority 0, cookie 0xd4bb90f1 resubmit(,24) 24. metadata=0x25, priority 0, cookie 0xd6e5d617 resubmit(,25) 25. metadata=0x25, priority 0, cookie 0xee89779a resubmit(,26) 26. metadata=0x25, priority 0, cookie 0x91e1d014 resubmit(,27) 27. metadata=0x25, priority 0, cookie 0x8356ed05 resubmit(,28) 28. metadata=0x25, priority 0, cookie 0x2eca4761 resubmit(,29) 29. metadata=0x25, priority 0, cookie 0x6f5c3072 resubmit(,30) 30. metadata=0x25, priority 0, cookie 0xcc630300 resubmit(,31) 31. metadata=0x25,dl_dst=0a:58:0a:80:0a:01, priority 50, cookie 0x91b51eb set_field:0x1->reg15 resubmit(,37) 37. priority 0 resubmit(,38) 38. reg15=0x1,metadata=0x25, priority 100, cookie 0x467246cc set_field:0x11->reg11 set_field:0x4->reg12 resubmit(,39) 39. priority 0 set_field:0->reg0 set_field:0->reg1 set_field:0->reg2 set_field:0->reg3 set_field:0->reg4 set_field:0->reg5 set_field:0->reg6 set_field:0->reg7 set_field:0->reg8 set_field:0->reg9 resubmit(,40) 40. ip,reg15=0x1,metadata=0x25, priority 110, cookie 0x906ce058 resubmit(,41) 42. metadata=0x25, priority 0, cookie 0xe663eb55 [85/49116] resubmit(,43) 43. ct_state=+new-est+trk,metadata=0x25, priority 7, cookie 0x60bcc9a set_field:0x80000000000000000000000000/0x80000000000000000000000000->xxreg0 set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0 resubmit(,44) 44. ct_state=-est+trk,ip,metadata=0x25, priority 1, cookie 0x552a3dfb set_field:0x2000000000000000000000000/0x2000000000000000000000000->xxreg0 resubmit(,45) 45. metadata=0x25, priority 0, cookie 0xfe4ec787 resubmit(,46) 46. metadata=0x25, priority 0, cookie 0x57f0e95d resubmit(,47) 47. ip,reg0=0x2/0x2002,metadata=0x25, priority 100, cookie 0x984dd2e0 ct(commit,zone=NXM_NX_REG13[0..15],nat(src),exec(set_field:0/0x1->ct_mark)) nat(src) set_field:0/0x1->ct_mark -> Sets the packet to an untracked state, and clears all the conntrack fields. resubmit(,48) 48. metadata=0x25, priority 0, cookie 0x5d6c21a8 set_field:0/0x1000->reg10 resubmit(,75) 75. No match. drop move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] -> NXM_NX_XXREG0[111] is now 0 resubmit(,49) 49. metadata=0x25, priority 0, cookie 0xc80920c0 resubmit(,64) 64. priority 0 resubmit(,65) 65. reg15=0x1,metadata=0x25, priority 100, cookie 0x467246cc clone(ct_clear,set_field:0->reg11,set_field:0->reg12,set_field:0->reg13,set_field:0xe->reg11,set_field:0x8->reg12,set_field:0x9->metadata,set_field:0x1d->reg14,set_field :0->reg10,set_field:0->reg15,set_field:0->reg0,set_field:0->reg1,set_field:0->reg2,set_field:0->reg3,set_field:0->reg4,set_field:0->reg5,set_field:0->reg6,set_field:0->reg7, set_field:0->reg8,set_field:0->reg9,resubmit(,8)) ct_clear set_field:0->reg11 set_field:0->reg12 set_field:0->reg13 set_field:0xe->reg11 set_field:0x8->reg12 set_field:0x9->metadata set_field:0x1d->reg14 set_field:0->reg10 set_field:0->reg15 set_field:0->reg0 set_field:0->reg1 set_field:0->reg2 set_field:0->reg3 set_field:0->reg4 set_field:0->reg5 set_field:0->reg6 set_field:0->reg7 set_field:0->reg8 set_field:0->reg9 resubmit(,8) 8. reg14=0x1d,metadata=0x9,dl_dst=0a:58:0a:80:0a:01, priority 50, cookie 0x407091d7 set_field:0xa580a800a010000000000000000/0xffffffffffff0000000000000000->xxreg0 resubmit(,9) 9. metadata=0x9, priority 0, cookie 0x3ff8a4a2 set_field:0x4/0x4->xreg4 resubmit(,10) 10. reg9=0x4/0x4,metadata=0x9, priority 100, cookie 0xd45ed856 resubmit(,11) 11. metadata=0x9, priority 0, cookie 0x91bec345 resubmit(,12) 12. metadata=0x9, priority 0, cookie 0xd0af99b6 resubmit(,13) 13. metadata=0x9, priority 0, cookie 0xcd9dc371 resubmit(,14) 14. metadata=0x9, priority 0, cookie 0x78643e06 resubmit(,15) 15. metadata=0x9, priority 0, cookie 0x8dd81c49 resubmit(,16) 16. metadata=0x9, priority 0, cookie 0x118e1fee resubmit(,17) 17. metadata=0x9, priority 0, cookie 0x7ae5e55 resubmit(,18) 18. metadata=0x9, priority 0, cookie 0x6721c07 set_field:0/0xffffffff->xxreg1 resubmit(,19) 19. ip,metadata=0x9,nw_dst=10.130.10.0/23, priority 71, cookie 0xd4629e4c dec_ttl() set_field:0/0xffff00000000->xreg4 move:NXM_OF_IP_DST[]->NXM_NX_XXREG0[96..127] -> NXM_NX_XXREG0[96..127] is now 0xa820bd1 set_field:0xa820a010000000000000000/0xffffffff0000000000000000->xxreg0 set_field:0a:58:0a:82:0a:01->eth_src set_field:0x23->reg15 set_field:0x1/0x1->reg10 resubmit(,20) 20. reg8=0/0xffff,metadata=0x9, priority 150, cookie 0xd9fc79f9 resubmit(,21) 21. ip,metadata=0x9,nw_src=10.128.0.0/14,nw_dst=10.128.0.0/14, priority 101, cookie 0x83c48946 set_field:0/0xffff00000000->xreg4 resubmit(,22) 22. reg8=0/0xffff,metadata=0x9, priority 150, cookie 0xdfdc5b37 resubmit(,23) 23. reg0=0xa820bd1,reg15=0x23,metadata=0x9, priority 100, cookie 0x727f7b3c set_field:0a:58:0a:82:0b:d1->eth_dst resubmit(,24) 24. metadata=0x9, priority 0, cookie 0xf0cb01ae resubmit(,25) 25. metadata=0x9, priority 0, cookie 0x45400e5a resubmit(,26) 26. reg15=0x23,metadata=0x9, priority 50, cookie 0x475902c1 set_field:0x2d->reg15 resubmit(,27) 27. metadata=0x9, priority 0, cookie 0xde98330d resubmit(,37) 37. reg15=0x2d,metadata=0x9, priority 100, cookie 0x73268279 set_field:0x9/0xffffff->tun_id set_field:0x2d->tun_metadata0 move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30] -> NXM_NX_TUN_METADATA0[16..30] is now 0x1d output:6 -> output to kernel tunnel Final flow: recirc_id=0x29e665,dp_hash=0x1,eth,tcp,reg0=0x282,reg11=0x11,reg12=0x4,reg13=0xb,reg14=0xc,reg15=0x1,metadata=0x25,in_port=279,vlan_tci=0x0000,dl_src=0a:58:0a:80:0b:0b,dl_dst=0a:58:0a:80:0a:01,nw_src=10.128.11.11,nw_dst=10.130.11.209,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=80,tp_dst=5432,tcp_flags=0 Megaflow: recirc_id=0x29e665,ct_state=+new-est-rel-rpl-inv+trk,ct_mark=0x2/0x3,eth,tcp,tun_id=0/0xffffff,tun_metadata0=NP,in_port=279,dl_src=0a:58:0a:80:0b:0b,dl_dst=0a:58:0a:80:0a:01,nw_src=10.128.11.8/30,nw_dst=10.130.11.209,nw_ecn=0,nw_ttl=64,nw_frag=no Datapath actions: ct(commit,zone=11,mark=0/0x1,nat(src)),set(tunnel(tun_id=0x9,dst=10.0.178.75,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x1d002d}),flags(df|csum|key))),set(eth(src=0a:58:0a:82:0a:01,dst=0a:58:0a:82:0b:d1)),set(ipv4(ttl=63)),1
Must-gather - http://perf1.perf.lab.eng.bos.redhat.com/pub/mukrishn/bz_2111619/
Seeing long polls on controller as well: 2022-07-27T17:08:50.979Z|01466|timeval|WARN|Unreasonably long 64520ms poll interval (24ms user, 1299ms system) 2022-07-27T21:11:15.415Z|00004|timeval(ovn_pinctrl0)|WARN|Unreasonably long 24356ms poll interval (0ms user, 231ms system)
Which 4.y.z to 4.y'.z' updates increase vulnerability? Which types of clusters? > Any update or new cluster using 4.11.0 with OVN as the SDN provider is vulnerable. What is the impact? Is it serious enough to warrant removing update recommendations? > Under heavy stress and system load or any restart to OVS, which causes ovn-controller to disconnect from OVS, the node may be left in a state where pods can no longer talk to Kubernetes services. Restarting OVS via a node reboot does not cause this problem, and in a typical deployment OVS should not be restarted unless there is administrative intervention. For stress related cases, the node must be under an extreme amount of stress which causes OVS to be blocked from sending its inactivity probe to ovn-controller for over 120 seconds. How involved is remediation? > As a workaround, the user can simply delete the ovnkube-node pod on the node that is having the problem. The problem can be identified by looking at the ovn-controller logs for a specific node for the error messages: 2022-07-29T15:38:07.237Z|01324|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x1ae50): OFPGMFC_GROUP_EXISTS 2022-07-29T15:38:07.237Z|01325|ofctrl|INFO|OpenFlow error: OFPT_ERROR (OF1.5) (xid=0x1ae57): OFPBFC_MSG_FAILED Is this a regression? > Yes, this is a regression introduced in 4.11.0. The problem does not exist in versions prior to 4.11.0.
(In reply to Tim Rozet from comment #10) > Is this a regression? > > Yes, this is a regression introduced in 4.11.0. The problem does not exist in versions prior to 4.11.0. I'm adding the 'Regression' keyword based on this^
Tested on a perfscale environment with https://github.com/openshift/ovn-kubernetes/pull/1223 OVS patch on 4.11.0-rc.6. 200 pods per node as well as 245 pods per node on 24 node cluster were successful, this patch fixes the problem.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.11.1 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6103