Assigning to the development branch to investigate. We will consider the backport when the issue is understood.
Hi I am going to close this bug as INSUFFICIENT_DATA. The reason is: we don't know exactly what happened as to trigger the bug. Our internal analysis of the problem resulted in the following: We think we've understood the issue here. It seems that the reason nothing worked on node: oapprown02.oap-011.oappro.jp was because there were not pod flows configured in OVS. This is taken from the sosreport provided here: https://access.redhat.com/support/cases/#/case/02787947/discussion?attachmentId=a092K00002CpDibQAF cat sosreport-oapprown02-2020-10-28-qjgjxtq/sos_commands/openvswitch/ovs-ofctl_-O_OpenFlow13_dump-flows_br0 OFPST_FLOW reply (OF1.3) (xid=0x2): cookie=0x4d6ac4d9, duration=11830.943s, table=10, n_packets=0, n_bytes=0, priority=100,tun_src=10.3.209.24 actions=goto_table:30 cookie=0x21876511, duration=11830.905s, table=10, n_packets=0, n_bytes=0, priority=100,tun_src=10.3.209.25 actions=goto_table:30 cookie=0x3e01f207, duration=11830.867s, table=10, n_packets=0, n_bytes=0, priority=100,tun_src=10.3.209.26 actions=goto_table:30 cookie=0xffafcc08, duration=11830.829s, table=10, n_packets=0, n_bytes=0, priority=100,tun_src=10.3.209.34 actions=goto_table:30 cookie=0x51badde4, duration=11830.791s, table=10, n_packets=0, n_bytes=0, priority=100,tun_src=10.3.209.36 actions=goto_table:30 cookie=0x4d6ac4d9, duration=11830.943s, table=50, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.128.0.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.24->tun_dst,output:1 cookie=0x21876511, duration=11830.905s, table=50, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.129.0.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.25->tun_dst,output:1 cookie=0x3e01f207, duration=11830.867s, table=50, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.130.0.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.26->tun_dst,output:1 cookie=0xffafcc08, duration=11830.829s, table=50, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.128.2.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.34->tun_dst,output:1 cookie=0x51badde4, duration=11830.791s, table=50, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.129.2.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.36->tun_dst,output:1 cookie=0x4d6ac4d9, duration=11830.943s, table=90, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.128.0.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.24->tun_dst,output:1 cookie=0x21876511, duration=11830.905s, table=90, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.129.0.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.25->tun_dst,output:1 cookie=0x3e01f207, duration=11830.867s, table=90, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.130.0.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.26->tun_dst,output:1 cookie=0xffafcc08, duration=11830.829s, table=90, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.128.2.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.34->tun_dst,output:1 cookie=0x51badde4, duration=11830.791s, table=90, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.129.2.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.36->tun_dst,output:1 cookie=0x0, duration=11830.772s, table=111, n_packets=0, n_bytes=0, priority=100 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.3.209.24->tun_dst,output:1,set_field:10.3.209.25->tun_dst,output:1,set_field:10.3.209.26->tun_dst,output:1,set_field:10.3.209.34->tun_dst,output:1,set_field:10.3.209.36->tun_dst,output:1,goto_table:120 cookie=0x0, duration=11831.211s, table=253, n_packets=0, n_bytes=0, actions=note:02.07.00.00.00.00 That shows that each nodes subnet flows are configured as it should on oapprown02.oap-011.oappro.jp, however we are missing all pod flows of the existing pods running on this node. In openshift-sdn we however do not check that pod flows are existing when determining if we need to perform a new setup. In this case, for example, we determined that we did not need to. This was because, according to the existing conditions we specify, it was setup. We thus, subsequently did not proceed to setting up the flows for the existing pods on that node. I have no way of determining why the pods flows are missing but everything else is existing. This is because neither the sosreport nor must-gather do not contain any more SDN / OVS logs for the previous runs. Something might have ended up deleting the pod flows during the upgrade, but I have no conclusive indications of this. Given that we don't know the root cause, we can't say for sure that the fix linked in this BZ will fix the problem completely. It will in fact most likely not do that, but instead hide the real problem from ever happening again (next time maybe with more logs to investigate further). I am thus closing the bug and the PR since that will not merge. Thanks in advance, Alexander