Description of problem: two instances in the same subnet are failing to have ping connectivity between each other. Version-Release number of selected component (if applicable): opendaylight-8.0.0-9.el7ost.noarch.rpm How reproducible: sporadically Steps to Reproduce: 1. run this job: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-opendaylight-odl-netvirt-13_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit Actual results: pings should work between all three instances in that same subnet Expected results: ping fails Additional info: in this specific case, this is happening the L2 suite while in a clustered ODL setup. There is no HA activity prior to the test. In other words, there has not been any tests yet where an ODL node is blocked, or rebooted, etc which exercises HA. The artifacts (with the controller logs) are here: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-opendaylight-odl-netvirt-13_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit/26/artifact/ the full robot test log is here: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-opendaylight-odl-netvirt-13_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit/26/robot/report/log.html specifically, the ping failure is here: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-opendaylight-odl-netvirt-13_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit/26/robot/report/log.html#s1-s1-t15-k2-k11-k2-k3-k1-k3 There are flow dumps and other debugs in the robot html log itself.
Please attachj the relevant logs to the BZ.
Created attachment 1435513 [details] nova, neutron, odl logs logs that should be included in the .xz $ tree -s . ├── [ 4096] compute-0 │ └── [ 3889583] nova.log ├── [ 4096] compute-1 │ └── [ 4614407] nova.log ├── [ 4096] controller-0 │ ├── [ 1577605] neutron-dhcp.log │ ├── [ 6972055] neutron.log │ └── [ 23984158] odl.log ├── [ 4096] controller-1 │ ├── [ 1538144] neutron-dhcp.log │ ├── [ 7109172] neutron.log │ └── [ 11252364] odl.log ├── [ 4096] controller-2 │ ├── [ 1550031] neutron-dhcp.log │ ├── [ 7395311] neutron.log │ └── [ 17801402] odl.log └── [ 2212092] logs.tar.xz
Per the following flows (the ones marked in *'s are the ones that would be hit) if one VM sends an arp for the other, the packet will never reach it. It seems the groups are programmed wrong. "mac-address":"fa:16:3e:68:76:0d", "uuid":"ba96abec-9da6-4c5d-9d5e-98976dcf85bd" src "mac-address":"fa:16:3e:de:98:13", "uuid":"9d9a457d-eef6-495c-b0f3-4437cd6abfa3" * table=0, n_packets=172, n_bytes=19023, priority=4,in_port="tap9d9a457d-ee",vlan_tci=0x0000/0x1fff actions=write_metadata:0x1d0000000000/0xffffff0000000001,goto_table:17 table=0, n_packets=381, n_bytes=42702, priority=4,in_port="br-ex-patch",vlan_tci=0x0000/0x1fff actions=write_metadata:0x1e0000000001/0xffffff0000000001,goto_table:17 table=0, n_packets=301, n_bytes=34610, priority=10,in_port="br-ex-patch",dl_vlan=1121 actions=pop_vlan,write_metadata:0x1f0000000001/0xffffff0000000001,goto_table:17 table=0, n_packets=121, n_bytes=12466, priority=4,in_port="tapba96abec-9d",vlan_tci=0x0000/0x1fff actions=write_metadata:0x240000000000/0xffffff0000000001,goto_table:17 table=0, n_packets=693, n_bytes=66080, priority=5,in_port=tunc279f9655c4 actions=write_metadata:0x40000000001/0xfffff0000000001,goto_table:36 table=0, n_packets=820, n_bytes=77796, priority=5,in_port=tun221c130bbdd actions=write_metadata:0x60000000001/0xfffff0000000001,goto_table:36 table=0, n_packets=662, n_bytes=62595, priority=5,in_port=tunca481192bbe actions=write_metadata:0x80000000001/0xfffff0000000001,goto_table:36 table=0, n_packets=660, n_bytes=62463, priority=5,in_port=tund59175be953 actions=write_metadata:0xd0000000001/0xfffff0000000001,goto_table:36 * table=17, n_packets=170, n_bytes=18855, priority=10,metadata=0x1d0000000000/0xffffff0000000000 actions=load:0x1d->NXM_NX_REG1[0..19],load:0x138a->NXM_NX_REG7[0..15],write_metadata:0xa0001d138a000000/0xfffffffffffffffe,goto_table:43 table=17, n_packets=300, n_bytes=34516, priority=10,metadata=0x1f0000000000/0xffffff0000000000 actions=load:0x1f->NXM_NX_REG1[0..19],load:0x138a->NXM_NX_REG7[0..15],write_metadata:0xa0001f138a000000/0xfffffffffffffffe,goto_table:43 table=17, n_packets=119, n_bytes=12298, priority=10,metadata=0x240000000000/0xffffff0000000000 actions=load:0x24->NXM_NX_REG1[0..19],load:0x138b->NXM_NX_REG7[0..15],write_metadata:0xa00024138b000000/0xfffffffffffffffe,goto_table:43 table=17, n_packets=0, n_bytes=0, priority=0,metadata=0x8000000000000000/0xf000000000000000 actions=write_metadata:0x9000000000000000/0xf000000000000000,goto_table:80 table=18, n_packets=0, n_bytes=0, priority=0 actions=goto_table:38 table=19, n_packets=0, n_bytes=0, priority=100,arp,arp_op=1 actions=resubmit(,17) table=19, n_packets=0, n_bytes=0, priority=100,arp,arp_op=2 actions=resubmit(,17) table=19, n_packets=156, n_bytes=17472, priority=0 actions=resubmit(,17) table=20, n_packets=0, n_bytes=0, priority=0 actions=goto_table:80 table=22, n_packets=0, n_bytes=0, priority=0 actions=CONTROLLER:65535 table=23, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=24, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=36, n_packets=197, n_bytes=17612, priority=5,tun_id=0x2c actions=write_metadata:0x138b000000/0xfffffffff000000,goto_table:51 table=38, n_packets=0, n_bytes=0, priority=5,tun_id=0x2c actions=write_metadata:0x138b000000/0xfffffffff000000,goto_table:51 * table=43, n_packets=50, n_bytes=2268, priority=100,arp,arp_op=1 actions=group:5500 [group_id=5500,type=all,bucket=actions=CONTROLLER:65535,bucket=actions=resubmit(,48),bucket=actions=resubmit(,81)] table=43, n_packets=15, n_bytes=642, priority=100,arp,arp_op=2 actions=CONTROLLER:65535,resubmit(,48) table=43, n_packets=680, n_bytes=80231, priority=0 actions=goto_table:48 table=45, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) * table=48, n_packets=745, n_bytes=83141, priority=0 actions=resubmit(,49),resubmit(,50) * table=50, n_packets=170, n_bytes=18855, priority=20,metadata=0x1d138a000000/0xfffffffff000000,dl_src=fa:16:3e:de:98:13 actions=goto_table:51 table=50, n_packets=119, n_bytes=12298, priority=20,metadata=0x24138b000000/0xfffffffff000000,dl_src=fa:16:3e:68:76:0d actions=goto_table:51 table=50, n_packets=139, n_bytes=18948, idle_timeout=300, send_flow_rem priority=20,metadata=0x1f138a000000/0xfffffffff000000,dl_src=fa:16:3e:1e:44:25 actions=goto_table:51 table=50, n_packets=32, n_bytes=2600, idle_timeout=300, send_flow_rem priority=20,metadata=0x1f138a000000/0xfffffffff000000,dl_src=fa:16:3e:bb:b5:60 actions=goto_table:51 table=50, n_packets=123, n_bytes=11726, idle_timeout=300, send_flow_rem priority=20,metadata=0x1f138a000000/0xfffffffff000000,dl_src=fa:16:3e:6e:7b:c5 actions=goto_table:51 table=50, n_packets=1, n_bytes=46, priority=10,reg4=0x1 actions=goto_table:51 table=50, n_packets=8, n_bytes=1550, priority=0 actions=CONTROLLER:65535,learn(table=49,hard_timeout=10,priority=0, cookie=0x8600000,NXM_OF_ETH_SRC[],NXM_NX_REG1[0..19],load:0x1->NXM_NX_REG4[0..7]),goto_table:51 table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x1389000000/0xffff000000,dl_dst=f2:a9:ab:ed:db:72 actions=load:0x1800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x1389000000/0xffff000000,dl_dst=d2:10:06:09:9e:de actions=load:0x1800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x1389000000/0xffff000000,dl_dst=22:22:5c:7c:8d:bb actions=load:0x1800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x1389000000/0xffff000000,dl_dst=76:b9:e1:75:52:3a actions=load:0x1800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=241, n_bytes=29570, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:de:98:13 actions=load:0x1d00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:ed:7c:ca actions=drop table=51, n_packets=140, n_bytes=12938, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:68:76:0d actions=load:0x2400->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:87:11:4f actions=set_field:0x2c->tun_id,load:0xd00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:84:35:b8 actions=set_field:0x2c->tun_id,load:0x800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=71, n_bytes=8437, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:1e:44:25 actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=18, n_bytes=1798, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:bb:b5:60 actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=79, n_bytes=8440, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:6e:7b:c5 actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=18, n_bytes=1798, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:c9:ce:f3 actions=set_field:0x2c->tun_id,load:0x600->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=18, n_bytes=1798, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:5f:8a:e4 actions=set_field:0x2c->tun_id,load:0x400->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=81, n_bytes=8522, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:89:b3:0f actions=set_field:0x2c->tun_id,load:0x600->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=15,dl_dst=01:80:c2:00:00:00/ff:ff:ff:ff:ff:f0 actions=drop * table=51, n_packets=276, n_bytes=27452, priority=0 actions=goto_table:52 * table=52, n_packets=2, n_bytes=180, priority=5,metadata=0x138a000000/0xffff000001 actions=write_actions(group:210004) [group_id=210004,type=all,bucket=actions=group:210003,bucket=actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220)] table=52, n_packets=59, n_bytes=4946, priority=5,metadata=0x138a000001/0xffff000001 actions=write_actions(group:210003) [group_id=210003,type=all,bucket=actions=set_field:0x1d->tun_id,resubmit(,55)] table=52, n_packets=57, n_bytes=4674, priority=5,metadata=0x138b000001/0xffff000001 actions=write_actions(group:210005) table=52, n_packets=2, n_bytes=180, priority=5,metadata=0x138b000000/0xffff000001 actions=write_actions(group:210006) table=55, n_packets=2, n_bytes=180, priority=10,tun_id=0x1d,metadata=0x1d0000000000/0xfffff0000000000 actions=drop table=55, n_packets=0, n_bytes=0, priority=10,tun_id=0x1f,metadata=0x1f0000000000/0xfffff0000000000 actions=drop table=55, n_packets=2, n_bytes=180, priority=10,tun_id=0x24,metadata=0x240000000000/0xfffff0000000000 actions=drop * table=55, n_packets=59, n_bytes=4946, priority=9,tun_id=0x1d actions=load:0x1d00->NXM_NX_REG6[],resubmit(,220) table=55, n_packets=0, n_bytes=0, priority=9,tun_id=0x1f actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220) table=55, n_packets=57, n_bytes=4674, priority=9,tun_id=0x24 actions=load:0x2400->NXM_NX_REG6[],resubmit(,220) table=60, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=80, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=81, n_packets=50, n_bytes=2268, priority=0 actions=drop table=90, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=210, n_packets=0, n_bytes=0, priority=63009,arp actions=drop table=210, n_packets=0, n_bytes=0, priority=61009,ipv6 actions=drop table=210, n_packets=0, n_bytes=0, priority=61009,ip actions=drop table=210, n_packets=0, n_bytes=0, priority=0 actions=drop table=211, n_packets=0, n_bytes=0, priority=100,tcp actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,udp actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,icmp6 actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,tcp6 actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,udp6 actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,icmp actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=0 actions=write_metadata:0x2/0x2,goto_table:214 table=212, n_packets=0, n_bytes=0, priority=0 actions=drop table=213, n_packets=0, n_bytes=0, priority=0 actions=goto_table:214 table=214, n_packets=0, n_bytes=0, priority=62030,ct_state=-new+est-rel-inv+trk,ct_mark=0x1/0x1 actions=ct_clear,resubmit(,17) table=214, n_packets=0, n_bytes=0, priority=62030,ct_state=-new-est+rel-inv+trk,ct_mark=0x1/0x1 actions=ct_clear,resubmit(,17) table=214, n_packets=0, n_bytes=0, priority=62030,ct_state=-trk actions=ct_clear,resubmit(,242) table=214, n_packets=0, n_bytes=0, priority=0 actions=drop table=215, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,214) table=216, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,214) table=217, n_packets=0, n_bytes=0, priority=0 actions=drop table=220, n_packets=0, n_bytes=0, priority=10,reg6=0x1f00,metadata=0x1/0x1 actions=drop table=220, n_packets=0, n_bytes=0, priority=10,reg6=0x1e00,metadata=0x1/0x1 actions=drop table=220, n_packets=18, n_bytes=1798, priority=9,reg6=0x400 actions=output:tunc279f9655c4 table=220, n_packets=99, n_bytes=10320, priority=9,reg6=0x600 actions=output:tun221c130bbdd table=220, n_packets=1, n_bytes=90, priority=9,reg6=0x800 actions=output:tunca481192bbe table=220, n_packets=2, n_bytes=180, priority=9,reg6=0xd00 actions=output:tund59175be953 * table=220, n_packets=300, n_bytes=34516, priority=9,reg6=0x1d00 actions=output:"tap9d9a457d-ee" * table=220, n_packets=170, n_bytes=18855, priority=9,reg6=0x1f00 actions=push_vlan:0x8100,set_field:5217->vlan_vid,output:"br-ex-patch" table=220, n_packets=0, n_bytes=0, priority=9,reg6=0x1e00 actions=output:"br-ex-patch" table=220, n_packets=197, n_bytes=17612, priority=9,reg6=0x2400 actions=output:"tapba96abec-9d" table=239, n_packets=0, n_bytes=0, priority=100,ip actions=ct_clear,goto_table:240 table=239, n_packets=0, n_bytes=0, priority=100,ipv6 actions=ct_clear,goto_table:240 table=239, n_packets=0, n_bytes=0, priority=0 actions=goto_table:240 table=240, n_packets=0, n_bytes=0, priority=61010,ip,dl_dst=ff:ff:ff:ff:ff:ff,nw_dst=255.255.255.255 actions=goto_table:241 table=240, n_packets=0, n_bytes=0, priority=61005,dl_dst=ff:ff:ff:ff:ff:ff actions=resubmit(,220) table=240, n_packets=0, n_bytes=0, priority=0 actions=drop table=241, n_packets=0, n_bytes=0, priority=100,icmp6 actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,tcp actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,tcp6 actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,udp actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,udp6 actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,icmp actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=0 actions=write_metadata:0x2/0x2,goto_table:244 table=242, n_packets=0, n_bytes=0, priority=0 actions=drop table=243, n_packets=0, n_bytes=0, priority=0 actions=goto_table:244 table=244, n_packets=0, n_bytes=0, priority=62030,ct_state=-trk actions=ct_clear,resubmit(,242) table=244, n_packets=0, n_bytes=0, priority=62030,ct_state=-new+est-rel-inv+trk,ct_mark=0x1/0x1 actions=ct_clear,resubmit(,220) table=244, n_packets=0, n_bytes=0, priority=62030,ct_state=-new-est+rel-inv+trk,ct_mark=0x1/0x1 actions=ct_clear,resubmit(,220) table=244, n_packets=0, n_bytes=0, priority=0 actions=drop table=245, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,244) table=246, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,244) table=247, n_packets=0, n_bytes=0, priority=0 actions=drop [heat-admin@controller-0 ~]$
After multiple attempts to reproduce this in equivalent environments have failed, we are currently working under the assumption that this is some sort of weird timing thing that occurs sometimes in CSIT. I've reduced the severity and the urgency of the bug accordingly.
Trying to reproduce this bug with increased log levels. Will create a temporary job in downstream tomorrow if staging won't be stable enough.
Waldek, I really need this reproduced with DEBUG logs. It does not reproduce in my development environment. In general, CSIT *must* be able to produce debug logs.
Josh, A recent d/s CSIT CI job is at https://url.corp.redhat.com/b96f3ee. You can download any of controller*.tar.gz and go to /var/log/containers/opendaylight to find multiple karaf.log files (too big to attach here in bzilla). Make your pick and ping me if need a hand.
And since we've enabled ODL to mount /var/log/opendaylight from within the container to the host (so we can capture these logs in CI) we're getting the DEBUG log entries from now on as part of karaf.logs so debugging should be way easier from now on.
I believe we've just hit it in https://url.corp.redhat.com/fc854b9 in downstream OSP CI puddle: 2018-08-16.1, opendaylight-8.3.0-3 see robot/karaf logs attached with today's date
Created attachment 1477208 [details] robot log 1of2
Created attachment 1477209 [details] robot log 2of2
Created attachment 1477210 [details] controller-0 karaf logs
Created attachment 1477211 [details] controller-1 karaf logs
Created attachment 1477212 [details] controller-2 karaf logs
This issue is no longer occurring. Closing.