Bug 1574725
| Summary: | [Deployment][CI] instance to instance connectivity failing in HA setup (CSIT) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | jamo luhrsen <jluhrsen> | ||||||||||||||
| Component: | opendaylight | Assignee: | Josh Hershberg <jhershbe> | ||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | Itzik Brown <itbrown> | ||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||
| Priority: | medium | ||||||||||||||||
| Version: | 13.0 (Queens) | CC: | aadam, jluhrsen, mkolesni, nyechiel, oblaut, tjamrisk, wznoinsk | ||||||||||||||
| Target Milestone: | z5 | Keywords: | Triaged, ZStream | ||||||||||||||
| Target Release: | 13.0 (Queens) | ||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||
| OS: | Unspecified | ||||||||||||||||
| Whiteboard: | Deployment | ||||||||||||||||
| Fixed In Version: | Doc Type: | Known Issue | |||||||||||||||
| Doc Text: |
When multiple VMs in the same subnet of a VLAN provider network are scheduled to two different Compute nodes, ARP between the VMs fails sporadically.
Since ARP packets between those VMs fails, there is essentially no networking between the two VMs.
|
Story Points: | --- | ||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2018-10-10 05:47:35 UTC | Type: | Bug | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Embargoed: | |||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
jamo luhrsen
2018-05-03 21:41:46 UTC
Please attachj the relevant logs to the BZ. Created attachment 1435513 [details]
nova, neutron, odl logs
logs that should be included in the .xz
$ tree -s
.
├── [ 4096] compute-0
│ └── [ 3889583] nova.log
├── [ 4096] compute-1
│ └── [ 4614407] nova.log
├── [ 4096] controller-0
│ ├── [ 1577605] neutron-dhcp.log
│ ├── [ 6972055] neutron.log
│ └── [ 23984158] odl.log
├── [ 4096] controller-1
│ ├── [ 1538144] neutron-dhcp.log
│ ├── [ 7109172] neutron.log
│ └── [ 11252364] odl.log
├── [ 4096] controller-2
│ ├── [ 1550031] neutron-dhcp.log
│ ├── [ 7395311] neutron.log
│ └── [ 17801402] odl.log
└── [ 2212092] logs.tar.xz
Per the following flows (the ones marked in *'s are the ones that would be hit) if one VM sends an arp for the other, the packet will never reach it. It seems the groups are programmed wrong. "mac-address":"fa:16:3e:68:76:0d", "uuid":"ba96abec-9da6-4c5d-9d5e-98976dcf85bd" src "mac-address":"fa:16:3e:de:98:13", "uuid":"9d9a457d-eef6-495c-b0f3-4437cd6abfa3" * table=0, n_packets=172, n_bytes=19023, priority=4,in_port="tap9d9a457d-ee",vlan_tci=0x0000/0x1fff actions=write_metadata:0x1d0000000000/0xffffff0000000001,goto_table:17 table=0, n_packets=381, n_bytes=42702, priority=4,in_port="br-ex-patch",vlan_tci=0x0000/0x1fff actions=write_metadata:0x1e0000000001/0xffffff0000000001,goto_table:17 table=0, n_packets=301, n_bytes=34610, priority=10,in_port="br-ex-patch",dl_vlan=1121 actions=pop_vlan,write_metadata:0x1f0000000001/0xffffff0000000001,goto_table:17 table=0, n_packets=121, n_bytes=12466, priority=4,in_port="tapba96abec-9d",vlan_tci=0x0000/0x1fff actions=write_metadata:0x240000000000/0xffffff0000000001,goto_table:17 table=0, n_packets=693, n_bytes=66080, priority=5,in_port=tunc279f9655c4 actions=write_metadata:0x40000000001/0xfffff0000000001,goto_table:36 table=0, n_packets=820, n_bytes=77796, priority=5,in_port=tun221c130bbdd actions=write_metadata:0x60000000001/0xfffff0000000001,goto_table:36 table=0, n_packets=662, n_bytes=62595, priority=5,in_port=tunca481192bbe actions=write_metadata:0x80000000001/0xfffff0000000001,goto_table:36 table=0, n_packets=660, n_bytes=62463, priority=5,in_port=tund59175be953 actions=write_metadata:0xd0000000001/0xfffff0000000001,goto_table:36 * table=17, n_packets=170, n_bytes=18855, priority=10,metadata=0x1d0000000000/0xffffff0000000000 actions=load:0x1d->NXM_NX_REG1[0..19],load:0x138a->NXM_NX_REG7[0..15],write_metadata:0xa0001d138a000000/0xfffffffffffffffe,goto_table:43 table=17, n_packets=300, n_bytes=34516, priority=10,metadata=0x1f0000000000/0xffffff0000000000 actions=load:0x1f->NXM_NX_REG1[0..19],load:0x138a->NXM_NX_REG7[0..15],write_metadata:0xa0001f138a000000/0xfffffffffffffffe,goto_table:43 table=17, n_packets=119, n_bytes=12298, priority=10,metadata=0x240000000000/0xffffff0000000000 actions=load:0x24->NXM_NX_REG1[0..19],load:0x138b->NXM_NX_REG7[0..15],write_metadata:0xa00024138b000000/0xfffffffffffffffe,goto_table:43 table=17, n_packets=0, n_bytes=0, priority=0,metadata=0x8000000000000000/0xf000000000000000 actions=write_metadata:0x9000000000000000/0xf000000000000000,goto_table:80 table=18, n_packets=0, n_bytes=0, priority=0 actions=goto_table:38 table=19, n_packets=0, n_bytes=0, priority=100,arp,arp_op=1 actions=resubmit(,17) table=19, n_packets=0, n_bytes=0, priority=100,arp,arp_op=2 actions=resubmit(,17) table=19, n_packets=156, n_bytes=17472, priority=0 actions=resubmit(,17) table=20, n_packets=0, n_bytes=0, priority=0 actions=goto_table:80 table=22, n_packets=0, n_bytes=0, priority=0 actions=CONTROLLER:65535 table=23, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=24, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=36, n_packets=197, n_bytes=17612, priority=5,tun_id=0x2c actions=write_metadata:0x138b000000/0xfffffffff000000,goto_table:51 table=38, n_packets=0, n_bytes=0, priority=5,tun_id=0x2c actions=write_metadata:0x138b000000/0xfffffffff000000,goto_table:51 * table=43, n_packets=50, n_bytes=2268, priority=100,arp,arp_op=1 actions=group:5500 [group_id=5500,type=all,bucket=actions=CONTROLLER:65535,bucket=actions=resubmit(,48),bucket=actions=resubmit(,81)] table=43, n_packets=15, n_bytes=642, priority=100,arp,arp_op=2 actions=CONTROLLER:65535,resubmit(,48) table=43, n_packets=680, n_bytes=80231, priority=0 actions=goto_table:48 table=45, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) * table=48, n_packets=745, n_bytes=83141, priority=0 actions=resubmit(,49),resubmit(,50) * table=50, n_packets=170, n_bytes=18855, priority=20,metadata=0x1d138a000000/0xfffffffff000000,dl_src=fa:16:3e:de:98:13 actions=goto_table:51 table=50, n_packets=119, n_bytes=12298, priority=20,metadata=0x24138b000000/0xfffffffff000000,dl_src=fa:16:3e:68:76:0d actions=goto_table:51 table=50, n_packets=139, n_bytes=18948, idle_timeout=300, send_flow_rem priority=20,metadata=0x1f138a000000/0xfffffffff000000,dl_src=fa:16:3e:1e:44:25 actions=goto_table:51 table=50, n_packets=32, n_bytes=2600, idle_timeout=300, send_flow_rem priority=20,metadata=0x1f138a000000/0xfffffffff000000,dl_src=fa:16:3e:bb:b5:60 actions=goto_table:51 table=50, n_packets=123, n_bytes=11726, idle_timeout=300, send_flow_rem priority=20,metadata=0x1f138a000000/0xfffffffff000000,dl_src=fa:16:3e:6e:7b:c5 actions=goto_table:51 table=50, n_packets=1, n_bytes=46, priority=10,reg4=0x1 actions=goto_table:51 table=50, n_packets=8, n_bytes=1550, priority=0 actions=CONTROLLER:65535,learn(table=49,hard_timeout=10,priority=0, cookie=0x8600000,NXM_OF_ETH_SRC[],NXM_NX_REG1[0..19],load:0x1->NXM_NX_REG4[0..7]),goto_table:51 table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x1389000000/0xffff000000,dl_dst=f2:a9:ab:ed:db:72 actions=load:0x1800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x1389000000/0xffff000000,dl_dst=d2:10:06:09:9e:de actions=load:0x1800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x1389000000/0xffff000000,dl_dst=22:22:5c:7c:8d:bb actions=load:0x1800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x1389000000/0xffff000000,dl_dst=76:b9:e1:75:52:3a actions=load:0x1800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=241, n_bytes=29570, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:de:98:13 actions=load:0x1d00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:ed:7c:ca actions=drop table=51, n_packets=140, n_bytes=12938, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:68:76:0d actions=load:0x2400->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:87:11:4f actions=set_field:0x2c->tun_id,load:0xd00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:84:35:b8 actions=set_field:0x2c->tun_id,load:0x800->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=71, n_bytes=8437, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:1e:44:25 actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=18, n_bytes=1798, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:bb:b5:60 actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=79, n_bytes=8440, priority=20,metadata=0x138a000000/0xffff000000,dl_dst=fa:16:3e:6e:7b:c5 actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=18, n_bytes=1798, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:c9:ce:f3 actions=set_field:0x2c->tun_id,load:0x600->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=18, n_bytes=1798, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:5f:8a:e4 actions=set_field:0x2c->tun_id,load:0x400->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=81, n_bytes=8522, priority=20,metadata=0x138b000000/0xffff000000,dl_dst=fa:16:3e:89:b3:0f actions=set_field:0x2c->tun_id,load:0x600->NXM_NX_REG6[],resubmit(,220) table=51, n_packets=0, n_bytes=0, priority=15,dl_dst=01:80:c2:00:00:00/ff:ff:ff:ff:ff:f0 actions=drop * table=51, n_packets=276, n_bytes=27452, priority=0 actions=goto_table:52 * table=52, n_packets=2, n_bytes=180, priority=5,metadata=0x138a000000/0xffff000001 actions=write_actions(group:210004) [group_id=210004,type=all,bucket=actions=group:210003,bucket=actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220)] table=52, n_packets=59, n_bytes=4946, priority=5,metadata=0x138a000001/0xffff000001 actions=write_actions(group:210003) [group_id=210003,type=all,bucket=actions=set_field:0x1d->tun_id,resubmit(,55)] table=52, n_packets=57, n_bytes=4674, priority=5,metadata=0x138b000001/0xffff000001 actions=write_actions(group:210005) table=52, n_packets=2, n_bytes=180, priority=5,metadata=0x138b000000/0xffff000001 actions=write_actions(group:210006) table=55, n_packets=2, n_bytes=180, priority=10,tun_id=0x1d,metadata=0x1d0000000000/0xfffff0000000000 actions=drop table=55, n_packets=0, n_bytes=0, priority=10,tun_id=0x1f,metadata=0x1f0000000000/0xfffff0000000000 actions=drop table=55, n_packets=2, n_bytes=180, priority=10,tun_id=0x24,metadata=0x240000000000/0xfffff0000000000 actions=drop * table=55, n_packets=59, n_bytes=4946, priority=9,tun_id=0x1d actions=load:0x1d00->NXM_NX_REG6[],resubmit(,220) table=55, n_packets=0, n_bytes=0, priority=9,tun_id=0x1f actions=load:0x1f00->NXM_NX_REG6[],resubmit(,220) table=55, n_packets=57, n_bytes=4674, priority=9,tun_id=0x24 actions=load:0x2400->NXM_NX_REG6[],resubmit(,220) table=60, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=80, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=81, n_packets=50, n_bytes=2268, priority=0 actions=drop table=90, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,17) table=210, n_packets=0, n_bytes=0, priority=63009,arp actions=drop table=210, n_packets=0, n_bytes=0, priority=61009,ipv6 actions=drop table=210, n_packets=0, n_bytes=0, priority=61009,ip actions=drop table=210, n_packets=0, n_bytes=0, priority=0 actions=drop table=211, n_packets=0, n_bytes=0, priority=100,tcp actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,udp actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,icmp6 actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,tcp6 actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,udp6 actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=100,icmp actions=write_metadata:0/0x2,goto_table:212 table=211, n_packets=0, n_bytes=0, priority=0 actions=write_metadata:0x2/0x2,goto_table:214 table=212, n_packets=0, n_bytes=0, priority=0 actions=drop table=213, n_packets=0, n_bytes=0, priority=0 actions=goto_table:214 table=214, n_packets=0, n_bytes=0, priority=62030,ct_state=-new+est-rel-inv+trk,ct_mark=0x1/0x1 actions=ct_clear,resubmit(,17) table=214, n_packets=0, n_bytes=0, priority=62030,ct_state=-new-est+rel-inv+trk,ct_mark=0x1/0x1 actions=ct_clear,resubmit(,17) table=214, n_packets=0, n_bytes=0, priority=62030,ct_state=-trk actions=ct_clear,resubmit(,242) table=214, n_packets=0, n_bytes=0, priority=0 actions=drop table=215, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,214) table=216, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,214) table=217, n_packets=0, n_bytes=0, priority=0 actions=drop table=220, n_packets=0, n_bytes=0, priority=10,reg6=0x1f00,metadata=0x1/0x1 actions=drop table=220, n_packets=0, n_bytes=0, priority=10,reg6=0x1e00,metadata=0x1/0x1 actions=drop table=220, n_packets=18, n_bytes=1798, priority=9,reg6=0x400 actions=output:tunc279f9655c4 table=220, n_packets=99, n_bytes=10320, priority=9,reg6=0x600 actions=output:tun221c130bbdd table=220, n_packets=1, n_bytes=90, priority=9,reg6=0x800 actions=output:tunca481192bbe table=220, n_packets=2, n_bytes=180, priority=9,reg6=0xd00 actions=output:tund59175be953 * table=220, n_packets=300, n_bytes=34516, priority=9,reg6=0x1d00 actions=output:"tap9d9a457d-ee" * table=220, n_packets=170, n_bytes=18855, priority=9,reg6=0x1f00 actions=push_vlan:0x8100,set_field:5217->vlan_vid,output:"br-ex-patch" table=220, n_packets=0, n_bytes=0, priority=9,reg6=0x1e00 actions=output:"br-ex-patch" table=220, n_packets=197, n_bytes=17612, priority=9,reg6=0x2400 actions=output:"tapba96abec-9d" table=239, n_packets=0, n_bytes=0, priority=100,ip actions=ct_clear,goto_table:240 table=239, n_packets=0, n_bytes=0, priority=100,ipv6 actions=ct_clear,goto_table:240 table=239, n_packets=0, n_bytes=0, priority=0 actions=goto_table:240 table=240, n_packets=0, n_bytes=0, priority=61010,ip,dl_dst=ff:ff:ff:ff:ff:ff,nw_dst=255.255.255.255 actions=goto_table:241 table=240, n_packets=0, n_bytes=0, priority=61005,dl_dst=ff:ff:ff:ff:ff:ff actions=resubmit(,220) table=240, n_packets=0, n_bytes=0, priority=0 actions=drop table=241, n_packets=0, n_bytes=0, priority=100,icmp6 actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,tcp actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,tcp6 actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,udp actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,udp6 actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=100,icmp actions=write_metadata:0/0x2,goto_table:242 table=241, n_packets=0, n_bytes=0, priority=0 actions=write_metadata:0x2/0x2,goto_table:244 table=242, n_packets=0, n_bytes=0, priority=0 actions=drop table=243, n_packets=0, n_bytes=0, priority=0 actions=goto_table:244 table=244, n_packets=0, n_bytes=0, priority=62030,ct_state=-trk actions=ct_clear,resubmit(,242) table=244, n_packets=0, n_bytes=0, priority=62030,ct_state=-new+est-rel-inv+trk,ct_mark=0x1/0x1 actions=ct_clear,resubmit(,220) table=244, n_packets=0, n_bytes=0, priority=62030,ct_state=-new-est+rel-inv+trk,ct_mark=0x1/0x1 actions=ct_clear,resubmit(,220) table=244, n_packets=0, n_bytes=0, priority=0 actions=drop table=245, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,244) table=246, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,244) table=247, n_packets=0, n_bytes=0, priority=0 actions=drop [heat-admin@controller-0 ~]$ After multiple attempts to reproduce this in equivalent environments have failed, we are currently working under the assumption that this is some sort of weird timing thing that occurs sometimes in CSIT. I've reduced the severity and the urgency of the bug accordingly. Trying to reproduce this bug with increased log levels. Will create a temporary job in downstream tomorrow if staging won't be stable enough. Waldek, I really need this reproduced with DEBUG logs. It does not reproduce in my development environment. In general, CSIT *must* be able to produce debug logs. Josh, A recent d/s CSIT CI job is at https://url.corp.redhat.com/b96f3ee. You can download any of controller*.tar.gz and go to /var/log/containers/opendaylight to find multiple karaf.log files (too big to attach here in bzilla). Make your pick and ping me if need a hand. And since we've enabled ODL to mount /var/log/opendaylight from within the container to the host (so we can capture these logs in CI) we're getting the DEBUG log entries from now on as part of karaf.logs so debugging should be way easier from now on. I believe we've just hit it in https://url.corp.redhat.com/fc854b9 in downstream OSP CI puddle: 2018-08-16.1, opendaylight-8.3.0-3 see robot/karaf logs attached with today's date Created attachment 1477208 [details]
robot log 1of2
Created attachment 1477209 [details]
robot log 2of2
Created attachment 1477210 [details]
controller-0 karaf logs
Created attachment 1477211 [details]
controller-1 karaf logs
Created attachment 1477212 [details]
controller-2 karaf logs
This issue is no longer occurring. Closing. |