+++ This bug was initially created as a clone of Bug #1974061 +++ Description of problem: Instance is not able to get metadata on creation. SSH to the instance is not working. Version-Release number of selected component (if applicable): RHOS-16.2-RHEL-8-20210610.n.1 How reproducible: Happens very often, mainly on SR-IOV environment Steps to Reproduce: 1. Deploy SR-IOV environment, make sure that external network exist. 2. Create a security group with allowed icmp and ssh and a keypair. 3. Create a new network, create a router and connect the internal network to the external one through the router. 4. Launch a VM connected to the internal network. 5. Create a FIP and attach it to the VM's port. 6. Try to ping the VM FIP Result: Ping works - OK 7. Try to ssh the VM Result: Access using SSH fails - NOK (BUG) Actual results: Metadata service is not accessible from a VM so SSH key can not be obtained. It is not possible to connect to VM using SSH. Expected results: Metadata service is accessible from VM. It is possible to connect to VM using SSH. Additional info: Try run openstack console log show <VM UUID> It can be seen that VM is not able to access metadata: 35.102236] cloud-init[797]: 2021-06-18 18:11:07,494 - util.py[WARNING]: No active metadata service found Connect to the compute node where the VM is running and try to ping the VM from metadata namespace: sudo ip net exec ovnmeta-<DATAPATH UUID> ping 192.168.2.225 Result: no replies Try to trace a packet from the VM to metadata port: [heat-admin@computesriov-1 ~]$ sudo ovs-appctl ofproto/trace br-int in_port=493,dl_src=fa:16:3e:35:7c:7d,dl_dst=fa:16:3e:52:37:e5 Flow: in_port=493,vlan_tci=0x0000,dl_src=fa:16:3e:35:7c:7d,dl_dst=fa:16:3e:52:37:e5,dl_type=0x0000 bridge("br-int") ---------------- 0. in_port=493, priority 100, cookie 0xcee76ab7 set_field:0xd->reg13 set_field:0xf->reg11 set_field:0xe->reg12 set_field:0x4->metadata set_field:0x3->reg14 resubmit(,8) 8. reg14=0x3,metadata=0x4,dl_src=fa:16:3e:35:7c:7d, priority 50, cookie 0x278e23bd resubmit(,9) 9. metadata=0x4, priority 0, cookie 0x773414cc resubmit(,10) 10. metadata=0x4, priority 0, cookie 0x6c0ff7a9 resubmit(,11) 11. metadata=0x4, priority 0, cookie 0x781aaddf resubmit(,12) 12. metadata=0x4, priority 0, cookie 0xf29f0a11 resubmit(,13) 13. metadata=0x4, priority 0, cookie 0x29be1854 resubmit(,14) 14. metadata=0x4, priority 0, cookie 0xe29979c2 resubmit(,15) 15. metadata=0x4, priority 0, cookie 0x1ba260d6 resubmit(,16) 16. ct_state=-trk,metadata=0x4, priority 5, cookie 0x3cd9fb46 set_field:0x100000000000000000000000000/0x100000000000000000000000000->xxreg0 set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0 resubmit(,17) 17. metadata=0x4, priority 0, cookie 0xe50c1ff9 resubmit(,18) 18. metadata=0x4, priority 0, cookie 0x32efad62 resubmit(,19) 19. metadata=0x4, priority 0, cookie 0x12f09e82 resubmit(,20) 20. metadata=0x4, priority 0, cookie 0x5c1d64dd resubmit(,21) 21. metadata=0x4, priority 0, cookie 0x7c74f24b resubmit(,22) 22. metadata=0x4, priority 0, cookie 0xbdcdc10b resubmit(,23) 23. metadata=0x4, priority 0, cookie 0x71ce481b resubmit(,24) 24. metadata=0x4, priority 0, cookie 0x9c0631be resubmit(,25) 25. metadata=0x4, priority 0, cookie 0x3337a67f resubmit(,26) 26. metadata=0x4, priority 0, cookie 0xe80589b0 resubmit(,27) 27. metadata=0x4, priority 0, cookie 0xfceb4a8d resubmit(,28) 28. metadata=0x4, priority 0, cookie 0x3d2bd176 resubmit(,29) 29. metadata=0x4, priority 0, cookie 0xa31797e7 resubmit(,30) 30. metadata=0x4, priority 0, cookie 0xd33564d0 resubmit(,31) 31. metadata=0x4,dl_dst=fa:16:3e:52:37:e5, priority 50, cookie 0xdbb8b7c4 set_field:0x2->reg15 resubmit(,37) 37. priority 0 resubmit(,38) 38. reg15=0x2,metadata=0x4, priority 100, cookie 0xe2f53bb7 set_field:0x1->reg15 resubmit(,38) 38. reg15=0x1,metadata=0x4, priority 100 set_field:0x10->reg13 set_field:0xf->reg11 set_field:0xe->reg12 resubmit(,39) 39. priority 0 set_field:0->reg0 set_field:0->reg1 set_field:0->reg2 set_field:0->reg3 set_field:0->reg4 set_field:0->reg5 set_field:0->reg6 set_field:0->reg7 set_field:0->reg8 set_field:0->reg9 resubmit(,40) 40. metadata=0x4, priority 0, cookie 0x2d2084a7 resubmit(,41) 41. metadata=0x4, priority 0, cookie 0x9a0d473 resubmit(,42) 42. metadata=0x4, priority 0, cookie 0xa37266fe resubmit(,43) 43. metadata=0x4, priority 0, cookie 0xbf5498f8 resubmit(,44) 44. ct_state=-trk,metadata=0x4, priority 5, cookie 0x3af4b3cc set_field:0x100000000000000000000000000/0x100000000000000000000000000->xxreg0 set_field:0x200000000000000000000000000/0x200000000000000000000000000->xxreg0 resubmit(,45) 45. metadata=0x4, priority 0, cookie 0xa775f992 resubmit(,46) 46. metadata=0x4, priority 0, cookie 0x76593a3 resubmit(,47) 47. metadata=0x4, priority 0, cookie 0xb0395be2 resubmit(,48) 48. metadata=0x4, priority 0, cookie 0x1ac6c088 resubmit(,49) 49. metadata=0x4, priority 0, cookie 0x50392d97 resubmit(,50) 50. reg15=0x1,metadata=0x4, priority 50, cookie 0x9de154a2 resubmit(,64) 64. priority 0 resubmit(,65) 65. reg15=0x1,metadata=0x4, priority 100, cookie 0x149767bd push_vlan:0x8100 set_field:4418->vlan_vid output:487 bridge("br-ext-int") -------------------- 0. priority 0 NORMAL -> no learned MAC for destination, flooding bridge("br-int") ---------------- 0. No match. drop bridge("br-int") ---------------- 0. No match. drop pop_vlan Final flow: reg0=0x300,reg11=0xf,reg12=0xe,reg13=0x10,reg14=0x3,reg15=0x1,metadata=0x4,in_port=493,vlan_tci=0x0000,dl_src=fa:16:3e:35:7c:7d,dl_dst=fa:16:3e:52:37:e5,dl_type=0x0000 Megaflow: recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_label=0/0x1,eth,in_port=493,dl_src=fa:16:3e:35:7c:7d,dl_dst=fa:16:3e:52:37:e5,dl_type=0x0000 Datapath actions: push_vlan(vid=322,pcp=0),1,3 Note: in case I try to launch on the network a VM with '--config-drive True', i.e. like this: openstack server create --flavor rhel-flavor --security-group overcloud_sg --image rhel-8 --nic net-id=internal_A vm2 --key-name test-key --config-drive True it succeeds to get metadata. After this all new VMs on the same network will be able to access the metadata service on this network.
Can you get this triaged so we can give the Blocker+ when approved?
(In reply to spower from comment #2) > Can you get this triaged so we can give the Blocker+ when approved? +NEEDINFO on owner.
u/s backport for 21.06: https://patchwork.ozlabs.org/project/ovn/patch/20210715020704.2622538-1-ihrachys@redhat.com/
tested with following script: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.172.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.172.25 systemctl restart ovn-controller ovs-vsctl add-br br-phys ip link set br-phys up ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys ovn-nbctl ls-add ls ovn-nbctl --wait=sb ha-chassis-group-add hagrp ovn-nbctl --wait=sb ha-chassis-group-add-chassis hagrp hv1 10 ovn-nbctl lsp-add ls lext ovn-nbctl lsp-set-addresses lext "00:00:00:00:00:04 10.0.0.4 2001::4" ovn-nbctl lsp-set-type lext external hagrp_uuid=`ovn-nbctl --bare --columns _uuid find ha_chassis_group name=hagrp` ovn-nbctl set logical_switch_port lext ha_chassis_group=$hagrp_uuid ovn-nbctl lsp-add ls lp \ -- lsp-set-type lp localport \ -- lsp-set-addresses lp "00:00:00:00:00:01 10.0.0.1 2001::1" \ -- lsp-add ls lsp \ -- lsp-set-addresses lsp "00:00:00:00:00:02 10.0.0.2 2001::2" ovn-nbctl lsp-add ls lext2 ovn-nbctl lsp-set-addresses lext2 "00:00:00:00:00:10 10.0.0.10 2001::10" ovn-nbctl lsp-set-type lext2 external ovn-nbctl set logical_switch_port lext2 ha_chassis_group=$hagrp_uuid ovn-nbctl --wait=hv sync ovn-nbctl lsp-add ls lext-deleted ovn-nbctl lsp-set-addresses lext-deleted "00:00:00:00:00:03 10.0.0.3 2001::3" ovn-nbctl lsp-set-type lext-deleted external ovn-nbctl set logical_switch_port lext-deleted ha_chassis_group=$hagrp_uuid ovn-nbctl --wait=hv sync ovn-nbctl lsp-del lext-deleted ovn-nbctl --wait=hv sync ovs-vsctl add-port br-int lp -- set interface lp type=internal external_ids:iface-id=lp ip netns add lp ip link set lp netns lp ip netns exec lp ip link set lp address 00:00:00:00:00:01 ip netns exec lp ip link set lp up ip netns exec lp ip addr add 10.0.0.1/24 dev lp ip netns exec lp ip addr add 2001::1/64 dev lp ovn-nbctl --wait=hv sync ovs-vsctl add-port br-int lsp -- set interface lsp type=internal external_ids:iface-id=lsp options:tx_pcap=lsp.pcap options:rxq_pcap=lsp-rx.pcap ip netns add lsp ip link set lsp netns lsp ip netns exec lsp ip link set lsp address 00:00:00:00:00:02 ip netns exec lsp ip link set lsp up ip netns exec lsp ip addr add 10.0.0.2/24 dev lsp ip netns exec lsp ip addr add 2001::2/64 dev lsp ip netns exec lsp tcpdump -i lsp -w lsp.pcap & ovs-vsctl add-port br-phys ext1 -- set interface ext1 type=internal ip netns add ext1 ip link set ext1 netns ext1 ip netns exec ext1 ip link set ext1 up ip netns exec ext1 ip addr add 10.0.0.101/24 dev ext1 ip netns exec ext1 ip addr add 2001::101/64 dev ext1 ip netns exec ext1 tcpdump -i ext1 -w ext1.pcap & sleep 2 ovn-nbctl lsp-add ls ln \ -- lsp-set-type ln localnet \ -- lsp-set-addresses ln unknown \ -- lsp-set-options ln network_name=phys ip netns exec lp ip neigh add 10.0.0.4 lladdr 00:00:00:00:00:04 dev lp ip netns exec lp ip -6 neigh add 2001::4 lladdr 00:00:00:00:00:04 dev lp ip netns exec lp ip neigh add 10.0.0.10 lladdr 00:00:00:00:00:10 dev lp ip netns exec lp ip -6 neigh add 2001::10 lladdr 00:00:00:00:00:10 dev lp ip netns exec lp ping 10.0.0.4 -c 1 -w 1 -W 1 ip netns exec lp ping 10.0.0.10 -c 1 -w 1 -W 1 ip netns exec lp ping6 2001::4 -c 1 -w 1 -W 1 ip netns exec lp ping6 2001::10 -c 1 -w 1 -W 1 sleep 1 pkill tcpdump sleep 1 tcpdump -r ext1.pcap -nnle reproduced on ovn-2021-21.06.0-4: [root@wsfd-advnetlab16 4]# tcpdump -r ext1.pcap -nnle host 10.0.0.4 or host 10.0.0.10 or host 2001::4 or host 2001::10 reading from file ext1.pcap, link-type EN10MB (Ethernet) dropped privs to tcpdump <=== no packets on localnet verified on ovn-2021-21.06.0-12: [root@wsfd-advnetlab16 bz1974062]# tcpdump -r ext1.pcap -nnle host 10.0.0.4 or host 10.0.0.10 or host 2001::4 or host 2001::10 reading from file ext1.pcap, link-type EN10MB (Ethernet) dropped privs to tcpdump 03:30:47.849584 00:00:00:00:00:01 > 00:00:00:00:00:04, ethertype IPv4 (0x0800), length 98: 10.0.0.1 > 10.0.0.4: ICMP echo request, id 25014, seq 1, length 64 03:30:48.870469 00:00:00:00:00:01 > 00:00:00:00:00:10, ethertype IPv4 (0x0800), length 98: 10.0.0.1 > 10.0.0.10: ICMP echo request, id 25015, seq 1, length 64 03:30:49.902329 00:00:00:00:00:01 > 00:00:00:00:00:04, ethertype IPv6 (0x86dd), length 118: 2001::1 > 2001::4: ICMP6, echo request, seq 1, length 64 03:30:50.930259 00:00:00:00:00:01 > 00:00:00:00:00:10, ethertype IPv6 (0x86dd), length 118: 2001::1 > 2001::10: ICMP6, echo request, seq 1, length 64 <=== packets sent on localnet
Verified on ovn2.13-20.12.0-149.el8: [root@dell-per740-12 bz1974062]# tcpdump -r ext1.pcap -nnle host 10.0.0.4 or host 10.0.0.10 or host 2001::4 or host 2001::10 reading from file ext1.pcap, link-type EN10MB (Ethernet) dropped privs to tcpdump 00:12:52.027244 00:00:00:00:00:01 > 00:00:00:00:00:04, ethertype IPv4 (0x0800), length 98: 10.0.0.1 > 10.0.0.4: ICMP echo request, id 25268, seq 1, length 64 00:12:53.055108 00:00:00:00:00:01 > 00:00:00:00:00:10, ethertype IPv4 (0x0800), length 98: 10.0.0.1 > 10.0.0.10: ICMP echo request, id 25269, seq 1, length 64 00:12:54.078155 00:00:00:00:00:01 > 00:00:00:00:00:04, ethertype IPv6 (0x86dd), length 118: 2001::1 > 2001::4: ICMP6, echo request, seq 1, length 64 00:12:55.105044 00:00:00:00:00:01 > 00:00:00:00:00:10, ethertype IPv6 (0x86dd), length 118: 2001::1 > 2001::10: ICMP6, echo request, seq 1, length 64 [root@dell-per740-12 bz1974062]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" ovn2.13-central-20.12.0-149.el8fdp.x86_64 ovn2.13-20.12.0-149.el8fdp.x86_64 openvswitch2.13-2.13.0-117.el8fdp.x86_64 ovn2.13-host-20.12.0-149.el8fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2971