Description of problem: We are seeing this in the first composes of OSP-16.1 with RHEL 8.2. Introspection is failing because the Ironic inspector dnsmasq is not receiving DHCP packets. Running "tcpdump -i br-ctlplane" shows DHCP Requests (or DHCPv6 Solicits) but these aren't being received - we see no logging from the inspector dnsmasq nor packets in iptables (below). We do see dhcp traffic received for the neutron dnsmasq server and DHCP responses (below). In RHEL 8.2 only the dhcp server on tap89231cd5-8c see the incoming traffic, both DHCPv4 and DHCPv6 affected. Prior to RHEL 8.2 both dhcp servers would see the DHCP requests. I've tried the previous version of dnsmasq prior to the last set of changes - dnsmasq-2.79-6 and there was no difference. There hasn't been any recent inspector changes. Doesn't look like any dnsmasq errors in container. /var/log/messages Mar 17 15:05:37 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPADVERTISE(tap89231cd5-8c) fe32:dead:beef::23 00:04:44:45:4c:4c:54:00:10:47:80:4c:b3:c0:4f:58:4b:32 host-fe32-dead-beef--23 Mar 17 15:05:42 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPREQUEST(tap89231cd5-8c) 00:04:44:45:4c:4c:54:00:10:47:80:4c:b3:c0:4f:58:4b:32 Mar 17 15:05:42 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPREPLY(tap89231cd5-8c) fe32:dead:beef::23 00:04:44:45:4c:4c:54:00:10:47:80:4c:b3:c0:4f:58:4b:32 host-fe32-dead-beef--23 Mar 17 15:05:43 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPRELEASE(tap89231cd5-8c) 00:04:44:45:4c:4c:54:00:10:47:80:4c:b3:c0:4f:58:4b:32 Mar 17 15:06:05 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPSOLICIT(tap89231cd5-8c) 00:04:60:e3:41:67:0f:57:da:23:9d:4b:86:9e:44:43:6a:30 Mar 17 15:06:05 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPREPLY(tap89231cd5-8c) fe32:dead:beef::23 00:04:60:e3:41:67:0f:57:da:23:9d:4b:86:9e:44:43:6a:30 host-fe32-dead-beef--23 Mar 17 15:08:35 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPSOLICIT(tap89231cd5-8c) 00:04:fe:56:ed:68:f1:26:d6:21:0f:16:76:4a:eb:48:d3:7f There is a new version of OVS - 2.13.0 and its possible that this is causing problems.
Info from Harald: [root@hardprov-dl360-g9-01 ~]# ovs-appctl bridge/dump-flows br-ctlplane duration=56995s, n_packets=9544, n_bytes=1297865, priority=4,in_port=2,dl_vlan=1,actions=pop_vlan,NORMAL duration=57125s, n_packets=12, n_bytes=1244, priority=2,in_port=2,actions=drop duration=57125s, n_packets=511008, n_bytes=40663885, priority=0,actions=NORMAL table_id=254, duration=60370s, n_packets=0, n_bytes=0, priority=2,recirc_id=0,actions=drop table_id=254, duration=60370s, n_packets=0, n_bytes=0, priority=0,reg0=0x1,actions=controller(reason=) table_id=254, duration=60370s, n_packets=0, n_bytes=0, priority=0,reg0=0x2,actions=drop table_id=254, duration=60370s, n_packets=0, n_bytes=0, priority=0,reg0=0x3,actions=drop [root@hardprov-dl360-g9-01 ~]# ovs-appctl dpctl/dump-flows recirc_id(0),in_port(2),eth(src=a0:36:9f:f4:7f:c2,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=120),encap(eth_type(0x0800),ipv4(frag=no)), packets:4, bytes:1316, used:7.369s, actions:1 recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=120),encap(), packets:28602, bytes:1944936, used:0.481s, actions:drop recirc_id(0),in_port(5),eth(src=fa:16:3e:ca:51:e8,dst=33:33:00:00:00:01),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:push_vlan(vid=1,pcp=0),3,pop_vlan,1,2,4 recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=112),encap(), packets:28602, bytes:1944936, used:1.206s, actions:drop recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=105),encap(), packets:28602, bytes:1944936, used:0.482s, actions:drop recirc_id(0),in_port(2),eth(src=a0:36:9f:f4:80:94,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:90, used:6.230s, actions:1,push_vlan(vid=1,pcp=0),3,pop_vlan,4,5 ^^ 33:33:00:00:00:16 is multicast, i.e this would action on DHCPv6 recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=117),encap(), packets:28602, bytes:1944936, used:0.204s, actions:drop recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=109),encap(), packets:28602, bytes:1944936, used:0.479s, actions:drop recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=101),encap(), packets:28602, bytes:1944936, used:1.143s, actions:drop recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=116),encap(), packets:28601, bytes:1944868, used:1.484s, actions:drop recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=104),encap(), packets:28602, bytes:1944936, used:0.483s, actions:drop recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=113),encap(), packets:28602, bytes:1944936, used:0.480s, actions:drop recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0/0xffff), packets:28602, bytes:1830528, used:0.482s, actions:drop recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=121),encap(), packets:28601, bytes:1944868, used:1.483s, actions:drop recirc_id(0),in_port(2),eth(src=a0:36:9f:f4:80:94,dst=33:33:00:01:00:02),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:1,push_vlan(vid=1,pcp=0),3,pop_vlan,4,5 recirc_id(0),in_port(1),eth(src=48:df:37:19:90:00,dst=33:33:00:00:00:12),eth_type(0x86dd),ipv6(frag=no), packets:114386, bytes:8922108, used:0.438s, actions:2,push_vlan(vid=1,pcp=0),3,pop_vlan,4,5 ^^ 33:33:00:00:00:16 is multicast, i.e this would action on DHCPv6 recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=108),encap(), packets:28602, bytes:1944936, used:0.483s, actions:drop ------ dhcpv6 debug script ---- import binascii import socket import struct INTERFACE = 'eno49' ALL_DHCP_RELAY_AGENTS_AND_SERVERS = 'ff02::1:2' ALL_DHCP_SERVERS = 'ff05::1:3' SERVER_UDP_PORT = 547 DHCP_MESSAGE_TYPES = { 1: 'SOLICIT', 2: 'ADVERTISE', 3: 'REQUEST', 4: 'CONFIRM', 5: 'RENEW', 6: 'REBIND', 7: 'REPLY', 8: 'RELEASE', 9: 'DECLINE', 10: 'RECONFIGURE', 11: 'INFORMATION-REQUEST', 12: 'RELAY-FORW', 13: 'RELAY-REPL', } RALAY_MESSAGE_TYPES = [ DHCP_MESSAGE_TYPES[12], DHCP_MESSAGE_TYPES[13], ] def decode_opts(raw_opts): opts = {} while len(raw_opts) > 0: opt = int(raw_opts[0:4], 16) opt_len = int(raw_opts[4:8], 16) * 2 opt_val = raw_opts[8:8 + opt_len] if opt not in opts: opts[opt] = [] opts[opt].append(opt_val) # Trim handled option before next iteration raw_opts = raw_opts[8 + opt_len:] return opts def decode_request(raw_bytes): transaction_id = raw_bytes[2:8] print("XID: {}".format(transaction_id)) opts = decode_opts(raw_bytes[8:]) print("OPTIONS: {}".format(opts)) return (transaction_id, opts) # Initialise socket for IPv6 datagrams sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP) # Allows address to be reused sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # Allow messages from this socket to loop back for development # sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_MULTICAST_LOOP, True) # Set multicast interface ifi = socket.if_nametoindex(INTERFACE) ifis = struct.pack("I", ifi) sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_MULTICAST_IF, ifis) # Binds to all interfaces on the given port sock.bind(('::', SERVER_UDP_PORT)) # Join ALL_DHCP_RELAY_AGENTS_AND_SERVERS multicast group mreq = socket.inet_pton( socket.AF_INET6, ALL_DHCP_RELAY_AGENTS_AND_SERVERS) + ifis sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_JOIN_GROUP, mreq) # Join ALL_DHCP_SERVERS multicast group mreq = socket.inet_pton(socket.AF_INET6, ALL_DHCP_SERVERS) + ifis sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_JOIN_GROUP, mreq) data, addr = sock.recvfrom(1024) raw_bytes = binascii.hexlify(data).decode() msg_type = int(raw_bytes[:2], 16) print("FROM: " + str(addr)) print("MESSAGE TYPE: {}".format(DHCP_MESSAGE_TYPES[msg_type])) transaction_id, options = decode_request(raw_bytes)
Adding my comments from the email thread as well: - It looks like the traffic is maybe being dropped by OVS, especially if it worked with RHEL 8.1. I'm not sure if it's possible to downgrade the kernel version to verify this. - Can you please try ovs-tcpdump in the br-ctlplane? Do you see the counters go up when the DHCP traffic comes in? - We're sure that iptables are not dropping the traffic before reaching OVS? As per Bob's comment it looks like it should be allowed. - IIUC you observe the same with DHCPv4 requests right? - I don't understand why you see the traffic with tcpdump but not with your debug script if that's using the physical interface. OVS is out of the picture here unless I'm missing something.
(In reply to Daniel Alvarez Sanchez from comment #2) > Adding my comments from the email thread as well: > > - It looks like the traffic is maybe being dropped by OVS, especially if it > worked with RHEL 8.1. I'm not sure if it's possible to downgrade the kernel > version to verify this. > Yes, this is probably the next step. > - Can you please try ovs-tcpdump in the br-ctlplane? Do you see the counters > go up when the DHCP traffic comes in? > TCPDUMP Args: -l -n -vv udp port 546 or udp port 547 tcpdump: listening on mibr-ctlplane, link-type EN10MB (Ethernet), capture size 262144 bytes 08:50:21.036767 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=fb4962 (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 1748)) 08:50:40.363790 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=fb4962 (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 3681)) 08:50:50.551561 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 0)) 08:50:51.631641 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 108)) 08:50:53.737796 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 318)) 08:50:57.943818 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 739)) 08:51:06.584717 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 1603)) 08:51:23.407683 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 3285)) The counter in this entry 'ovs-appctl dpctl/dump-flows' is going up. recirc_id(0),in_port(1),eth(src=48:df:37:19:90:00,dst=33:33:00:00:00:12),eth_type(0x86dd),ipv6(frag=no), packets:133964, bytes:10449192, used:0.687s, actions:6,2,push_vlan(vid=1,pcp=0),3,pop_vlan,4,5 > - We're sure that iptables are not dropping the traffic before reaching OVS? > As per Bob's comment it looks like it should be allowed. > I tried to stop ip6tables, 'systemctl stop ip6tables.service'. This should flush all tables and allow anything. The issue remains the same. > - IIUC you observe the same with DHCPv4 requests right? > Yes, this is not working for both DHCPv4 and DHCPv6. > - I don't understand why you see the traffic with tcpdump but not with your > debug script if that's using the physical interface. OVS is out of the > picture here unless I'm missing something. tcpdump is operating on a raw socket, bypassing a lot of the kernel network stack. My python debug code is using a UDP datagram socket, i.e not bypassing the kernel network stack. To be honest I tried that just to quickly see if listening on eno49 vs br-ctlplane made a difference. And also try without the podman container stuff.
(In reply to Harald Jensås from comment #3) > (In reply to Daniel Alvarez Sanchez from comment #2) > > Adding my comments from the email thread as well: > > > > - It looks like the traffic is maybe being dropped by OVS, especially if it > > worked with RHEL 8.1. I'm not sure if it's possible to downgrade the kernel > > version to verify this. > > > > Yes, this is probably the next step. > > > - Can you please try ovs-tcpdump in the br-ctlplane? Do you see the counters > > go up when the DHCP traffic comes in? > > > > TCPDUMP Args: -l -n -vv udp port 546 or udp port 547 > tcpdump: listening on mibr-ctlplane, link-type EN10MB (Ethernet), capture > size 262144 bytes > 08:50:21.036767 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload > length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=fb4962 > (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) > (option-request DNS-server DNS-search-list NTP-server SNTP-servers) > (client-ID type 4) (elapsed-time 1748)) > Looks like it's arriving to OVS right? Can you check also ovs-tcpdump in br-int? The goal is to determine whether the packet is dropped after the patch-port or before. Anyways seems like downgrading to 8.1 kernel might tell us more. At this point, I'd loop in the OVS team.
I will set up for OVS 2.11 and we can retest.
Marking this as a blocker as its affecting OSP-16.1 CI. Currently using $ sudo ovs-vsctl -V ovs-vsctl (Open vSwitch) 2.13.0 DB Schema 8.2.0 Will retry when compose with 2.11 is available.
We are waiting for test results with OVS 2.11 (to confirm if it is purely OVS issue or more generic), we should get some update here before EOW
Michele - which kernel and OSP version did you duplicate this in? Was it just ovs 2.11 with rhel 8.2/osp 16.1?
FWIW - I tried, but failed, to reproduce this without the full undercloud. Since this works, we may want to see about the firewall in the undercloud setup. RHOS_RELEASE_ARGS="16.1 -r 8.2 -p latest-RHOS_TRUNK-16-RHEL-8" dnf localinstall -y http://download.lab.bos.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm dnf update -y rhos-release* rhos-release $RHOS_RELEASE_ARGS dnf update -y dnf install openvswitch vim-enhanced tcpdump tmux dnsmasq reboot [root@rhel8 ~]# rhos-release -L Installed repositories (rhel-8.2): 16.1 ceph-4 ceph-osd-4 rhel-8.2 [root@rhel8 ~]# uname -a Linux rhel8.lab.example.com 4.18.0-187.el8.x86_64 #1 SMP Sat Mar 7 03:42:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux [root@rhel8 ~]# ovs-vsctl --version ovs-vsctl (Open vSwitch) 2.11.0 DB Schema 7.16.1 systemct stop firewalld systemctl enable openvswitch systemctl start openvswitch ovs-vsctl add-br br-ctlplane ovs-vsctl add-port br-ctlplane enp2s0 ip addr add 172.20.0.1/24 dev br-ctlplane ip link set br-ctlplane up ovs-vsctl add-br br-int ovs-vsctl -- add-port br-ctlplane patch0 \ -- set interface patch0 type=patch options:peer=patch1 \ -- add-port br-int patch1 \ -- set interface patch1 type=patch options:peer=patch0 ovs-vsctl add-port br-int qdhcp -- set interface qdhcp type=internal ip netns add qdhcp ip link set qdhcp netns qdhcp ip netns exec qdhcp ip addr add 172.20.0.2/24 dev qdhcp ip netns exec qdhcp ip link set qdhcp up [root@rhel8 ~]# ovs-vsctl show 7b8be7ec-d12e-43eb-a9e7-5241c319ff25 Bridge br-int Port "patch1" Interface "patch1" type: patch options: {peer="patch0"} Port qdhcp Interface qdhcp type: internal Port br-int Interface br-int type: internal Bridge br-ctlplane Port br-ctlplane Interface br-ctlplane type: internal Port "enp2s0" Interface "enp2s0" Port "patch0" Interface "patch0" type: patch options: {peer="patch1"} ovs_version: "2.11.0" cat << EOF > dnsmasq.conf port=0 interface=br-ctlplane log-dhcp dhcp-range=172.20.0.100,172.20.0.110,255.255.255.0,10m EOF cat << EOF > qdhcp_dnsmasq.conf port=0 interface=qdhcp log-dhcp dhcp-range=172.20.0.0,static,255.255.255.0,10m EOF [root@rhel8 ~]# dnsmasq --no-daemon --conf-file=dnsmasq.conf | dnsmasq-dhcp: 2446226970 available DHCP range: 172.20.0.100 -- 172.20.0.110 | dnsmasq-dhcp: 2446226970 vendor class: PXEClient:Arch:00000:UNDI:002001 | dnsmasq-dhcp: 2446226970 user class: iPXE | dnsmasq-dhcp: 2446226970 DHCPREQUEST(br-ctlplane) 172.20.0.105 52:54:00:bc:c3:fd | dnsmasq-dhcp: 2446226970 tags: br-ctlplane | dnsmasq-dhcp: 2446226970 DHCPACK(br-ctlplane) 172.20.0.105 52:54:00:bc:c3:fd | dnsmasq-dhcp: 2446226970 requested options: 1:netmask, 3:router, 6:dns-server, 7:log-server, | dnsmasq-dhcp: 2446226970 requested options: 12:hostname, 15:domain-name, 17:root-path, | dnsmasq-dhcp: 2446226970 requested options: 26:mtu, 43:vendor-encap, 60:vendor-class, | dnsmasq-dhcp: 2446226970 requested options: 66:tftp-server, 67:bootfile-name, 119:domain-search, | dnsmasq-dhcp: 2446226970 requested options: 128, 129, 130, 131, 132, 133, 134, 135, 175, | dnsmasq-dhcp: 2446226970 requested options: 203 | dnsmasq-dhcp: 2446226970 next server: 172.20.0.1 | dnsmasq-dhcp: 2446226970 sent size: 1 option: 53 message-type 5 | dnsmasq-dhcp: 2446226970 sent size: 4 option: 54 server-identifier 172.20.0.1 | dnsmasq-dhcp: 2446226970 sent size: 4 option: 51 lease-time 10m | dnsmasq-dhcp: 2446226970 sent size: 4 option: 58 T1 5m | dnsmasq-dhcp: 2446226970 sent size: 4 option: 59 T2 8m45s | dnsmasq-dhcp: 2446226970 sent size: 4 option: 1 netmask 255.255.255.0 | dnsmasq-dhcp: 2446226970 sent size: 4 option: 28 broadcast 172.20.0.255 | dnsmasq-dhcp: 2446226970 sent size: 4 option: 3 router 172.20.0.1 [root@rhel8 ~]# ip netns exec qdhcp dnsmasq --no-daemon --conf-file=qdhcp_dnsmasq.conf | dnsmasq-dhcp: 2446226970 available DHCP subnet: 172.20.0.0/255.255.255.0 | dnsmasq-dhcp: 2446226970 vendor class: PXEClient:Arch:00000:UNDI:002001 | dnsmasq-dhcp: 2446226970 user class: iPXE | dnsmasq-dhcp: 2446226970 DHCPDISCOVER(qdhcp) 52:54:00:bc:c3:fd no address available cat << EOF > dnsmasq.conf port=0 interface=br-ctlplane log-dhcp dhcp-range=172.20.0.100,172.20.0.110,255.255.255.0,10m EOF cat << EOF > qdhcp_dnsmasq.conf port=0 interface=qdhcp log-dhcp dhcp-range=172.20.0.0,static,255.255.255.0,10m EOF [root@rhel8 ~]# dnsmasq --no-daemon --conf-file=dnsmasq.conf | dnsmasq-dhcp: 2446226970 available DHCP range: 172.20.0.100 -- 172.20.0.110 | dnsmasq-dhcp: 2446226970 vendor class: PXEClient:Arch:00000:UNDI:002001 | dnsmasq-dhcp: 2446226970 user class: iPXE | dnsmasq-dhcp: 2446226970 DHCPREQUEST(br-ctlplane) 172.20.0.105 52:54:00:bc:c3:fd | dnsmasq-dhcp: 2446226970 tags: br-ctlplane | dnsmasq-dhcp: 2446226970 DHCPACK(br-ctlplane) 172.20.0.105 52:54:00:bc:c3:fd | dnsmasq-dhcp: 2446226970 requested options: 1:netmask, 3:router, 6:dns-server, 7:log-server, | dnsmasq-dhcp: 2446226970 requested options: 12:hostname, 15:domain-name, 17:root-path, | dnsmasq-dhcp: 2446226970 requested options: 26:mtu, 43:vendor-encap, 60:vendor-class, | dnsmasq-dhcp: 2446226970 requested options: 66:tftp-server, 67:bootfile-name, 119:domain-search, | dnsmasq-dhcp: 2446226970 requested options: 128, 129, 130, 131, 132, 133, 134, 135, 175, | dnsmasq-dhcp: 2446226970 requested options: 203 | dnsmasq-dhcp: 2446226970 next server: 172.20.0.1 | dnsmasq-dhcp: 2446226970 sent size: 1 option: 53 message-type 5 | dnsmasq-dhcp: 2446226970 sent size: 4 option: 54 server-identifier 172.20.0.1 | dnsmasq-dhcp: 2446226970 sent size: 4 option: 51 lease-time 10m | dnsmasq-dhcp: 2446226970 sent size: 4 option: 58 T1 5m | dnsmasq-dhcp: 2446226970 sent size: 4 option: 59 T2 8m45s | dnsmasq-dhcp: 2446226970 sent size: 4 option: 1 netmask 255.255.255.0 | dnsmasq-dhcp: 2446226970 sent size: 4 option: 28 broadcast 172.20.0.255 | dnsmasq-dhcp: 2446226970 sent size: 4 option: 3 router 172.20.0.1 [root@rhel8 ~]# ip netns exec qdhcp dnsmasq --no-daemon --conf-file=qdhcp_dnsmasq.conf | dnsmasq-dhcp: 2446226970 available DHCP subnet: 172.20.0.0/255.255.255.0 | dnsmasq-dhcp: 2446226970 vendor class: PXEClient:Arch:00000:UNDI:002001 | dnsmasq-dhcp: 2446226970 user class: iPXE | dnsmasq-dhcp: 2446226970 DHCPDISCOVER(qdhcp) 52:54:00:bc:c3:fd no address available
I copied /etc/sysconfig/iptables from another 16 deployment onto my reproducer. The rule set: ---------------------- [root@site-undercloud-0 ~]# cat /etc/sysconfig/iptables # Generated by xtables-save v1.8.2 on Fri Mar 13 07:18:50 2020 *raw :PREROUTING ACCEPT [11296935:41611227599] :OUTPUT ACCEPT [11556386:72256765344] COMMIT # Completed on Fri Mar 13 07:18:50 2020 # Generated by xtables-save v1.8.2 on Fri Mar 13 07:18:50 2020 *filter :INPUT ACCEPT [371:19220] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [11556366:72256758464] -A INPUT -m state --state RELATED,ESTABLISHED -m comment --comment "000 accept related established rules ipv4" -j ACCEPT -A INPUT -p icmp -m state --state NEW -m comment --comment "001 accept all icmp ipv4" -j ACCEPT -A INPUT -i lo -m state --state NEW -m comment --comment "002 accept all to lo interface ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 22 -m state --state NEW -m comment --comment "003 accept ssh from all ipv4" -j ACCEPT -A INPUT -s 192.168.24.0/24 -p tcp -m multiport --dports 22 -m state --state NEW -m comment --comment "003 accept ssh from ctlplane subnet 192.168.24.1/24 ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 9292 -m state --state NEW -m comment --comment "100 glance_api_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13292 -m state --state NEW -m comment --comment "100 glance_api_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8004 -m state --state NEW -m comment --comment "100 heat_api_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13004 -m state --state NEW -m comment --comment "100 heat_api_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 5050 -m state --state NEW -m comment --comment "100 ironic-inspector_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13050 -m state --state NEW -m comment --comment "100 ironic-inspector_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 6385 -m state --state NEW -m comment --comment "100 ironic_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13385 -m state --state NEW -m comment --comment "100 ironic_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 5000 -m state --state NEW -m comment --comment "100 keystone_admin_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 35357 -m state --state NEW -m comment --comment "100 keystone_admin_haproxy_frontend ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 5000 -m state --state NEW -m comment --comment "100 keystone_public_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13000 -m state --state NEW -m comment --comment "100 keystone_public_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8989 -m state --state NEW -m comment --comment "100 mistral_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13989 -m state --state NEW -m comment --comment "100 mistral_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 3306 -m state --state NEW -m comment --comment "100 mysql_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 9696 -m state --state NEW -m comment --comment "100 neutron_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13696 -m state --state NEW -m comment --comment "100 neutron_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8774 -m state --state NEW -m comment --comment "100 nova_osapi_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13774 -m state --state NEW -m comment --comment "100 nova_osapi_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8778 -m state --state NEW -m comment --comment "100 placement_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13778 -m state --state NEW -m comment --comment "100 placement_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8080 -m state --state NEW -m comment --comment "100 swift_proxy_server_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13808 -m state --state NEW -m comment --comment "100 swift_proxy_server_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8888 -m state --state NEW -m comment --comment "100 zaqar_api_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 13888 -m state --state NEW -m comment --comment "100 zaqar_api_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 9000 -m state --state NEW -m comment --comment "100 zaqar_ws_haproxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 9000 -m state --state NEW -m comment --comment "100 zaqar_ws_haproxy_ssl ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 873,3306,4444,4567,4568,9200 -m state --state NEW -m comment --comment "104 mysql galera ipv4" -j ACCEPT -A INPUT -p vrrp -m state --state NEW -m comment --comment "106 keepalived vrrp ipv4" -j ACCEPT -A INPUT -p vrrp -m state --state NEW -m comment --comment "106 neutron_l3 vrrp ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 1993 -m state --state NEW -m comment --comment "107 haproxy stats ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 4369,5672,25672 -m state --state NEW -m comment --comment "109 rabbitmq ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 5000,13000,35357 -m state --state NEW -m comment --comment "111 keystone ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 9292,13292 -m state --state NEW -m comment --comment "112 glance_api ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8774,13774 -m state --state NEW -m comment --comment "113 nova_api ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 9000,8888,3000,13888 -m state --state NEW -m comment --comment "113 zaqar_api ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 9696,13696 -m state --state NEW -m comment --comment "114 neutron api ipv4" -j ACCEPT -A INPUT -p udp -m multiport --dports 67 -m state --state NEW -m comment --comment "115 neutron dhcp input ipv4" -j ACCEPT -A INPUT -p udp -m multiport --dports 4789 -m state --state NEW -m comment --comment "118 neutron vxlan networks ipv4" -j ACCEPT -A INPUT -s 192.168.24.0/24 -p tcp -m multiport --dports 11211 -m state --state NEW -m comment --comment "121 memcached 192.168.24.1/24 ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8080,13808 -m state --state NEW -m comment --comment "122 swift proxy ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 873,6000,6001,6002 -m state --state NEW -m comment --comment "123 swift storage ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8004,13004 -m state --state NEW -m comment --comment "125 heat_api ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 6385,13385 -m state --state NEW -m comment --comment "133 ironic api ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8989,13989 -m state --state NEW -m comment --comment "133 mistral ipv4" -j ACCEPT -A INPUT -p udp -m multiport --dports 69 -m state --state NEW -m comment --comment "134 ironic conductor TFTP ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8088 -m state --state NEW -m comment --comment "135 ironic conductor HTTP ipv4" -j ACCEPT -A INPUT -p gre -m comment --comment "136 neutron gre networks ipv4" -j ACCEPT -A INPUT -i br-ctlplane -p udp -m multiport --dports 67 -m state --state NEW -m comment --comment "137 ironic-inspector dhcp input ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 5050 -m state --state NEW -m comment --comment "137 ironic-inspector ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8778,13778 -m state --state NEW -m comment --comment "138 placement ipv4" -j ACCEPT -A INPUT -p tcp -m multiport --dports 8787,13787 -m state --state NEW -m comment --comment "155 docker-registry ipv4" -j ACCEPT -A INPUT -m state --state NEW -m limit --limit 20/min --limit-burst 15 -m comment --comment "998 log all ipv4" -j LOG -A INPUT -m state --state NEW -m comment --comment "999 drop all ipv4" -j DROP -A OUTPUT -p udp -m multiport --dports 68 -m state --state NEW -m comment --comment "116 neutron dhcp output ipv4" -j ACCEPT -A OUTPUT -p udp -m multiport --dports 68 -m state --state NEW -m comment --comment "137 ironic-inspector dhcp output ipv4" -j ACCEPT COMMIT # Completed on Fri Mar 13 07:18:50 2020 ------------------------------------------------------------ Ran command: iptables-restore /etc/sysconfig/iptables And now I only see the requests in the "qdhcp" namespace, the dnsmasq instance listening on br-ctlplane does not see the DHCP requests.
iptables -F iptables -F -t nat iptables -F -t raw Nothing helps, packets are not recived in dnsmasq listening on br-ctlplane. Stopping firewald with 'systemctl stop firewalld' does help.
It doesn't seem that stopping firewalld has helped on a reproducer systems so not sure if this is related. Consistently we are seeing DHCP packets only received by the neutron dnsmasq but not by inspector dnsmasq. This occurs with RHEL 8.2 regardless of the OVS version, i.e. it happens with 2.11 and 2.13. Testing with 8.1 is still in progress (Michele ran into RHEL iptables issue https://bugzilla.redhat.com/show_bug.cgi?id=1807811).
Adding info from Michele on mail thread. I tweaked the ironic_inspector_dnsmasq container to start under strace So we see the following in the strace of dnsmasq: 18 10:34:12.123578 write(9</var/log/ironic-inspector/dnsmasq.log>, "Mar 24 10:34:12 dnsmasq[18]: inotify, new or changed file /var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff\n", 116) = 116 18 10:34:12.123658 getpid() = 18 18 10:34:12.123719 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 18 10:34:12.123796 getpid() = 18 18 10:34:12.123877 write(9</var/log/ironic-inspector/dnsmasq.log>, "Mar 24 10:34:12 dnsmasq-dhcp[18]: read /var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff\n", 97) = 97 18 10:34:12.123964 openat(AT_FDCWD, "/var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff", O_RDONLY) = 10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff> 18 10:34:12.124052 fstat(10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>, {st_mode=S_IFREG|0644, st_size=18, ...}) = 0 18 10:34:12.124126 read(10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>, "52:54:00:fb:b6:ff\n", 4096) = 18 18 10:34:12.124203 read(10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>, "", 4096) = 0 18 10:34:12.124274 close(10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>) = 0 18 10:34:12.124341 read(6<anon_inode:inotify>, 0x55a0d94f7ae0, 272) = -1 EAGAIN (Resource temporarily unavailable) The following line is where the dnsmasq process hangs (i.e. while polling for network data, the inotify fd on /var/lib/ironic-inspector/dhcp-hostsdir, etc): 18 10:34:12.124415 poll([{fd=4<socket:[1376796]>, events=POLLIN}, {fd=5<socket:[1376797]>, events=POLLIN}, {fd=6<anon_inode:inotify>, events=POLLIN}, {fd=7<pipe:[1376803]>, events=POLLIN}], 4, -1n I double checked and if I do create a new file under /var/lib/ironic-inspector/dhcp-hostsdir, the poll() call correctly signals it. So poll() has not gone nuts just yet ;) The problem is that the polling of the socket 1376796 which is the udp one listening on port 67 is simply not getting any data at all (as already mentioned only the ironic inspector seems to not get data, the neutron one listening to a tap interface gets the data just fine): [root@undercloud-0 ironic-inspector]# grep -e 1376796 -e 1376797 /proc/net/udp 15596: 00000000:0043 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 1376796 2 ffff89f90a0a6400 0 (port 0043 -> 67 dhcp) I also tried changing interface= to eth0 in /var/lib/config-data/puppet-generated/ironic_inspector/etc/ironic-inspector/dnsmasq.conf to no avail So to me the TLDR; currently is that dnsmasq is simply not receiving any data from the kernel for whatever reason (I also tried adding iptables -I INPUT 1 -j ACCEPT to be on the safe side but to no avail)
Updated nftables appears to work to get past this issue and is available. Also https://review.opendev.org/715130 has merged. Moving this to MODIFIED.
Issue has been resolved, we're no longer seeing introspection failures.
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148