Bug 1814616 - ironic inspector dnsmasq is not receiving DHCP requests
Summary: ironic inspector dnsmasq is not receiving DHCP requests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: beta
: 16.1 (Train on RHEL 8.2)
Assignee: RHOS Maint
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-18 11:49 UTC by Bob Fournier
Modified: 2020-07-29 07:51 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-0.20200413103800.cf3c03e.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 07:51:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1869166 0 None None None 2020-03-26 10:19:02 UTC
OpenStack gerrit 715130 0 None MERGED [train/backport] Prevent nftables to interfere with tripleo firewall 2021-02-19 05:02:33 UTC
Red Hat Product Errata RHBA-2020:3148 0 None None None 2020-07-29 07:51:16 UTC

Description Bob Fournier 2020-03-18 11:49:09 UTC
Description of problem:

We are seeing this in the first composes of OSP-16.1 with RHEL 8.2.  Introspection is failing because the Ironic inspector dnsmasq is not receiving DHCP packets.

Running "tcpdump -i br-ctlplane" shows DHCP Requests (or DHCPv6 Solicits) but these aren't being received - we see no logging from the inspector dnsmasq nor packets in iptables (below).

We do see dhcp traffic received for the neutron dnsmasq server and DHCP responses (below). In RHEL 8.2 only the dhcp server on tap89231cd5-8c see the incoming traffic, both DHCPv4 and DHCPv6 affected. Prior to RHEL 8.2 both dhcp
servers would see the DHCP requests.

I've tried the previous version of dnsmasq prior to the last set of changes - dnsmasq-2.79-6 and there was no difference.  There hasn't been any recent inspector changes.  Doesn't look like any dnsmasq errors in container.


/var/log/messages
Mar 17 15:05:37 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPADVERTISE(tap89231cd5-8c) fe32:dead:beef::23 00:04:44:45:4c:4c:54:00:10:47:80:4c:b3:c0:4f:58:4b:32 host-fe32-dead-beef--23
Mar 17 15:05:42 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPREQUEST(tap89231cd5-8c) 00:04:44:45:4c:4c:54:00:10:47:80:4c:b3:c0:4f:58:4b:32
Mar 17 15:05:42 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPREPLY(tap89231cd5-8c) fe32:dead:beef::23 00:04:44:45:4c:4c:54:00:10:47:80:4c:b3:c0:4f:58:4b:32 host-fe32-dead-beef--23
Mar 17 15:05:43 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPRELEASE(tap89231cd5-8c) 00:04:44:45:4c:4c:54:00:10:47:80:4c:b3:c0:4f:58:4b:32
Mar 17 15:06:05 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPSOLICIT(tap89231cd5-8c) 00:04:60:e3:41:67:0f:57:da:23:9d:4b:86:9e:44:43:6a:30
Mar 17 15:06:05 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPREPLY(tap89231cd5-8c) fe32:dead:beef::23 00:04:60:e3:41:67:0f:57:da:23:9d:4b:86:9e:44:43:6a:30 host-fe32-dead-beef--23
Mar 17 15:08:35 hardprov-dl360-g9-01 dnsmasq-dhcp[68038]: DHCPSOLICIT(tap89231cd5-8c) 00:04:fe:56:ed:68:f1:26:d6:21:0f:16:76:4a:eb:48:d3:7f


There is a new version of OVS - 2.13.0 and its possible that this is causing problems.

Comment 1 Bob Fournier 2020-03-18 11:50:51 UTC
Info from Harald:

[root@hardprov-dl360-g9-01 ~]# ovs-appctl bridge/dump-flows br-ctlplane
duration=56995s, n_packets=9544, n_bytes=1297865, priority=4,in_port=2,dl_vlan=1,actions=pop_vlan,NORMAL
duration=57125s, n_packets=12, n_bytes=1244, priority=2,in_port=2,actions=drop
duration=57125s, n_packets=511008, n_bytes=40663885, priority=0,actions=NORMAL
table_id=254, duration=60370s, n_packets=0, n_bytes=0, priority=2,recirc_id=0,actions=drop
table_id=254, duration=60370s, n_packets=0, n_bytes=0, priority=0,reg0=0x1,actions=controller(reason=)
table_id=254, duration=60370s, n_packets=0, n_bytes=0, priority=0,reg0=0x2,actions=drop
table_id=254, duration=60370s, n_packets=0, n_bytes=0, priority=0,reg0=0x3,actions=drop


[root@hardprov-dl360-g9-01 ~]# ovs-appctl dpctl/dump-flows
recirc_id(0),in_port(2),eth(src=a0:36:9f:f4:7f:c2,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=120),encap(eth_type(0x0800),ipv4(frag=no)), packets:4, bytes:1316, used:7.369s, actions:1
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=120),encap(), packets:28602, bytes:1944936, used:0.481s, actions:drop
recirc_id(0),in_port(5),eth(src=fa:16:3e:ca:51:e8,dst=33:33:00:00:00:01),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:push_vlan(vid=1,pcp=0),3,pop_vlan,1,2,4
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=112),encap(), packets:28602, bytes:1944936, used:1.206s, actions:drop
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=105),encap(), packets:28602, bytes:1944936, used:0.482s, actions:drop
recirc_id(0),in_port(2),eth(src=a0:36:9f:f4:80:94,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:90, used:6.230s, actions:1,push_vlan(vid=1,pcp=0),3,pop_vlan,4,5

 ^^ 33:33:00:00:00:16 is multicast, i.e this would action on DHCPv6

recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=117),encap(), packets:28602, bytes:1944936, used:0.204s, actions:drop
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=109),encap(), packets:28602, bytes:1944936, used:0.479s, actions:drop
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=101),encap(), packets:28602, bytes:1944936, used:1.143s, actions:drop
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=116),encap(), packets:28601, bytes:1944868, used:1.484s, actions:drop
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=104),encap(), packets:28602, bytes:1944936, used:0.483s, actions:drop
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=113),encap(), packets:28602, bytes:1944936, used:0.480s, actions:drop
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0/0xffff), packets:28602, bytes:1830528, used:0.482s, actions:drop
recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=121),encap(), packets:28601, bytes:1944868, used:1.483s, actions:drop
recirc_id(0),in_port(2),eth(src=a0:36:9f:f4:80:94,dst=33:33:00:01:00:02),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:1,push_vlan(vid=1,pcp=0),3,pop_vlan,4,5
recirc_id(0),in_port(1),eth(src=48:df:37:19:90:00,dst=33:33:00:00:00:12),eth_type(0x86dd),ipv6(frag=no), packets:114386, bytes:8922108, used:0.438s, actions:2,push_vlan(vid=1,pcp=0),3,pop_vlan,4,5

 ^^ 33:33:00:00:00:16 is multicast, i.e this would action on DHCPv6

recirc_id(0),in_port(2),eth(src=f8:b7:e2:09:ed:0a,dst=01:00:0c:cc:cc:cd),eth_type(0x8100),vlan(vid=108),encap(), packets:28602, bytes:1944936, used:0.483s, actions:drop





------ dhcpv6 debug script ----
import binascii
import socket
import struct

INTERFACE = 'eno49'
ALL_DHCP_RELAY_AGENTS_AND_SERVERS = 'ff02::1:2'
ALL_DHCP_SERVERS = 'ff05::1:3'
SERVER_UDP_PORT = 547
DHCP_MESSAGE_TYPES = {
    1: 'SOLICIT',
    2: 'ADVERTISE',
    3: 'REQUEST',
    4: 'CONFIRM',
    5: 'RENEW',
    6: 'REBIND',
    7: 'REPLY',
    8: 'RELEASE',
    9: 'DECLINE',
    10: 'RECONFIGURE',
    11: 'INFORMATION-REQUEST',
    12: 'RELAY-FORW',
    13: 'RELAY-REPL',
}
RALAY_MESSAGE_TYPES = [
    DHCP_MESSAGE_TYPES[12],
    DHCP_MESSAGE_TYPES[13],
]



def decode_opts(raw_opts):
    opts = {}
    while len(raw_opts) > 0:
       opt = int(raw_opts[0:4], 16)
       opt_len = int(raw_opts[4:8], 16) * 2
       opt_val = raw_opts[8:8 + opt_len]
       if opt not in opts:
           opts[opt] = []
       opts[opt].append(opt_val)
       # Trim handled option before next iteration
       raw_opts = raw_opts[8 + opt_len:]

    return opts


def decode_request(raw_bytes):
    transaction_id = raw_bytes[2:8]
    print("XID: {}".format(transaction_id))
    opts = decode_opts(raw_bytes[8:])
    print("OPTIONS: {}".format(opts))

    return (transaction_id, opts)


# Initialise socket for IPv6 datagrams
sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)

# Allows address to be reused
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

# Allow messages from this socket to loop back for development
# sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_MULTICAST_LOOP, True)

# Set multicast interface
ifi = socket.if_nametoindex(INTERFACE)
ifis = struct.pack("I", ifi)
sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_MULTICAST_IF, ifis)

# Binds to all interfaces on the given port
sock.bind(('::', SERVER_UDP_PORT))

# Join ALL_DHCP_RELAY_AGENTS_AND_SERVERS multicast group
mreq = socket.inet_pton(
           socket.AF_INET6, ALL_DHCP_RELAY_AGENTS_AND_SERVERS) + ifis
sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_JOIN_GROUP, mreq)
# Join ALL_DHCP_SERVERS multicast group
mreq = socket.inet_pton(socket.AF_INET6, ALL_DHCP_SERVERS) + ifis
sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_JOIN_GROUP, mreq)

data, addr = sock.recvfrom(1024)
raw_bytes = binascii.hexlify(data).decode()
msg_type = int(raw_bytes[:2], 16)
print("FROM: " + str(addr))
print("MESSAGE TYPE: {}".format(DHCP_MESSAGE_TYPES[msg_type]))
transaction_id, options = decode_request(raw_bytes)

Comment 2 Daniel Alvarez Sanchez 2020-03-18 12:19:37 UTC
Adding my comments from the email thread as well:

- It looks like the traffic is maybe being dropped by OVS, especially if it worked with RHEL 8.1. I'm not sure if it's possible to downgrade the kernel version to verify this.

- Can you please try ovs-tcpdump in the br-ctlplane? Do you see the counters go up when the DHCP traffic comes in?

- We're sure that iptables are not dropping the traffic before reaching OVS? As per Bob's comment it looks like it should be allowed.

- IIUC you observe the same with DHCPv4 requests right? 

- I don't understand why you see the traffic with tcpdump but not with your debug script if that's using the physical interface. OVS is out of the picture here unless I'm missing something.

Comment 3 Harald Jensås 2020-03-18 13:13:12 UTC
(In reply to Daniel Alvarez Sanchez from comment #2)
> Adding my comments from the email thread as well:
> 
> - It looks like the traffic is maybe being dropped by OVS, especially if it
> worked with RHEL 8.1. I'm not sure if it's possible to downgrade the kernel
> version to verify this.
> 

Yes, this is probably the next step.

> - Can you please try ovs-tcpdump in the br-ctlplane? Do you see the counters
> go up when the DHCP traffic comes in?
> 

TCPDUMP Args: -l -n -vv udp port 546 or udp port 547
tcpdump: listening on mibr-ctlplane, link-type EN10MB (Ethernet), capture size 262144 bytes
08:50:21.036767 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=fb4962 (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 1748))

08:50:40.363790 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=fb4962 (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 3681))
08:50:50.551561 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 0))
08:50:51.631641 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 108))
08:50:53.737796 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 318))
08:50:57.943818 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 739))
08:51:06.584717 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 1603))
08:51:23.407683 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client > ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=41cabf (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN) (option-request DNS-server DNS-search-list NTP-server SNTP-servers) (client-ID type 4) (elapsed-time 3285))


The counter in this entry 'ovs-appctl dpctl/dump-flows' is going up.
recirc_id(0),in_port(1),eth(src=48:df:37:19:90:00,dst=33:33:00:00:00:12),eth_type(0x86dd),ipv6(frag=no), packets:133964, bytes:10449192, used:0.687s, actions:6,2,push_vlan(vid=1,pcp=0),3,pop_vlan,4,5

> - We're sure that iptables are not dropping the traffic before reaching OVS?
> As per Bob's comment it looks like it should be allowed.
> 

I tried to stop ip6tables, 'systemctl stop ip6tables.service'. This should flush all tables and allow anything. The issue remains the same.

> - IIUC you observe the same with DHCPv4 requests right? 
> 

Yes, this is not working for both DHCPv4 and DHCPv6.

> - I don't understand why you see the traffic with tcpdump but not with your
> debug script if that's using the physical interface. OVS is out of the
> picture here unless I'm missing something.

tcpdump is operating on a raw socket, bypassing a lot of the kernel network stack. My python debug code is using a UDP datagram socket, i.e not bypassing the kernel network stack. To be honest I tried that just to quickly see if listening on eno49 vs br-ctlplane made a difference. And also try without the podman container stuff.

Comment 4 Daniel Alvarez Sanchez 2020-03-18 14:39:05 UTC
(In reply to Harald Jensås from comment #3)
> (In reply to Daniel Alvarez Sanchez from comment #2)
> > Adding my comments from the email thread as well:
> > 
> > - It looks like the traffic is maybe being dropped by OVS, especially if it
> > worked with RHEL 8.1. I'm not sure if it's possible to downgrade the kernel
> > version to verify this.
> > 
> 
> Yes, this is probably the next step.
> 
> > - Can you please try ovs-tcpdump in the br-ctlplane? Do you see the counters
> > go up when the DHCP traffic comes in?
> > 
> 
> TCPDUMP Args: -l -n -vv udp port 546 or udp port 547
> tcpdump: listening on mibr-ctlplane, link-type EN10MB (Ethernet), capture
> size 262144 bytes
> 08:50:21.036767 IP6 (flowlabel 0xe47ca, hlim 1, next-header UDP (17) payload
> length: 82) fe80::a236:9fff:fef4:8094.dhcpv6-client >
> ff02::1:2.dhcpv6-server: [udp sum ok] dhcp6 solicit (xid=fb4962
> (rapid-commit) (IA_NA IAID:1041239257 T1:0 T2:0) (Client-FQDN)
> (option-request DNS-server DNS-search-list NTP-server SNTP-servers)
> (client-ID type 4) (elapsed-time 1748))
> 

Looks like it's arriving to OVS right? Can you check also ovs-tcpdump in br-int? The goal is to determine whether the packet is dropped after the patch-port or before.
Anyways seems like downgrading to 8.1 kernel might tell us more.

At this point, I'd loop in the OVS team.

Comment 5 Lon Hohberger 2020-03-19 13:16:10 UTC
I will set up for OVS 2.11 and we can retest.

Comment 6 Bob Fournier 2020-03-19 13:20:27 UTC
Marking this as a blocker as its affecting OSP-16.1 CI.

Currently using
$ sudo ovs-vsctl -V
ovs-vsctl (Open vSwitch) 2.13.0
DB Schema 8.2.0

Will retry when compose with 2.11 is available.

Comment 7 Bernard Cafarelli 2020-03-19 14:10:10 UTC
We are waiting for test results with OVS 2.11 (to confirm if it is purely OVS issue or more generic), we should get some update here before EOW

Comment 10 Bob Fournier 2020-03-20 13:31:59 UTC
Michele - which kernel and OSP version did you duplicate this in?  Was it just ovs 2.11 with rhel 8.2/osp 16.1?

Comment 12 Harald Jensås 2020-03-20 16:07:52 UTC
FWIW - I tried, but failed, to reproduce this without the full undercloud.
Since this works, we may want to see about the firewall in the undercloud setup.



RHOS_RELEASE_ARGS="16.1 -r 8.2 -p latest-RHOS_TRUNK-16-RHEL-8"
dnf localinstall -y http://download.lab.bos.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm
dnf update -y rhos-release*
rhos-release $RHOS_RELEASE_ARGS
dnf update -y 
dnf install openvswitch vim-enhanced tcpdump tmux dnsmasq
reboot

[root@rhel8 ~]# rhos-release -L
Installed repositories (rhel-8.2):
  16.1
  ceph-4
  ceph-osd-4
  rhel-8.2
[root@rhel8 ~]# uname -a
Linux rhel8.lab.example.com 4.18.0-187.el8.x86_64 #1 SMP Sat Mar 7 03:42:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@rhel8 ~]# ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.11.0
DB Schema 7.16.1

  

systemct stop firewalld

systemctl enable openvswitch
systemctl start openvswitch

ovs-vsctl add-br br-ctlplane
ovs-vsctl add-port br-ctlplane enp2s0
ip addr add 172.20.0.1/24 dev br-ctlplane
ip link set br-ctlplane up

ovs-vsctl add-br br-int
ovs-vsctl -- add-port br-ctlplane patch0 \
          -- set interface patch0 type=patch options:peer=patch1 \
          -- add-port br-int patch1 \
          -- set interface patch1 type=patch options:peer=patch0

ovs-vsctl add-port br-int qdhcp -- set interface qdhcp type=internal

ip netns add qdhcp
ip link set qdhcp netns qdhcp
ip netns exec qdhcp ip addr add 172.20.0.2/24 dev qdhcp
ip netns exec qdhcp ip link set qdhcp up


[root@rhel8 ~]# ovs-vsctl show
7b8be7ec-d12e-43eb-a9e7-5241c319ff25
    Bridge br-int
        Port "patch1"
            Interface "patch1"
                type: patch
                options: {peer="patch0"}
        Port qdhcp
            Interface qdhcp
                type: internal
        Port br-int
            Interface br-int
                type: internal
    Bridge br-ctlplane
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port "enp2s0"
            Interface "enp2s0"
        Port "patch0"
            Interface "patch0"
                type: patch
                options: {peer="patch1"}
    ovs_version: "2.11.0"


cat << EOF > dnsmasq.conf
port=0
interface=br-ctlplane
log-dhcp

dhcp-range=172.20.0.100,172.20.0.110,255.255.255.0,10m

EOF

cat << EOF > qdhcp_dnsmasq.conf
port=0
interface=qdhcp
log-dhcp

dhcp-range=172.20.0.0,static,255.255.255.0,10m

EOF


[root@rhel8 ~]# dnsmasq --no-daemon --conf-file=dnsmasq.conf
| dnsmasq-dhcp: 2446226970 available DHCP range: 172.20.0.100 -- 172.20.0.110
| dnsmasq-dhcp: 2446226970 vendor class: PXEClient:Arch:00000:UNDI:002001
| dnsmasq-dhcp: 2446226970 user class: iPXE
| dnsmasq-dhcp: 2446226970 DHCPREQUEST(br-ctlplane) 172.20.0.105 52:54:00:bc:c3:fd
| dnsmasq-dhcp: 2446226970 tags: br-ctlplane
| dnsmasq-dhcp: 2446226970 DHCPACK(br-ctlplane) 172.20.0.105 52:54:00:bc:c3:fd
| dnsmasq-dhcp: 2446226970 requested options: 1:netmask, 3:router, 6:dns-server, 7:log-server,
| dnsmasq-dhcp: 2446226970 requested options: 12:hostname, 15:domain-name, 17:root-path,
| dnsmasq-dhcp: 2446226970 requested options: 26:mtu, 43:vendor-encap, 60:vendor-class,
| dnsmasq-dhcp: 2446226970 requested options: 66:tftp-server, 67:bootfile-name, 119:domain-search,
| dnsmasq-dhcp: 2446226970 requested options: 128, 129, 130, 131, 132, 133, 134, 135, 175,
| dnsmasq-dhcp: 2446226970 requested options: 203
| dnsmasq-dhcp: 2446226970 next server: 172.20.0.1
| dnsmasq-dhcp: 2446226970 sent size:  1 option: 53 message-type  5
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 54 server-identifier  172.20.0.1
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 51 lease-time  10m
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 58 T1  5m
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 59 T2  8m45s
| dnsmasq-dhcp: 2446226970 sent size:  4 option:  1 netmask  255.255.255.0
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 28 broadcast  172.20.0.255
| dnsmasq-dhcp: 2446226970 sent size:  4 option:  3 router  172.20.0.1



[root@rhel8 ~]# ip netns exec qdhcp dnsmasq --no-daemon --conf-file=qdhcp_dnsmasq.conf
| dnsmasq-dhcp: 2446226970 available DHCP subnet: 172.20.0.0/255.255.255.0
| dnsmasq-dhcp: 2446226970 vendor class: PXEClient:Arch:00000:UNDI:002001
| dnsmasq-dhcp: 2446226970 user class: iPXE
| dnsmasq-dhcp: 2446226970 DHCPDISCOVER(qdhcp) 52:54:00:bc:c3:fd no address available







cat << EOF > dnsmasq.conf
port=0
interface=br-ctlplane
log-dhcp

dhcp-range=172.20.0.100,172.20.0.110,255.255.255.0,10m

EOF

cat << EOF > qdhcp_dnsmasq.conf
port=0
interface=qdhcp
log-dhcp

dhcp-range=172.20.0.0,static,255.255.255.0,10m

EOF


[root@rhel8 ~]# dnsmasq --no-daemon --conf-file=dnsmasq.conf
| dnsmasq-dhcp: 2446226970 available DHCP range: 172.20.0.100 -- 172.20.0.110
| dnsmasq-dhcp: 2446226970 vendor class: PXEClient:Arch:00000:UNDI:002001
| dnsmasq-dhcp: 2446226970 user class: iPXE
| dnsmasq-dhcp: 2446226970 DHCPREQUEST(br-ctlplane) 172.20.0.105 52:54:00:bc:c3:fd
| dnsmasq-dhcp: 2446226970 tags: br-ctlplane
| dnsmasq-dhcp: 2446226970 DHCPACK(br-ctlplane) 172.20.0.105 52:54:00:bc:c3:fd
| dnsmasq-dhcp: 2446226970 requested options: 1:netmask, 3:router, 6:dns-server, 7:log-server,
| dnsmasq-dhcp: 2446226970 requested options: 12:hostname, 15:domain-name, 17:root-path,
| dnsmasq-dhcp: 2446226970 requested options: 26:mtu, 43:vendor-encap, 60:vendor-class,
| dnsmasq-dhcp: 2446226970 requested options: 66:tftp-server, 67:bootfile-name, 119:domain-search,
| dnsmasq-dhcp: 2446226970 requested options: 128, 129, 130, 131, 132, 133, 134, 135, 175,
| dnsmasq-dhcp: 2446226970 requested options: 203
| dnsmasq-dhcp: 2446226970 next server: 172.20.0.1
| dnsmasq-dhcp: 2446226970 sent size:  1 option: 53 message-type  5
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 54 server-identifier  172.20.0.1
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 51 lease-time  10m
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 58 T1  5m
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 59 T2  8m45s
| dnsmasq-dhcp: 2446226970 sent size:  4 option:  1 netmask  255.255.255.0
| dnsmasq-dhcp: 2446226970 sent size:  4 option: 28 broadcast  172.20.0.255
| dnsmasq-dhcp: 2446226970 sent size:  4 option:  3 router  172.20.0.1



[root@rhel8 ~]# ip netns exec qdhcp dnsmasq --no-daemon --conf-file=qdhcp_dnsmasq.conf
| dnsmasq-dhcp: 2446226970 available DHCP subnet: 172.20.0.0/255.255.255.0
| dnsmasq-dhcp: 2446226970 vendor class: PXEClient:Arch:00000:UNDI:002001
| dnsmasq-dhcp: 2446226970 user class: iPXE
| dnsmasq-dhcp: 2446226970 DHCPDISCOVER(qdhcp) 52:54:00:bc:c3:fd no address available

Comment 14 Harald Jensås 2020-03-20 17:02:17 UTC
I copied /etc/sysconfig/iptables from another 16 deployment onto my reproducer.


The rule set:
----------------------
[root@site-undercloud-0 ~]# cat /etc/sysconfig/iptables
# Generated by xtables-save v1.8.2 on Fri Mar 13 07:18:50 2020
*raw
:PREROUTING ACCEPT [11296935:41611227599]
:OUTPUT ACCEPT [11556386:72256765344]
COMMIT
# Completed on Fri Mar 13 07:18:50 2020
# Generated by xtables-save v1.8.2 on Fri Mar 13 07:18:50 2020
*filter
:INPUT ACCEPT [371:19220]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [11556366:72256758464]
-A INPUT -m state --state RELATED,ESTABLISHED -m comment --comment "000 accept related established rules ipv4" -j ACCEPT
-A INPUT -p icmp -m state --state NEW -m comment --comment "001 accept all icmp ipv4" -j ACCEPT
-A INPUT -i lo -m state --state NEW -m comment --comment "002 accept all to lo interface ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 22 -m state --state NEW -m comment --comment "003 accept ssh from all ipv4" -j ACCEPT
-A INPUT -s 192.168.24.0/24 -p tcp -m multiport --dports 22 -m state --state NEW -m comment --comment "003 accept ssh from ctlplane subnet 192.168.24.1/24 ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 9292 -m state --state NEW -m comment --comment "100 glance_api_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13292 -m state --state NEW -m comment --comment "100 glance_api_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8004 -m state --state NEW -m comment --comment "100 heat_api_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13004 -m state --state NEW -m comment --comment "100 heat_api_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 5050 -m state --state NEW -m comment --comment "100 ironic-inspector_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13050 -m state --state NEW -m comment --comment "100 ironic-inspector_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 6385 -m state --state NEW -m comment --comment "100 ironic_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13385 -m state --state NEW -m comment --comment "100 ironic_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 5000 -m state --state NEW -m comment --comment "100 keystone_admin_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 35357 -m state --state NEW -m comment --comment "100 keystone_admin_haproxy_frontend ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 5000 -m state --state NEW -m comment --comment "100 keystone_public_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13000 -m state --state NEW -m comment --comment "100 keystone_public_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8989 -m state --state NEW -m comment --comment "100 mistral_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13989 -m state --state NEW -m comment --comment "100 mistral_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 3306 -m state --state NEW -m comment --comment "100 mysql_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 9696 -m state --state NEW -m comment --comment "100 neutron_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13696 -m state --state NEW -m comment --comment "100 neutron_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8774 -m state --state NEW -m comment --comment "100 nova_osapi_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13774 -m state --state NEW -m comment --comment "100 nova_osapi_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8778 -m state --state NEW -m comment --comment "100 placement_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13778 -m state --state NEW -m comment --comment "100 placement_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8080 -m state --state NEW -m comment --comment "100 swift_proxy_server_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13808 -m state --state NEW -m comment --comment "100 swift_proxy_server_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8888 -m state --state NEW -m comment --comment "100 zaqar_api_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 13888 -m state --state NEW -m comment --comment "100 zaqar_api_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 9000 -m state --state NEW -m comment --comment "100 zaqar_ws_haproxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 9000 -m state --state NEW -m comment --comment "100 zaqar_ws_haproxy_ssl ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 873,3306,4444,4567,4568,9200 -m state --state NEW -m comment --comment "104 mysql galera ipv4" -j ACCEPT
-A INPUT -p vrrp -m state --state NEW -m comment --comment "106 keepalived vrrp ipv4" -j ACCEPT
-A INPUT -p vrrp -m state --state NEW -m comment --comment "106 neutron_l3 vrrp ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 1993 -m state --state NEW -m comment --comment "107 haproxy stats ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 4369,5672,25672 -m state --state NEW -m comment --comment "109 rabbitmq ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 5000,13000,35357 -m state --state NEW -m comment --comment "111 keystone ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 9292,13292 -m state --state NEW -m comment --comment "112 glance_api ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8774,13774 -m state --state NEW -m comment --comment "113 nova_api ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 9000,8888,3000,13888 -m state --state NEW -m comment --comment "113 zaqar_api ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 9696,13696 -m state --state NEW -m comment --comment "114 neutron api ipv4" -j ACCEPT
-A INPUT -p udp -m multiport --dports 67 -m state --state NEW -m comment --comment "115 neutron dhcp input ipv4" -j ACCEPT
-A INPUT -p udp -m multiport --dports 4789 -m state --state NEW -m comment --comment "118 neutron vxlan networks ipv4" -j ACCEPT
-A INPUT -s 192.168.24.0/24 -p tcp -m multiport --dports 11211 -m state --state NEW -m comment --comment "121 memcached 192.168.24.1/24 ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8080,13808 -m state --state NEW -m comment --comment "122 swift proxy ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 873,6000,6001,6002 -m state --state NEW -m comment --comment "123 swift storage ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8004,13004 -m state --state NEW -m comment --comment "125 heat_api ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 6385,13385 -m state --state NEW -m comment --comment "133 ironic api ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8989,13989 -m state --state NEW -m comment --comment "133 mistral ipv4" -j ACCEPT
-A INPUT -p udp -m multiport --dports 69 -m state --state NEW -m comment --comment "134 ironic conductor TFTP ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8088 -m state --state NEW -m comment --comment "135 ironic conductor HTTP ipv4" -j ACCEPT
-A INPUT -p gre -m comment --comment "136 neutron gre networks ipv4" -j ACCEPT
-A INPUT -i br-ctlplane -p udp -m multiport --dports 67 -m state --state NEW -m comment --comment "137 ironic-inspector dhcp input ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 5050 -m state --state NEW -m comment --comment "137 ironic-inspector ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8778,13778 -m state --state NEW -m comment --comment "138 placement ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 8787,13787 -m state --state NEW -m comment --comment "155 docker-registry ipv4" -j ACCEPT
-A INPUT -m state --state NEW -m limit --limit 20/min --limit-burst 15 -m comment --comment "998 log all ipv4" -j LOG
-A INPUT -m state --state NEW -m comment --comment "999 drop all ipv4" -j DROP
-A OUTPUT -p udp -m multiport --dports 68 -m state --state NEW -m comment --comment "116 neutron dhcp output ipv4" -j ACCEPT
-A OUTPUT -p udp -m multiport --dports 68 -m state --state NEW -m comment --comment "137 ironic-inspector dhcp output ipv4" -j ACCEPT
COMMIT
# Completed on Fri Mar 13 07:18:50 2020
------------------------------------------------------------

Ran command:

 iptables-restore /etc/sysconfig/iptables



And now I only see the requests in the "qdhcp" namespace, the dnsmasq instance listening on br-ctlplane does not see the DHCP requests.

Comment 15 Harald Jensås 2020-03-20 17:27:39 UTC
iptables -F 
iptables -F -t nat
iptables -F -t raw

Nothing helps, packets are not recived in dnsmasq listening on br-ctlplane.


Stopping firewald with 'systemctl stop firewalld' does help.

Comment 16 Bob Fournier 2020-03-23 15:28:29 UTC
It doesn't seem that stopping firewalld has helped on a reproducer systems so not sure if this is related.

Consistently we are seeing DHCP packets only received by the neutron dnsmasq but not by inspector dnsmasq.  This occurs with RHEL 8.2 regardless of the OVS version, i.e. it happens with 2.11 and 2.13.  Testing with 8.1 is still in progress (Michele ran into RHEL iptables issue https://bugzilla.redhat.com/show_bug.cgi?id=1807811).

Comment 18 Harald Jensås 2020-03-24 12:25:35 UTC
Adding info from Michele on mail thread.


I tweaked the ironic_inspector_dnsmasq container to start under strace
So we see the following in the strace of dnsmasq:
18    10:34:12.123578 write(9</var/log/ironic-inspector/dnsmasq.log>, "Mar 24 10:34:12 dnsmasq[18]: inotify, new or changed file /var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff\n", 116) = 116
18    10:34:12.123658 getpid()          = 18
18    10:34:12.123719 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
18    10:34:12.123796 getpid()          = 18
18    10:34:12.123877 write(9</var/log/ironic-inspector/dnsmasq.log>, "Mar 24 10:34:12 dnsmasq-dhcp[18]: read /var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff\n", 97) = 97
18    10:34:12.123964 openat(AT_FDCWD, "/var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff", O_RDONLY) = 10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>
18    10:34:12.124052 fstat(10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>, {st_mode=S_IFREG|0644, st_size=18, ...}) = 0
18    10:34:12.124126 read(10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>, "52:54:00:fb:b6:ff\n", 4096) = 18
18    10:34:12.124203 read(10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>, "", 4096) = 0
18    10:34:12.124274 close(10</var/lib/ironic-inspector/dhcp-hostsdir/52:54:00:fb:b6:ff>) = 0
18    10:34:12.124341 read(6<anon_inode:inotify>, 0x55a0d94f7ae0, 272) = -1 EAGAIN (Resource temporarily unavailable)

The following line is where the dnsmasq process hangs (i.e. while polling for network data, the inotify fd on /var/lib/ironic-inspector/dhcp-hostsdir, etc):
18    10:34:12.124415 poll([{fd=4<socket:[1376796]>, events=POLLIN}, {fd=5<socket:[1376797]>, events=POLLIN}, {fd=6<anon_inode:inotify>, events=POLLIN}, {fd=7<pipe:[1376803]>, events=POLLIN}], 4, -1n

I double checked and if I do create a new file under
/var/lib/ironic-inspector/dhcp-hostsdir, the poll() call correctly signals it.
So poll() has not gone nuts just yet ;)

The problem is that the polling of the socket 1376796 which is the udp one
listening on port 67 is simply not getting any data at all (as already
mentioned only the ironic inspector seems to not get data, the neutron one
listening to a tap interface gets the data just fine):
[root@undercloud-0 ironic-inspector]# grep -e 1376796 -e 1376797 /proc/net/udp
15596: 00000000:0043 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 1376796 2 ffff89f90a0a6400 0

(port 0043 -> 67 dhcp)

I also tried changing interface= to eth0 in
/var/lib/config-data/puppet-generated/ironic_inspector/etc/ironic-inspector/dnsmasq.conf
to no avail

So to me the TLDR; currently is that dnsmasq is simply not receiving any data
from the kernel for whatever reason (I also tried adding iptables -I INPUT 1 -j
ACCEPT to be on the safe side but to no avail)

Comment 29 Bob Fournier 2020-04-02 12:34:59 UTC
Updated nftables appears to work to get past this issue and is available.  Also https://review.opendev.org/715130 has merged.  Moving this to MODIFIED.

Comment 31 Bob Fournier 2020-05-04 11:37:28 UTC
Issue has been resolved, we're no longer seeing introspection failures.

Comment 35 Alex McLeod 2020-06-16 12:28:34 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 37 errata-xmlrpc 2020-07-29 07:51:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148


Note You need to log in before you can comment on or make changes to this bug.