Bug 1623868

Summary: default libvirt/container networking (NAT based AFIACT) broken in F-29
Product: [Fedora] Fedora Reporter: Peter Robinson <pbrobinson>
Component: firewalldAssignee: Eric Garver <egarver>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 29CC: agedosier, awilliam, berrange, clalancette, egarver, gmarr, herrold, itamar, jdulaney, jforbes, jpopelka, jsmith.fedora, laine, libvirt-maint, lucas.yamanishi, mail, ngompa13, pbrobinson, rbarlow, robatino, ttomecek, twoerner, veillard, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugzilla.suse.com/show_bug.cgi?id=1102761
Whiteboard: AcceptedBlocker
Fixed In Version: firewalld-0.6.1-2.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1633744 (view as bug list) Environment:
Last Closed: 2018-09-05 01:09:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1517011    

Description Peter Robinson 2018-08-30 11:00:05 UTC
with an upgrade from Fedora 28 using default libvirt network config (NAT, ipv4 192.168.122.x) any running guests cease to get an IP address.

libvirt-4.6.0-1.fc29
dnsmasq-2.79-7.fc29

Tested across x86_64 and aarch64

Comment 1 Peter Robinson 2018-08-30 11:03:30 UTC
Actually thing might be in firewalld changes to nftables

https://developers.redhat.com/blog/2018/08/10/firewalld-the-future-is-nftables/

firewalld-0.6.1-1.fc29.noarch
firewalld-filesystem-0.6.1-1.fc29.noarch

Interesting such a major change isn't a Chnage

Comment 2 Peter Robinson 2018-08-30 11:09:31 UTC
Confirmed this is a bug in the nftables changes. If you change the firewalld to iptables it starts working as expected.

https://firewalld.org/2018/07/nftables-backend

Comment 3 Fedora Blocker Bugs Application 2018-08-30 11:10:58 UTC
Proposed as a Blocker for 29-beta by Fedora user pbrobinson using the blocker tracking app because:

 Virtualisation (and probably a raft of other things, I suspect also containers) default networking is broken in Fedora 29

Comment 4 Peter Robinson 2018-08-30 12:08:19 UTC
So confirmation it has also broken containers in the SUSE bug above. From that bug this should actually be fixed with the 4.18 kernel, but given all of the above was tested on 4.18.5 I don't believe that is actually the case. For reference:

"The firewalld TW package now uses iptables as default backend but this is only a temporary workaround. The real problem is that 'nat' table can't co-exist in nftables and iptables but that's fixed in 4.18

https://marc.info/?l=netfilter-devel&m=152633437413998&w=2

Leaving this bug open so we can revert the fix when 4.18 hits tumbleweed

Adding kernel@ in case they are interested in backporting these fixes for 4.17 stable kernels."

Comment 5 Peter Robinson 2018-08-30 12:09:49 UTC
For reference I've also referred this to FESCo as seemingly the firewalld team feel they can ignore Fedora procedures for Changes: https://pagure.io/fesco/issue/1978

Comment 6 Zbigniew Jędrzejewski-Szmek 2018-08-30 12:50:02 UTC
I seem to get the same result. I upgraded my main machine to F29 yesterday, and the libvirt VMs get no network, DHCP fails. Doing "systemctl stop firewalld" on the host makes the issue go away.

Comment 7 Peter Robinson 2018-08-30 12:54:46 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #6)
> I seem to get the same result. I upgraded my main machine to F29 yesterday,
> and the libvirt VMs get no network, DHCP fails. Doing "systemctl stop
> firewalld" on the host makes the issue go away.

change the firewalld backend from nftables to iptables in /etc/firewalld/firewalld.conf should also resolve the issue

Comment 8 Eric Garver 2018-08-30 13:19:09 UTC
Logs from firewalld would be helpful. In the meantime I'll try to reproduce on my end.

Comment 9 Peter Robinson 2018-08-30 13:23:26 UTC
(In reply to Eric Garver from comment #8)
> Logs from firewalld would be helpful. In the meantime I'll try to reproduce
> on my end.

Tell me what you need, but it's reproducible across 4 of my devices without exception so I expect you'll be able to reproduce yourself easy enough.

Comment 10 Daniel Berrangé 2018-08-30 13:59:21 UTC
To make DHCP/DNS work for guests, regardless of host firewall settings, libvirt has to insert some rules at the head of the INPUT chain:

  364 25130 ACCEPT     udp  --  virbr0 any     anywhere             anywhere             udp dpt:domain
    0     0 ACCEPT     tcp  --  virbr0 any     anywhere             anywhere             tcp dpt:domain
  169 57544 ACCEPT     udp  --  virbr0 any     anywhere             anywhere             udp dpt:bootps
    0     0 ACCEPT     tcp  --  virbr0 any     anywhere             anywhere             tcp dpt:bootps

Normally these end up immediately before firewalld's rules:

# iptables -L INPUT -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  364 25130 ACCEPT     udp  --  virbr0 any     anywhere             anywhere             udp dpt:domain
    0     0 ACCEPT     tcp  --  virbr0 any     anywhere             anywhere             tcp dpt:domain
  169 57544 ACCEPT     udp  --  virbr0 any     anywhere             anywhere             udp dpt:bootps
    0     0 ACCEPT     tcp  --  virbr0 any     anywhere             anywhere             tcp dpt:bootps
5515K 9181M ACCEPT     all  --  any    any     anywhere             anywhere             ctstate RELATED,ESTABLISHED
 1919  140K ACCEPT     all  --  lo     any     anywhere             anywhere            
33574 2647K INPUT_direct  all  --  any    any     anywhere             anywhere            
33574 2647K INPUT_ZONES_SOURCE  all  --  any    any     anywhere             anywhere            
33574 2647K INPUT_ZONES  all  --  any    any     anywhere             anywhere            
   54  6429 DROP       all  --  any    any     anywhere             anywhere             ctstate INVALID
  405 67007 REJECT     all  --  any    any     anywhere             anywhere             reject-with icmp-host-prohibited


My guess would be that with firewalld switching to use nft for its own rules, the rules firewalld adds via nft get evaluated *before* the traditional rules added via iptables. Thus meaning libvirt's rules to allow DHCP/DNS never take effect.


There are other rules libvirt adds to allow traffic to/from  virbr0 that might get impacted in a similar way.

What libvirt does is described in this old mail https://www.redhat.com/archives/libvir-list/2010-June/msg00762.html

Comment 11 Eric Garver 2018-08-30 14:46:57 UTC
I have reproduced with docker.

The cause is firewalld's default rules have a "drop all" at the end of its FORWARD filtering rules. Both firewall backends (iptables and nftables) have this rule. Docker inserts a rule in the FORWARD chain to accepts all forwarded packets - this occurs _before_ firewalld's rules. The behavior difference is caused by how netfilter executes filter hooks.

With iptables backend:

  1) docker's FORWARD accept all rule matches
  2) Packet is accepted and execution for this hook (iptables) is stopped.
    - Since firewalld's rules are in iptables its rules are not considered.
  3) No other netfilter hooks exist. Packet accepted.

With nftables backend:

  1) docker's FORWARD accept all rule matches
  2) Packet is accepted and execution for this hook (iptables) is stopped.
  3) Other netfilter hooks are considered. This includes any nftables FORWARD hooks (i.e. firewalld).
  4) firewalld's "drop all" FORWARD rule is hit. Packet is dropped.

I have verified this by temporarily removing firewalld's "drop all" FORWARD rule. After which docker's networking works as expected.

Comment 12 Daniel Berrangé 2018-08-30 15:10:05 UTC
By "docker" here, I'm assuming you meant to say "libvirt" :-)

(In reply to Eric Garver from comment #11)
> With nftables backend:
> 
>   1) docker's FORWARD accept all rule matches
>   2) Packet is accepted and execution for this hook (iptables) is stopped.
>   3) Other netfilter hooks are considered. This includes any nftables
> FORWARD hooks (i.e. firewalld).
>   4) firewalld's "drop all" FORWARD rule is hit. Packet is dropped.

Oh, so it wasn't a question of ordering of rules from libvirt vs firewalld, but rather that multiple independent rules *all* have to ACCEPT the packet for it to be allowed.

This is quite a major functional change/regression in behaviour/semantics for iptables rules - ACCEPT no longer guarantees acceptance as it did in the past.

Comment 13 Eric Garver 2018-08-30 15:27:05 UTC
(In reply to Daniel Berrange from comment #12)
> By "docker" here, I'm assuming you meant to say "libvirt" :-)

I used docker, but in this scenario behavior is similar.

> (In reply to Eric Garver from comment #11)
> > With nftables backend:
> > 
> >   1) docker's FORWARD accept all rule matches
> >   2) Packet is accepted and execution for this hook (iptables) is stopped.
> >   3) Other netfilter hooks are considered. This includes any nftables
> > FORWARD hooks (i.e. firewalld).
> >   4) firewalld's "drop all" FORWARD rule is hit. Packet is dropped.
> 
> Oh, so it wasn't a question of ordering of rules from libvirt vs firewalld,
> but rather that multiple independent rules *all* have to ACCEPT the packet
> for it to be allowed.
>
> This is quite a major functional change/regression in behaviour/semantics
> for iptables rules - ACCEPT no longer guarantees acceptance as it did in the
> past.

Right. Maybe it helps to think of iptables and nftables as two independent firewalls. The packet has to get through both.

Comment 14 Daniel Berrangé 2018-08-30 15:31:29 UTC
So how do we get the old behaviour back such that "-j ACCEPT" really does mean accept no matter what firewalld has done.

Comment 15 Eric Garver 2018-08-30 15:55:53 UTC
A quick workaround is to add the docker/libvirt interfaces to the trusted zone.

  # firewalld-cmd --add-interface=docker0 --zone=trusted

Or you can use the NAT'd address range

  # firewall-cmd --add-source=172.17.0.1/16 --zone=trusted

Comment 16 Daniel Berrangé 2018-08-30 16:01:08 UTC
The only traffic that should be allowed is DHCP/DNS, per the rules we illustrate above - we certainly don't want guests being able to access more than that, so the "trusted" zone seems inappropriate.

IMHO, this needs to be fixed in firewalld so that the existing rules created work as before.

Comment 17 Adam Williamson 2018-08-30 16:06:39 UTC
I'm +1 blocker on this per Beta criterion "The release must be able host virtual guest instances of the same release." (with its footnote, "This rather concise criterion means effectively means that both virtual host and virtual guest functionality must work - it's implied, if you think about it. It also means that there must be no showstopper bugs in the installer when installing to a virtual machine..."). Default VM networking being broken is serious enough to count as a violation of that, for me.

(For the record, I also was rather surprised to see the switch from iptables to nftables just suddenly appear with no process or even casual notification to test@ or devel@ AFAICS).

Comment 18 Peter Robinson 2018-08-30 16:14:35 UTC
> (For the record, I also was rather surprised to see the switch from iptables
> to nftables just suddenly appear with no process or even casual notification
> to test@ or devel@ AFAICS).

It's for that explicit reason why I opened the FESCo ticket

Comment 19 Adam Williamson 2018-08-30 16:46:04 UTC
Yes, I'm just backing you up on that.

Comment 20 Jared Smith 2018-08-30 17:01:17 UTC
I'm also +1 to declaring this a blocker.  (I'll spare the details on how much hair I've lost this week on my Rawhide laptop trying to figure out why I was having networking problems on both libvirt and docker...)

Comment 21 Eric Garver 2018-08-31 14:53:11 UTC
I plan to push a change today to default to the iptables backend. This will go to rawhide and f29.

Comment 22 Fedora Update System 2018-08-31 15:48:22 UTC
firewalld-0.6.1-2.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-379c39d97c

Comment 23 Fedora Update System 2018-09-02 02:56:54 UTC
firewalld-0.6.1-2.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-379c39d97c

Comment 24 Geoffrey Marr 2018-09-04 19:59:28 UTC
Discussed during the 2018-09-04 blocker review meeting: [1]

The decision to classify this bug as an "AcceptedBlocker" was made as it violates the following criteria:

"The release must be able host virtual guest instances of the same release"

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-09-04/f29-blocker-review.2018-09-04-16.01.txt

Comment 25 Fedora Update System 2018-09-05 01:09:17 UTC
firewalld-0.6.1-2.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 26 Eric Garver 2018-09-20 17:55:56 UTC
*** Bug 1619835 has been marked as a duplicate of this bug. ***