642171 – nwfilter fails to filter incoming with kernels newer than 2.6.20

Bug 642171 - nwfilter fails to filter incoming with kernels newer than 2.6.20

Summary: nwfilter fails to filter incoming with kernels newer than 2.6.20

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Virtualization Tools
Classification:	Community
Component:	libvirt
Sub Component:
Version:	unspecified
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Libvirt Maintainers
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-10-12 09:19 UTC by Soren Hansen
Modified:	2016-03-22 22:44 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-03-22 22:44:45 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Soren Hansen 2010-10-12 09:19:39 UTC

Description of problem:


Version-Release number of selected component (if applicable):
Observed with 0.8.3.


How reproducible:
Always.


Steps to Reproduce:
1. Add a filter with the following rules:
  <rule action='accept' direction='out' priority='399'>
    <tcp/>
  </rule>
  <rule action='drop' direction='in' priority='400'>
    <tcp/>
  </rule>
2. Attempt to connect to the VM from another host(!).

  
Actual results:
You are allowed through.

Expected results:
Connection should be blocked.

Additional info:
The FO-vnet0 (or vnet1 or whatever) gets hit, but the single rule inside of it does not. It uses "-m physdev --physdev-out vnet0" for matching, but since 2.6.20, --physdev-out only works for traffic local to a bridge (so coming from a port on the same bridge as the VM).

From feature-removal.txt from days of yonder:
What: Bridge netfilter deferred IPv4/IPv6 output hook calling
When: January 2007
Why: The deferred output hooks are a layering violation causing unusual
and broken behaviour on bridge devices. Examples of things they
break include QoS classifation using the MARK or CLASSIFY targets,
the IPsec policy match and connection tracking with VLANs on a
bridge. Their only use is to enable bridge output port filtering
within iptables with the physdev match, which can also be done by
combining iptables and ebtables using netfilter marks. Until it
will get removed the hook deferral is disabled by default and is
only enabled when needed.

Comment 1 Soren Hansen 2010-10-12 10:25:07 UTC

I'm having trouble wrapping my head around a fix for this.

Once ebtables has a chance to mark a packet as going out on a particular port, it's too late for iptables to filter it, at least as far as I can see.

Alternatively, iptables needs to apply filters for all bridge ports and mark the packet with a bitmask representing the bridge ports on which it's allowed and ebtables can filter on that, but that sounds dreadfully slow to me.

What other options do we have?

Comment 2 Soren Hansen 2010-10-13 06:36:22 UTC

I had a bit of a chat with Jan Engelhardt (netfilter dev) yesterday. As it stands, our best bet really is to apply the filters in iptables, mark the packets, and leave it to ebtables to actually drop the packets. There's work being done to fix this, but we still need to support the current state of affairs.

Comment 3 Stefan Berger 2010-10-13 11:30:04 UTC

A VM with this filter translates into the following rules:

Chain FI-vnet0 (1 references)
pkts bytes target prot opt in out source destination
42 3897 RETURN tcp -- * * 0.0.0.0/0 0.0.0.0/0 state NEW,ESTABLISHED
0 0 DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0

Chain FO-vnet0 (1 references)
pkts bytes target prot opt in out source destination
31 12032 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 state ESTABLISHED
3 144 DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0

Chain HI-vnet0 (1 references)
pkts bytes target prot opt in out source destination
0 0 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0
0 0 DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0

If the VM had no previous TCP/IP connection then these rules work correctly.
However, they do not cut off an existing incoming TCP/IP connection, i.e., ssh connection towards VM is cut off when this filter is applied from a previous filter that allowed the ssh connection to come in. The rules reject newly initiated traffic, though. At least that's the behavior of a VM I tested this with.

Explanation: The first rule in FI-vnet0 controls outgoing TCP traffic and only accepts NEW traffic establishments. Subsequent traffic will reach the VM using the 1st rule in FO-vnet0. However, this first rule in FO-vnet0 also allows already-established connections into the VM (due to the state 'ESTABLISHED', i.e., the ssh connection that was allowed prior to applying this filter. Subsequently, traffic from the VM will be allowed to leave the VM through the first rule in FI-vnet0 using the state 'ESTABLISHED'. Both times checking against the state 'ESTABLISHED' is necessary to enable outgoing connections.
Since there is no rule checking for the state 'NEW' in the TCP rule in FO-vnet0, no new incoming connection will be allowed while this filter is active.

The problem is not related to -m physdev or other reasons/solutions mentioned above, but rather how the connection tracking system works (or my omission of checking against direction).

A solution (for the filter-transitioning to work) would be to have an iptables match that allows to detect the direction of the connection. So, the first rule in FO-vnet0 should also check against the 'outgoing' direction with a hypothetical -m direction 'out', thus not allowing the existing incoming ssh connection anymore, but hit the subsequent drop rule. That said, I will try to find out whether the -m conntrack with

--ctdir {ORIGINAL|REPLY}
Match packets that are flowing in the specified direction. If
this flag is not specified at all, matches packets in both
directions.

will do the trick and report back to this channel.

Comment 4 Soren Hansen 2010-10-13 11:46:15 UTC

The problem you mention is accurate, but a completely separate one to the one I'm reporting.

No matter how many new connections I create, iptables -L libvirt-out -vn gives me this:
Chain libvirt-out (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 FO-vnet0   all  --  *      *       0.0.0.0/0            0.0.0.0/0           [goto] PHYSDEV match --physdev-out vnet0 

The rule is simply never hit. I explained it in the original bug report, but once more:

Before 2.6.20, --physdev-out worked by deferring some filtering until such time that the bridging code had determined which bridge port the given packet would be sent to. Since 2.6.20, this no longer works.

Try this:
On your test box, create a new bridge, e.g. br100. Do NOT add any physical interfaces to it.
Give it an IP address, e.g. 10.0.0.1.  Create a new VM, attached to this new bridge. Give it an IP in the same subnet as the bridge and tell it to use the bridge as its default gateway.

On a separate machine (must be a physically separate host!), add a route for the new network with the gateway set to your test box' LAN IP. Verify that connectivity works. Now, try adding a filter and observer that it simply does not work at all. --physdev-out ONLY works for locally generated traffic (i.e. traffic from the host to the guest) or traffic coming from another port on the same bridge (which is why you shouldn't add any physical interface to the bridge).

Comment 5 Stefan Berger 2010-10-13 12:57:29 UTC

Thanks for explaining the network setup you are working with.

Well, I have never tested this setup that you mention where the physical device is not plugged into the bridge. If the physical device is plugged into the bridge, the filtering works as expected (besides what I am mentioning above). I do not know a work-around for this and I would declare it as a current limitation of the nwfilter. Besides that the user has the possibility to reconfigure the network setting at all times, thus plugging or unplugging the physical device from the bridge, giving the bridge an IP address, reconfiguring the VM's IP address etc. to get it into the configuration you mention. Now libvirt would need to live-follow these user actions in order to guarantee that the filtering works under all configurations... from the perspective of libvirt using a single --physdev-out would be 'ideal', but if that functionality has been removed and packets cannot be intercepted using that method, then a different solution would be required. Packet marking on the ebtables layer would probably work. I'll look into this.

If I follow your last sentence, the configurations that I am working with are not correct since I shouldn't add a 'physical interface to the bridge'. However, if I want to use the infastructure's gateways (routers) from virtual machines on hosts in the same subnet, this seems a valid and useful configuration, no?

Comment 6 Stefan Berger 2010-10-13 15:13:47 UTC

I intend to post the following patch on the mailing list. It solves the problem of someone switching the filtering rules and for example dropping incoming tcp connections that were previously established, i.e., going from a filter like this here that allows incoming ssh connection

  <rule action='accept' direction='in' priority='401'>
    <tcp/>
  </rule>
  <rule action='accept' direction='out' priority='500'>
    <tcp/>
  </rule>

to a filter like this here that now (with this patch) cuts the existing ssh connection right off:

  <rule action='drop' direction='in' priority='401'>
    <tcp/>
  </rule>
  <rule action='accept' direction='out' priority='500'>
    <tcp/>
  </rule>

!!! As a nice side-effect, this also improves(!!) the situation for the network configuration with the bridge not having the physical interface, but here one can see the syn packets coming into the machine (on disallowed incoming tcp connections), but no connection can be established. Please try it. I tested this in this configuration using telnet to port 22 initated from within the VM and towards the VM and modifying the filter above. The effect is that it's 'working'. Another test was pinging the VM at 10.0.0.2 and from withing the VM to another machine in the infrastructure and live-modifying the rules related to ICMP traffic --> also cuts of the traffic by filtering the outgoing side correctly (still see icmp request coming into the vm and icmp responses going out, which then get dropped on the host when incoming ICMP is disallowed).


Playing around with packet marking didn't lead to any useful results. 


---
 src/nwfilter/nwfilter_ebiptables_driver.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Index: libvirt-acl/src/nwfilter/nwfilter_ebiptables_driver.c
===================================================================
--- libvirt-acl.orig/src/nwfilter/nwfilter_ebiptables_driver.c
+++ libvirt-acl/src/nwfilter/nwfilter_ebiptables_driver.c
@@ -1100,6 +1100,19 @@ err_exit:
     return 1;
 }
 
+
+static void
+iptablesEnforceDirection(int directionIn,
+                         virNWFilterRuleDefPtr rule,
+                         virBufferPtr buf)
+{
+    if (rule->tt != VIR_NWFILTER_RULE_DIRECTION_INOUT)
+        virBufferVSprintf(buf, " -m conntrack --ctdir %s",
+                          (directionIn) ? "Original"
+                                        : "Reply");
+}
+
+
 /*
  * _iptablesCreateRuleInstance:
  * @chainPrefix : The prefix to put in front of the name of the chain
@@ -1494,6 +1507,10 @@ _iptablesCreateRuleInstance(int directio
     if (match && !skipMatch)
         virBufferVSprintf(&buf, " %s", match);
 
+    if (defMatch && match != NULL)
+        iptablesEnforceDirection(directionIn,
+                                 rule,
+                                 &buf);
 
     virBufferVSprintf(&buf,
                       " -j %s" CMD_DEF_POST CMD_SEPARATOR

Comment 7 Soren Hansen 2010-10-13 15:40:53 UTC

(In reply to comment #5)
> Well, I have never tested this setup that you mention where the physical device
> is not plugged into the bridge.

Oh. It's what the "default" network in libvirt does (with NAT'ing and all
that). (I'm not using that, though, this is a bridge I configured myself.)

> If the physical device is plugged into the bridge, the filtering works as expected (besides what I am mentioning above).

Right.

> I do not know a work-around for this and I would declare it as a current
> limitation of the nwfilter.

Ok :(

> Besides that the user has the possibility to
> reconfigure the network setting at all times, thus plugging or unplugging the
> physical device from the bridge, giving the bridge an IP address, reconfiguring
> the VM's IP address etc. to get it into the configuration you mention. Now
> libvirt would need to live-follow these user actions in order to guarantee that
> the filtering works under all configurations...

Oh, the solution (or workaround, depending on your point of view) I'm proposing
would always work. You wouldn't only do it if the network was configured this
way, so there would be no need to follow the user's changes to network
configuration.

> from the perspective of libvirt using a single --physdev-out would be 'ideal', but if that functionality has
> been removed and packets cannot be intercepted using that method, then a
> different solution would be required. Packet marking on the ebtables layer
> would probably work. I'll look into this.

It's really dreadful, but I don't see another way (and Jan (netfilter
upstream)) agreed. We need to apply the /logic/ of the filtering in iptables
(not knowing which bridge port a packet is destined for), mark the packet
according to which bridge ports should drop it, and then finally drop it in
ebtables. This means that the *union* of all iptables filters for a given
bridge needs to be applied to all packets going through that bridge. The only
optimisation we can apply is "-o <bridge interface>" which works.

So, if I:
 * for vnet0 wanted to allow port 22 and deny port 25,
 * for vnet1 wanted to allow port 80,
 * for vnet2 wanted to allow port 80 and 443,

and all of vnet[012] were connected to br100, this could become:

iptables -A FORWARD -o br100 -m tcp --dport 22 -j MARK --or-mark 0x01
iptables -A FORWARD -o br100 -m tcp --dport 25 -j MARK --or-mark 0x02
iptables -A FORWARD -o br100 -m tcp --dport 80 -j MARK --or-mark 0x04
iptables -A FORWARD -o br100 -m tcp --dport 80 -j MARK --or-mark 0x08
iptables -A FORWARD -o br100 -m tcp --dport 443 -j MARK --or-mark 0x08
ebtables -t nat -A POSTROUTING -o vnet0 --mark 0x1/0x1 -j ACCEPT
ebtables -t nat -A POSTROUTING -o vnet0 --mark 0x2/0x2 -j DROP
ebtables -t nat -A POSTROUTING -o vnet1 --mark 0x4/0x4 -j ACCEPT
ebtables -t nat -A POSTROUTING -o vnet2 --mark 0x8/0x8 -j ACCEPT


So, each (ACCEPT/DROP, interface) tuple gets its own bit in the mark.
Marks are 32 bits, so we need to not waste them. :(

> If I follow your last sentence, the configurations that I am working with are
> not correct since I shouldn't add a 'physical interface to the bridge'.
> However, if I want to use the infastructure's gateways (routers) from virtual
> machines on hosts in the same subnet, this seems a valid and useful
> configuration, no?

Oh, what you're doing is fine. I do it all the time, too. It's just that to see
this particular bug, you need to not have the physical interface attached to
the bridge.

Comment 8 Daniel Berrangé 2010-10-13 15:50:30 UTC

> (In reply to comment #5)
> > Well, I have never tested this setup that you mention where the physical device
> > is not plugged into the bridge.
>
> Oh. It's what the "default" network in libvirt does (with NAT'ing and all
> that). (I'm not using that, though, this is a bridge I configured myself.)

Yes, with the libvirt virtual networking, you get an isolated bridge with TAP devices enslaved. Traffic to the outside world is either blocked, routed, or routed with NAT. Even if the physdev matches don't work in this scenario, there are a few other ways we can hook into traffic to/from the virtual network. eg, PRE/POST-ROUTING chains, or FORWARD chain. This old email shows what chains are traversed in libvirt virtual network setups. I believe this is still accurate

http://www.redhat.com/archives/libvir-list/2007-April/msg00033.html

Comment 9 Stefan Berger 2010-10-13 16:34:25 UTC

(In reply to comment #7)
> (In reply to comment #5)
> > Well, I have never tested this setup that you mention where the physical device
> > is not plugged into the bridge.
> 
> Oh. It's what the "default" network in libvirt does (with NAT'ing and all
> that). (I'm not using that, though, this is a bridge I configured myself.)

I guess the problem is that I always plugged the VMs into a bridge I created rather than the one libvirt creats. That way I can reach the VMs from anywhere, so no need to set up SNAT and DNAT... and masquerading only works if traffic is initiated from VM to the outside, which is of limited use.

> 
> > If the physical device is plugged into the bridge, the filtering works as expected (besides what I am mentioning above).
> 
> Right.
> 
> > I do not know a work-around for this and I would declare it as a current
> > limitation of the nwfilter.
> 
> Ok :(

From what you are showing below, this is not a good solution...
> 
> > from the perspective of libvirt using a single --physdev-out would be 'ideal', but if that functionality has
> > been removed and packets cannot be intercepted using that method, then a
> > different solution would be required. Packet marking on the ebtables layer
> > would probably work. I'll look into this.
> 
> It's really dreadful, but I don't see another way (and Jan (netfilter
> upstream)) agreed. We need to apply the /logic/ of the filtering in iptables
> (not knowing which bridge port a packet is destined for), mark the packet
> according to which bridge ports should drop it, and then finally drop it in
> ebtables. This means that the *union* of all iptables filters for a given
> bridge needs to be applied to all packets going through that bridge. The only
> optimisation we can apply is "-o <bridge interface>" which works.
> 
> So, if I:
>  * for vnet0 wanted to allow port 22 and deny port 25,
>  * for vnet1 wanted to allow port 80,
>  * for vnet2 wanted to allow port 80 and 443,
> 
> and all of vnet[012] were connected to br100, this could become:
> 
> iptables -A FORWARD -o br100 -m tcp --dport 22 -j MARK --or-mark 0x01
> iptables -A FORWARD -o br100 -m tcp --dport 25 -j MARK --or-mark 0x02
> iptables -A FORWARD -o br100 -m tcp --dport 80 -j MARK --or-mark 0x04
> iptables -A FORWARD -o br100 -m tcp --dport 80 -j MARK --or-mark 0x08
> iptables -A FORWARD -o br100 -m tcp --dport 443 -j MARK --or-mark 0x08
> ebtables -t nat -A POSTROUTING -o vnet0 --mark 0x1/0x1 -j ACCEPT
> ebtables -t nat -A POSTROUTING -o vnet0 --mark 0x2/0x2 -j DROP
> ebtables -t nat -A POSTROUTING -o vnet1 --mark 0x4/0x4 -j ACCEPT
> ebtables -t nat -A POSTROUTING -o vnet2 --mark 0x8/0x8 -j ACCEPT
> 
> 

Yes, that would be 'dreadful'. We would only have the possibility to filter on 32 different 'types'... You can easily have that many filters per VM or on a big machine that many or more VMs with each having different rules.

> So, each (ACCEPT/DROP, interface) tuple gets its own bit in the mark.
> Marks are 32 bits, so we need to not waste them. :(

Right...

I think the patch I am showing above is a first step and necessary for other reason. We'll need to find 'something else' later on to 'somehow' grab the packets coming in through the routing. I don't yet know what that is, but I'll look out.

Comment 10 Stefan Berger 2010-10-13 17:27:56 UTC

Following this diagram here

http://www.imagestream.com/~josh/PacketFlow.png

an idea would be to mark packets destined (-o <ifname>) for a certain VM's interface in ebtables' nat POSTROUTING chain (later on with the interface index number), then check the mark in iptables' nat POSTROUTING chain and hook the 'normal' nwfilter tables in there then through a jump.


ebtables -t nat -L POSTROUTING --Lc ; iptables -t nat -L POSTROUTING -v 
Bridge table: nat

Bridge chain: POSTROUTING, entries: 1, policy: ACCEPT
-o vnet0 -j mark --mark-set 0x457 --mark-target ACCEPT, pcnt = 39470 -- bcnt = 3043618
Chain POSTROUTING (policy ACCEPT 51 packets, 3684 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 MASQUERADE  tcp  --  any    any     192.168.122.0/24    !192.168.122.0/24    masq ports: 1024-65535 
    0     0 MASQUERADE  udp  --  any    any     192.168.122.0/24    !192.168.122.0/24    masq ports: 1024-65535 
    0     0 MASQUERADE  all  --  any    any     192.168.122.0/24    !192.168.122.0/24    
    2   152 ACCEPT     all  --  any    any     anywhere             anywhere            mark match 0x457 


The problem is that even after a long time the pkts/bytes counters in iptables' nat POSTROUTING chain hasn't increased at all while the ebtables counter seems to show a valid number. I wonder whether the diagram is correct. It shows ebtable POSTROUTING before iptables POSTROUTING...

Comment 11 Soren Hansen 2010-10-13 18:48:52 UTC

(In reply to comment #9)
> (In reply to comment #7)
> > (In reply to comment #5)
>>> Well, I have never tested this setup that you mention where the physical device
>>> is not plugged into the bridge.
>> Oh. It's what the "default" network in libvirt does (with NAT'ing and all
>> that). (I'm not using that, though, this is a bridge I configured myself.)
> I guess the problem is that I always plugged the VMs into a bridge I created
> rather than the one libvirt creats. That way I can reach the VMs from anywhere,
> so no need to set up SNAT and DNAT... and masquerading only works if traffic is
> initiated from VM to the outside, which is of limited use.

Yes, I usually do that, too.

>>> I do not know a work-around for this and I would declare it as a current
>>> limitation of the nwfilter.
>> 
>> Ok :(
> From what you are showing below, this is not a good solution...

It's not (I even pointed out that it was dreadful :-) ), but it's a solution. Without it, filters are simply ineffective.

>> It's really dreadful, but I don't see another way (and Jan (netfilter
>> upstream)) agreed. We need to apply the /logic/ of the filtering in iptables
>> (not knowing which bridge port a packet is destined for), mark the packet
>> according to which bridge ports should drop it, and then finally drop it in
>> ebtables. This means that the *union* of all iptables filters for a given
>> bridge needs to be applied to all packets going through that bridge. The only
>> optimisation we can apply is "-o <bridge interface>" which works.
>> 
>> So, if I:
>>  * for vnet0 wanted to allow port 22 and deny port 25,
>>  * for vnet1 wanted to allow port 80,
>>  * for vnet2 wanted to allow port 80 and 443,
>> 
>> and all of vnet[012] were connected to br100, this could become:
>> 
>> iptables -A FORWARD -o br100 -m tcp --dport 22 -j MARK --or-mark 0x01
>> iptables -A FORWARD -o br100 -m tcp --dport 25 -j MARK --or-mark 0x02
>> iptables -A FORWARD -o br100 -m tcp --dport 80 -j MARK --or-mark 0x04
>> iptables -A FORWARD -o br100 -m tcp --dport 80 -j MARK --or-mark 0x08
>> iptables -A FORWARD -o br100 -m tcp --dport 443 -j MARK --or-mark 0x08
>> ebtables -t nat -A POSTROUTING -o vnet0 --mark 0x1/0x1 -j ACCEPT
>> ebtables -t nat -A POSTROUTING -o vnet0 --mark 0x2/0x2 -j DROP
>> ebtables -t nat -A POSTROUTING -o vnet1 --mark 0x4/0x4 -j ACCEPT
>> ebtables -t nat -A POSTROUTING -o vnet2 --mark 0x8/0x8 -j ACCEPT
> Yes, that would be 'dreadful'. We would only have the possibility to filter on
> 32 different 'types'... You can easily have that many filters per VM or on a
> big machine that many or more VMs with each having different rules.

I honestly doubt it's very common. 16 interfaces attached to the same bridge
with both accept and deny filters? Anyway, without this, you simply get no
filtering at all with this network setup.

> I think the patch I am showing above is a first step and necessary for other
> reason.

No doubt about its necessity.

> We'll need to find 'something else' later on to 'somehow' grab the
> packets coming in through the routing. I don't yet know what that is, but I'll
> look out.

I may be putting too much trust in individuals, but when netfilter upstream
tells me there's no other way, I tend to believe it's true.

Comment 12 Soren Hansen 2010-10-13 18:59:39 UTC

(In reply to comment #10)
> Following this diagram here
> 
> http://www.imagestream.com/~josh/PacketFlow.png

don't do that. As I said, this affects kernels newer than 2.6.20. That diagram is from 2003. 2.6.20 came out in 2007.

This is the current diagram:

http://jengelh.medozas.de/images/nf-packet-flow.png

> an idea would be to mark packets destined (-o <ifname>) for a certain VM's
> interface in ebtables' nat POSTROUTING chain (later on with the interface index
> number), then check the mark in iptables' nat POSTROUTING chain and hook the
> 'normal' nwfilter tables in there then through a jump.

Except it doesn't work that way anymore. Once the packet reaches the bridge layer and the kernel find out which bridge port it is destined for, it never comes back to iptables land (except if it is locally generated or comes from another port on the same bridge). The new diagram shows this.

Comment 13 Soren Hansen 2010-10-13 19:03:39 UTC

(In reply to comment #8)
> Yes, with the libvirt virtual networking, you get an isolated bridge with TAP
> devices enslaved. Traffic to the outside world is either blocked, routed, or
> routed with NAT. Even if the physdev matches don't work in this scenario, there
> are a few other ways we can hook into traffic to/from the virtual network. eg,
> PRE/POST-ROUTING chains, or FORWARD chain. This old email shows what chains are
> traversed in libvirt virtual network setups. I believe this is still accurate
> 
> http://www.redhat.com/archives/libvir-list/2007-April/msg00033.html

Again, this changed in 2.6.20. 2.6.20 came out in February 2007. I don't know which kernel version you used for your tests. Do you have an educated guess?

Anyways, I'm not sure I see what you are suggesting. nwfilter lets you define filters that are specific to a interface of a particular guest. It identifies this based on tap device. For incoming traffic (from the outside to the guest), I don't see how you can do that without --physdev-out matching?

Comment 14 Stefan Berger 2010-10-13 21:15:15 UTC

(In reply to comment #12)
> (In reply to comment #10)
> > Following this diagram here
> > 
> > http://www.imagestream.com/~josh/PacketFlow.png
> don't do that. As I said, this affects kernels newer than 2.6.20. That diagram
> is from 2003. 2.6.20 came out in 2007.
> This is the current diagram:
> http://jengelh.medozas.de/images/nf-packet-flow.png
> > an idea would be to mark packets destined (-o <ifname>) for a certain VM's
> > interface in ebtables' nat POSTROUTING chain (later on with the interface index
> > number), then check the mark in iptables' nat POSTROUTING chain and hook the
> > 'normal' nwfilter tables in there then through a jump.
> Except it doesn't work that way anymore. Once the packet reaches the bridge
> layer and the kernel find out which bridge port it is destined for, it never
> comes back to iptables land (except if it is locally generated or comes from
> another port on the same bridge). The new diagram shows this.

Yes, I can see that.

Thanks for the link.

Comment 15 Stefan Berger 2012-04-18 16:34:29 UTC

Just was reminded of this thread...

With the changes that were added back then I tried the following again today on a VM connected to virbr0 and accessing the Internet through MASQUERADing.

I defined the following filter, disallowing the pinging of the public (Google) DNS server 8.8.4.4, while allowing pinging of the public (Google) DNS server 8.8.8.8.

<filter name='acl-filter' chain='root'>
  <uuid>aca4e9f0-c9ed-03db-d7e3-8e8210196c79</uuid>
  <rule action='drop' direction='out' priority='500'>
    <icmp dstipaddr='8.8.4.4'/>
  </rule>
</filter>

(Use 'virsh nwfilter-define <xml file>' to make this filter known to libvirt.)

The relevant part in the domain XML is:

    <interface type='bridge'>
      <mac address='52:54:00:9f:80:45'/>
      <source bridge='virbr0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <filterref filter='acl-filter'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </interface>

Now you can start two shells inside the VM and ping 8.8.8.8 and 8.8.4.4. The latter will not ping successfully.

On the host you can now do

virsh nwfilter-edit acl-filter

and live-change the IP address 8.8.4.4 to 8.8.8.8 and save the changes. The shell previously successfully pinging 8.8.8.8 will stop and pinging 8.8.4.4 will now work.

Note: In this particular configuration where the physical interface is not connected to the bridge the one side of the traffic is filtered (using the above XML markup), i.e., the return path of the ICMP Echo responses in case of ping (or the SYN-ACK path in case of TCP initiated from the VM, etc).

Can this bug now be closed ?


Regards,
    Stefan

Comment 16 Stefan Berger 2012-04-18 16:51:27 UTC

(In reply to comment #15)


I forgot to mention to do the following on the host before starting the VM:

echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables

Comment 17 Cole Robinson 2016-03-22 22:44:45 UTC

Closing per comment #14

Note You need to log in before you can comment on or make changes to this bug.