Bug 1031102
Summary: | Firewalld should use rmmod instead of modprobe -r | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Martin <mholec> |
Component: | firewalld | Assignee: | Thomas Woerner <twoerner> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tomas Dolezal <todoleza> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.0 | CC: | acathrow, berrange, dallan, dyuan, eblake, gsun, honzhang, jbenc, jpopelka, jprokes, jscotka, jwboyer, laine, mzhan, qe-baseos-daemons, todoleza, tpelka, twoerner |
Target Milestone: | rc | ||
Target Release: | 7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-06-13 09:17:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Martin
2013-11-15 15:46:57 UTC
(In reply to Martin Holec from comment #0) > Description of problem: > Stoping FirewallD removes libvirt's network interfaces Well yeah, since libvirt depends on the firewall to set up its interfaces. > > Version-Release number of selected component (if applicable): > firewalld-0.3.8-1.el7.noarch > libvirt-daemon-1.1.1-12.el7.x86_64 > kernel-3.10.0-50.el7.x86_64 > > How reproducible: > always > > Steps to Reproduce: > systemctl stop firewalld.service Maybe this implies our systemd config file needs to set up a hard dependency, where firewalld cannot be stopped if libvirtd is still running? (In reply to Eric Blake from comment #2) > (In reply to Martin Holec from comment #0) > > Description of problem: > > Stoping FirewallD removes libvirt's network interfaces > > Well yeah, since libvirt depends on the firewall to set up its interfaces. That doesn't make any sense. Firewalld is only concerned with firewall rules. Stopping it shouldn't cause the virbr1 bridge device to be deleted or offlined. > > Version-Release number of selected component (if applicable): > > firewalld-0.3.8-1.el7.noarch > > libvirt-daemon-1.1.1-12.el7.x86_64 > > kernel-3.10.0-50.el7.x86_64 > > > > How reproducible: > > always > > > > Steps to Reproduce: > > systemctl stop firewalld.service > > Maybe this implies our systemd config file needs to set up a hard > dependency, where firewalld cannot be stopped if libvirtd is still running? I don't think so. Stopping firewalld should not affect running networking. It should merely mean that you can't make further changes to the firewall rules until it is started again. I agree with Daniel. FirewallD shouldn't interfere with network interfaces created by other daemons. Eric, if I am prevented from stoping FirewallD or running libvirt without FirewallD, I couldn't decide whether my scenario fails because of firewall misconfiguration, libvirt networking setup or service daemon configuration. FirewallD as hard dependency for *any* service is very bad idea. When I'm disabling the firewall, I know what I'm doing and why and it's only temporally. In principle it's the same as setting SELinux to "permissive" mode temporally. This is reproducible without libvirt in the equation, so I'm reassigning to firewalld: # brctl addbr xyzzy # brctl show bridge name bridge id STP enabled interfaces xyzzy 8000.000000000000 no # systemctl stop firewalld.service # brctl show bridge name bridge id STP enabled interfaces # And of course the destroyed bridge doesn't return when you restart firewalld. This is obviously *very bad* for libvirt (and any management based on libvirt), since it means that any time firewalld is restarted, all virtual guests will be disconnected from the network, and the only reasonable method of reconnecting them will be to reboot them all. BTW, I witnessed the same behavior on Fedora 20 (firewalld-0.3.8-1.fc20.noarch) Info from /var/log/messages at the request of twoerner: Nov 26 18:47:10 vlap NetworkManager[1148]: <info> (xyzzy): ignoring bridge not created by NetworkManager Nov 26 18:47:10 vlap NetworkManager[1148]: <info> (xyzzy): ignoring bridge not created by NetworkManager Nov 26 18:47:28 vlap systemd: Stopping firewalld - dynamic firewall daemon... Nov 26 18:47:29 vlap avahi-daemon[1002]: Withdrawing workstation service for xyzzy. Nov 26 18:47:29 vlap kernel: [191096.787315] Ebtables v2.0 unregistered Nov 26 18:47:29 vlap systemd: Stopped firewalld - dynamic firewall daemon. Nov 26 18:47:31 vlap kernel: [191099.288209] Bridge firewalling registered This is starting to sound familiar - perhaps NM isn't *really* ignoring the bridge? Reassigning to kernel: Why is there an entry for bridge to ebtable_broute? modprobe -r ebtable_broute unloads bridge also. rmmod ebtable_broute is working, though. ebtable_nat 12807 0 ebtable_broute 12731 0 ebtable_filter 12827 0 lsmod | grep bridge ebtables 30758 3 ebtable_broute,ebtable_nat,ebtable_filter bridge 110617 1 ebtable_broute Steps to reproduce the problem: 1) systemctl stop firewalld.service; systmctl stop libvirtd.service -> just to make sure that nothing is using bridges and the firewall 2) brctl addbr xyzzy 3) brctl show 4) lsmod | grep bridge -> bridge is not using ebtable_broute 5) modprobe ebtable_broute 6) lsmod | grep bridge -> now bridge is using ebtable_broute 7) modprobe -r ebtable_broute -> removing ebtable_broute and bridge 8) brctl show -> no bridge anymore 9) lsmod | grep bridge -> no bridge anymore This seems to be a problem for all newer kernels also in Fedora. This is a modprobe bug: [root@localhost ~]# strace -f modprobe -r ebtable_broute ... delete_module("bridge", O_RDONLY) = 0 ... Man page says: If the modules it depends on are also unused, modprobe will try to remove them too. Josh, any idea why does modprobe do this when rmmod doesn't? From the modprobe logic I see "unused" means there are no other modules using the module that's being removed, but it doesn't say anything about actual software using the module. modprobe won't remove the module if the kernel's refcnt for that module is non-zero. If there is software using the module then I would expect the refcnt for that module (bridge in this case) to be non-zero. So what is the refcnt of the bridge module? You can get it directly from the kernel in /sys, so: [jwboyer@zod linux]$ lsmod | grep ebtable ebtable_nat 12807 0 ebtable_broute 12731 0 bridge 110624 1 ebtable_broute ebtable_filter 12827 0 ebtables 30758 3 ebtable_broute,ebtable_nat,ebtable_filter [jwboyer@zod linux]$ cat /sys/module/ebtable_broute/refcnt 0 [jwboyer@zod linux]$ cat /sys/module/bridge/refcnt 1 [jwboyer@zod linux]$ is what I see on my f19 machine. If you remove ebtable_broute with rmmod, and the refcnt goes to 0, modprobe will remove that too. If that's the case, is any software directly using the bridge module? (In reply to Josh Boyer from comment #15) > modprobe won't remove the module if the kernel's refcnt for that module is > non-zero. If there is software using the module then I would expect the > refcnt for that module (bridge in this case) to be non-zero. The refcount for the bridge module is only affected by other kernel module deps. Whether you have actual bridge devices in existance doesn't affect it. ie i have a bridge device: # brctl show bridge name bridge id STP enabled interfaces virbr0 8000.52540054952a yes virbr0-nic vnet0 vnet1 and the ref count is still zero # lsmod | grep bridge bridge 110624 0 IMHO this is a firewalld bug - it should not be telling the kernel to recursively remove unused moduls. The problem is that recursive removing of unused modules is default behaviour of modprobe -r command. So the workaround is to use rmmod instead, but I don't think the modprobe behaviour is correct (or doing this based on refcnt is not correct). (In reply to Josh Boyer from comment #15) > modprobe won't remove the module if the kernel's refcnt for that module is > non-zero. If there is software using the module then I would expect the > refcnt for that module (bridge in this case) to be non-zero. That's not correct, sorry. The refcnt is a counter of how many other modules would blow up if this module was unloaded. Thus, its meaning is solely whether the module can or cannot be removed at the particular time. It has nothing to do with the fact whether a feature provided by the module is used or not. In other words, when refcnt is zero, the module is able to clean up after itself, releasing all acquired kernel resources. If it's not zero, it's not able to clean up and must not be removed. As an example, you can unload driver for a network card. That will destroy the network interfaces that belong to that card. The reference counter says nothing about whether there's a traffic going through those interfaces. > So what is the refcnt of the bridge module? You can get it directly from > the kernel in /sys, so: It's 0, as no other module needs it. That does not mean there are no bridges registered and in use. > [jwboyer@zod linux]$ lsmod | grep ebtable > ebtable_nat 12807 0 > ebtable_broute 12731 0 > bridge 110624 1 ebtable_broute > ebtable_filter 12827 0 > ebtables 30758 3 ebtable_broute,ebtable_nat,ebtable_filter > [jwboyer@zod linux]$ cat /sys/module/ebtable_broute/refcnt > 0 > [jwboyer@zod linux]$ cat /sys/module/bridge/refcnt > 1 > [jwboyer@zod linux]$ > > is what I see on my f19 machine. If you remove ebtable_broute with rmmod, > and the refcnt goes to 0, modprobe will remove that too. If that's the > case, is any software directly using the bridge module? Not sure what you mean by "software" but there surely can be bridges registered and in use. (In reply to Daniel Berrange from comment #16) > IMHO this is a firewalld bug - it should not be telling the kernel to > recursively remove unused moduls. The kernel does not do that, it's done by modprobe. But you're probably right that firewalld should use rmmod instead. The behavior of modprobe -r does not make much sense but I suppose it's doing that since ages and cannot be easily changed, as there may be people relying on it. Interesting what one learns, I always thought rmmod and modprobe -r were more or less equivalents. (In reply to Václav Pavlín from comment #17) > The problem is that recursive removing of unused modules is default > behaviour of modprobe -r command. So the workaround is to use rmmod instead, > but I don't think the modprobe behaviour is correct (or doing this based on > refcnt is not correct). Well the modprobe behaviour is correct within the confines of its defined semantics. It is just that those semantics aren't particularly useful! I question why firewalld is trying to remove modules at all rather than just leaving them alone. The modprobe manpage itself says "There is usually no reason to remove modules, but some buggy modules require it. Your distribution kernel may not have been built to support removal of modules at all." (In reply to Jiri Benc from comment #18) > (In reply to Josh Boyer from comment #15) > > modprobe won't remove the module if the kernel's refcnt for that module is > > non-zero. If there is software using the module then I would expect the > > refcnt for that module (bridge in this case) to be non-zero. > > That's not correct, sorry. The refcnt is a counter of how many other modules > would blow up if this module was unloaded. Thus, its meaning is solely > whether the module can or cannot be removed at the particular time. It has > nothing to do with the fact whether a feature provided by the module is used > or not. Perhaps it depends on the module itself. I have e.g. fuse here loaded, with no other modules depending on it and it has a refcnt of 3. So something other than a module has increased that. > In other words, when refcnt is zero, the module is able to clean up after > itself, releasing all acquired kernel resources. If it's not zero, it's not > able to clean up and must not be removed. We agree on that part at least. > As an example, you can unload driver for a network card. That will destroy > the network interfaces that belong to that card. The reference counter says > nothing about whether there's a traffic going through those interfaces. Then maybe network drivers operate differently. (In reply to Václav Pavlín from comment #17) > The problem is that recursive removing of unused modules is default > behaviour of modprobe -r command. So the workaround is to use rmmod instead, > but I don't think the modprobe behaviour is correct (or doing this based on > refcnt is not correct). This has been the case for modprobe for basically forever. It was already present in module-init-tools long ago. I don't think changing the behavior of -r now is the correct thing to do. It also has nothing other than the refcnt to go on, so I don't think there's a suitable alternative. (In reply to Daniel Berrange from comment #16) > IMHO this is a firewalld bug - it should not be telling the kernel to > recursively remove unused moduls. It could do that by using rmmod instead. (In reply to Daniel Berrange from comment #20) > Well the modprobe behaviour is correct within the confines of its defined > semantics. It is just that those semantics aren't particularly useful! :-) > I question why firewalld is trying to remove modules at all rather than just > leaving them alone. You have very good point, generally. However, in this particular case (ebtables), just leaving the modules loaded with empty tables has a small, yet measurable performance impact on bridge throughput. I 100% agree it's a kernel bug and empty tables should be equal to having the module unloaded. Hopefully, it will be fixed when we switch to nftables. (In reply to Josh Boyer from comment #21) > (In reply to Jiri Benc from comment #18) > > That's not correct, sorry. The refcnt is a counter of how many other modules > > would blow up if this module was unloaded. Thus, its meaning is solely > > whether the module can or cannot be removed at the particular time. It has > > nothing to do with the fact whether a feature provided by the module is used > > or not. > > Perhaps it depends on the module itself. I have e.g. fuse here loaded, with > no other modules depending on it and it has a refcnt of 3. So something > other than a module has increased that. I probably oversimplified. The point is, the module refcnt increases when some kernel code calls module_get and decreases when module_put is called, and that's basically everything that can be said wrt. refcnt semantics. In kernel, in general, references are not taken unless really needed. If something can be done safely without increasing the reference count, it's done that way. As a special case, that probably led to the implementation of the dubious modprobe -r feature, when one module uses symbols exported from another module, the module_get is called by the kernel module loader. (In reply to Jiri Benc from comment #23) > (In reply to Josh Boyer from comment #21) > > (In reply to Jiri Benc from comment #18) > > > That's not correct, sorry. The refcnt is a counter of how many other modules > > > would blow up if this module was unloaded. Thus, its meaning is solely > > > whether the module can or cannot be removed at the particular time. It has > > > nothing to do with the fact whether a feature provided by the module is used > > > or not. > > > > Perhaps it depends on the module itself. I have e.g. fuse here loaded, with > > no other modules depending on it and it has a refcnt of 3. So something > > other than a module has increased that. > > I probably oversimplified. The point is, the module refcnt increases when > some kernel code calls module_get and decreases when module_put is called, > and that's basically everything that can be said wrt. refcnt semantics. In > kernel, in general, references are not taken unless really needed. If > something can be done safely without increasing the reference count, it's > done that way. OK, I knew that and I agree. I suppose I had forgotten that the kernel doesn't protect userspace from shooting itself in the foot :) Ok, so the behavior of kmod is correct (although 'those semantics aren't particularly useful') and firewalld should use rmmod rather than modprobe -r (or shouldn't unload modules at all). Moving back to firewalld. Thomas, please check https://git.fedorahosted.org/cgit/firewalld.git/commit/?id=3ba68446036f9fd850fc4be68a1bd938deb17b2e + # Unloading of modul with rmmod sometimes fails for no obvious reason + # (even if no other modul uses it). It was ok with modprobe -r. That's strange and should not happen. Do you have more details? Anything in dmesg? Nothing in dmesg or journal. rmmod only tells me that for example rmmod: ERROR: Module nf_conntrack_ipv6 is in use Does it help if you wait a short while between clearing the tables and removing the modules? No, it's not a matter of time, it's a dependency thing. We sort the modules before unloading, to unload leaves first. # lsmod | grep nf_conntrack_ipv6 nf_conntrack_ipv6 18738 1 nf_defrag_ipv6 34595 1 nf_conntrack_ipv6 nf_conntrack 86430 6 nf_nat,nf_nat_ipv4,nf_nat_ipv6,iptable_nat,nf_conntrack_ipv4,nf_conntrack_ipv6 shows that nf_conntrack_ipv6 is used by some other module but doesn't state by what. I've discovered that it's nf_nat_ipv6 - problem is that it's stated as a dependency of nf_conntrack and not nf_conntrack_ipv6. Previously 'modprobe -r' recursively removed nf_nat_ipv6 before trying to remove nf_conntrack_ipv6 so we had not seen this problem. Just how much is the "small but measurable" performance impact of leaving ebtables loaded anyway? Has that been re-measured in recent times? (In reply to Jiri Popelka from comment #30) > shows that nf_conntrack_ipv6 is used by some other module but doesn't state > by what. I've discovered that it's nf_nat_ipv6 - problem is that it's stated > as a dependency of nf_conntrack and not nf_conntrack_ipv6. Previously > 'modprobe -r' recursively removed nf_nat_ipv6 before trying to remove > nf_conntrack_ipv6 so we had not seen this problem. I see. Two passes indeed solve that, then. Thanks for the explanation! *** Bug 1048928 has been marked as a duplicate of this bug. *** This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |