Bug 1396032
| Summary: | libvirt fails applying an ebtables rule so a bridge goes down and so the VMs running on it | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Simone Tiraboschi <stirabos> |
| Component: | ebtables | Assignee: | Phil Sutter <psutter> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | qe-baseos-daemons |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.3 | CC: | aloughla, dyuan, egarver, jscotka, laine, rbalakri, stirabos, xuzhang, yalzhang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-12-12 11:41:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I'm fairly certain this isn't a libvirt issue - all that libvirt does it call ebtables with a rule that happens to be for the ebtables "nat" table. It's up to [something else. ebtables userland? ebtables kernel code? Some system config?] to autoload the ebtables_nat module when it is needed. I'm changing the component to ebtables for further triage. Hi!
So I started looking into this issue - or rather issues, as in my opinion the EEXIST case is unrelated to the ENOTSUP one.
EEXIST first, the relevant part of the logs seems to be this snippet:
| 2016-11-16 08:29:00.910+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F J-vnet0-mac'
| 2016-11-16 08:29:00.913+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X J-vnet0-mac'
| 2016-11-16 08:29:00.916+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac'
| 2016-11-16 08:29:00.919+0000: 20933: error : virFirewallApplyRuleDirect:732 : internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac: Chain J-vnet0-mac already exists.
This raises a question: How is it possible that J-vnet0-mac already exists if
it was just removed (by the previous call with '-X')? I can imagine only three
options:
A) Previous chain removal has failed (maybe because the chain is still
referenced somewhere). Since the same code is used in both calls, that
error should be logged as well, though. So very unlikely, but could you
please check that failure to remove a chain in respective code is logged?
B) Both calls ('-X' and '-N') run in parallel, so this is a race condition.
Three milliseconds between both calls is really quick, but not impossible.
Could someone with insight into virFirewallApplyRule code base please check
that these ebtables calls are not backgrounded or otherwise run in
parallel?
C) There is another instance running in parallel which adds the rule. I could
imagine if these rules are saved and 'systemctl restart ebtables' runs in
parallel, we see strange errors like this. Can you confirm ebtables service
is not in use and libvirt maintains ebtables state manually?
Now for ENOTSUP: The attached dmesg log[1] in bz#1386293 drew my attention:
| [...]
| Oct 18 14:50:39 mac5254002a783e systemd: Starting Flexible Branding Service...
| Oct 18 14:50:39 mac5254002a783e systemd: Stopping IPv4 firewall with iptables...
| Oct 18 14:50:39 mac5254002a783e iptables.init: iptables: Setting chains to policy ACCEPT: filter [ OK ]
| Oct 18 14:50:39 mac5254002a783e iptables.init: iptables: Flushing firewall rules: [ OK ]
| Oct 18 14:50:39 mac5254002a783e journal: libvirt version: 2.0.0, package: 4.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-08-02-09:15:12, x86-034.build.eng.bos.redhat.com)
| Oct 18 14:50:39 mac5254002a783e journal: hostname: mac5254002a783e.example.com
| Oct 18 14:50:39 mac5254002a783e journal: internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N libvirt-J-vnet0: The kernel doesn't support the ebtables 'nat' table.
| Oct 18 14:50:39 mac5254002a783e kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
| Oct 18 14:50:39 mac5254002a783e iptables.init: iptables: Unloading modules: [ OK ]
| Oct 18 14:50:39 mac5254002a783e systemd: Stopped IPv4 firewall with iptables.
| [...]
From looking at the (full) log, I can't tell which instance is issuing the
failing ebtables call here. Though I see that 'iptables stop' runs in parallel
and is unloading modules, and that is potentially fishy. Though I wasn't able
to provoke iptables.init into unloading any ebtables-required kernel modules -
might be a red herring, sadly.
The reasoning behind picking on iptables.init specifically is because there
was a similar issue between iptables.init and ip6tables.init which eventually
required to force systemd to serialize restarting both services. I see that
ebtables service has a similar module unloading logic, can you please confirm
ebtables service is not involved here anywhere?
Thanks, Phil
[1] https://bugzilla.redhat.com/attachment.cgi?id=1211766
> required to force systemd to serialize restarting both services. I see that
> ebtables service has a similar module unloading logic, can you please confirm
> ebtables service is not involved here anywhere?
AFAIK ebtables is involved as well.
Hi Simone, Since I didn't get a reply for two months, I assume the problem was either solved on your side or is not relevant anymore. I'm therefore closing this ticket, feel free to reopen in case my assumption is wrong and you are able to provide me with instructions on how to reproduce the issue. Thanks, Phil |
Description of problem: libvirt fails (maybe there is some issue loading the ebtables module) applying an ebtables rule (generated by vdsm-no-mac-spoofing hook) so it ends rolling up the firewall configuration and this brings down a bridge and the VMs attached to that bridge. 2016-11-16 08:29:00.884+0000: 20933: info : virFirewallApplyGroup:895 : Starting transaction for firewall=0x7f2a6c001820 group=0x7f2a6c004050 flags=1 2016-11-16 08:29:00.884+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -D PREROUTING -i vnet0 -j libvirt-J-vnet0' 2016-11-16 08:29:00.888+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -D POSTROUTING -o vnet0 -j libvirt-P-vnet0' 2016-11-16 08:29:00.891+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -L libvirt-J-vnet0' 2016-11-16 08:29:00.893+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -L libvirt-P-vnet0' 2016-11-16 08:29:00.896+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F libvirt-J-vnet0' 2016-11-16 08:29:00.899+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X libvirt-J-vnet0' 2016-11-16 08:29:00.900+0000: 20840: info : virEventPollDispatchHandles:506 : EVENT_POLL_DISPATCH_HANDLE: watch=9 events=1 2016-11-16 08:29:00.900+0000: 20840: info : virEventPollRunOnce:640 : EVENT_POLL_RUN: nhandles=12 timeout=-1 2016-11-16 08:29:00.901+0000: 20840: info : virEventPollDispatchHandles:506 : EVENT_POLL_DISPATCH_HANDLE: watch=12 events=1 2016-11-16 08:29:00.901+0000: 20840: info : virEventPollRunOnce:640 : EVENT_POLL_RUN: nhandles=12 timeout=-1 2016-11-16 08:29:00.902+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F libvirt-P-vnet0' 2016-11-16 08:29:00.905+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X libvirt-P-vnet0' 2016-11-16 08:29:00.907+0000: 20933: info : virFirewallApplyGroup:895 : Starting transaction for firewall=0x7f2a6c001820 group=0x7f2a6c006a10 flags=0 2016-11-16 08:29:00.907+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -N libvirt-J-vnet0' 2016-11-16 08:29:00.910+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F J-vnet0-mac' 2016-11-16 08:29:00.913+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X J-vnet0-mac' 2016-11-16 08:29:00.916+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac' 2016-11-16 08:29:00.919+0000: 20933: error : virFirewallApplyRuleDirect:732 : internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac: Chain J-vnet0-mac already exists. 2016-11-16 08:29:00.919+0000: 20933: info : virFirewallRollbackGroup:915 : Starting rollback for group 0x7f2a6c006a10 2016-11-16 08:29:00.919+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -D PREROUTING -i vnet0 -j libvirt-J-vnet0' 2016-11-16 08:29:00.921+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -D POSTROUTING -o vnet0 -j libvirt-P-vnet0' 2016-11-16 08:29:00.924+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -L libvirt-J-vnet0' 2016-11-16 08:29:00.927+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -L libvirt-P-vnet0' 2016-11-16 08:29:00.930+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F libvirt-J-vnet0' 2016-11-16 08:29:00.933+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X libvirt-J-vnet0' 2016-11-16 08:29:00.936+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F libvirt-P-vnet0' 2016-11-16 08:29:00.938+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X libvirt-P-vnet0' libvirt fails applying an ebtables rule and so start to rollback. vnet0 goes down, so ovirtmgmt and so the engine VM. Nov 16 10:29:00 puma23 systemd: Starting Virtual Desktop Server Manager network restoration... Nov 16 10:29:00 puma23 dnsmasq[4773]: read /etc/hosts - 2 addresses Nov 16 10:29:00 puma23 dnsmasq[4773]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses Nov 16 10:29:00 puma23 dnsmasq-dhcp[4773]: read /var/lib/libvirt/dnsmasq/default.hostsfile Nov 16 10:29:00 puma23 kernel: type=1400 audit(1479284940.858:26): avc: denied { search } for pid=19894 comm="systemd-machine" name="19893" dev="proc" ino=155551324 scontext=system_u:system_r:systemd_machined_t:s0 tcontext=system_u:system_r:svirt_t:s0:c431,c860 tclass=dir Nov 16 10:29:01 puma23 kernel: ovirtmgmt: port 2(vnet0) entered disabled state Nov 16 10:29:01 puma23 kernel: device vnet0 left promiscuous mode Nov 16 10:29:01 puma23 kernel: ovirtmgmt: port 2(vnet0) entered disabled state Nov 16 10:29:01 puma23 kvm: 0 guests now active Nov 16 10:29:01 puma23 systemd-machined: Machine qemu-1-HostedEngine terminated. Nov 16 10:29:01 puma23 systemd: Started Virtual Desktop Server Manager network restoration. All the relevant logs could be found on bug: rhbz#1386293 Version-Release number of selected component (if applicable): # rpm -qa | grep vdsm vdsm-hook-ethtool-options-4.18.15.3-1.el7ev.noarch vdsm-xmlrpc-4.18.15.3-1.el7ev.noarch vdsm-api-4.18.15.3-1.el7ev.noarch vdsm-yajsonrpc-4.18.15.3-1.el7ev.noarch vdsm-infra-4.18.15.3-1.el7ev.noarch vdsm-cli-4.18.15.3-1.el7ev.noarch vdsm-hook-vmfex-dev-4.18.15.3-1.el7ev.noarch vdsm-4.18.15.3-1.el7ev.x86_64 vdsm-python-4.18.15.3-1.el7ev.noarch vdsm-jsonrpc-4.18.15.3-1.el7ev.noarch # rpm -qa | grep ebtables ebtables-2.0.10-15.el7.x86_64 # rpm -qa | grep libvirt libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64 libvirt-daemon-2.0.0-10.el7.x86_64 libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64 libvirt-client-2.0.0-10.el7.x86_64 libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64 libvirt-daemon-driver-lxc-2.0.0-10.el7.x86_64 libvirt-2.0.0-10.el7.x86_64 libvirt-python-2.0.0-2.el7.x86_64 libvirt-lock-sanlock-2.0.0-10.el7.x86_64 libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64 libvirt-daemon-config-network-2.0.0-10.el7.x86_64 libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64 libvirt-daemon-kvm-2.0.0-10.el7.x86_64 libvirt-daemon-driver-network-2.0.0-10.el7.x86_64 libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64 libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64 # rpm -qa | grep hosted ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.3-2.el7ev.noarch It seams that this wasn't reproducible (to be verified) downgrading to ebtables-2.0.10-13.el7.x86_64 How reproducible: Not really systematic. Steps to Reproduce: We saw that trying to deploy ovirt-hosted-engine using vdsm-no-mac-spoofing Still do be determined how to isolate it. Actual results: We saw two kind of error messages on different reproductions: Oct 18 14:50:39 mac5254002a783e journal: internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N libvirt-J-vnet0: The kernel doesn't support the ebtables 'nat' table. 2016-11-16 08:29:00.919+0000: 20933: error : virFirewallApplyRuleDirect:732 : internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac: Chain J-vnet0-mac already exists. but at the end the result is the same: the management bridge got down and so the VM that was using it. Expected results: success Additional info: vdsm-no-mac-spoofing wasn't really needed by hosted-engine so, as a workaround, we removed it with patch https://gerrit.ovirt.org/#/c/66853/ All the relevant logs could be found on bug: rhbz#1386293