RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1396032 - libvirt fails applying an ebtables rule so a bridge goes down and so the VMs running on it
Summary: libvirt fails applying an ebtables rule so a bridge goes down and so the VMs ...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ebtables
Version: 7.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Phil Sutter
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-17 10:08 UTC by Simone Tiraboschi
Modified: 2020-05-28 09:40 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-12 11:41:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1386293 0 high CLOSED Host stuck in installing state during ovirt-hosted-engine-setup run 2021-02-22 00:41:40 UTC

Internal Links: 1386293

Description Simone Tiraboschi 2016-11-17 10:08:56 UTC
Description of problem:
libvirt fails (maybe there is some issue loading the ebtables module) applying an ebtables rule (generated by vdsm-no-mac-spoofing hook) so it ends rolling up the firewall configuration and this brings down a bridge and the VMs attached to that bridge.

2016-11-16 08:29:00.884+0000: 20933: info : virFirewallApplyGroup:895 : Starting transaction for firewall=0x7f2a6c001820 group=0x7f2a6c004050 flags=1
2016-11-16 08:29:00.884+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -D PREROUTING -i vnet0 -j libvirt-J-vnet0'
2016-11-16 08:29:00.888+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -D POSTROUTING -o vnet0 -j libvirt-P-vnet0'
2016-11-16 08:29:00.891+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -L libvirt-J-vnet0'
2016-11-16 08:29:00.893+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -L libvirt-P-vnet0'
2016-11-16 08:29:00.896+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F libvirt-J-vnet0'
2016-11-16 08:29:00.899+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X libvirt-J-vnet0'
2016-11-16 08:29:00.900+0000: 20840: info : virEventPollDispatchHandles:506 : EVENT_POLL_DISPATCH_HANDLE: watch=9 events=1
2016-11-16 08:29:00.900+0000: 20840: info : virEventPollRunOnce:640 : EVENT_POLL_RUN: nhandles=12 timeout=-1
2016-11-16 08:29:00.901+0000: 20840: info : virEventPollDispatchHandles:506 : EVENT_POLL_DISPATCH_HANDLE: watch=12 events=1
2016-11-16 08:29:00.901+0000: 20840: info : virEventPollRunOnce:640 : EVENT_POLL_RUN: nhandles=12 timeout=-1
2016-11-16 08:29:00.902+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F libvirt-P-vnet0'
2016-11-16 08:29:00.905+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X libvirt-P-vnet0'
2016-11-16 08:29:00.907+0000: 20933: info : virFirewallApplyGroup:895 : Starting transaction for firewall=0x7f2a6c001820 group=0x7f2a6c006a10 flags=0
2016-11-16 08:29:00.907+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -N libvirt-J-vnet0'
2016-11-16 08:29:00.910+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F J-vnet0-mac'
2016-11-16 08:29:00.913+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X J-vnet0-mac'
2016-11-16 08:29:00.916+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac'
2016-11-16 08:29:00.919+0000: 20933: error : virFirewallApplyRuleDirect:732 : internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac: Chain J-vnet0-mac already exists.

2016-11-16 08:29:00.919+0000: 20933: info : virFirewallRollbackGroup:915 : Starting rollback for group 0x7f2a6c006a10
2016-11-16 08:29:00.919+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -D PREROUTING -i vnet0 -j libvirt-J-vnet0'
2016-11-16 08:29:00.921+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -D POSTROUTING -o vnet0 -j libvirt-P-vnet0'
2016-11-16 08:29:00.924+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -L libvirt-J-vnet0'
2016-11-16 08:29:00.927+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -L libvirt-P-vnet0'
2016-11-16 08:29:00.930+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F libvirt-J-vnet0'
2016-11-16 08:29:00.933+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X libvirt-J-vnet0'
2016-11-16 08:29:00.936+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F libvirt-P-vnet0'
2016-11-16 08:29:00.938+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X libvirt-P-vnet0'

libvirt fails applying an ebtables rule and so start to rollback.
vnet0 goes down, so ovirtmgmt and so the engine VM.

Nov 16 10:29:00 puma23 systemd: Starting Virtual Desktop Server Manager network restoration...
Nov 16 10:29:00 puma23 dnsmasq[4773]: read /etc/hosts - 2 addresses
Nov 16 10:29:00 puma23 dnsmasq[4773]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Nov 16 10:29:00 puma23 dnsmasq-dhcp[4773]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Nov 16 10:29:00 puma23 kernel: type=1400 audit(1479284940.858:26): avc:  denied  { search } for  pid=19894 comm="systemd-machine" name="19893" dev="proc" ino=155551324 scontext=system_u:system_r:systemd_machined_t:s0 tcontext=system_u:system_r:svirt_t:s0:c431,c860 tclass=dir
Nov 16 10:29:01 puma23 kernel: ovirtmgmt: port 2(vnet0) entered disabled state
Nov 16 10:29:01 puma23 kernel: device vnet0 left promiscuous mode
Nov 16 10:29:01 puma23 kernel: ovirtmgmt: port 2(vnet0) entered disabled state
Nov 16 10:29:01 puma23 kvm: 0 guests now active
Nov 16 10:29:01 puma23 systemd-machined: Machine qemu-1-HostedEngine terminated.
Nov 16 10:29:01 puma23 systemd: Started Virtual Desktop Server Manager network restoration.

All the relevant logs could be found on bug: rhbz#1386293


Version-Release number of selected component (if applicable):
# rpm -qa | grep vdsm
vdsm-hook-ethtool-options-4.18.15.3-1.el7ev.noarch
vdsm-xmlrpc-4.18.15.3-1.el7ev.noarch
vdsm-api-4.18.15.3-1.el7ev.noarch
vdsm-yajsonrpc-4.18.15.3-1.el7ev.noarch
vdsm-infra-4.18.15.3-1.el7ev.noarch
vdsm-cli-4.18.15.3-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.18.15.3-1.el7ev.noarch
vdsm-4.18.15.3-1.el7ev.x86_64
vdsm-python-4.18.15.3-1.el7ev.noarch
vdsm-jsonrpc-4.18.15.3-1.el7ev.noarch

# rpm -qa | grep ebtables
ebtables-2.0.10-15.el7.x86_64

# rpm -qa | grep libvirt
libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64
libvirt-daemon-2.0.0-10.el7.x86_64
libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64
libvirt-client-2.0.0-10.el7.x86_64
libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64
libvirt-daemon-driver-lxc-2.0.0-10.el7.x86_64
libvirt-2.0.0-10.el7.x86_64
libvirt-python-2.0.0-2.el7.x86_64
libvirt-lock-sanlock-2.0.0-10.el7.x86_64
libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64
libvirt-daemon-config-network-2.0.0-10.el7.x86_64
libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64
libvirt-daemon-kvm-2.0.0-10.el7.x86_64
libvirt-daemon-driver-network-2.0.0-10.el7.x86_64
libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64
libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64

# rpm -qa | grep hosted
ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.3-2.el7ev.noarch

It seams that this wasn't reproducible (to be verified) downgrading to ebtables-2.0.10-13.el7.x86_64

How reproducible:
Not really systematic.

Steps to Reproduce:
We saw that trying to deploy ovirt-hosted-engine using vdsm-no-mac-spoofing
Still do be determined how to isolate it.


Actual results:
We saw two kind of error messages on different reproductions:


Oct 18 14:50:39 mac5254002a783e journal: internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N libvirt-J-vnet0: The kernel doesn't support the ebtables 'nat' table.

2016-11-16 08:29:00.919+0000: 20933: error : virFirewallApplyRuleDirect:732 : internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac: Chain J-vnet0-mac already exists.

but at the end the result is the same:
the management bridge got down and so the VM that was using it.


Expected results:
success

Additional info:
vdsm-no-mac-spoofing wasn't really needed by hosted-engine so, as a workaround, we removed it with patch https://gerrit.ovirt.org/#/c/66853/

All the relevant logs could be found on bug: rhbz#1386293

Comment 2 Laine Stump 2016-11-17 15:52:40 UTC
I'm fairly certain this isn't a libvirt issue - all that libvirt does it call ebtables with a rule that happens to be for the ebtables "nat" table. It's up to [something else. ebtables userland? ebtables kernel code? Some system config?] to autoload the ebtables_nat module when it is needed.

I'm changing the component to ebtables for further triage.

Comment 3 Phil Sutter 2017-10-05 16:06:04 UTC
Hi!

So I started looking into this issue - or rather issues, as in my opinion the EEXIST case is unrelated to the ENOTSUP one.

EEXIST first, the relevant part of the logs seems to be this snippet:

| 2016-11-16 08:29:00.910+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -F J-vnet0-mac'
| 2016-11-16 08:29:00.913+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -X J-vnet0-mac'
| 2016-11-16 08:29:00.916+0000: 20933: info : virFirewallApplyRule:838 : Applying rule '/usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac'
| 2016-11-16 08:29:00.919+0000: 20933: error : virFirewallApplyRuleDirect:732 : internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N J-vnet0-mac: Chain J-vnet0-mac already exists.

This raises a question: How is it possible that J-vnet0-mac already exists if
it was just removed (by the previous call with '-X')? I can imagine only three
options:

A) Previous chain removal has failed (maybe because the chain is still
   referenced somewhere). Since the same code is used in both calls, that
   error should be logged as well, though. So very unlikely, but could you
   please check that failure to remove a chain in respective code is logged?

B) Both calls ('-X' and '-N') run in parallel, so this is a race condition.
   Three milliseconds between both calls is really quick, but not impossible.
   Could someone with insight into virFirewallApplyRule code base please check
   that these ebtables calls are not backgrounded or otherwise run in
   parallel?

C) There is another instance running in parallel which adds the rule. I could
   imagine if these rules are saved and 'systemctl restart ebtables' runs in
   parallel, we see strange errors like this. Can you confirm ebtables service
   is not in use and libvirt maintains ebtables state manually?


Now for ENOTSUP: The attached dmesg log[1] in bz#1386293 drew my attention:

| [...]
| Oct 18 14:50:39 mac5254002a783e systemd: Starting Flexible Branding Service...
| Oct 18 14:50:39 mac5254002a783e systemd: Stopping IPv4 firewall with iptables...
| Oct 18 14:50:39 mac5254002a783e iptables.init: iptables: Setting chains to policy ACCEPT: filter [  OK  ]
| Oct 18 14:50:39 mac5254002a783e iptables.init: iptables: Flushing firewall rules: [  OK  ]
| Oct 18 14:50:39 mac5254002a783e journal: libvirt version: 2.0.0, package: 4.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-08-02-09:15:12, x86-034.build.eng.bos.redhat.com)
| Oct 18 14:50:39 mac5254002a783e journal: hostname: mac5254002a783e.example.com
| Oct 18 14:50:39 mac5254002a783e journal: internal error: Failed to apply firewall rules /usr/sbin/ebtables --concurrent -t nat -N libvirt-J-vnet0: The kernel doesn't support the ebtables 'nat' table.
| Oct 18 14:50:39 mac5254002a783e kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
| Oct 18 14:50:39 mac5254002a783e iptables.init: iptables: Unloading modules: [  OK  ]
| Oct 18 14:50:39 mac5254002a783e systemd: Stopped IPv4 firewall with iptables.
| [...]

From looking at the (full) log, I can't tell which instance is issuing the
failing ebtables call here. Though I see that 'iptables stop' runs in parallel
and is unloading modules, and that is potentially fishy. Though I wasn't able
to provoke iptables.init into unloading any ebtables-required kernel modules -
might be a red herring, sadly.

The reasoning behind picking on iptables.init specifically is because there
was a similar issue between iptables.init and ip6tables.init which eventually
required to force systemd to serialize restarting both services. I see that
ebtables service has a similar module unloading logic, can you please confirm
ebtables service is not involved here anywhere?

Thanks, Phil

[1] https://bugzilla.redhat.com/attachment.cgi?id=1211766

Comment 4 Simone Tiraboschi 2017-10-09 14:54:47 UTC
> required to force systemd to serialize restarting both services. I see that
> ebtables service has a similar module unloading logic, can you please confirm
> ebtables service is not involved here anywhere?

AFAIK ebtables is involved as well.

Comment 8 Phil Sutter 2017-12-12 11:41:09 UTC
Hi Simone,

Since I didn't get a reply for two months, I assume the problem was either solved on your side or is not relevant anymore. I'm therefore closing this ticket, feel free to reopen in case my assumption is wrong and you are able to provide me with instructions on how to reproduce the issue.

Thanks, Phil


Note You need to log in before you can comment on or make changes to this bug.