Bug 1489066

Summary: iptables manager may fail to apply firewall rules if another iptables* process is being executed
Product: Red Hat OpenStack Reporter: Ihar Hrachyshka <ihrachys>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 12.0 (Pike)CC: abregman, amuller, chrisw, ebarrera, ihrachys, jlibosva, nyechiel, ragiman, srevivo, ssigwald
Target Milestone: rcKeywords: Reopened, Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-11.0.1-0.20170923193224.5b0191f.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1489069 1489070 1489071 1489072 1489074 1489081 (view as bug list) Environment:
Last Closed: 2018-04-05 12:50:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1489069, 1489070, 1489071, 1489072, 1489074, 1489081, 1504790, 1504791, 1505518, 1505520, 1505522, 1505524, 1505525, 1505526, 1505529    

Description Ihar Hrachyshka 2017-09-06 16:00:27 UTC
Description of problem: sometimes iptables manager (used by l2 agent, l3 agent, and other Neutron components) may fail to apply new firewall rules because another iptables process holding xlock is running at the moment the manager calls to iptables-* CLI.

It happens because newer iptables that is shipped with fresh RHEL 7.4 has xlock backports from iptables master that make all calls to iptables-* CLI to grab a file lock; and to fail if the lock is already grabbed by another process. To avoid that, one can pass --wait [timeout] argument to wait for the lock to become free instead of failing right away.

Version-Release number of selected component (if applicable): any OSP release with new iptables package.


How reproducible: easy to reproduce with functional test suite that utilizes the manager a lot. Example of the failure:

neutron.tests.functional.agent.test_firewall.FirewallTestCase.test_established_connection_is_cut(IptablesFirewallDriver,without ipset)
--------------------------------------------------------------------------------------------------------------------------------------

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "neutron/tests/functional/agent/test_firewall.py", line 113, in setUp
        self.firewall.prepare_port_filter(self.src_port_desc)
      File "neutron/agent/linux/iptables_firewall.py", line 204, in prepare_port_filter
        return self.iptables.apply()
      File "neutron/agent/linux/iptables_manager.py", line 432, in apply
        return self._apply()
      File "neutron/agent/linux/iptables_manager.py", line 440, in _apply
        first = self._apply_synchronized()
      File "neutron/agent/linux/iptables_manager.py", line 539, in _apply_synchronized
        '
'.join(log_lines))
      File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
        self.force_reraise()
      File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
        six.reraise(self.type_, self.value, self.tb)
      File "neutron/agent/linux/iptables_manager.py", line 518, in _apply_synchronized
        run_as_root=True)
      File "neutron/agent/linux/utils.py", line 156, in execute
        raise ProcessExecutionError(msg, returncode=returncode)
    neutron.agent.linux.utils.ProcessExecutionError: Exit code: 4; Stdin: # Generated by iptables_manager
    *filter
    :neutron-filter-top - [0:0]
    :run.py-FORWARD - [0:0]
    :run.py-INPUT - [0:0]
    :run.py-OUTPUT - [0:0]
    :run.py-it-veth0bc5 - [0:0]
    :run.py-local - [0:0]
    :run.py-ot-veth0bc5 - [0:0]
    :run.py-sg-chain - [0:0]
    :run.py-sg-fallback - [0:0]
    -I FORWARD 1 -j neutron-filter-top
    -I FORWARD 2 -j run.py-FORWARD
    -I INPUT 1 -j run.py-INPUT
    -I OUTPUT 1 -j neutron-filter-top
    -I OUTPUT 2 -j run.py-OUTPUT
    -I neutron-filter-top 1 -j run.py-local
    -I run.py-FORWARD 1 -m physdev --physdev-out test-veth0bc5b8 --physdev-is-bridged -m comment --comment "Direct traffic from the VM interface to the security group chain." -j run.py-sg-chain
    -I run.py-FORWARD 2 -m physdev --physdev-in test-veth0bc5b8 --physdev-is-bridged -m comment --comment "Direct traffic from the VM interface to the security group chain." -j run.py-sg-chain
    -I run.py-INPUT 1 -m physdev --physdev-in test-veth0bc5b8 --physdev-is-bridged -m comment --comment "Direct incoming traffic from VM to the security group chain." -j run.py-ot-veth0bc5
    -I run.py-it-veth0bc5 1 -p ipv6-icmp -m icmp6 --icmpv6-type 130 -j RETURN
    -I run.py-it-veth0bc5 2 -p ipv6-icmp -m icmp6 --icmpv6-type 134 -j RETURN
    -I run.py-it-veth0bc5 3 -p ipv6-icmp -m icmp6 --icmpv6-type 135 -j RETURN
    -I run.py-it-veth0bc5 4 -p ipv6-icmp -m icmp6 --icmpv6-type 136 -j RETURN
    -I run.py-it-veth0bc5 5 -m state --state RELATED,ESTABLISHED -m comment --comment "Direct packets associated with a known session to the RETURN chain." -j RETURN
    -I run.py-it-veth0bc5 6 -m state --state INVALID -m comment --comment "Drop packets that appear related to an existing connection (e.g. TCP ACK/FIN) but do not have an entry in conntrack." -j DROP
    -I run.py-it-veth0bc5 7 -m comment --comment "Send unmatched traffic to the fallback chain." -j run.py-sg-fallback
    -I run.py-ot-veth0bc5 1 -s ::/128 -d ff02::/16 -p ipv6-icmp -m icmp6 --icmpv6-type 131 -m comment --comment "Allow IPv6 ICMP traffic." -j RETURN
    -I run.py-ot-veth0bc5 2 -s ::/128 -d ff02::/16 -p ipv6-icmp -m icmp6 --icmpv6-type 135 -m comment --comment "Allow IPv6 ICMP traffic." -j RETURN
    -I run.py-ot-veth0bc5 3 -s ::/128 -d ff02::/16 -p ipv6-icmp -m icmp6 --icmpv6-type 143 -m comment --comment "Allow IPv6 ICMP traffic." -j RETURN
    -I run.py-ot-veth0bc5 4 -p ipv6-icmp -m icmp6 --icmpv6-type 134 -m comment --comment "Drop IPv6 Router Advts from VM Instance." -j DROP
    -I run.py-ot-veth0bc5 5 -p ipv6-icmp -m comment --comment "Allow IPv6 ICMP traffic." -j RETURN
    -I run.py-ot-veth0bc5 6 -p udp -m udp --sport 546 --dport 547 -m comment --comment "Allow DHCP client traffic." -j RETURN
    -I run.py-ot-veth0bc5 7 -p udp -m udp --sport 547 --dport 546 -m comment --comment "Prevent DHCP Spoofing by VM." -j DROP
    -I run.py-ot-veth0bc5 8 -m state --state RELATED,ESTABLISHED -m comment --comment "Direct packets associated with a known session to the RETURN chain." -j RETURN
    -I run.py-ot-veth0bc5 9 -m state --state INVALID -m comment --comment "Drop packets that appear related to an existing connection (e.g. TCP ACK/FIN) but do not have an entry in conntrack." -j DROP
    -I run.py-ot-veth0bc5 10 -m comment --comment "Send unmatched traffic to the fallback chain." -j run.py-sg-fallback
    -I run.py-sg-chain 1 -m physdev --physdev-out test-veth0bc5b8 --physdev-is-bridged -m comment --comment "Jump to the VM specific chain." -j run.py-it-veth0bc5
    -I run.py-sg-chain 2 -m physdev --physdev-in test-veth0bc5b8 --physdev-is-bridged -m comment --comment "Jump to the VM specific chain." -j run.py-ot-veth0bc5
    -I run.py-sg-chain 3 -j ACCEPT
    -I run.py-sg-fallback 1 -m comment --comment "Default drop rule for unmatched traffic." -j DROP
    COMMIT
    # Completed by iptables_manager
    # Generated by iptables_manager
    *raw
    :run.py-OUTPUT - [0:0]
    :run.py-PREROUTING - [0:0]
    -I OUTPUT 1 -j run.py-OUTPUT
    -I PREROUTING 1 -j run.py-PREROUTING
    -I run.py-PREROUTING 1 -m physdev --physdev-in brq7a7f000b-b8 -m comment --comment "Set zone for -veth0bc5b8" -j CT --zone 1
    -I run.py-PREROUTING 2 -i brq7a7f000b-b8 -m comment --comment "Set zone for -veth0bc5b8" -j CT --zone 1
    -I run.py-PREROUTING 3 -m physdev --physdev-in test-veth0bc5b8 -m comment --comment "Set zone for -veth0bc5b8" -j CT --zone 1
    COMMIT
    # Completed by iptables_manager
    ; Stdout: ; Stderr: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?


Steps to Reproduce:
Execute full functional suite on new RHEL with fresh iptables package and see some test cases fail when applying new security group rules.

There is a iptables bug that is related to xlock: https://bugzilla.redhat.com/show_bug.cgi?id=1481207 where the fix is to pass --wait for all calls to iptables and iptables-restore in iptables startup scripts.

Comment 5 Eduard Barrera 2017-11-06 10:55:06 UTC
Hi all,

This issue is already fixed on uptstream for newton/ocata and pike.

Can we have a hotfix for newton ?

Comment 6 Ihar Hrachyshka 2017-11-06 18:34:01 UTC
Eduard, it's already fixed there too: https://bugzilla.redhat.com/show_bug.cgi?id=1489070 Please inspect the linked bugs in 'Blocks' section to get all clones of this bug for different releases.

Comment 7 Arie Bregman 2017-11-07 06:32:27 UTC
*** Bug 1490843 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2017-12-13 22:05:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462