Bug 1305434 - Firewalld hangs with a NIS configuration
Firewalld hangs with a NIS configuration
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: firewalld (Show other bugs)
7.2
Unspecified Linux
high Severity high
: rc
: ---
Assigned To: Thomas Woerner
Tomas Dolezal
:
Depends On:
Blocks: 1376842
  Show dependency treegraph
 
Reported: 2016-02-08 04:06 EST by Maurizio Schena
Modified: 2016-11-03 17:02 EDT (History)
4 users (show)

See Also:
Fixed In Version: firewalld-0.4.2-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1376842 (view as bug list)
Environment:
Last Closed: 2016-11-03 17:02:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace of failing firewalld reload (3.47 MB, text/plain)
2016-03-31 07:58 EDT, Lutz P. Christoph
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2068503 None None None 2016-07-21 10:06 EDT

  None (edit)
Description Maurizio Schena 2016-02-08 04:06:40 EST
Description of problem:
The firewalld code contains two bugs that cause spurious getprotobyname() and getservbyname() calls. If there is a NIS backend that serves the corresponding maps, the firewalld process will hang in a very long timeout during a reload.

The first bug is using a protocol name "icmpv6" in fw.py that is not present in /etc/protocols. The second bug causes getservbyname() lookups for port ranges. It is in functions.py.

In an environment running a NIS server that serves the maps:
protocols.byname protocols.bynumber services.byname services.byservicename
The *.byname maps seems to be causing the problem.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
It is sufficient to run in the environment described above
systemctl restart firewalld.service
systemctl reload firewalld.service
systemctl restart firewalld.service
The second restart hangs.

Versions:
firewalld-0.3.9-7.el7.noarch
setup-2.8.71-4.el7.noarch 

Additional info:

the code portion from functions.py that causes spurious getservbyname() calls for port ranges:

def getPortRange(ports):
    """ Get port range for port range string or single port id

    @param ports an integer or port string or port range string
    @return Array containing start and end port id for a valid range or -1 if port can not be found and -2 if port is too big for integer input or -1 for invalid ranges or None if the range is ambiguous.
    """

    if isinstance(ports, int):
        id = getPortID(ports)
        if id >= 0:
            return (id,)
        return id

    splits = ports.split("-")
    matched = [ ]
    for i in range(len(splits), 0, -1):
>>>> The next line leads to getservbyname() 
        id1 = getPortID("-".join(splits[:i]))
<<<<
        port2 = "-".join(splits[i:])
        if len(port2) > 0:
            id2 = getPortID(port2)
            if id1 >= 0 and id2 >= 0:
                if id1 < id2:
                    matched.append((id1, id2))
                elif id1 > id2:
                    matched.append((id2, id1))
                else:
                    matched.append((id1, ))
        else:
            if id1 >= 0:
                matched.append((id1,))
                if i == len(splits):
                    # full match, stop here
                    break
    if len(matched) < 1:
        return -1
    elif len(matched) > 1:
        return None
    return matched[0]
Comment 1 Maurizio Schena 2016-02-08 04:07:46 EST
The 'icmpv6' standard is called that but the official /etc/protocols file along with what is in RHEL7 both use 'ipv6-icmp' instead.

http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
Comment 5 Thomas Woerner 2016-03-31 07:40:08 EDT
The use of "icmpv6" in fw.py can be changed to "ipv6-icmp".

But the use of socket.getservbyname in getPortID is needed and only used if a service name is used. Then getservbyname returns the port id for the string. If a port number getservbyname it is not used.

Are port names instead of port numbers used in the user configuration? firewalld itself is not using port names.
Comment 6 Lutz P. Christoph 2016-03-31 07:58 EDT
Created attachment 1142187 [details]
strace of failing firewalld reload

I'm attaching the firewalld.strace from Red Hat CASE 01569103.

To see a failing ip6tables execution, grep the PID 8151 in this file.
Comment 7 Lutz P. Christoph 2016-03-31 08:02:17 EDT
> The use of "icmpv6" in fw.py can be changed to "ipv6-icmp".

I provided a trivial patch in Red Hat CASE 01569103.

> Are port names instead of port numbers used in the user configuration? firewalld itself is not using port names.

The problem occurs in a firewalld reload. No user supplied rules are involved, as we only use IPv4, but this command fails:
/sbin/ip6tables -t raw -I PREROUTING 1 -p icmpv6 --icmpv6-type=router-advertisement -j ACCEPT

This is the most relevant line in the strace of this execution:
8151  sendto(4, "\33\0226?\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7bbaprod\0\0\0\0\20protocols.byname\0\0\0\6icmpv6\0\0", 84, 0, {sa_family=AF_INET, sin_port=htons(606), sin_addr=inet_addr("10.17.86.21")}, 16) = -1 EPERM (Operation not permitted)
Comment 8 Thomas Woerner 2016-03-31 09:23:38 EDT
Ok, I missed the issue in getPortRange.
Comment 9 Thomas Woerner 2016-03-31 09:25:00 EDT
Here are the two fixes:

Use ipv6-icmp instead of ipcmpv6 to prevent getprotobyname calls (RHBZ#1305434)

https://github.com/t-woerner/firewalld/commit/846f5e708cf0586d8d7c0387e97cf2787444864d

and

functions.getPortRange: No getservbyname for port id ranges (RHBZ#1305434)

https://github.com/t-woerner/firewalld/commit/8565b81cd24272f59f358336b3a0ba8921eed096
Comment 12 Lutz P. Christoph 2016-03-31 10:06:57 EDT
Thanks, Thomas!

I will see that I can test your fixes ASAP.
Comment 13 Lutz P. Christoph 2016-04-06 05:35:06 EDT
I didn't get around to applying the patches to my RHEL 7.0 until today (you message may have had something to do with that...). I'm afraid, we do not have a complete solution, as firewalld does not always call iptables with numeric protocol or service identifiers. The reload now stop with this iptables call, trying to lookup the protocol name "icmp":

/sbin/iptables -t filter -I INPUT 6 -p icmp -j ACCEPT

I traced the firewalld process and grepped all execve calls (only calls to iptables and ip6tables). When I filter them for the -p/--protocol option, the arguments for that are all names. I checked all port arguments ("--dport" and "--destination-port"), and they are all numeric.

So it looks like firewalld needs to memoize a protocol name to number mapping and use that when calling iptables/ip6tables to avoid the NIS problem.

I have already posted this analysis in the support case 01569103.

Since firewalld on Github is at 0.4.0, I should note that I did not import that to my RHEL 7.0 (rpm version 0.3.9-7.el7) but manually patched the two files fw.py and functions.py with your new code. So I can't tell if the any of the newer firewalld versions uses protocol numbers rather than names.
Comment 14 Thomas Woerner 2016-04-12 11:05:53 EDT
firewalld is not providing a mapping and also not caching mappings for protocol names to protocol ids. It is using the mechanism that the system provides to achieve this. I do not think that a duplication in firewalld is a good move.
Comment 15 Lutz P. Christoph 2016-04-12 11:26:39 EDT
I'm aware of this and I understand your position. It was an idea how firewalld could work around the problem with an unaccessible NIS.

Another possibility would be not to radically set iptables to all deny (I don't remember what it is actually set to, sorry) but to statefully permit outgoing connections, as firewalld does not block them anyway.
Comment 16 Thomas Woerner 2016-04-13 05:37:40 EDT
(In reply to Lutz P. Christoph from comment #15)
> Another possibility would be not to radically set iptables to all deny (I
> don't remember what it is actually set to, sorry) but to statefully permit
> outgoing connections, as firewalld does not block them anyway.

Please explain, I do not understand this.
Comment 17 Lutz P. Christoph 2016-04-14 10:50:10 EDT
Currently, firewalld shuts down iptables completely when doing a reload, like this:

Chain INPUT (policy DROP)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
INPUT_direct  all  --  0.0.0.0/0            0.0.0.0/0           
INPUT_ZONES_SOURCE  all  --  0.0.0.0/0            0.0.0.0/0           
INPUT_ZONES  all  --  0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy DROP)
target     prot opt source               destination         

Chain OUTPUT (policy DROP)
target     prot opt source               destination         

Chain INPUT_ZONES (1 references)
target     prot opt source               destination         

Chain INPUT_ZONES_SOURCE (1 references)
target     prot opt source               destination         

Chain INPUT_direct (1 references)
target     prot opt source               destination         

You see that OUTPUT is set to DROP. When firewalld is running normally, it is set to ACCEPT, and the chain OUTPUT_direct is empty.
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
OUTPUT_direct  all  --  0.0.0.0/0            0.0.0.0/0           
Chain OUTPUT_direct (1 references)
target     prot opt source               destination         

This allows any locally originating traffic to pass. What I propose as a possible solution is to leave those two chain unchanged, or to delete OUTPUT_direct and the corresponding rule in OUTPUT.

That would allow the NIS service to continue working.
Comment 18 Thomas Woerner 2016-04-14 12:57:37 EDT
A additional patch to limit the time where default chains are on policy DROP while reloading has been added upstream: https://github.com/t-woerner/firewalld/commit/c91b2fa8f478653aa1ff4ee66bcd0d069554574c

With this patch applied the config files are read before flushing the old rules and setting the default policy to DROP. Also the calls to getPortID are done before.

This means that the rule set of firwalld and the zones should not result in this anymore. The only possible issue here still is the use of port or protocol names in direct rules saved in direct.xml.

The rules from libvirt for example are regenerated right after firewalld finished reload and therefore should not result in this issue.
Comment 19 Thomas Woerner 2016-04-14 13:04:26 EDT
(In reply to Lutz P. Christoph from comment #17)     
> 
> This allows any locally originating traffic to pass. What I propose as a
> possible solution is to leave those two chain unchanged, or to delete
> OUTPUT_direct and the corresponding rule in OUTPUT.
> 
> That would allow the NIS service to continue working.

But the replies are blocked in INPUT. Is NIS really working then?
Comment 20 Thomas Woerner 2016-05-02 08:20:24 EDT
Here is a scratch build of the current version 0.4.1.2 for RHEL-7:

http://people.redhat.com/twoerner/firewalld/0.4.1.2-1.el7/

Please have a look at SELINUX.README. A change for selinux-policy is needed to allow the use of the iptables-restore commands with file inputs from /run/firewalld.

Is it fixing the issue for you?
Comment 21 Lutz P. Christoph 2016-05-02 08:55:58 EDT
Thanks for the build. I'm on vacation right now, so I'll assume this is essentially identical to the backport I did (massaging your patches into the RHEL 7.0 sources). I will give it a try next week.

This problem is completely unrelated to SELinux for two reasons:
1) SElinux is disabled on the machine in question.
2) Using strace shows the NIS request failing that normally goes through. I doubt firewalld twiddles the SELinux status when it reloads.

8151  bind(4, {sa_family=AF_INET, sin_port=htons(695), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
8151  setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0
8151  close(3)                          = 0
8151  sendto(4, "\33\0226?\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7bbaprod\0\0\0\0\20protocols.byname\0\0\0\6icmpv6\0\0", 84, 0, {sa_family=AF_INET, sin_port=htons(606), sin_addr=inet_addr("10.17.86.21")}, 16) = -1 EPERM (Operation not permitted)
8151  write(2, "do_ypcall: clnt_call: RPC: Unable to send; errno = Operation not permitted\n", 75) = 75
8151  close(4 <unfinished ...>
Comment 23 Lutz P. Christoph 2016-06-01 04:35:44 EDT
Thomas, I see that you did a whole load of commits on Github. As I don't have the time to go through all of them - did you commit anything new related to this ticket?

I see I neglected to test 0.4.1.2 after I returned from vacation. Since you have a 0.4.2-1.el7 now, I will test that.

Thanks!
Comment 24 Thomas Woerner 2016-06-01 06:16:36 EDT
Yes, there is the transaction model. This makes it possible to remove the old and to apply the new firewall configuration in 6 calls, if there are no direct rules and ipsets used.

In the first three, the old rule set is cleaned up and the new rules for all zones are generated for ipv4, ipv6 and eb. This means that the possibility to run into this issue goes down dramatically.

Then there are three more calls to set the policies back to accept for ipv4, ipv6 and eb. In these calls there are no protocols or ports used, therefore there should not be any issue.

The direct rules are separate, because they could contain rules, which can not be applied and therefore could make the transaction fail in the first step.

If IndividualCalls is enabled in the firewalld configuration, then the transaction model is not used and the old behavior steps in.
Comment 25 Tomas Dolezal 2016-09-16 09:59:03 EDT
verified firewalld-0.4.3.2-8.el7

old: 
[root@vm-rhel7s ~]# firewall-cmd --complete-reload
ERROR:dbus.proxies:Introspect error on :1.472:/org/fedoraproject/FirewallD1: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

when connection allowed:
304.150037208 192.168.122.246 -> 192.168.122.144 YPSERV 126 V2 MATCH Call nisdom/protocols.byname/icmpv6
304.150109754 192.168.122.144 -> 192.168.122.246 YPSERV 74 V2 MATCH Reply (Call In 144)


new:
no requests, immediate reload.
Comment 26 Tomas Dolezal 2016-09-16 10:37:03 EDT
"num-num/port" ranges works without proceeding to resolve the name (hence it does not hang on unavailable nis server).

however name-ranges like "firewall-cmd --add-port domain-tftp/udp" still tries to find service via NIS call:
  YPSERV 142 V2 MATCH Call firewalld/services.byservicename/domain-tftp
Comment 28 errata-xmlrpc 2016-11-03 17:02:31 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2597.html

Note You need to log in before you can comment on or make changes to this bug.