Red Hat Bugzilla – Bug 1305434
Firewalld hangs with a NIS configuration
Last modified: 2016-11-03 17:02:31 EDT
Description of problem: The firewalld code contains two bugs that cause spurious getprotobyname() and getservbyname() calls. If there is a NIS backend that serves the corresponding maps, the firewalld process will hang in a very long timeout during a reload. The first bug is using a protocol name "icmpv6" in fw.py that is not present in /etc/protocols. The second bug causes getservbyname() lookups for port ranges. It is in functions.py. In an environment running a NIS server that serves the maps: protocols.byname protocols.bynumber services.byname services.byservicename The *.byname maps seems to be causing the problem. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: It is sufficient to run in the environment described above systemctl restart firewalld.service systemctl reload firewalld.service systemctl restart firewalld.service The second restart hangs. Versions: firewalld-0.3.9-7.el7.noarch setup-2.8.71-4.el7.noarch Additional info: the code portion from functions.py that causes spurious getservbyname() calls for port ranges: def getPortRange(ports): """ Get port range for port range string or single port id @param ports an integer or port string or port range string @return Array containing start and end port id for a valid range or -1 if port can not be found and -2 if port is too big for integer input or -1 for invalid ranges or None if the range is ambiguous. """ if isinstance(ports, int): id = getPortID(ports) if id >= 0: return (id,) return id splits = ports.split("-") matched = [ ] for i in range(len(splits), 0, -1): >>>> The next line leads to getservbyname() id1 = getPortID("-".join(splits[:i])) <<<< port2 = "-".join(splits[i:]) if len(port2) > 0: id2 = getPortID(port2) if id1 >= 0 and id2 >= 0: if id1 < id2: matched.append((id1, id2)) elif id1 > id2: matched.append((id2, id1)) else: matched.append((id1, )) else: if id1 >= 0: matched.append((id1,)) if i == len(splits): # full match, stop here break if len(matched) < 1: return -1 elif len(matched) > 1: return None return matched[0]
The 'icmpv6' standard is called that but the official /etc/protocols file along with what is in RHEL7 both use 'ipv6-icmp' instead. http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
The use of "icmpv6" in fw.py can be changed to "ipv6-icmp". But the use of socket.getservbyname in getPortID is needed and only used if a service name is used. Then getservbyname returns the port id for the string. If a port number getservbyname it is not used. Are port names instead of port numbers used in the user configuration? firewalld itself is not using port names.
Created attachment 1142187 [details] strace of failing firewalld reload I'm attaching the firewalld.strace from Red Hat CASE 01569103. To see a failing ip6tables execution, grep the PID 8151 in this file.
> The use of "icmpv6" in fw.py can be changed to "ipv6-icmp". I provided a trivial patch in Red Hat CASE 01569103. > Are port names instead of port numbers used in the user configuration? firewalld itself is not using port names. The problem occurs in a firewalld reload. No user supplied rules are involved, as we only use IPv4, but this command fails: /sbin/ip6tables -t raw -I PREROUTING 1 -p icmpv6 --icmpv6-type=router-advertisement -j ACCEPT This is the most relevant line in the strace of this execution: 8151 sendto(4, "\33\0226?\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7bbaprod\0\0\0\0\20protocols.byname\0\0\0\6icmpv6\0\0", 84, 0, {sa_family=AF_INET, sin_port=htons(606), sin_addr=inet_addr("10.17.86.21")}, 16) = -1 EPERM (Operation not permitted)
Ok, I missed the issue in getPortRange.
Here are the two fixes: Use ipv6-icmp instead of ipcmpv6 to prevent getprotobyname calls (RHBZ#1305434) https://github.com/t-woerner/firewalld/commit/846f5e708cf0586d8d7c0387e97cf2787444864d and functions.getPortRange: No getservbyname for port id ranges (RHBZ#1305434) https://github.com/t-woerner/firewalld/commit/8565b81cd24272f59f358336b3a0ba8921eed096
Thanks, Thomas! I will see that I can test your fixes ASAP.
I didn't get around to applying the patches to my RHEL 7.0 until today (you message may have had something to do with that...). I'm afraid, we do not have a complete solution, as firewalld does not always call iptables with numeric protocol or service identifiers. The reload now stop with this iptables call, trying to lookup the protocol name "icmp": /sbin/iptables -t filter -I INPUT 6 -p icmp -j ACCEPT I traced the firewalld process and grepped all execve calls (only calls to iptables and ip6tables). When I filter them for the -p/--protocol option, the arguments for that are all names. I checked all port arguments ("--dport" and "--destination-port"), and they are all numeric. So it looks like firewalld needs to memoize a protocol name to number mapping and use that when calling iptables/ip6tables to avoid the NIS problem. I have already posted this analysis in the support case 01569103. Since firewalld on Github is at 0.4.0, I should note that I did not import that to my RHEL 7.0 (rpm version 0.3.9-7.el7) but manually patched the two files fw.py and functions.py with your new code. So I can't tell if the any of the newer firewalld versions uses protocol numbers rather than names.
firewalld is not providing a mapping and also not caching mappings for protocol names to protocol ids. It is using the mechanism that the system provides to achieve this. I do not think that a duplication in firewalld is a good move.
I'm aware of this and I understand your position. It was an idea how firewalld could work around the problem with an unaccessible NIS. Another possibility would be not to radically set iptables to all deny (I don't remember what it is actually set to, sorry) but to statefully permit outgoing connections, as firewalld does not block them anyway.
(In reply to Lutz P. Christoph from comment #15) > Another possibility would be not to radically set iptables to all deny (I > don't remember what it is actually set to, sorry) but to statefully permit > outgoing connections, as firewalld does not block them anyway. Please explain, I do not understand this.
Currently, firewalld shuts down iptables completely when doing a reload, like this: Chain INPUT (policy DROP) target prot opt source destination ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 INPUT_direct all -- 0.0.0.0/0 0.0.0.0/0 INPUT_ZONES_SOURCE all -- 0.0.0.0/0 0.0.0.0/0 INPUT_ZONES all -- 0.0.0.0/0 0.0.0.0/0 Chain FORWARD (policy DROP) target prot opt source destination Chain OUTPUT (policy DROP) target prot opt source destination Chain INPUT_ZONES (1 references) target prot opt source destination Chain INPUT_ZONES_SOURCE (1 references) target prot opt source destination Chain INPUT_direct (1 references) target prot opt source destination You see that OUTPUT is set to DROP. When firewalld is running normally, it is set to ACCEPT, and the chain OUTPUT_direct is empty. Chain OUTPUT (policy ACCEPT) target prot opt source destination OUTPUT_direct all -- 0.0.0.0/0 0.0.0.0/0 Chain OUTPUT_direct (1 references) target prot opt source destination This allows any locally originating traffic to pass. What I propose as a possible solution is to leave those two chain unchanged, or to delete OUTPUT_direct and the corresponding rule in OUTPUT. That would allow the NIS service to continue working.
A additional patch to limit the time where default chains are on policy DROP while reloading has been added upstream: https://github.com/t-woerner/firewalld/commit/c91b2fa8f478653aa1ff4ee66bcd0d069554574c With this patch applied the config files are read before flushing the old rules and setting the default policy to DROP. Also the calls to getPortID are done before. This means that the rule set of firwalld and the zones should not result in this anymore. The only possible issue here still is the use of port or protocol names in direct rules saved in direct.xml. The rules from libvirt for example are regenerated right after firewalld finished reload and therefore should not result in this issue.
(In reply to Lutz P. Christoph from comment #17) > > This allows any locally originating traffic to pass. What I propose as a > possible solution is to leave those two chain unchanged, or to delete > OUTPUT_direct and the corresponding rule in OUTPUT. > > That would allow the NIS service to continue working. But the replies are blocked in INPUT. Is NIS really working then?
Here is a scratch build of the current version 0.4.1.2 for RHEL-7: http://people.redhat.com/twoerner/firewalld/0.4.1.2-1.el7/ Please have a look at SELINUX.README. A change for selinux-policy is needed to allow the use of the iptables-restore commands with file inputs from /run/firewalld. Is it fixing the issue for you?
Thanks for the build. I'm on vacation right now, so I'll assume this is essentially identical to the backport I did (massaging your patches into the RHEL 7.0 sources). I will give it a try next week. This problem is completely unrelated to SELinux for two reasons: 1) SElinux is disabled on the machine in question. 2) Using strace shows the NIS request failing that normally goes through. I doubt firewalld twiddles the SELinux status when it reloads. 8151 bind(4, {sa_family=AF_INET, sin_port=htons(695), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 8151 setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0 8151 close(3) = 0 8151 sendto(4, "\33\0226?\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\7bbaprod\0\0\0\0\20protocols.byname\0\0\0\6icmpv6\0\0", 84, 0, {sa_family=AF_INET, sin_port=htons(606), sin_addr=inet_addr("10.17.86.21")}, 16) = -1 EPERM (Operation not permitted) 8151 write(2, "do_ypcall: clnt_call: RPC: Unable to send; errno = Operation not permitted\n", 75) = 75 8151 close(4 <unfinished ...>
Thomas, I see that you did a whole load of commits on Github. As I don't have the time to go through all of them - did you commit anything new related to this ticket? I see I neglected to test 0.4.1.2 after I returned from vacation. Since you have a 0.4.2-1.el7 now, I will test that. Thanks!
Yes, there is the transaction model. This makes it possible to remove the old and to apply the new firewall configuration in 6 calls, if there are no direct rules and ipsets used. In the first three, the old rule set is cleaned up and the new rules for all zones are generated for ipv4, ipv6 and eb. This means that the possibility to run into this issue goes down dramatically. Then there are three more calls to set the policies back to accept for ipv4, ipv6 and eb. In these calls there are no protocols or ports used, therefore there should not be any issue. The direct rules are separate, because they could contain rules, which can not be applied and therefore could make the transaction fail in the first step. If IndividualCalls is enabled in the firewalld configuration, then the transaction model is not used and the old behavior steps in.
verified firewalld-0.4.3.2-8.el7 old: [root@vm-rhel7s ~]# firewall-cmd --complete-reload ERROR:dbus.proxies:Introspect error on :1.472:/org/fedoraproject/FirewallD1: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. when connection allowed: 304.150037208 192.168.122.246 -> 192.168.122.144 YPSERV 126 V2 MATCH Call nisdom/protocols.byname/icmpv6 304.150109754 192.168.122.144 -> 192.168.122.246 YPSERV 74 V2 MATCH Reply (Call In 144) new: no requests, immediate reload.
"num-num/port" ranges works without proceeding to resolve the name (hence it does not hang on unavailable nis server). however name-ranges like "firewall-cmd --add-port domain-tftp/udp" still tries to find service via NIS call: YPSERV 142 V2 MATCH Call firewalld/services.byservicename/domain-tftp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2597.html