Bug 1098281
Summary: | docker interferes with firewall initialisation via firewalld | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Stephen Tweedie <sct> | ||||
Component: | NetworkManager | Assignee: | Thomas Haller <thaller> | ||||
Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 | CC: | danw, dbrockus, dcbw, degts, dwalsh, fweimer, greartes, jeder, jklimes, jpoimboe, jpopelka, lfc30, linville, mattdm, michele, pdwyer, sauchter, sbonnevi, sghosh, tgummels, thaller, tjay, twoerner, vbenes, villapla | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | NetworkManager-0.9.11.0-6.git20141125.f32075d2.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1161745 (view as bug list) | Environment: | |||||
Last Closed: | 2015-03-05 13:51:01 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1113141, 1161745 | ||||||
Attachments: |
|
Description
Stephen Tweedie
2014-05-15 15:52:01 UTC
BZ is now public Any update on this issue? Senior Linux Admin here at Georgetown University trying to roadmap our move from RHEL6 -> 7 and one of the main items of interest is docker containerizing. No sorry. Been working lots of other issues. Trying to get others to look into this. Is there any update on this? I (firewalld developer) have been recently asked to take a look at the problem. Firewalld support in docker seems like correct long term solution, but: 1) It'll take some time because my golang knowledge is zero 2) I'm not sure how willing the docker upstream will be to eventually integrate something like firewalld support Meanwhile the work-around suggested in bug #1151067, comment #3 (needs patching iptables and firewalld) might very well help with this problem. I am pretty sure they would take a patch to work well with firewalld. I don't think that would be a problem, as long as they can continue to support both models. I agree that having docker interface with firewalld instead of iptables sounds like a good idea. However I think this failure is specifically due to NetworkManager detecting the new docker0 bridge device (which is created by the docker daemon), and then trying to set it up with firewalld, which conflicts with docker0's use of iptables with the device. This issue seems to be fixed in F20: Oct 24 11:41:53 treble NetworkManager[1202]: <info> (docker0): ignoring bridge not created by NetworkManager The fix was apparently made with NetworkManager upstream commit 17338069e332bb73e4d9e7332b67b7853fbe83b7: commit 17338069e332bb73e4d9e7332b67b7853fbe83b7 Author: Dan Williams <dcbw> Date: Fri Feb 1 18:03:11 2013 -0600 core: only manage those bridges created by NetworkManager (rh #905035) But it still looks like a bug in RHEL 7: Oct 24 12:47:35 rhel7 docker: [c850ed30.init_networkdriver()] creating new bridge for docker0 Oct 24 12:47:35 rhel7 avahi-daemon[484]: Joining mDNS multicast group on interface docker0.IPv4 with address 172.17.42.1. Oct 24 12:47:35 rhel7 avahi-daemon[484]: New relevant interface docker0.IPv4 for mDNS. Oct 24 12:47:35 rhel7 avahi-daemon[484]: Registering new address record for 172.17.42.1 on docker0.IPv4. Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): carrier is OFF (but ignored) Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): new Bridge device (driver: 'bridge' ifindex: 3) Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): exported as /org/freedesktop/NetworkManager/Devices/2 Oct 24 12:47:35 rhel7 NetworkManager[609]: ifcfg-rh: read connection 'docker0' Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): device state change: unmanaged -> unavailable (reason 'connection-assumed') [10 20 41] Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41] Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) starting connection 'docker0' Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 1 of 5 (Device Prepare) scheduled... Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 1 of 5 (Device Prepare) started... Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): device state change: disconnected -> prepare (reason 'none') [30 40 0] Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 2 of 5 (Device Configure) scheduled... Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 1 of 5 (Device Prepare) complete. Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 2 of 5 (Device Configure) starting... Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): device state change: prepare -> config (reason 'none') [40 50 0] Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 2 of 5 (Device Configure) successful. Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 2 of 5 (Device Configure) complete. Oct 24 12:47:35 rhel7 kernel: IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready Oct 24 12:47:35 rhel7 firewalld: 2014-10-24 12:47:35 ERROR: '/sbin/ip6tables -I INPUT_ZONES 1 -t filter -i docker0 -g IN_public' failed: Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Oct 24 12:47:35 rhel7 firewalld: 2014-10-24 12:47:35 ERROR: '/sbin/iptables -D INPUT_ZONES 1 -t filter -i docker0 -g IN_public' failed: iptables v1.4.21: Illegal option `-j' with this command Oct 24 12:47:35 rhel7 firewalld: 2014-10-24 12:47:35 ERROR: COMMAND_FAILED: '/sbin/ip6tables -I INPUT_ZONES 1 -t filter -i docker0 -g IN_public' failed: Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Oct 24 12:47:35 rhel7 NetworkManager[609]: <warn> (docker0) firewall zone add/change failed [2]: (32) COMMAND_FAILED: '/sbin/ip6tables -I INPUT_ZONES 1 -t filter -i docker0 -g IN_public' failed: Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 3 of 5 (IP Configure Start) scheduled. Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 3 of 5 (IP Configure Start) started... Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): device state change: config -> ip-config (reason 'none') [50 70 0] Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): IPv4 config waiting until carrier is on Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> (docker0): IPv6 config waiting until carrier is on Oct 24 12:47:35 rhel7 NetworkManager[609]: <info> Activation (docker0) Stage 3 of 5 (IP Configure Start) complete. So maybe it's just a matter of backporting NetworkManager bug 905035 to RHEL 7. There is also another bug where docker sets up its own iptables rules. man docker --icc=true|false Enable inter-container communication. Default is true. --iptables=true|false Disable Docker's addition of iptables rules. Default is true. (In reply to Jiri Popelka from comment #7) > pkg/iptables/iptables.go @@ -150 @@ func Raw(args ...string) > + args = append(args, []string{"-w"}...) > output, err := exec.Command(path, args...).CombinedOutput() Looks like this has been added with https://github.com/docker/docker/commit/034babf1753741184c1155a7346ecec86fc51e2c Hello, Why was this closed? This is still an issue with RHEL 7. Regards, David Jiri do you believe this is fixed in current RHEL7 docker? Actually the upstream commit is https://github.com/docker/docker/commit/b315c380f4acd65cc0428009702f99a266f96c59 which was released first with v0.12.0. You're the one to clarify this, but AFAICT it's *not* fixed in 0.7.6 we have in rhel-7.1, but it *is* fixed in 1.2.0 we have in extras-rhel-7.1. Reassigning to firewalld (similar to bug #1151067) which should also use the -w/--wait flag when calling iptables to avoid this particular problem. Of course the long term solution still is firewalld support in docker which I'm working on. Why wouldn't the real fix be to prevent NetworkManager from managing bridges it didn't create, as I suggested in comment #20? Otherwise NetworkManager could still overwrite settings made by dockerd, even if they both use '-i' and '-w'. I agree with that fix. But I would also like to get dockerd to talk to firewalld when firewalld is running. But NetworkManager should not be overriding dockerd when dockerd sets up the bridge. Dan: What do you think about cloning the bug for firewalld and assigning this one back to docker to have bugs for both components? Well this bug is actually for networkmanager I believe, but I will do that. (In reply to Josh Poimboeuf from comment #26) > Why wouldn't the real fix be to prevent NetworkManager from managing bridges > it didn't create, as I suggested in comment #20? Otherwise NetworkManager > could still overwrite settings made by dockerd, even if they both use '-i' > and '-w'. The iptables lock is global, not interface-specific, so it would only make it less likely to trigger the race. (In reply to Florian Weimer from comment #30) > (In reply to Josh Poimboeuf from comment #26) > > Why wouldn't the real fix be to prevent NetworkManager from managing bridges > > it didn't create, as I suggested in comment #20? Otherwise NetworkManager > > could still overwrite settings made by dockerd, even if they both use '-i' > > and '-w'. > > The iptables lock is global, not interface-specific, so it would only make > it less likely to trigger the race. Thanks, didn't realize it was a global lock. Making all parties use '-w' does seem like the right short term fix. Regardless I think it's still a good idea to prevent NetworkManager from managing the docker bridge so that it doesn't overwrite its iptables settings, race or no race. Pushed branch for review, for NetworkManager upstream: th/rh1098281_firewall_assumed_device > firewall: always complete callbacks asynchronously I'd just #define PENDING_CALL_DUMMY GUINT_TO_POINTER(1), less magic that way. I'd actually convert info->is_idly_scheduled to a 'guint32 idle_id' and store the idle callback id there. That way you don't need two variables 'canceled' and 'is_idly_scheduled'. You could just set idle_id to 0 and that indicates canceled in add_or_change_idle_cb(). Also I'd g_assert (priv->pending_calls == NULL) in dispose(). Since all the calls take a ref on the FirewallManager there shouldn't be any outstanding calls at dispose() time but doesn't hurt to enforce that. > firewall: don't set firewall zone for assumed devices I feel like it just adds complexity to do the FW call and simulate success. Why not just check nm_device_uses_assumed_connection() and just not call any firewall stuff at all? I'd rather do that... (In reply to Dan Williams from comment #33) done and repushed Why add fw_change_zone_idle_cb()? Why not just call activation_source_schedule() directly from nm_device_activate_schedule_stage3_ip_config_start() ? The new "firewall: don't set firewall zone for assumed devices" commit schedules an idle handler that then schedules another idle handler that finally calls nm_device_activate_stage3_ip_config_start() which seems unecessary... (In reply to Dan Williams from comment #35) > Why add fw_change_zone_idle_cb()? Why not just call > activation_source_schedule() directly from > nm_device_activate_schedule_stage3_ip_config_start() ? > > The new "firewall: don't set firewall zone for assumed devices" commit > schedules an idle handler that then schedules another idle handler that > finally calls nm_device_activate_stage3_ip_config_start() which seems > unecessary... fixup pushed Created attachment 957625 [details]
cleanup patch
How about this cleanup patch? The IDLY_SCHEDULED enum isn't really needed since the idle_id of add_or_change_idle_id() is sufficient to determine whether the idle should be canceled or not.
(In reply to Dan Williams from comment #37) > Created attachment 957625 [details] > cleanup patch > > How about this cleanup patch? The IDLY_SCHEDULED enum isn't really needed > since the idle_id of add_or_change_idle_id() is sufficient to determine > whether the idle should be canceled or not. Good! Patch applied on branch and rebased. Looks good to me now! (In reply to Dan Williams from comment #39) > Looks good to me now! merged to upstream master: http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=d638ccdecab85b55a6e54f29f6aded3aa0460259 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0311.html |