Bug 1435791

Summary: Upgrading polkit disrupts docker bridge networking
Product: Red Hat Enterprise Linux 7 Reporter: Jake Hunsaker <jhunsaker>
Component: polkitAssignee: Polkit Maintainers <polkit-devel>
Status: CLOSED CURRENTRELEASE QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: medium    
Version: 7.3CC: amurdaca, cww, dornelas, fsumsal, jamills, jhunsaker, jrybar, lsm5, reli, walters
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-02 16:02:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1477664    

Description Jake Hunsaker 2017-03-24 19:37:56 UTC
Description of problem:

CEE has a customer who has found that upgrading polkit disrupts docker networking.

Communication on docker0 works correctly (so containers can communicate with each other and the host), but tcpdump shows no packets are ever bridged over to the host interface (either inbound or outbound). While in this failed state, interfaces all look correct, as do iptables nat and filter rules.

Restarting individual containers doesn't address the issue, only restarting the docker daemon does. They tested upgrade and dowgrade on polkit and both cause the same issue (so it appears systemic). 


Version-Release number of selected component (if applicable):
docker-1.10.3-59.el7.x86_64
polkit-0.112-11.el7_3.x86_64

How reproducible:
Always 

Steps to Reproduce:
1. Have containers running and communicating over the network
2. Upgrade polkit
3.

Actual results:
Containers stop communicating on the network outside of the docker0 bridge

Expected results:
Containers should not lose communication

Additional info:

I am clarifying with the customer if this continues with docker-1.12. I believe it does, so I am filing this bug early, but I am just waiting for confirmation.

Comment 2 Daniel Walsh 2017-03-27 13:16:03 UTC
I would figure polkit must be mucking around with the iptables rules?

Comment 3 Colin Walters 2017-03-27 13:29:28 UTC
No, polkit has nothing to do with iptables.  My wild guess here is that firewalld isn't handling polkit being restarted; it uses python-slip, see: 

https://github.com/nphilipp/python-slip/blob/master/slip/dbus/polkit.py

Which glancing at the code, I'm not seeing support for polkit restarting.

Comment 4 Miloslav Trmač 2017-03-27 14:10:25 UTC
No, polkit is definitely not modifying iptables on its own.

Do the containers stop communicating _immediately_ after the polkit upgrade, or only after a further action (perhaps starting/stopping another container?)

Is the system using firewalld or raw iptables?

(FWIW the Docker daemon, via https://github.com/docker/libnetwork/blob/master/iptables/firewalld.go , seems not to interact with polkit at all, only to rely on polkit’s hard-coded “root can do anything” policy.)



> https://github.com/nphilipp/python-slip/blob/master/slip/dbus/polkit.py
>
> Which glancing at the code, I'm not seeing support for polkit restarting.

Isn’t that what PolKit._on_name_owner_changed is supposed to do?


> While in this failed state, interfaces all look correct, as do iptables nat and filter rules.

Are you saying that the iptables rules are exactly the same before and after polkit restart, but networking starts failing anyway?

Comment 5 Jake Hunsaker 2017-03-28 15:39:17 UTC
System is using firewalld, and the containers stop communicating immediately after the polkit upgrade/downgrade. 

Yes, all rules are exactly the same before and after.

Comment 7 James W. Mills 2018-02-28 19:55:45 UTC
All, I've tried this in a much more recent environment, and I'm unable to reproduce:

# rpm -q docker polkit firewalld
docker-1.12.6-71.git3e8e77d.el7.x86_64
polkit-0.112-11.el7_3.x86_64
firewalld-0.4.4.4-6.el7.noarch

# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-02-28 14:33:05 EST; 7min ago


# docker run -d busybox ping 8.8.8.8
# docker logs -f hungry_kilby
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=56 time=8.747 ms
64 bytes from 8.8.8.8: seq=1 ttl=56 time=8.508 ms
64 bytes from 8.8.8.8: seq=2 ttl=56 time=8.181 ms
64 bytes from 8.8.8.8: seq=3 ttl=56 time=8.234 ms
64 bytes from 8.8.8.8: seq=4 ttl=56 time=7.827 ms
64 bytes from 8.8.8.8: seq=5 ttl=56 time=7.739 ms
64 bytes from 8.8.8.8: seq=6 ttl=56 time=7.406 ms
64 bytes from 8.8.8.8: seq=7 ttl=56 time=6.941 ms
64 bytes from 8.8.8.8: seq=8 ttl=56 time=6.846 ms
...

# yum upgrade polkit
...
Updated:
  polkit.x86_64 0:0.112-12.el7_3


# docker logs -f hungry_kilby
...
64 bytes from 8.8.8.8: seq=226 ttl=56 time=7.420 ms
64 bytes from 8.8.8.8: seq=227 ttl=56 time=5.544 ms
64 bytes from 8.8.8.8: seq=228 ttl=56 time=6.840 ms
64 bytes from 8.8.8.8: seq=229 ttl=56 time=6.468 ms
64 bytes from 8.8.8.8: seq=230 ttl=56 time=8.573 ms
64 bytes from 8.8.8.8: seq=231 ttl=56 time=8.214 ms

In addition to verifying that the running container continues to be able to communicate with the outside world, I also ensured that a new container can also communicate with the outside world, as well as with the already running container:
# docker run -d busybox ping 8.8.8.8
# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
fff994ff90a9        busybox             "ping 8.8.8.8"      6 seconds ago       Up 2 seconds                            admiring_jennings
ebcfe0277aaa        busybox             "ping 8.8.8.8"      7 minutes ago       Up 7 minutes                            hungry_kilby

# docker logs -f admiring_jennings
...
64 bytes from 8.8.8.8: seq=220 ttl=56 time=7.869 ms
64 bytes from 8.8.8.8: seq=221 ttl=56 time=7.657 ms
64 bytes from 8.8.8.8: seq=222 ttl=56 time=7.484 ms
...

# docker exec -it admiring_jennings ip a
...
    inet 172.17.0.3/16 scope global eth0
...

# docker exec -it hungry_kilby ping -c 2 172.17.0.3
PING 172.17.0.3 (172.17.0.3): 56 data bytes
64 bytes from 172.17.0.3: seq=0 ttl=64 time=0.229 ms
64 bytes from 172.17.0.3: seq=1 ttl=64 time=0.179 ms

# docker exec -it hungry_kilby wget -qO- getmyip.example.com
XX.XX.XXX.XX
# docker exec -it admiring_jennings wget -qO- getmyip.example.com
XX.XX.XXX.XX

Did I miss anything, or can we close this?

~james

Comment 8 Jake Hunsaker 2018-03-26 13:34:07 UTC
Sorry, didn't see the needinfo. I'd say we're good to close, no new reports of it, etc...