Created attachment 1331874 [details] inventory host file Description of problem: See the following details. Version-Release number of the following components: openshift-ansible-3.7.0-0.128.0.git.0.89dcad2.el7.noarch ansible 2.3 docker-1.12.6-55.gitc4618fb.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. prepare inventory to install a multiple master env which is using a containerized haproxy lb. 2. trigger installation 3. Actual results: Failed at the following task: TASK [openshift_loadbalancer : Enable and start haproxy] *********************** Thursday 28 September 2017 03:09:20 +0000 (0:00:01.776) 0:14:39.316 **** fatal: [jialiu-icuu-lb-1.0928-0zv.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service haproxy: Job for haproxy.service failed because the control process exited with error code. See \"systemctl status haproxy.service\" and \"journalctl -xe\" for details.\n"} Go to the failed hosts, check haproxy service log: # journalctl -f -u haproxy -- Logs begin at Wed 2017-09-27 22:51:04 EDT. -- Sep 28 03:35:07 jialiu-icuu-lb-1 docker[25483]: Error response from daemon: No such container: openshift_loadbalancer Sep 28 03:35:07 jialiu-icuu-lb-1 systemd[1]: haproxy.service: control process exited, code=exited status=1 Sep 28 03:35:07 jialiu-icuu-lb-1 systemd[1]: Failed to start haproxy.service. Sep 28 03:35:07 jialiu-icuu-lb-1 systemd[1]: Unit haproxy.service entered failed state. Sep 28 03:35:07 jialiu-icuu-lb-1 systemd[1]: haproxy.service failed. Sep 28 03:35:12 jialiu-icuu-lb-1 systemd[1]: haproxy.service holdoff time over, scheduling restart. Sep 28 03:35:12 jialiu-icuu-lb-1 systemd[1]: Starting haproxy.service... Sep 28 03:35:12 jialiu-icuu-lb-1 docker[25501]: Error response from daemon: No such container: openshift_loadbalancer Sep 28 03:35:14 jialiu-icuu-lb-1 docker[25505]: /usr/bin/docker-current: Error response from daemon: driver failed programming external connectivity on endpoint openshift_loadbalancer (9129fb5e1930650f8327e03bce6e138305261c043075509cdae8e9b3c8e6b4cf): iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 8443 -j DNAT --to-destination 172.17.0.2:8443 ! -i docker0: iptables: No chain/target/match by that name. # iptables -L -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination Expected results: haproxy container should be started successfully, installation should pass. Additional info: restart docker service could get DOCKER chain back, then haproxy service is running well. Seem like this is caused by prior tasks - disable firewalld, install iptable services.
Created attachment 1331875 [details] installation log
Isn't docker responsible for provisioning this iptables chain?
Scott, reading installation log, it appears docker is not restarted after iptables is re-installed. I suspect that is where the rules were deleted and not recreated. Johnny's comment about restarting docker corrects issue re-enforces this scenario.
I encountered the same problem on fedora 26: there is a general problem with fedora using docker in conjuction with firewalld. For me it looks like that there is a mismatch between iptable-rules emitted by docker and iptable-rules emitted by firewalld. In /var/log/firewalld you could see all the rules docker is emitting (when you start a docker image with 'docker run' that does port mapping and/or anything implemented as iptables-rules). In most cases all the rules leads to errors. For examples the error that stops the docker container > iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 8443 -j DNAT --to-destination 172.17.0.2:8443 ! -i docker0: iptables leads to an error because there is chain DOCKER in the 'nat' table on a default firewalld configuration. Some people try to tweak the firewalld configuration when they encounter the errror. * https://opsech.io/posts/2017/May/23/docker-dns-with-firewalld-on-fedora.html * https://superuser.com/questions/1180870/fedora-firewalld-issues-with-docker * https://github.com/moby/moby/issues/16137#issuecomment-271615192 * https://github.com/firewalld/firewalld/issues/195 iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 8443 -j DNAT --to-destination 172.17.0.2:8443 ! -i docker0: iptables I have tried that to (adding tables, and chains with firewall-cmd) up to the point where there are no more error reports in /var/log/firewalld. HOWEVER, I'm not sure if simply adding table and chains to the firewalld iptables structure will result in anything senseful! IMHO, it is VERY unlikely that doing such kind of things will result in a iptables configuration that does the things docker intends to do.
https://github.com/openshift/openshift-ansible/pull/5680 I think fixes this and I don't think this is limited in scope to containerized installs.
I believe this bug is due to the fact that openshift_version used to run against all hosts. openshift_version has a meta-depends on docker role, when openshift.common.is_containerized == True. openshift version is now only run against masters, nodes, and etcd hosts.
PR Created: https://github.com/openshift/openshift-ansible/pull/5740
Verified this bug with openshift-ansible-3.7.0-0.153.0.git.0.d5028b3.el7.noarch, and PASS. haproxy image is pulled and running successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188