1496756 – containerized haproxy fail to be started because no DOCKER chain is existing

Bug 1496756 - containerized haproxy fail to be started because no DOCKER chain is existing

Summary: containerized haproxy fail to be started because no DOCKER chain is existing

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Michael Gugino
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-28 10:03 UTC by Johnny Liu
Modified:	2017-11-28 22:13 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-28 22:13:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
inventory host file (5.31 KB, text/plain) 2017-09-28 10:03 UTC, Johnny Liu	no flags	Details
installation log (884.89 KB, text/plain) 2017-09-28 10:04 UTC, Johnny Liu	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description Johnny Liu 2017-09-28 10:03:55 UTC

Created attachment 1331874 [details]
inventory host file

Description of problem:
See the following details.

Version-Release number of the following components:
openshift-ansible-3.7.0-0.128.0.git.0.89dcad2.el7.noarch
ansible 2.3
docker-1.12.6-55.gitc4618fb.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. prepare inventory to install a multiple master env which is using a containerized haproxy lb.
2. trigger installation
3.

Actual results:
Failed at the following task:
TASK [openshift_loadbalancer : Enable and start haproxy] ***********************
Thursday 28 September 2017  03:09:20 +0000 (0:00:01.776)       0:14:39.316 **** 
fatal: [jialiu-icuu-lb-1.0928-0zv.qe.rhcloud.com]: FAILED! => {"changed": false, "failed": true, "msg": "Unable to start service haproxy: Job for haproxy.service failed because the control process exited with error code. See \"systemctl status haproxy.service\" and \"journalctl -xe\" for details.\n"}

Go to the failed hosts, check haproxy service log:
# journalctl -f  -u haproxy
-- Logs begin at Wed 2017-09-27 22:51:04 EDT. --
Sep 28 03:35:07 jialiu-icuu-lb-1 docker[25483]: Error response from daemon: No such container: openshift_loadbalancer
Sep 28 03:35:07 jialiu-icuu-lb-1 systemd[1]: haproxy.service: control process exited, code=exited status=1
Sep 28 03:35:07 jialiu-icuu-lb-1 systemd[1]: Failed to start haproxy.service.
Sep 28 03:35:07 jialiu-icuu-lb-1 systemd[1]: Unit haproxy.service entered failed state.
Sep 28 03:35:07 jialiu-icuu-lb-1 systemd[1]: haproxy.service failed.
Sep 28 03:35:12 jialiu-icuu-lb-1 systemd[1]: haproxy.service holdoff time over, scheduling restart.
Sep 28 03:35:12 jialiu-icuu-lb-1 systemd[1]: Starting haproxy.service...
Sep 28 03:35:12 jialiu-icuu-lb-1 docker[25501]: Error response from daemon: No such container: openshift_loadbalancer
Sep 28 03:35:14 jialiu-icuu-lb-1 docker[25505]: /usr/bin/docker-current: Error response from daemon: driver failed programming external connectivity on endpoint openshift_loadbalancer (9129fb5e1930650f8327e03bce6e138305261c043075509cdae8e9b3c8e6b4cf): iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 8443 -j DNAT --to-destination 172.17.0.2:8443 ! -i docker0: iptables: No chain/target/match by that name.


# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination       



Expected results:
haproxy container should be started successfully, installation should pass.

Additional info:
restart docker service could get DOCKER chain back, then haproxy service is running well.

Seem like this is caused by prior tasks - disable firewalld, install iptable services.

Comment 1 Johnny Liu 2017-09-28 10:04:44 UTC

Created attachment 1331875 [details]
installation log

Comment 2 Scott Dodson 2017-09-28 13:44:29 UTC

Isn't docker responsible for provisioning this iptables chain?

Comment 3 Jhon Honce 2017-10-03 17:16:58 UTC

Scott, reading installation log, it appears docker is not restarted after iptables is re-installed.  I suspect that is where the rules were deleted and not recreated.  Johnny's comment about restarting docker corrects issue re-enforces this scenario.

Comment 4 aannoaanno 2017-10-05 08:18:04 UTC

I encountered the same problem on fedora 26: there is a general problem with fedora using docker in conjuction with firewalld. For me it looks like that there is a mismatch between iptable-rules emitted by docker and iptable-rules emitted by firewalld.

In /var/log/firewalld you could see all the rules docker is emitting (when you start a docker image with 'docker run' that does port mapping and/or anything implemented as iptables-rules). In most cases all the rules leads to errors.

For examples the error that stops the docker container

> iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 8443 -j DNAT --to-destination 172.17.0.2:8443 ! -i docker0: iptables

leads to an error because there is chain DOCKER in the 'nat' table on a default firewalld configuration.

Some people try to tweak the firewalld configuration when they encounter the errror.

* https://opsech.io/posts/2017/May/23/docker-dns-with-firewalld-on-fedora.html
* https://superuser.com/questions/1180870/fedora-firewalld-issues-with-docker
* https://github.com/moby/moby/issues/16137#issuecomment-271615192
* https://github.com/firewalld/firewalld/issues/195

iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 8443 -j DNAT --to-destination 172.17.0.2:8443 ! -i docker0: iptables
I have tried that to (adding tables, and chains with firewall-cmd) up to the point where there are no more error reports in /var/log/firewalld.

HOWEVER, I'm not sure if simply adding table and chains to the firewalld iptables structure will result in anything senseful! IMHO, it is VERY unlikely that doing such kind of things will result in a iptables configuration that does the things docker intends to do.

Comment 5 Scott Dodson 2017-10-06 13:06:07 UTC

https://github.com/openshift/openshift-ansible/pull/5680 I think fixes this and I don't think this is limited in scope to containerized installs.

Comment 8 Michael Gugino 2017-10-12 17:28:32 UTC

I believe this bug is due to the fact that openshift_version used to run against all hosts.  openshift_version has a meta-depends on docker role, when openshift.common.is_containerized == True.

openshift version is now only run against masters, nodes, and etcd hosts.

Comment 9 Michael Gugino 2017-10-12 17:32:25 UTC

PR Created: https://github.com/openshift/openshift-ansible/pull/5740

Comment 10 Johnny Liu 2017-10-17 06:04:17 UTC

Verified this bug with openshift-ansible-3.7.0-0.153.0.git.0.d5028b3.el7.noarch, and PASS.

haproxy image is pulled and running successfully.

Comment 14 errata-xmlrpc 2017-11-28 22:13:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.