When deploying an overcloud the step2 execution of mistral fails on the task collect_nodes_uuid. Investigation via Mistral [1] showed that the collect_nodes_uuid task timed out with Ansible's "Timeout (12s) waiting for privilege escalation prompt:". [1] Mistral executor logs 2018-02-08 11:19:19.795 1368 DEBUG oslo_concurrency.processutils [req-d19df95d-0da9-46c5-9356-cf0335ca9e82 77f8d00fa5864cac8bde976b0f20024f b42a556a02b1412f9bc2b8f88cddbd66 - default default] CMD "ansible-playbook /tmp/ansible-mistral-actionUDyTgf/playbook.yaml --user tripleo-admin --become --become-user root --inventory-file /tmp/ansible-mistral-actionUDyTgf/inventory.yaml --private-key /tmp/ansible-mistral-actionUDyTgf/ssh_private_key" returned: 2 in 374.893s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409 2018-02-08 11:19:19.796 1368 DEBUG oslo_concurrency.processutils [req-d19df95d-0da9-46c5-9356-cf0335ca9e82 77f8d00fa5864cac8bde976b0f20024f b42a556a02b1412f9bc2b8f88cddbd66 - default default] None command: u'ansible-playbook /tmp/ansible-mistral-actionUDyTgf/playbook.yaml --user tripleo-admin --become --become-user root --inventory-file /tmp/ansible-mistral-actionUDyTgf/inventory.yaml --private-key /tmp/ansible-mistral-actionUDyTgf/ssh_private_key' exit code: 2 stdout: u'{\n "plays": [\n {\n "play": {\n "id": "5254001c-2bc5-0d2d-226a-000000000008", \n "name": "overcloud"\n }, \n "tasks": [\n {\n "hosts": {\n "192.168.24.7": {\n "_ansible_no_log": false, \n "_ansible_parsed": true, \n "changed": true, \n "cmd": [\n "dmidecode", \n "-s", \n "system-uuid"\n ], \n "delta": "0:00:00.003973", \n "end": "2018-02-08 16:13:07.947261", \n "invocation": {\n "module_args": {\n "_raw_params": "dmidecode -s system-uuid", \n "_uses_shell": false, \n "chdir": null, \n "creates": null, \n "executable": null, \n "removes": null, \n "stdin": null, \n "warn": true\n }\n }, \n "rc": 0, \n "start": "2018-02-08 16:13:07.943288", \n "stderr": "", \n "stderr_lines": [], \n "stdout": "CD125C9B-C294-49CC-B039-F8652D6D8C63", \n "stdout_lines": [\n "CD125C9B-C294-49CC-B039-F8652D6D8C63"\n ]\n }, \n "192.168.24.8": {\n "_ansible_no_log": false, \n "_ansible_parsed": true, \n "changed": true, \n "cmd": [\n "dmidecode", \n "-s", \n "system-uuid"\n ], \n "delta": "0:00:00.003179", \n "end": "2018-02-08 16:13:07.257269", \n "invocation": {\n "module_args": {\n "_raw_params": "dmidecode -s system-uuid", \n "_uses_shell": false, \n "chdir": null, \n "creates": null, \n "executable": null, \n "removes": null, \n "stdin": null, \n "warn": true\n }\n }, \n "rc": 0, \n "start": "2018-02-08 16:13:07.254090", \n "stderr": "", \n "stderr_lines": [], \n "stdout": "B02BFE20-2E86-494B-82B6-DD1F040E5C58", \n "stdout_lines": [\n "B02BFE20-2E86-494B-82B6-DD1F040E5C58"\n ]\n }, \n "192.168.24.9": {\n "msg": "Timeout (12s) waiting for privilege escalation prompt: "\n }\n }, \n "task": {\n "id": "5254001c-2bc5-0d2d-226a-00000000000a", \n "name": "collect machine id"\n }\n }\n ]\n }\n ], \n "stats": {\n "192.168.24.7": {\n "changed": 1, \n "failures": 0, \n "ok": 1, \n "skipped": 0, \n "unreachable": 0\n }, \n "192.168.24.8": {\n "changed": 1, \n "failures": 0, \n "ok": 1, \n "skipped": 0, \n "unreachable": 0\n }, \n "192.168.24.9": {\n "changed": 0, \n "failures": 1, \n "ok": 0, \n "skipped": 0, \n "unreachable": 0\n }\n }\n}\n'
Seems there is considerable delay in dns replies - and so reverse dns lookups performed by ssh ... in my reproducer case at least/exactly on compute-0: compute-0 cannot resolve: > [heat-admin@compute-0 ~]$ time ping -c1 google.com > ping: google.com: Name or service not known > > real 0m10.013s > user 0m0.001s > sys 0m0.003s undercloud-0 is ok: > (undercloud) [stack@undercloud-0 ~]$ time ping -c1 google.com > PING google.com (172.217.5.238) 56(84) bytes of data. > 64 bytes from iad30s07-in-f14.1e100.net (172.217.5.238): icmp_seq=1 ttl=49 time=11.2 ms > > --- google.com ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 11.293/11.293/11.293/0.000 ms > > real 0m0.017s > user 0m0.000s > sys 0m0.004s controller-0 is OK: > [heat-admin@controller-0 ~]$ time ping -c1 google.com > PING google.com (172.217.5.238) 56(84) bytes of data. > 64 bytes from iad30s07-in-f238.1e100.net (172.217.5.238): icmp_seq=1 ttl=49 time=11.2 ms > > --- google.com ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 11.274/11.274/11.274/0.000 ms > > real 0m0.017s > user 0m0.000s > sys 0m0.003s (all these ping attempts can be repeated again and again with same failure||output +-ms)
As also in all our cases the main dns is same/expected one for this deployment - 10.0.0.1. [heat-admin@compute-0 ~]$ cat /etc/resolv.conf > # Generated by NetworkManager > nameserver 10.0.0.1 [heat-admin@controller-0 ~]$ cat /etc/resolv.conf > # Generated by NetworkManager > nameserver 10.0.0.1 (undercloud) [stack@undercloud-0 ~]$ cat /etc/resolv.conf > # Generated by NetworkManager > search redhat.local > nameserver 172.16.0.1 > nameserver 2620:52:0:13b8::fe > nameserver 10.0.0.1
Fyi - seems, not confirmed/revalidated atm, that it does not happen if deployed just as minimal as 1cont1comp - WITHOUT CEPH (yes in above was 1cont1comp1ceph and affected was just compute-0).
(In reply to Pavel Sedlák from comment #6) > Fyi - seems, not confirmed/revalidated atm, that it does not happen if > deployed just as minimal as 1cont1comp - WITHOUT CEPH (yes in above was > 1cont1comp1ceph and affected was just compute-0). When not deploying Ceph the affected ansible task does not run, which might explain why you don't see the problem ... but if the problem depends on the reverse lookup delays, I think it'd be best address that. We could alternatively increase further the default ansible timeout.
This issue happens only on rhel7.5. And is caused by on undercloud-0 default iptables FORWARD policy changed to DROP. In these setups, non-controller nodes (compute/ceph/..) do not have external connectivity. At least not directly, they are connected only between themselves and undercloud. And none of these nodes provides dns or anything they could use. Undercloud does provide access to outside via nat/masquerade, and for this to work FORWARD has to be ACCEPT (or more explicit forward rule from/to masq_network range to be added). Confirmed that `iptables -P FORWARD ACCEPT` does indeed make `ssh heat-admin@compute-0` almost instant as expected. Is there a way how to tell `openstack undercloud install` to do it, e.g. via undercloud.conf? Or is it a bug in there, that when adding rules for masquerade it should handle this itself? cat /home/stack/undercloud.conf > [DEFAULT] > # Network interface on the Undercloud that will be handling the PXE > # boots and DHCP for Overcloud instances. (string value) > local_interface = eth0 > # 192.168.24.0 subnet is by default used since RHOS11 > local_ip = 192.168.24.1/24 > network_gateway = 192.168.24.1 > undercloud_public_vip = 192.168.24.2 > undercloud_admin_vip = 192.168.24.3 > network_cidr = 192.168.24.0/24 > masquerade_network = 192.168.24.0/24 > dhcp_start = 192.168.24.5 > dhcp_end = 192.168.24.24 > inspection_iprange = 192.168.24.100,192.168.24.120 iptables -S | grep FORW > -P FORWARD DROP > -N neutron-openvswi-FORWARD > -A FORWARD -j neutron-filter-top > -A FORWARD -j neutron-openvswi-FORWARD > -A FORWARD -j DOCKER-ISOLATION > -A FORWARD -o docker0 -j DOCKER > -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT > -A FORWARD -i docker0 ! -o docker0 -j ACCEPT > -A FORWARD -i docker0 -o docker0 -j ACCEPT > -A FORWARD -d 192.168.24.0/24 -p tcp -m state --state NEW -m comment --comment "140 network cidr nat ipv4" -j ACCEPT
Manually running openstack undercloud install proved to be the source of this change - during it's execution policy changed into DROP. Before that i verified that rhel-7.5 image used, when cleanly booted does have FORWARD ACCEPT, as also after yum update, also after installation and start of iptables-services via systemctl, as also after system reboot.
This is the resolv.conf content in overcloud-full.qcow2 image: ><fs> cat /etc/resolv.conf # Generated by NetworkManager nameserver 192.168.122.1 ><fs> it looks that /etc/sysconfig/network-scripts/ifup-post changed in rhel 7.5 compared to 7.4 and the DNS servers set in the ifcfg files are added on top of what exists in resolv.conf: snippet from ifup-post: # Keep the rest of the /etc/resolv.conf as it was: *) echo "${line}" >> "${tmp_file}" ;; esac So the resulting resolv.conf has always 192.168.122.1 as the first entry in resolv.conf as it comes pre-built in the overcloud image which makes the name resolution sluggish.
This appears to be the related change: https://github.com/fedora-sysv/initscripts/commit/4da9dbaffba4af74eb632d1a5d10e5c366475516#diff-0d62530925abcdd252ce22c42fdeff8c
Apologies for my previous 2 comments, I commented without reading the initial report. I think that is a different issue than the one reported initially in this BZ. Regarding the iptables issue the policy for the FORWARD chain gets switched to DROP by the docker service. This can be easily reproduced by booting a clean rhel 7.5 VM, then install and start the docker service: [root@undercloud-0 ~]# iptables -nL Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination [root@undercloud-0 ~]# yum install -y docker; systemctl start docker Loaded plugins: search-disabled-repos Resolving Dependencies --> Running transaction check ---> Package docker.x86_64 2:1.13.1-54.rhel75.gitce62987.el7 will be installed --> Processing Dependency: docker-client = 2:1.13.1-54.rhel75.gitce62987.el7 for package: 2:docker-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Processing Dependency: docker-common = 2:1.13.1-54.rhel75.gitce62987.el7 for package: 2:docker-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Running transaction check ---> Package docker-client.x86_64 2:1.13.1-54.rhel75.gitce62987.el7 will be installed ---> Package docker-common.x86_64 2:1.13.1-54.rhel75.gitce62987.el7 will be installed --> Processing Dependency: docker-rhel-push-plugin = 2:1.13.1-54.rhel75.gitce62987.el7 for package: 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Processing Dependency: container-storage-setup >= 0.9.0-1 for package: 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Processing Dependency: lvm2 >= 2.02.112 for package: 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Processing Dependency: oci-register-machine >= 1:0-5.13 for package: 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Processing Dependency: oci-systemd-hook >= 1:0.1.4-9 for package: 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Processing Dependency: oci-umount >= 2:2.0.0-1 for package: 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Processing Dependency: atomic-registries for package: 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Processing Dependency: subscription-manager-plugin-container for package: 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 --> Running transaction check ---> Package atomic-registries.x86_64 1:1.21.1-1.git1170769.el7 will be installed --> Processing Dependency: python-pytoml for package: 1:atomic-registries-1.21.1-1.git1170769.el7.x86_64 ---> Package container-storage-setup.noarch 0:0.9.0-1.rhel75.gite0997c3.el7 will be installed ---> Package docker-rhel-push-plugin.x86_64 2:1.13.1-54.rhel75.gitce62987.el7 will be installed ---> Package lvm2.x86_64 7:2.02.177-3.el7 will be installed --> Processing Dependency: lvm2-libs = 7:2.02.177-3.el7 for package: 7:lvm2-2.02.177-3.el7.x86_64 --> Processing Dependency: device-mapper-persistent-data >= 0.7.0-0.1.rc6 for package: 7:lvm2-2.02.177-3.el7.x86_64 --> Processing Dependency: libdevmapper-event.so.1.02(Base)(64bit) for package: 7:lvm2-2.02.177-3.el7.x86_64 --> Processing Dependency: liblvm2app.so.2.2(Base)(64bit) for package: 7:lvm2-2.02.177-3.el7.x86_64 --> Processing Dependency: libdevmapper-event.so.1.02()(64bit) for package: 7:lvm2-2.02.177-3.el7.x86_64 --> Processing Dependency: liblvm2app.so.2.2()(64bit) for package: 7:lvm2-2.02.177-3.el7.x86_64 ---> Package oci-register-machine.x86_64 1:0-6.git2b44233.el7 will be installed ---> Package oci-systemd-hook.x86_64 1:0.1.15-2.gitc04483d.el7 will be installed --> Processing Dependency: libyajl.so.2()(64bit) for package: 1:oci-systemd-hook-0.1.15-2.gitc04483d.el7.x86_64 ---> Package oci-umount.x86_64 2:2.3.3-3.gite3c9055.el7 will be installed ---> Package subscription-manager-plugin-container.x86_64 0:1.20.10-1.el7 will be installed --> Running transaction check ---> Package device-mapper-event-libs.x86_64 7:1.02.146-3.el7 will be installed ---> Package device-mapper-persistent-data.x86_64 0:0.7.3-3.el7 will be installed ---> Package lvm2-libs.x86_64 7:2.02.177-3.el7 will be installed --> Processing Dependency: device-mapper-event = 7:1.02.146-3.el7 for package: 7:lvm2-libs-2.02.177-3.el7.x86_64 ---> Package python-pytoml.noarch 0:0.1.14-1.git7dea353.el7 will be installed ---> Package yajl.x86_64 0:2.0.4-4.el7 will be installed --> Running transaction check ---> Package device-mapper-event.x86_64 7:1.02.146-3.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ============================================================================================================================================================================================================================================== Package Arch Version Repository Size ============================================================================================================================================================================================================================================== Installing: docker x86_64 2:1.13.1-54.rhel75.gitce62987.el7 rhelosp-rhel-7.5-extras 16 M Installing for dependencies: atomic-registries x86_64 1:1.21.1-1.git1170769.el7 rhelosp-rhel-7.5-extras 35 k container-storage-setup noarch 0.9.0-1.rhel75.gite0997c3.el7 rhelosp-rhel-7.5-extras 33 k device-mapper-event x86_64 7:1.02.146-3.el7 rhelosp-rhel-7.5-server 185 k device-mapper-event-libs x86_64 7:1.02.146-3.el7 rhelosp-rhel-7.5-server 184 k device-mapper-persistent-data x86_64 0.7.3-3.el7 rhelosp-rhel-7.5-server 405 k docker-client x86_64 2:1.13.1-54.rhel75.gitce62987.el7 rhelosp-rhel-7.5-extras 3.8 M docker-common x86_64 2:1.13.1-54.rhel75.gitce62987.el7 rhelosp-rhel-7.5-extras 86 k docker-rhel-push-plugin x86_64 2:1.13.1-54.rhel75.gitce62987.el7 rhelosp-rhel-7.5-extras 1.7 M lvm2 x86_64 7:2.02.177-3.el7 rhelosp-rhel-7.5-server 1.3 M lvm2-libs x86_64 7:2.02.177-3.el7 rhelosp-rhel-7.5-server 1.0 M oci-register-machine x86_64 1:0-6.git2b44233.el7 rhelosp-rhel-7.5-extras 1.1 M oci-systemd-hook x86_64 1:0.1.15-2.gitc04483d.el7 rhelosp-rhel-7.5-extras 33 k oci-umount x86_64 2:2.3.3-3.gite3c9055.el7 rhelosp-rhel-7.5-extras 32 k python-pytoml noarch 0.1.14-1.git7dea353.el7 rhelosp-rhel-7.5-extras 18 k subscription-manager-plugin-container x86_64 1.20.10-1.el7 rhelosp-rhel-7.5-server 205 k yajl x86_64 2.0.4-4.el7 rhelosp-rhel-7.5-server 39 k Transaction Summary ============================================================================================================================================================================================================================================== Install 1 Package (+16 Dependent packages) Total download size: 26 M Installed size: 86 M Downloading packages: (1/17): device-mapper-event-1.02.146-3.el7.x86_64.rpm | 185 kB 00:00:00 (2/17): device-mapper-event-libs-1.02.146-3.el7.x86_64.rpm | 184 kB 00:00:00 (3/17): container-storage-setup-0.9.0-1.rhel75.gite0997c3.el7.noarch.rpm | 33 kB 00:00:00 (4/17): atomic-registries-1.21.1-1.git1170769.el7.x86_64.rpm | 35 kB 00:00:00 (5/17): device-mapper-persistent-data-0.7.3-3.el7.x86_64.rpm | 405 kB 00:00:00 (6/17): docker-client-1.13.1-54.rhel75.gitce62987.el7.x86_64.rpm | 3.8 MB 00:00:00 (7/17): docker-1.13.1-54.rhel75.gitce62987.el7.x86_64.rpm | 16 MB 00:00:00 (8/17): docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64.rpm | 86 kB 00:00:00 (9/17): docker-rhel-push-plugin-1.13.1-54.rhel75.gitce62987.el7.x86_64.rpm | 1.7 MB 00:00:00 (10/17): lvm2-libs-2.02.177-3.el7.x86_64.rpm | 1.0 MB 00:00:00 (11/17): lvm2-2.02.177-3.el7.x86_64.rpm | 1.3 MB 00:00:00 (12/17): oci-register-machine-0-6.git2b44233.el7.x86_64.rpm | 1.1 MB 00:00:00 (13/17): oci-systemd-hook-0.1.15-2.gitc04483d.el7.x86_64.rpm | 33 kB 00:00:00 (14/17): oci-umount-2.3.3-3.gite3c9055.el7.x86_64.rpm | 32 kB 00:00:00 (15/17): python-pytoml-0.1.14-1.git7dea353.el7.noarch.rpm | 18 kB 00:00:00 (16/17): yajl-2.0.4-4.el7.x86_64.rpm | 39 kB 00:00:00 (17/17): subscription-manager-plugin-container-1.20.10-1.el7.x86_64.rpm | 205 kB 00:00:00 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Total 10 MB/s | 26 MB 00:00:02 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : 7:device-mapper-event-libs-1.02.146-3.el7.x86_64 1/17 Installing : yajl-2.0.4-4.el7.x86_64 2/17 Installing : 2:oci-umount-2.3.3-3.gite3c9055.el7.x86_64 3/17 Installing : 1:oci-systemd-hook-0.1.15-2.gitc04483d.el7.x86_64 4/17 Installing : 7:device-mapper-event-1.02.146-3.el7.x86_64 5/17 Installing : 7:lvm2-libs-2.02.177-3.el7.x86_64 6/17 Installing : 2:docker-rhel-push-plugin-1.13.1-54.rhel75.gitce62987.el7.x86_64 7/17 Installing : 1:oci-register-machine-0-6.git2b44233.el7.x86_64 8/17 Installing : subscription-manager-plugin-container-1.20.10-1.el7.x86_64 9/17 Installing : python-pytoml-0.1.14-1.git7dea353.el7.noarch 10/17 Installing : 1:atomic-registries-1.21.1-1.git1170769.el7.x86_64 11/17 Installing : device-mapper-persistent-data-0.7.3-3.el7.x86_64 12/17 Installing : 7:lvm2-2.02.177-3.el7.x86_64 13/17 Installing : container-storage-setup-0.9.0-1.rhel75.gite0997c3.el7.noarch 14/17 Installing : 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 15/17 Installing : 2:docker-client-1.13.1-54.rhel75.gitce62987.el7.x86_64 16/17 Installing : 2:docker-1.13.1-54.rhel75.gitce62987.el7.x86_64 17/17 Verifying : device-mapper-persistent-data-0.7.3-3.el7.x86_64 1/17 Verifying : python-pytoml-0.1.14-1.git7dea353.el7.noarch 2/17 Verifying : subscription-manager-plugin-container-1.20.10-1.el7.x86_64 3/17 Verifying : 7:lvm2-2.02.177-3.el7.x86_64 4/17 Verifying : 7:lvm2-libs-2.02.177-3.el7.x86_64 5/17 Verifying : yajl-2.0.4-4.el7.x86_64 6/17 Verifying : 7:device-mapper-event-libs-1.02.146-3.el7.x86_64 7/17 Verifying : container-storage-setup-0.9.0-1.rhel75.gite0997c3.el7.noarch 8/17 Verifying : 2:oci-umount-2.3.3-3.gite3c9055.el7.x86_64 9/17 Verifying : 2:docker-client-1.13.1-54.rhel75.gitce62987.el7.x86_64 10/17 Verifying : 1:oci-systemd-hook-0.1.15-2.gitc04483d.el7.x86_64 11/17 Verifying : 1:oci-register-machine-0-6.git2b44233.el7.x86_64 12/17 Verifying : 1:atomic-registries-1.21.1-1.git1170769.el7.x86_64 13/17 Verifying : 2:docker-rhel-push-plugin-1.13.1-54.rhel75.gitce62987.el7.x86_64 14/17 Verifying : 7:device-mapper-event-1.02.146-3.el7.x86_64 15/17 Verifying : 2:docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64 16/17 Verifying : 2:docker-1.13.1-54.rhel75.gitce62987.el7.x86_64 17/17 Installed: docker.x86_64 2:1.13.1-54.rhel75.gitce62987.el7 Dependency Installed: atomic-registries.x86_64 1:1.21.1-1.git1170769.el7 container-storage-setup.noarch 0:0.9.0-1.rhel75.gite0997c3.el7 device-mapper-event.x86_64 7:1.02.146-3.el7 device-mapper-event-libs.x86_64 7:1.02.146-3.el7 device-mapper-persistent-data.x86_64 0:0.7.3-3.el7 docker-client.x86_64 2:1.13.1-54.rhel75.gitce62987.el7 docker-common.x86_64 2:1.13.1-54.rhel75.gitce62987.el7 docker-rhel-push-plugin.x86_64 2:1.13.1-54.rhel75.gitce62987.el7 lvm2.x86_64 7:2.02.177-3.el7 lvm2-libs.x86_64 7:2.02.177-3.el7 oci-register-machine.x86_64 1:0-6.git2b44233.el7 oci-systemd-hook.x86_64 1:0.1.15-2.gitc04483d.el7 oci-umount.x86_64 2:2.3.3-3.gite3c9055.el7 python-pytoml.noarch 0:0.1.14-1.git7dea353.el7 subscription-manager-plugin-container.x86_64 0:1.20.10-1.el7 yajl.x86_64 0:2.0.4-4.el7 Complete! iptab[root@undercloud-0 ~]# iptables -nL Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy DROP) target prot opt source destination DOCKER-ISOLATION all -- 0.0.0.0/0 0.0.0.0/0 DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (1 references) target prot opt source destination Chain DOCKER-ISOLATION (1 references) target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 [root@undercloud-0 ~]# [root@undercloud-0 ~]# rpm -qa | grep docker python-docker-pycreds-1.10.6-3.el7.noarch docker-1.13.1-54.rhel75.gitce62987.el7.x86_64 python-docker-2.4.2-1.3.el7.noarch docker-client-1.13.1-54.rhel75.gitce62987.el7.x86_64 docker-rhel-push-plugin-1.13.1-54.rhel75.gitce62987.el7.x86_64 docker-common-1.13.1-54.rhel75.gitce62987.el7.x86_64
While running openstack undercloud install I also ran iptables -vnxL > "$(date).txt" Aand I see the Policy on FORWARD change from ACCEPT to DENY right around the time we install docker: --- 2018-02-14 20:49:53,630 INFO: Notice: /Stage[main]/Main/Group[docker]/ensure: created 2018-02-14 20:49:53,649 INFO: Notice: /Stage[main]/Main/User[docker_user]/groups: groups changed '' to ['docker'] 2018-02-14 20:50:01,078 INFO: Notice: /Stage[main]/Tripleo::Profile::Base::Docker/Package[docker]/ensure: created 2018-02-14 20:50:01,080 INFO: Notice: ipleo::Profile::Base::Docker/File[/etc/systemd/system/docker.service.d]/ensure: created 2018-02-14 20:50:01,084 INFO: Notice: ipleo::Profile::Base::Docker/File[/etc/systemd/system/docker.service.d/99-unset-mountflags.conf]/ensure: defined content as '{md5}b984426de0b5978853686a649b64e' 2018-02-14 20:50:01,136 INFO: Notice: /Stage[main]/Tripleo::Profile::Base::Docker/Exec[systemd daemon-reload]: Triggered 'refresh' from 1 events --- The docker package is the same in extras-rhel-7.4 and 7.5 $ brew latest-pkg extras-rhel-7.5 docker Build Tag Built by ---------------------------------------- -------------------- ---------------- docker-1.12.6-71.git3e8e77d.el7 extras-rhel-7.4 fkluknav I'm going to hack iptables to dump a ps -axfwww if called with DROP as any arg.
Oh Never mind Comment 17 confirms that it's the installation of docker that switches the the firewall policy
Actually installing docker-1.12.6-71.git3e8e77d.el7 doesn't alter the firewall ---- Installed: docker.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-client.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-common.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-debuginfo.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-logrotate.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-lvm-plugin.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-novolume-plugin.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-rhel-push-plugin.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-unit-test.x86_64 2:1.12.6-71.git3e8e77d.el7 docker-v1.10-migrator.x86_64 2:1.12.6-71.git3e8e77d.el7 Dependency Installed: atomic-registries.x86_64 1:1.21.1-1.git1170769.el7 container-storage-setup.noarch 0:0.9.0-1.rhel75.gite0997c3.el7 device-mapper-event.x86_64 7:1.02.146-3.el7 device-mapper-event-libs.x86_64 7:1.02.146-3.el7 device-mapper-persistent-data.x86_64 0:0.7.3-3.el7 lvm2.x86_64 7:2.02.177-3.el7 lvm2-libs.x86_64 7:2.02.177-3.el7 oci-register-machine.x86_64 1:0-6.git2b44233.el7 oci-systemd-hook.x86_64 1:0.1.15-2.gitc04483d.el7 oci-umount.x86_64 2:2.3.3-3.gite3c9055.el7 python-pytoml.noarch 0:0.1.14-1.git7dea353.el7 subscription-manager-plugin-container.x86_64 0:1.20.10-1.el7 yajl.x86_64 0:2.0.4-4.el7 Complete! [root@director x86_64]# iptables -vnxL Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source de Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source de Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source de [root@director x86_64]#
I Bisected this down to: --- $ bash test.sh docker/1.12.6/75.rhel75.git3e8e77d.el7/x86_64 good $ bash test.sh docker/1.13.1/39.rhel75.gitddee18e.el7/x86_64 bad --- The latter contains, amongst other things: --- $ grep -Ernw FORWARD docker-ddee18eda36d8caba027cafb8d6f2c7906540a2c | tail -n1 docker-ddee18eda36d8caba027cafb8d6f2c7906540a2c/CHANGELOG.md:142:* Change the default `FORWARD` policy to `DROP` [#28257](https://github.com/docker/docker/pull/28257) --- So I guess it's an intentional change that will hit upstream when we switch to CentOS 7.5 and get the newer docker.
I have created BZ1545842 to track the nameserver issue separately from the iptables problems
(In reply to Tony Breeds from comment #21) > So I guess it's an intentional change that will hit upstream when we switch > to CentOS 7.5 and get the newer docker. It will hit us earlier, Docker is in Extras repo and there is Docker 1.13 update coming to 7.4 Extras. I'm pushing pre-release to RDO deps since it should help fix runc/systemd race https://bugs.launchpad.net/tripleo/+bug/1744954 Please chime in in https://review.rdoproject.org/r/12292 !
Workaround mentioned in https://github.com/docker/docker/pull/28257 is iptables: false in docker daemon config wdyt?
We need to be careful if we do this workaround, we had some issues in the past when iptables was set to false in docker daemon: https://bugs.launchpad.net/tripleo/+bug/1709325
> https://bugs.launchpad.net/tripleo/+bug/1709325 So that's for kolla build on the undercloud, OC is fine and defaults to ipstables: false I'd like to understand why are we running kolla build in UC??
(In reply to Alan Pevec from comment #26) > > https://bugs.launchpad.net/tripleo/+bug/1709325 > > So that's for kolla build on the undercloud, OC is fine and defaults to > ipstables: false > I'd like to understand why are we running kolla build in UC?? It is mainly done when you need to test stuff. Maybe it is not critical for operators, but for developers it is quite nice to do that on the undercloud. iirc other folks chimed in on that bug and maybe have additional use-cases for being able to do that
Could we have iptables=true only when running the kolla build? Where is the build procedure documented/scripted?
Maybe bit off the track, and not sure about difficulty of such change, but shouldn't the part which is responsible for setting up the OC nodes -> ext masquerade rules also at the same time handle allowing/accepting of such connections? The one which is generating nat rules for undercloud.conf's e.g. masquerade_network = 192.168.24.0/24?
FWIW, per https://github.com/moby/moby/pull/28257 Docker will set the FORWARD policy to DROP iff it enables the kernels ip_forward setting. If it finds ip_forward is enabled already, it will leave the FORWARD policy alone. So, the easiest solution here might just be to set ip_forward=1 (or net.ipv4.conf.$FOO.forwarding=1)?
I guess the problem is that we don't set ip_forward until the end of the undercloud installation. https://github.com/openstack/instack-undercloud/blob/9f53f0819beb5930540965b4807fc8b347a132d7/elements/undercloud-install/os-refresh-config/post-configure.d/98-undercloud-setup
The upstream fix has been modified to handle both issues. It sets ip_forward before launching docker and also fixes the network rules on the off chance ip_forward isn't set correctly (via update/upgrade/human error).
Tested on puddle: 2018-04-26.3 [stack@undercloud-0 ~]$ rpm -q instack-undercloud instack-undercloud-8.4.1-2.el7ost.noarch sudo iptables -S | grep FORW -P FORWARD ACCEPT -N neutron-openvswi-FORWARD -A FORWARD -j DOCKER-ISOLATION -A FORWARD -o docker0 -j DOCKER -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A FORWARD -j neutron-filter-top -A FORWARD -j neutron-openvswi-FORWARD -A FORWARD -d 192.168.24.0/24 -m state --state NEW -m comment --comment "140 destination ctlplane-subnet cidr nat ipv4" -j ACCEPT -A FORWARD -s 192.168.24.0/24 -m state --state NEW -m comment --comment "140 source ctlplane-subnet cidr nat ipv4" -j ACCEPT -A neutron-openvswi-FORWARD -m physdev --physdev-out tap21c04dd3-55 --physdev-is-bridged -m comment --comment "Accept all packets when port is trusted." -j ACCEPT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086