Bug 1525357

Summary: [DOCS] Should run prerequisites.yml playbook before deploy_cluster.yml
Product: OpenShift Container Platform Reporter: Gaoyun Pei <gpei>
Component: DocumentationAssignee: Alex Dellapenta <adellape>
Status: CLOSED CURRENTRELEASE QA Contact: Gaoyun Pei <gpei>
Severity: urgent Docs Contact: Vikram Goyal <vigoyal>
Priority: urgent    
Version: 3.8.0CC: aos-bugs, chaoyang, cshereme, ghuang, jhou, jialiu, jokerman, jpazdziora, mgugino, mlamouri, mmccomas, weshi, wjiang, wmeng, xtian
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-28 19:23:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1556744    
Bug Blocks: 1523047, 1525317, 1547348, 1547688    

Description Gaoyun Pei 2017-12-13 07:33:59 UTC
Description of problem:
ocp-3.8 installation failed as below:

TASK [etcd : Set hostname and ip facts] *************************************************************************************************************************************
ok: [ec2-54-197-34-241.compute-1.amazonaws.com]

TASK [etcd : Add iptables allow rules] **************************************************************************************************************************************
failed: [ec2-54-197-34-241.compute-1.amazonaws.com] (item={u'port': u'2379/tcp', u'service': u'etcd'}) => {"changed": false, "failed": true, "item": {"port": "2379/tcp", "service": "etcd"}, "module_stderr": "Shared connection to ec2-54-197-34-241.compute-1.amazonaws.com closed.\r\n", "module_stdout": "iptables: Bad rule (does a matching rule exist in that chain?).\r\nTraceback (most recent call last):\r\n  File \"/tmp/ansible_xz5Mn7/ansible_module_os_firewall_manage_iptables.py\", line 283, in <module>\r\n    main()\r\n  File \"/tmp/ansible_xz5Mn7/ansible_module_os_firewall_manage_iptables.py\", line 266, in main\r\n    iptables_manager.add_rule(port, protocol)\r\n  File \"/tmp/ansible_xz5Mn7/ansible_module_os_firewall_manage_iptables.py\", line 98, in add_rule\r\n    self.save()\r\n  File \"/tmp/ansible_xz5Mn7/ansible_module_os_firewall_manage_iptables.py\", line 73, in save\r\n    self.output.append(subprocess.check_output(self.save_cmd, stderr=subprocess.STDOUT))\r\n  File \"/usr/lib64/python2.7/subprocess.py\", line 568, in check_output\r\n    process = Popen(stdout=PIPE, *popenargs, **kwargs)\r\n  File \"/usr/lib64/python2.7/subprocess.py\", line 711, in __init__\r\n    errread, errwrite)\r\n  File \"/usr/lib64/python2.7/subprocess.py\", line 1327, in _execute_child\r\n    raise child_exception\r\nOSError: [Errno 2] No such file or directory\r\nChain OS_FIREWALL_ALLOW (1 references)\r\ntarget     prot opt source               destination         \r\n", "msg": "MODULE FAILURE", "rc": 0}
failed: [ec2-54-197-34-241.compute-1.amazonaws.com] (item={u'port': u'2380/tcp', u'service': u'etcd peering'}) => {"changed": false, "failed": true, "item": {"port": "2380/tcp", "service": "etcd peering"}, "module_stderr": "Shared connection to ec2-54-197-34-241.compute-1.amazonaws.com closed.\r\n", "module_stdout": "iptables: Bad rule (does a matching rule exist in that chain?).\r\nTraceback (most recent call last):\r\n  File \"/tmp/ansible_v3R2cC/ansible_module_os_firewall_manage_iptables.py\", line 283, in <module>\r\n    main()\r\n  File \"/tmp/ansible_v3R2cC/ansible_module_os_firewall_manage_iptables.py\", line 266, in main\r\n    iptables_manager.add_rule(port, protocol)\r\n  File \"/tmp/ansible_v3R2cC/ansible_module_os_firewall_manage_iptables.py\", line 98, in add_rule\r\n    self.save()\r\n  File \"/tmp/ansible_v3R2cC/ansible_module_os_firewall_manage_iptables.py\", line 73, in save\r\n    self.output.append(subprocess.check_output(self.save_cmd, stderr=subprocess.STDOUT))\r\n  File \"/usr/lib64/python2.7/subprocess.py\", line 568, in check_output\r\n    process = Popen(stdout=PIPE, *popenargs, **kwargs)\r\n  File \"/usr/lib64/python2.7/subprocess.py\", line 711, in __init__\r\n    errread, errwrite)\r\n  File \"/usr/lib64/python2.7/subprocess.py\", line 1327, in _execute_child\r\n    raise child_exception\r\nOSError: [Errno 2] No such file or directory\r\nChain OS_FIREWALL_ALLOW (1 references)\r\ntarget     prot opt source               destination         \r\nACCEPT     tcp  --  anywhere             anywhere             state NEW tcp dpt:2379\r\n", "msg": "MODULE FAILURE", "rc": 0}

NO MORE HOSTS LEFT **********************************************************************************************************************************************************
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry




Version-Release number of the following components:
openshift-ansible-3.8.18-1.git.140.794367f.el7.noarch.rpm
ansible 2.4.1.0-1.el7

How reproducible:
100%

Steps to Reproduce:
1.Prepare ansible inventory file, start installation
#ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml


Actual results:

Expected results:

Additional info:

Comment 2 Gaoyun Pei 2017-12-13 08:58:01 UTC
Add TestBlocker Keyword for it's blocking 3.8 cluster setup.

Comment 3 Johnny Liu 2017-12-14 02:18:36 UTC
This bug is also happening on 3.9, hope we could have a quick fix for it.

Comment 4 weiwei jiang 2017-12-18 07:55:15 UTC
# python 
Python 2.7.5 (default, May  3 2017, 07:55:04) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> subprocess.check_output(["/usr/libexec/iptables/iptables.init","save"], stderr=subprocess.STDOUT)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/subprocess.py", line 568, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory


# yum whatprovides /usr/libexec/iptables/iptables.init
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
aos38-devel-install/filelists                                                                                                                                                               | 457 kB  00:00:00     
fast-datapath/filelists_db                                                                                                                                                                  |  28 kB  00:00:00     
rhel7/filelists_db                                                                                                                                                                          |  30 MB  00:00:13     
rhel7-extra/filelists_db                                                                                                                                                                    | 431 kB  00:00:00     
iptables-services-1.4.21-13.el7.x86_64 : iptables and ip6tables services for iptables
Repo        : rhel7
Matched from:
Filename    : /usr/libexec/iptables/iptables.init



iptables-services-1.4.21-16.el7.x86_64 : iptables and ip6tables services for iptables
Repo        : rhel7
Matched from:
Filename    : /usr/libexec/iptables/iptables.init



iptables-services-1.4.21-17.el7.x86_64 : iptables and ip6tables services for iptables
Repo        : rhel7
Matched from:
Filename    : /usr/libexec/iptables/iptables.init



iptables-services-1.4.21-18.el7.x86_64 : iptables and ip6tables services for iptables
Repo        : rhel7
Matched from:
Filename    : /usr/libexec/iptables/iptables.init



iptables-services-1.4.21-18.2.el7_4.x86_64 : iptables and ip6tables services for iptables
Repo        : rhel7
Matched from:
Filename    : /usr/libexec/iptables/iptables.init

Comment 5 Michael Gugino 2017-12-18 16:41:44 UTC
We have changed the installation process.

openshift-ansible/playbooks/prerequisites.yml must be run before installing a new cluster.  This playbook is not necessary for existing cluster operations or upgrades.

Comment 6 Mark Lamourine 2017-12-18 23:14:11 UTC
I'm getting the same error in the openstack playbooks.  iptables RPM is installed.  The "file not found" is returned from iptables when trying to create a rule in a non-existant table.

I can find in os_firewall_manage_iptables.py where there is code to create the OS_FIREWALL_ALLOW table, but I can't find where this would be invoked.

I suspect that the return code 2 is being interpreted as File Not Found when it is really "table not found"


-----
failed: [master-0.openshift.example.com] (item={u'port': u'2380/tcp', u'service': u'etcd peering'}) => {"changed": false, "failed": true, "item": {"port": "2380/tcp", "service": "etcd peering"}, "module_stderr": "iptables: Bad rule (does a matching rule exist in that chain?).\niptables: No chain/target/match by that name.\nTraceback (most recent call last):\n  File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 283, in <module>\n    main()\n  File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 266, in main\n    iptables_manager.add_rule(port, protocol)\n  File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 88, in add_rule\n    self.verify_chain()\n  File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 83, in verify_chain\n    self.create_jump()\n  File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 178, in create_jump\n    self.save()\n  File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 73, in save\n    self.output.append(subprocess.check_output(self.save_cmd, stderr=subprocess.STDOUT))\n  File \"/usr/lib64/python2.7/subprocess.py\", line 568, in check_output\n    process = Popen(stdout=PIPE, *popenargs, **kwargs)\n  File \"/usr/lib64/python2.7/subprocess.py\", line 711, in __init__\n    errread, errwrite)\n  File \"/usr/lib64/python2.7/subprocess.py\", line 1327, in _execute_child\n    raise child_exception\nOSError: [Errno 2] No such file or directory\n", "module_stdout": "Chain OS_FIREWALL_ALLOW (0 references)\ntarget     prot opt source               destination         \n", "msg": "MODULE FAILURE", "rc": 1}

item={u'port': u'2380/tcp', u'service': u'etcd peering'}

iptables: Bad rule (does a matching rule exist in that chain?).
iptables: No chain/target/match by that name.
Traceback (most recent call last):
File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 283, in <module>
main()
File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 266, in main
iptables_manager.add_rule(port, protocol)
File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 88, in add_rule
self.verify_chain()
File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 83, in verify_chain
self.create_jump()
File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 178, in create_jump
self.save()
File \"/tmp/ansible_IDrehZ/ansible_module_os_firewall_manage_iptables.py\", line 73, in save
self.output.append(subprocess.check_output(self.save_cmd, stderr=subprocess.STDOUT))
File \"/usr/lib64/python2.7/subprocess.py\", line 568, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File \"/usr/lib64/python2.7/subprocess.py\", line 711, in __init__
errread, errwrite)
File \"/usr/lib64/python2.7/subprocess.py\", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Chain OS_FIREWALL_ALLOW (0 references)
target     prot opt source               destination

Comment 7 Scott Dodson 2017-12-19 14:20:21 UTC
Mike,

Lets make sure we have a docs PR and a release note item that makes it clear one must run the prequisites playbook first.

Comment 8 Scott Dodson 2017-12-19 14:21:04 UTC
https://github.com/openshift/openshift-docs/issues/6458 release notes tracker

Comment 9 Michael Gugino 2017-12-19 14:35:39 UTC
I have a docs PR Open here: https://github.com/openshift/openshift-docs/pull/6668

Not sure if that is the appropriate place for that information.

Comment 10 Mark Lamourine 2017-12-19 14:55:10 UTC
A docs PR isn't going to work for cloud providers. The "prerequisite" has to be run after the instances are created, but before they are used.

Comment 11 Gaoyun Pei 2017-12-20 07:34:13 UTC
After run prequisites playbook first, this issue disappeared. Will mark this bug as verified once the related PR merged.

Here's the installation sequence now:

Prepare ansible inventory file, run prerequisites.yml playbook

#ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml

Then run deploy_cluster.yml playbook

ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

Comment 12 Scott Dodson 2018-01-25 15:35:48 UTC
Relevant docs PR
https://github.com/openshift/openshift-docs/pull/6668

Comment 13 Scott Dodson 2018-01-26 13:22:46 UTC
The PR from comment #12 should address this via documentation.