Bug 1421032

Summary: Ansible Redeploy certificates changes iptables/firewalld configuration
Product: OpenShift Container Platform Reporter: Ruben Romero Montes <rromerom>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED CURRENTRELEASE QA Contact: Johnny Liu <jialiu>
Severity: low Docs Contact:
Priority: low    
Version: 3.4.1CC: aos-bugs, jokerman, mmccomas, pep, rromerom
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-12 11:49:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ruben Romero Montes 2017-02-10 07:19:46 UTC
Description of problem:
Using ansible to redeploy the certificates, the cluster ends up with iptables enabled and firewalld masked.
The playbook execution doesn't finish because of a communication error between node and master.
Master becomes unaccessible on port 8443

Version-Release number of selected component (if applicable):
openshift v3.4.1.2
kubernetes v1.4.0+776c994
openshift-ansible-3.4.56-1.git.0.7ba9968.el7.noarch

How reproducible: I tried twice


Steps to Reproduce:
ansible-playbook -i /etc/ansible/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml --extra-vars "openshift_certificates_redeploy_ca=true"

Actual results:
iptables active
firewalld masked

TASK [restart node] ************************************************************
fatal: [node.example.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

Unable to restart service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.


Expected results:
iptables masked
firewalld active
# iptables -nvL | grep 8443
243 14580 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8443 ctstate NEW


Additional info:
Ansible output related to iptables changes

2017-02-09 12:48:24,404 p=11054 u=root |  TASK [os_firewall : Install firewalld packages] ********************************
2017-02-09 12:48:24,437 p=11054 u=root |  skipping: [master.example.com]
2017-02-09 12:48:24,454 p=11054 u=root |  TASK [os_firewall : Ensure iptables services are not enabled] ******************
2017-02-09 12:48:24,492 p=11054 u=root |  skipping: [master.example.com] => (item=iptables) 
2017-02-09 12:48:24,514 p=11054 u=root |  skipping: [master.example.com] => (item=ip6tables) 
2017-02-09 12:48:24,532 p=11054 u=root |  TASK [os_firewall : Start and enable firewalld service] ************************
2017-02-09 12:48:24,567 p=11054 u=root |  skipping: [master.example.com]
2017-02-09 12:48:24,585 p=11054 u=root |  TASK [os_firewall : need to pause here, otherwise the firewalld service starting can sometimes cause ssh to fail] ***
2017-02-09 12:48:24,616 p=11054 u=root |  skipping: [master.example.com]
2017-02-09 12:48:24,639 p=11054 u=root |  TASK [os_firewall : Add firewalld allow rules] *********************************
2017-02-09 12:48:24,697 p=11054 u=root |  TASK [os_firewall : Remove firewalld allow rules] ******************************
2017-02-09 12:48:24,748 p=11054 u=root |  TASK [os_firewall : Ensure firewalld service is not enabled] *******************
2017-02-09 12:48:25,701 p=11054 u=root |  changed: [master.example.com]
2017-02-09 12:48:25,714 p=11054 u=root |  TASK [os_firewall : Install iptables packages] *********************************
2017-02-09 12:48:26,757 p=11054 u=root |  ok: [master.example.com] => (item=[u'iptables', u'iptables-services'])
2017-02-09 12:48:26,774 p=11054 u=root |  TASK [os_firewall : Start and enable iptables service] *************************
2017-02-09 12:48:27,127 p=11054 u=root |  changed: [master.example.com]
2017-02-09 12:48:27,144 p=11054 u=root |  TASK [os_firewall : need to pause here, otherwise the iptables service starting can sometimes cause ssh to fail] ***
2017-02-09 12:48:27,201 p=11054 u=root |  Pausing for 10 seconds
2017-02-09 12:48:27,201 p=11054 u=root |  (ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
2017-02-09 12:48:37,235 p=11054 u=root |  ok: [master.example.com]
2017-02-09 12:48:37,262 p=11054 u=root |  TASK [os_firewall : Add iptables allow rules] **********************************
2017-02-09 12:48:37,304 p=11054 u=root |  TASK [os_firewall : Remove iptables rules] *************************************
2017-02-09 12:48:37,343 p=11054 u=root |  TASK [docker : Get current installed Docker version] ***************************
2017-02-09 12:48:38,315 p=11054 u=root |  ok: [master.example.com]

Comment 1 Johnny Liu 2017-02-10 10:40:40 UTC
As far as I know, 3.4 official openshift-ansible installer only support iptables [1], firewalld will be supported in 3.5. 3.4 install always enable iptables and mark firewalld. This should be an expected behavior.

How could you set up the env using firewalld? Or after you set up env, manually mark iptables service and enable firewalld, and manually add iptable rule via firewalld, then when you redeploy certificates, iptables is enabled and firewalld is marked again. Is that your case? If yes, are you opening this bug to ask installer not touch user's customized setting (enable firewalld and mark iptables) when redeploying certificates, right?

[1]: https://github.com/openshift/openshift-ansible/blob/openshift-ansible-3.4.60-1/roles/os_firewall/defaults/main.yml#L7

Comment 2 Ruben Romero Montes 2017-02-10 14:04:15 UTC
Hi Johnny,

Yes, that is my situation. I installed openshift using the community playbooks and then I executed the openshift 3.4 playbooks inside the master.

I know now why I started having the problems. But as you suggested, I don't think the certificates playbook shoult try to align this configuration related to iptables/firewalld.

Is there any reason for that?

Thanks
Ruben

Comment 3 Scott Dodson 2017-06-09 04:12:03 UTC
This should no longer be the case as we reverted the changes to default to firewalld. Can you please try with the latest code?

Comment 4 Ruben Romero Montes 2017-06-12 11:49:33 UTC
Hi Scott,

I will not be able to test it because of time constraints. But if you say this change has been reverted we can close it as "CURRENTRELEASE".

Thank you for your help.