Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1613066

Summary: Reboot of v3.10 master looses iptables rules/chain
Product: OpenShift Container Platform Reporter: Øystein Bedin <obedin>
Component: InstallerAssignee: Russell Teague <rteague>
Status: CLOSED NOTABUG QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, jokerman, mmccomas, obedin
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-08 20:40:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
before reboot iptables output
none
after reboot iptables output none

Description Øystein Bedin 2018-08-06 22:32:06 UTC
Description of problem:
After a fresh install of v3.10, the cluster is up and operational with all nodes registered to the master and web console available. However, after a reboot of the master, there are iptables rules lost and hence nodes are failing to re-register and web console is unavailable. 


Version-Release number of selected component (if applicable):
3.10.14


How reproducible:
Reboot master of a v3.10 installed cluster


Steps to Reproduce:
1. Install a v3.10 cluster 
2. Reboot master
3. Observe that nodes are in a "NotReady" state and web console not available

Actual results:
See step 3 above

Expected results:
Nodes re-register after master reboot and web console is accessible 

Additional info:

Comment 1 Øystein Bedin 2018-08-06 22:34:01 UTC
Created attachment 1473789 [details]
before reboot iptables output

Comment 2 Øystein Bedin 2018-08-06 22:34:39 UTC
Created attachment 1473790 [details]
after reboot iptables output

Comment 3 Øystein Bedin 2018-08-06 22:36:39 UTC
Note in the attached files how 'OS_FIREWALL_ALLOW' chain is not there after the reboot, and hence missing important rules like the ones for 443, 2379, etc. 

Unless I'm missing something, this seems broken to me. Considering how the nodes aren't re-registering and the web console isn't accessible, I'm suspecting that this is a bug.

Comment 4 Øystein Bedin 2018-08-07 00:31:31 UTC
Running iptables-restore fixes the issue, but I'm not sure if it may be introducing some other issues:

> iptables-restore < /etc/sysconfig/iptables

Comment 5 Scott Dodson 2018-08-07 14:43:45 UTC
What's the status of `iptables` service post reboot?

Comment 6 Øystein Bedin 2018-08-07 15:03:54 UTC
From what I recall, "firewalld" was not running post-install and "iptables" was active. However, now, post-reboot, I see "firewalld" running and "iptables" deactivated.

Comment 7 Scott Dodson 2018-08-07 17:08:21 UTC
Any chance something external is affect that state? The service should be enabled.

https://github.com/openshift/openshift-ansible/blob/master/roles/os_firewall/tasks/iptables.yml#L30-L40

Comment 9 Øystein Bedin 2018-08-08 12:54:04 UTC
The task outlined in the link above (iptables.yml) does not run at all. I captured the output of a run and the "Start and enable iptables service" task is not part of the "openshift-ansible" run output. 

Also, here's the state of "iptables" and "firewalld" after install, BEFORE reboot:

> systemctl status iptables
  iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
> systemctl status firewalld
  firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead)

Comment 10 Scott Dodson 2018-08-08 14:07:36 UTC
Thanks, we'll look at it. If complete logs from ansible-playbook and an inventory are available that will help explain why the tasks were skipped.

Comment 14 Russell Teague 2018-08-08 19:24:34 UTC
@Øystein,
The os_firewall role is run once as part of the prerequisites.yml playbook.  This is why you are not seeing the task mentioned above in your playbook run.  As part of an install, the prerequisites.yml playbook should be run before deploy_cluster.yml.

https://docs.openshift.com/container-platform/3.10/install/running_install.html#running-the-advanced-installation-rpm

Please run prerequisites.yml and see if this problem persists.

Comment 15 Øystein Bedin 2018-08-08 19:41:54 UTC
@Russell - ahh, yes, that's probably it. I will give it another try with prerequisites before deployment. Sorry for missing that part.

Comment 16 Øystein Bedin 2018-08-08 20:30:10 UTC
@Russell + @Scott - thank you, that was it.