Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1613066

Summary:

Reboot of v3.10 master looses iptables rules/chain

Product:

OpenShift Container Platform

Reporter:

Øystein Bedin <obedin>

Component:

Installer

Assignee:

Russell Teague <rteague>

Status:

CLOSED NOTABUG

QA Contact:

Johnny Liu <jialiu>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.10.0

CC:

aos-bugs, jokerman, mmccomas, obedin

Target Milestone:

---

Target Release:

3.10.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-08-08 20:40:13 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
before reboot iptables output	none
after reboot iptables output	none

Description Øystein Bedin 2018-08-06 22:32:06 UTC

Description of problem:
After a fresh install of v3.10, the cluster is up and operational with all nodes registered to the master and web console available. However, after a reboot of the master, there are iptables rules lost and hence nodes are failing to re-register and web console is unavailable. 


Version-Release number of selected component (if applicable):
3.10.14


How reproducible:
Reboot master of a v3.10 installed cluster


Steps to Reproduce:
1. Install a v3.10 cluster 
2. Reboot master
3. Observe that nodes are in a "NotReady" state and web console not available

Actual results:
See step 3 above

Expected results:
Nodes re-register after master reboot and web console is accessible 

Additional info:

Comment 1 Øystein Bedin 2018-08-06 22:34:01 UTC

Created attachment 1473789 [details]
before reboot iptables output

Comment 2 Øystein Bedin 2018-08-06 22:34:39 UTC

Created attachment 1473790 [details]
after reboot iptables output

Comment 3 Øystein Bedin 2018-08-06 22:36:39 UTC

Note in the attached files how 'OS_FIREWALL_ALLOW' chain is not there after the reboot, and hence missing important rules like the ones for 443, 2379, etc. 

Unless I'm missing something, this seems broken to me. Considering how the nodes aren't re-registering and the web console isn't accessible, I'm suspecting that this is a bug.

Comment 4 Øystein Bedin 2018-08-07 00:31:31 UTC

Running iptables-restore fixes the issue, but I'm not sure if it may be introducing some other issues:

> iptables-restore < /etc/sysconfig/iptables

Comment 5 Scott Dodson 2018-08-07 14:43:45 UTC

What's the status of `iptables` service post reboot?

Comment 6 Øystein Bedin 2018-08-07 15:03:54 UTC

From what I recall, "firewalld" was not running post-install and "iptables" was active. However, now, post-reboot, I see "firewalld" running and "iptables" deactivated.

Comment 7 Scott Dodson 2018-08-07 17:08:21 UTC

Any chance something external is affect that state? The service should be enabled.

https://github.com/openshift/openshift-ansible/blob/master/roles/os_firewall/tasks/iptables.yml#L30-L40

Comment 9 Øystein Bedin 2018-08-08 12:54:04 UTC

The task outlined in the link above (iptables.yml) does not run at all. I captured the output of a run and the "Start and enable iptables service" task is not part of the "openshift-ansible" run output. 

Also, here's the state of "iptables" and "firewalld" after install, BEFORE reboot:

> systemctl status iptables
  iptables.service - IPv4 firewall with iptables
   Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
> systemctl status firewalld
  firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead)

Comment 10 Scott Dodson 2018-08-08 14:07:36 UTC

Thanks, we'll look at it. If complete logs from ansible-playbook and an inventory are available that will help explain why the tasks were skipped.

Comment 14 Russell Teague 2018-08-08 19:24:34 UTC

@Øystein,
The os_firewall role is run once as part of the prerequisites.yml playbook.  This is why you are not seeing the task mentioned above in your playbook run.  As part of an install, the prerequisites.yml playbook should be run before deploy_cluster.yml.

https://docs.openshift.com/container-platform/3.10/install/running_install.html#running-the-advanced-installation-rpm

Please run prerequisites.yml and see if this problem persists.

Comment 15 Øystein Bedin 2018-08-08 19:41:54 UTC

@Russell - ahh, yes, that's probably it. I will give it another try with prerequisites before deployment. Sorry for missing that part.

Comment 16 Øystein Bedin 2018-08-08 20:30:10 UTC

@Russell + @Scott - thank you, that was it.