Bug 1507225

Summary: OSP11 -> OSP12 upgrade: undercloud upgrade removes the '125 heat ipv4' iptables rule making the overcloud nodes unable to reach the heat_api_cfn service during upgrade
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: puppet-firewallAssignee: RHOS Maint <rhos-maint>
Status: CLOSED CURRENTRELEASE QA Contact: nlevinki <nlevinki>
Severity: medium Docs Contact:
Priority: medium    
Version: 12.0 (Pike)CC: aschultz, ccamacho, dbecker, ekuris, emacchi, jjoyce, jschluet, mandreou, mburns, mcornea, morazi, rhel-osp-director-maint, sathlang, slinaber, tvignaud
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-09 18:34:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1516634, 1516952    
Attachments:
Description Flags
undercloud upgrade output
none
/etc/sysconfig/iptables.save none

Description Marius Cornea 2017-10-28 13:18:40 UTC
Created attachment 1344747 [details]
undercloud upgrade output

Description of problem:

OSP11 -> OSP12 upgrade: undercloud upgrade removes the '125 heat ipv4' iptables rule making the overcloud nodes unable to reach the heat_api_cfn service during upgrade. This results in the upgrade process getting stuck when running 

## before upgrade
[stack@undercloud-0 ~]$ sudo iptables -nL | grep heat
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 8004 /* 100 heat_api_haproxy ipv4 */ state NEW
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 13004 /* 100 heat_api_haproxy_ssl ipv4 */ state NEW
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 8000,13800,8003,13003,8004,13004 /* 125 heat ipv4 */ state NEW

[stack@undercloud-0 ~]$ sudo yum -y update python-tripleoclient python-openstackclient
[stack@undercloud-0 ~]$ openstack undercloud upgrade
[..]
## after upgrade
[stack@undercloud-0 ~]$ sudo iptables -nL | grep heat
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 8004 /* 100 heat_api_haproxy ipv4 */ state NEW
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 13004 /* 100 heat_api_haproxy_ssl ipv4 */ state NEW


Version-Release number of selected component (if applicable):
instack-7.0.1-1.el7ost.noarch
instack-undercloud-7.4.2-0.20171010064304.el7ost.noarch


How reproducible:
100 on deployments with custom roles

Steps to Reproduce:
1. Deploy OSP11 with 3cont_3db_3msg_2net_2comp_3ceph
2. Upgrade undercloud to OSP12
3. Check iptables rules on undercloud

Actual results:
There is no rule allowing access to port 8000

Expected results:
Access is allowed to port 8000

Additional info:
Not sure how this is related but this is not reproducible on setups with less number of nodes(3 controller, 2 computes, 3 ceph). 

Attaching the undercloud upgrade output and log.

Comment 1 Marius Cornea 2017-10-28 13:34:09 UTC
Created attachment 1344760 [details]
/etc/sysconfig/iptables.save

Comment 2 Marius Cornea 2017-10-28 13:37:51 UTC
In /etc/sysconfig/iptables.save:

the '125 heat ipv4' line got replaced by:

-A INPUT -p tcp -m multiport --dports 8977,13779 -m state --state NEW -m comment --comment "143 panko-api ipv4" -j ACCEPT

which is a duplicate of line 151:
 
-A INPUT -p tcp -m multiport --dports 8779,13779 -m comment --comment "143 panko-api ipv4" -m state --state NEW -j ACCEPT

Comment 6 Sofer Athlan-Guyot 2017-11-03 12:58:16 UTC
Hi,

the issue is there since forever in the puppet-firewall module.  To trigger it you need:

 - not managed by puppet firewall rules;
 - update a rule managed by puppet which is after those not managed one.

the net effect is that puppet wrongly update the a “random” rule.

The fix is either:
 - fix puppet-firewall (doing);
 - make sure that all your “outside puppet” firewall rules:
   * have no comment at all;
   * or have a comment that match “\d{3}: \S+”, like “800: blah“
     (make sure your numbering doesn’t0 collide with number used by
     tripleo, current 8XX is safe

The last two bullets point can serve as workaround.  Those are modification you should do *before* the undercloud update.

External tracker for the full story:

 - https://github.com/puppetlabs/puppetlabs-firewall/pull/729
 - https://tickets.puppetlabs.com/browse/MODULES-5924?filter=-2

Comment 7 Sofer Athlan-Guyot 2017-11-03 13:24:22 UTC
Was not that clear in the previous comment.  Let me rephrase it with an example.

If you add, outside of puppet, INPUT rules whose comment is "Infrared: vbmc ports" 16 times, then puppet will update the 80th rule instead of updating the 96th rule as it misses them in its bookkeeping.  As a net result you have overwritten the 80th rule and still have the old rule in the 96th position.

That may be worth a doc bug:

"If you add custom iptables rules on the undercloud, make sure that either:
 - their comment is empty;
 - follows "8\d\d: blah" style

Comment 8 Sofer Athlan-Guyot 2017-11-06 10:01:22 UTC
Hi,

slight correction.  You have to use

 - "8\d\d blah" -> no colon.

(8 is only a currently free range)

The actual regex is there[1].  

[1] https://github.com/puppetlabs/puppetlabs-firewall/blob/master/lib/puppet/provider/firewall/iptables.rb#L613

Comment 12 Marius Cornea 2017-11-23 09:40:11 UTC
We hit this bug on an environment with a lower number of nodes (5) where the iptables rule for port 8787 got replaced by the panko-api one. Workaround

 sudo iptables -I INPUT -p tcp -m multiport --dports 8787,13787 -m comment --comment "138 docker registry ipv4" -m state --state NEW -j ACCEPT

Comment 15 Jakub Libosvar 2017-11-29 12:06:54 UTC
*** Bug 1516634 has been marked as a duplicate of this bug. ***

Comment 17 Marius Cornea 2017-11-30 17:59:26 UTC
Removing the blocker flag as this bug shows up only on particular test environments.