Bug 1430384
Summary: | OSP10 -> OSP11 upgrade on IPv6 deployment get stuck during major-upgrade-composable-steps | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> |
Component: | openstack-tripleo-heat-templates | Assignee: | Sofer Athlan-Guyot <sathlang> |
Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 11.0 (Ocata) | CC: | aschultz, dbecker, jcoufal, jschluet, mburns, mcornea, michele, morazi, rhel-osp-director-maint, sathlang |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 11.0 (Ocata) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-6.0.0-0.10.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-05-17 20:06:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1394019 |
Description
Marius Cornea
2017-03-08 13:48:05 UTC
[root@overcloud-controller-1 ~]# iptables -nL Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited Chain OUTPUT (policy ACCEPT) target prot opt source destination [root@overcloud-controller-1 ~]# ip6tables -nL Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all ::/0 ::/0 state RELATED,ESTABLISHED ACCEPT icmpv6 ::/0 ::/0 ACCEPT all ::/0 ::/0 ACCEPT tcp ::/0 ::/0 state NEW tcp dpt:22 ACCEPT udp ::/0 fe80::/64 udp dpt:546 state NEW REJECT all ::/0 ::/0 reject-with icmp6-adm-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all ::/0 ::/0 reject-with icmp6-adm-prohibited Chain OUTPUT (policy ACCEPT) target prot opt source destination [root@overcloud-controller-1 ~]# [stack@instack ~]$ heat resource-list overcloud -n5 | grep -v COMPLETE WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead +----------------------------------------------+---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +----------------------------------------------+---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | AllNodesDeploySteps | f9326dbe-bf6e-4aee-8d7b-5ee57095ad66 | OS::TripleO::PostDeploySteps | CREATE_IN_PROGRESS | 2017-03-08T13:11:04Z | overcloud | | ControllerDeployment_Step1 | 0f163e3f-70e0-404b-8004-c9d553e51edb | OS::Heat::StructuredDeploymentGroup | CREATE_IN_PROGRESS | 2017-03-08T13:26:25Z | overcloud-AllNodesDeploySteps-orrronjgqs57 | | 0 | 717ee17b-db18-4de0-9fec-08a50a119358 | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2017-03-08T13:27:09Z | overcloud-AllNodesDeploySteps-orrronjgqs57-ControllerDeployment_Step1-bc2ca62qd4bj | | 1 | b44a6bbb-1bb4-40e0-95af-60aa71e8d0a1 | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2017-03-08T13:27:09Z | overcloud-AllNodesDeploySteps-orrronjgqs57-ControllerDeployment_Step1-bc2ca62qd4bj | | 2 | eb4f2387-9aeb-4009-a754-452e4ebc1528 | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2017-03-08T13:27:10Z | overcloud-AllNodesDeploySteps-orrronjgqs57-ControllerDeployment_Step1-bc2ca62qd4bj | +----------------------------------------------+---------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+---------------------------------------------------------------------------------------------------------------------------------------+ So for the record we had a similar (same) issue in M->N upgrades but with ipv4, which we worked around with: https://github.com/openstack/tripleo-heat-templates/commit/ae8aac36143d5dadb08af0d275f513678909dcc7 But in that case we went from firewall off by default to apply firewall rules which created this disruption. The reason we have blocked traffic is likely this one: https://bugs.launchpad.net/tripleo/+bug/1657108 (See https://tickets.puppetlabs.com/browse/MODULES-3184) What I am not sure though is as to why /etc/sysconfig/ip[6]tables is populated with stock rules. Those files should be already populated with the right rules after the osp10 deployment. Marius would it be possible to get sosreports from controller0 after osp10 deployment and after the failed osp11 upgrade? The only theory I can think of is that you deployed osp10 with firewall disabled and we enabled it when moving to the osp11 templates. Would that be a possible theory? (In reply to Michele Baldessari from comment #2) > So for the record we had a similar (same) issue in M->N upgrades but with > ipv4, > which we worked around with: > https://github.com/openstack/tripleo-heat-templates/commit/ > ae8aac36143d5dadb08af0d275f513678909dcc7 > > But in that case we went from firewall off by default to apply firewall > rules which created this disruption. > > The reason we have blocked traffic is likely this one: > https://bugs.launchpad.net/tripleo/+bug/1657108 > (See https://tickets.puppetlabs.com/browse/MODULES-3184) > > What I am not sure though is as to why /etc/sysconfig/ip[6]tables is > populated with stock rules. Those files should be already populated with the > right rules after the osp10 deployment. > > Marius would it be possible to get sosreports from controller0 after osp10 > deployment and after the failed osp11 upgrade? > > The only theory I can think of is that you deployed osp10 with firewall > disabled and we enabled it when moving to the osp11 templates. Would that be > a possible theory? I think what happens is that during the OSP10 deployment the default firewall rules are there but the ip6tables service is not running so they're not applied. During the upgrade the ip6tables service gets started thus the rules set in /etc/sysconfig/ip6tables get applied and block the ipv6 traffic: http://paste.openstack.org/show/602106/ (In reply to Marius Cornea from comment #3) > (In reply to Michele Baldessari from comment #2) > > So for the record we had a similar (same) issue in M->N upgrades but with > > ipv4, > > which we worked around with: > > https://github.com/openstack/tripleo-heat-templates/commit/ > > ae8aac36143d5dadb08af0d275f513678909dcc7 > > > > But in that case we went from firewall off by default to apply firewall > > rules which created this disruption. > > > > The reason we have blocked traffic is likely this one: > > https://bugs.launchpad.net/tripleo/+bug/1657108 > > (See https://tickets.puppetlabs.com/browse/MODULES-3184) > > > > What I am not sure though is as to why /etc/sysconfig/ip[6]tables is > > populated with stock rules. Those files should be already populated with the > > right rules after the osp10 deployment. > > > > Marius would it be possible to get sosreports from controller0 after osp10 > > deployment and after the failed osp11 upgrade? > > > > The only theory I can think of is that you deployed osp10 with firewall > > disabled and we enabled it when moving to the osp11 templates. Would that be > > a possible theory? > > I think what happens is that during the OSP10 deployment the default > firewall rules are there but the ip6tables service is not running so they're > not applied. During the upgrade the ip6tables service gets started thus the > rules set in /etc/sysconfig/ip6tables get applied and block the ipv6 traffic: > > http://paste.openstack.org/show/602106/ Hi Marius, thanks, yes that explains it fully. I think the bug though is that OSP10 does not have ip6tables running and with the proper rules configured, no? Unless you are deploying OSP10 with ManageFirewall: false and then in the OSP11 upgrade you set ManageFirewall: true, in which case this problem is probably expected (although I am assuming you are not doing this). Am I correct in assuming that we do not have https://github.com/openstack/puppet-tripleo/commit/8c990738900cd74c2c5c046435517393d1afb92e in our OSP10 puppet-tripleo packages? If you can confirm that that is indeed the case, then I think we have two options: A) We backport it to OSP10 and that way we should not hit this issue during upgrades B) We come up with some hack to open up ip6tables traffic at the beginning of the upgrade as it will be reinstantiated during the converge. If you instead confirm that the patch is already in your OSP10 deployments, we should probably investigate why the rules are not populated. Ops I forgot I have your sosreports ;) I can confirm that in puppet-tripleo-5.5.0-3.el7ost.noarch there is no ipv6 support yet. So the problem is fully explained. Let's discuss today how we should best proceed. (In reply to Michele Baldessari from comment #5) > Hi Marius, > > thanks, yes that explains it fully. I think the bug though is that OSP10 > does not have ip6tables running and with the proper rules configured, no? > Unless you are deploying OSP10 with ManageFirewall: false and then in the > OSP11 upgrade you set ManageFirewall: true, in which case this problem is > probably expected (although I am assuming you are not doing this). During the OSP10 deployment I'm not manually setting the ManageFirewall parameter so I guess the default one is used. > Am I correct in assuming that we do not have > https://github.com/openstack/puppet-tripleo/commit/ > 8c990738900cd74c2c5c046435517393d1afb92e in our OSP10 puppet-tripleo > packages? If you can confirm that that is indeed the case, then I think we > have two options: > A) We backport it to OSP10 and that way we should not hit this issue during > upgrades > B) We come up with some hack to open up ip6tables traffic at the beginning > of the upgrade as it will be reinstantiated during the converge. Yes, I can confirm that we don't have the patch in OSP10(puppet-tripleo-5.5.0-4.el7ost.noarch). > If you instead confirm that the patch is already in your OSP10 deployments, > we should probably investigate why the rules are not populated. Going with the blank previous rule road. OSP10 ipv6 firewall should go in another bz with z-stream delivery if required. Point to ocata branch. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245 |