Bug 1448420 - OSP9 -> OSP10 upgrade fails because httpd is unable to start
Summary: OSP9 -> OSP10 upgrade fails because httpd is unable to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: z3
: 10.0 (Newton)
Assignee: Marios Andreou
QA Contact: Marius Cornea
URL:
Whiteboard:
: 1455640 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-05 12:05 UTC by Marius Cornea
Modified: 2017-06-28 14:50 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-heat-templates-5.2.0-16.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-28 14:50:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 463529 0 None MERGED [Newton only] - Manually touch ssl.conf before installing mod_ssl 2020-02-27 21:09:40 UTC
Red Hat Product Errata RHBA-2017:1585 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 director Bug Fix Advisory 2017-06-28 18:42:51 UTC

Description Marius Cornea 2017-05-05 12:05:25 UTC
Description of problem:
OSP9 -> OSP10 upgrade fails because httpd is unable to start:

[stack@undercloud-0 ~]$ openstack stack failures list overcloud
overcloud.UpdateWorkflow.ControllerPacemakerUpgradeDeployment_Step4.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 12dc669e-03a8-4d41-be88-a32305a8bb4e
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
  deploy_stdout: |
    ...
    Fri May  5 11:45:28 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Node is bootstrap checking rabbitmq to be started here
    rabbitmq has started
    Fri May  5 11:45:38 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Starting or enabling redis
    Fri May  5 11:45:38 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Going to pcs resource enable redis
    Fri May  5 11:45:40 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Node is bootstrap checking redis to be started here
    redis has started
    Fri May  5 11:45:42 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Starting or enabling openstack-cinder-volume
    Fri May  5 11:45:42 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Going to pcs resource enable openstack-cinder-volume
    Fri May  5 11:45:44 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Node is bootstrap checking openstack-cinder-volume to be started here
    openstack-cinder-volume has started
    (truncated, view all with --long)
  deploy_stderr: |
    Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details.


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.2.0-15.el7ost.noarch

How reproducible:
2/2

Steps to Reproduce:
1. Deploy OSP9 latest
2. Upgrade to OSP10 latest

Actual results:
Upgrade fails during major-upgrade-pacemaker.yaml

Expected results:
Upgrade completes ok.

Additional info:

HAProxy is already binding on 443:

[root@controller-0 heat-admin]# netstat -tupan | grep 443
tcp        0      0 172.17.1.10:443         0.0.0.0:*               LISTEN      25337/haproxy       
tcp        0      0 10.0.0.101:443          0.0.0.0:*               LISTEN      25337/haproxy       

[root@controller-0 heat-admin]# httpd
[Fri May 05 12:02:57.924999 2017] [so:warn] [pid 20373] AH01574: module access_compat_module is already loaded, skipping
[Fri May 05 12:02:57.925181 2017] [so:warn] [pid 20373] AH01574: module actions_module is already loaded, skipping
[Fri May 05 12:02:57.925188 2017] [so:warn] [pid 20373] AH01574: module alias_module is already loaded, skipping
[Fri May 05 12:02:57.925265 2017] [so:warn] [pid 20373] AH01574: module auth_basic_module is already loaded, skipping
[Fri May 05 12:02:57.925272 2017] [so:warn] [pid 20373] AH01574: module auth_digest_module is already loaded, skipping
[Fri May 05 12:02:57.925293 2017] [so:warn] [pid 20373] AH01574: module authn_anon_module is already loaded, skipping
[Fri May 05 12:02:57.925298 2017] [so:warn] [pid 20373] AH01574: module authn_core_module is already loaded, skipping
[Fri May 05 12:02:57.925372 2017] [so:warn] [pid 20373] AH01574: module authn_dbm_module is already loaded, skipping
[Fri May 05 12:02:57.925379 2017] [so:warn] [pid 20373] AH01574: module authn_file_module is already loaded, skipping
[Fri May 05 12:02:57.925457 2017] [so:warn] [pid 20373] AH01574: module authz_core_module is already loaded, skipping
[Fri May 05 12:02:57.925533 2017] [so:warn] [pid 20373] AH01574: module authz_dbm_module is already loaded, skipping
[Fri May 05 12:02:57.925540 2017] [so:warn] [pid 20373] AH01574: module authz_groupfile_module is already loaded, skipping
[Fri May 05 12:02:57.925549 2017] [so:warn] [pid 20373] AH01574: module authz_host_module is already loaded, skipping
[Fri May 05 12:02:57.925556 2017] [so:warn] [pid 20373] AH01574: module authz_owner_module is already loaded, skipping
[Fri May 05 12:02:57.925563 2017] [so:warn] [pid 20373] AH01574: module authz_user_module is already loaded, skipping
[Fri May 05 12:02:57.925569 2017] [so:warn] [pid 20373] AH01574: module autoindex_module is already loaded, skipping
[Fri May 05 12:02:57.925575 2017] [so:warn] [pid 20373] AH01574: module cache_module is already loaded, skipping
[Fri May 05 12:02:57.925828 2017] [so:warn] [pid 20373] AH01574: module deflate_module is already loaded, skipping
[Fri May 05 12:02:57.925840 2017] [so:warn] [pid 20373] AH01574: module dir_module is already loaded, skipping
[Fri May 05 12:02:57.925974 2017] [so:warn] [pid 20373] AH01574: module env_module is already loaded, skipping
[Fri May 05 12:02:57.925980 2017] [so:warn] [pid 20373] AH01574: module expires_module is already loaded, skipping
[Fri May 05 12:02:57.925987 2017] [so:warn] [pid 20373] AH01574: module ext_filter_module is already loaded, skipping
[Fri May 05 12:02:57.925992 2017] [so:warn] [pid 20373] AH01574: module filter_module is already loaded, skipping
[Fri May 05 12:02:57.926078 2017] [so:warn] [pid 20373] AH01574: module include_module is already loaded, skipping
[Fri May 05 12:02:57.926193 2017] [so:warn] [pid 20373] AH01574: module log_config_module is already loaded, skipping
[Fri May 05 12:02:57.926201 2017] [so:warn] [pid 20373] AH01574: module logio_module is already loaded, skipping
[Fri May 05 12:02:57.926206 2017] [so:warn] [pid 20373] AH01574: module mime_magic_module is already loaded, skipping
[Fri May 05 12:02:57.926212 2017] [so:warn] [pid 20373] AH01574: module mime_module is already loaded, skipping
[Fri May 05 12:02:57.926217 2017] [so:warn] [pid 20373] AH01574: module negotiation_module is already loaded, skipping
[Fri May 05 12:02:57.926377 2017] [so:warn] [pid 20373] AH01574: module rewrite_module is already loaded, skipping
[Fri May 05 12:02:57.926384 2017] [so:warn] [pid 20373] AH01574: module setenvif_module is already loaded, skipping
[Fri May 05 12:02:57.926768 2017] [so:warn] [pid 20373] AH01574: module status_module is already loaded, skipping
[Fri May 05 12:02:57.926777 2017] [so:warn] [pid 20373] AH01574: module substitute_module is already loaded, skipping
[Fri May 05 12:02:57.926783 2017] [so:warn] [pid 20373] AH01574: module suexec_module is already loaded, skipping
[Fri May 05 12:02:57.926860 2017] [so:warn] [pid 20373] AH01574: module unixd_module is already loaded, skipping
[Fri May 05 12:02:57.926951 2017] [so:warn] [pid 20373] AH01574: module version_module is already loaded, skipping
[Fri May 05 12:02:57.926961 2017] [so:warn] [pid 20373] AH01574: module vhost_alias_module is already loaded, skipping
[Fri May 05 12:02:57.926984 2017] [so:warn] [pid 20373] AH01574: module dav_module is already loaded, skipping
[Fri May 05 12:02:57.926990 2017] [so:warn] [pid 20373] AH01574: module dav_fs_module is already loaded, skipping
[Fri May 05 12:02:57.927850 2017] [so:warn] [pid 20373] AH01574: module mpm_prefork_module is already loaded, skipping
[Fri May 05 12:02:57.936800 2017] [so:warn] [pid 20373] AH01574: module systemd_module is already loaded, skipping
[Fri May 05 12:02:57.936903 2017] [so:warn] [pid 20373] AH01574: module cgi_module is already loaded, skipping
[Fri May 05 12:02:57.943869 2017] [alias:warn] [pid 20373] AH00671: The Alias directive in /etc/httpd/conf.d/autoindex.conf at line 21 will probably never match because it overlaps an earlier Alias.
(98)Address already in use: AH00073: make_sock: unable to listen for connections on address [::]:443
(98)Address already in use: AH00073: make_sock: unable to listen for connections on address 0.0.0.0:443
no listening sockets available, shutting down
AH00015: Unable to open logs

Comment 1 Marios Andreou 2017-05-05 14:00:58 UTC
i think this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1441977 and fixed by the stable/newton backport at https://review.openstack.org/#/c/460560/2 - it landed 2 days ago. @mcornea can you check if you had this in the environment? grep -rn "include ::apache::mod::ssl" /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/keystone.pp for example on any of the overcloud nodes should tell us.

Comment 2 Marius Cornea 2017-05-05 15:03:16 UTC
(In reply to marios from comment #1)
> i think this is related to
> https://bugzilla.redhat.com/show_bug.cgi?id=1441977 and fixed by the
> stable/newton backport at https://review.openstack.org/#/c/460560/2 - it
> landed 2 days ago. @mcornea can you check if you had this in the
> environment? grep -rn "include ::apache::mod::ssl"
> /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/keystone.
> pp for example on any of the overcloud nodes should tell us.

[root@controller-0 heat-admin]# grep -rn "include ::apache::mod::ssl" /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/keystone.pp
90:    include ::apache::mod::ssl

[root@controller-0 heat-admin]# rpm -qa | grep puppet-tripleo
puppet-tripleo-5.5.0-12.el7ost.noarch

Comment 3 Marios Andreou 2017-05-05 15:44:20 UTC
Hey Marius thanks for giving me access to the box - the env does already have https://review.openstack.org/#/c/460560/ [0]. Still think it is related though and we may be missing the other part of the fix for BZ 1441977 https://review.openstack.org/#/c/460555 on OSP9.

Its a bit confusing because the tripleo-heat-templates on the undercloud (openstack-tripleo-heat-templates-5.2.0-15.el7ost.noarch) *do* have the relevant "touch ssl.conf" line but they are the latest for OSP10. This 'fix'/workaround needs to happen during the minor update, so the fix is needed in OSP9 first and afaics we do not have this workaround in latest OSP9 tripleo-heat-templates. 

I believe it goes something like, during the update we touch ssl.conf to prevent the update of mod_ssl from creating /etc/httpd/conf.d/ssl.conf since that contains a "Listen 443" line causing the conflict. If we touch it before the update it won't get created/updated by mod_ssl. I think Lucas/Sofer can validate my understanding as they worked on the related bug. I'll revisit early next week.

For testing, you'd need to run the minor update with https://review.openstack.org/#/c/460555/2 included before doing the major upgrade with https://review.openstack.org/#/c/458033/ (which is already in OSP10 and this env afaics).

thanks. 
   
    
[0] 
[root@controller-0 httpd]# grep -rn "include ::apache::mod::ssl" /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/
/usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/keystone.pp:90:    include ::apache::mod::ssl
/usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/aodh/api.pp:40:    include ::apache::mod::ssl
/usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/ceilometer/api.pp:33:    include ::apache::mod::ssl
/usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/gnocchi/api.pp:53:    include ::apache::mod::ssl

Comment 5 Marios Andreou 2017-05-09 09:37:10 UTC
FYI/more info as I'm looking at this some more today - the actual problem here is that the overcloud nodes have an ssl.conf with an uncommented 'Listen 443' in it causing the conflict as in BZ 1441977 - from the environment when I checked on friday like:

[root@controller-0 httpd]# grep Listen /etc/httpd/conf.d/ssl.conf 
Listen 443 https

Its not clear to me why you didn't hit this issue on minor update and how you ended up with the file created and the previous stack update operation completed. But this ^^^ (ssl.conf with Listen) can be prevented by https://review.openstack.org/#/c/460555/ - for now you can even try sudo mv /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.BACKUP; and re-run the upgrade. Since we have https://review.openstack.org/#/c/460560/ in the environment already it should re-create it without that listen line it it.

Comment 6 Marius Cornea 2017-05-09 09:43:11 UTC
(In reply to marios from comment #5)
> FYI/more info as I'm looking at this some more today - the actual problem
> here is that the overcloud nodes have an ssl.conf with an uncommented
> 'Listen 443' in it causing the conflict as in BZ 1441977 - from the
> environment when I checked on friday like:
> 
> [root@controller-0 httpd]# grep Listen /etc/httpd/conf.d/ssl.conf 
> Listen 443 https
> 
> Its not clear to me why you didn't hit this issue on minor update and how
> you ended up with the file created and the previous stack update operation
> completed. But this ^^^ (ssl.conf with Listen) can be prevented by
> https://review.openstack.org/#/c/460555/ - for now you can even try sudo mv
> /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.BACKUP; and re-run the
> upgrade. Since we have https://review.openstack.org/#/c/460560/ in the
> environment already it should re-create it without that listen line it it.

I didn't run minor update - I started with latest OSP9 and then upgraded to OSP10.

Comment 7 Marios Andreou 2017-05-09 10:05:22 UTC
@mcornea I just found the OSP9 clone for the ssl.conf issue at BZ 1446289 and added a comment there.

OK I think I understand a bit better now. On OSP9 you didn't have mod_ssl - or at least shouldn't have as per the discussion about this on the BZ 1446289. However we landed this https://review.openstack.org/#/c/461060/ which will install mod_ssl on the mitaka to newton upgrade (as here). I saw that in the env in fact mod_ssl was installed on friday:

[root@controller-0 httpd]# grep ssl /var/log/yum.log 
May 05 14:50:08 Updated: erlang-ssl-18.3.4.4-1.el7ost.x86_64
May 05 14:54:14 Installed: 1:mod_ssl-2.4.6-45.el7_3.4.x86_64

I think the fix then could be adding a 'touch ssl.conf' before the installation of mod_ssl.

So, to confirm this mcornea, before running the upgrade, can you:

1. confirm you don't have mod_ssl installed on OSP9
2. manually 'create' the ssl.conf file with "touch /etc/httpd/conf.d/ssl.conf" on all the overcloud nodes (well controllers really or wherever httpd is running but shouldn't hurt everywhere)

Once we confirm we can add that into the upgrade script for stable/newton.

Comment 8 Marios Andreou 2017-05-09 10:07:37 UTC
(In reply to Marius Cornea from comment #6)
> (In reply to marios from comment #5)
...

> > Its not clear to me why you didn't hit this issue on minor update and how
> > you ended up with the file created and the previous stack update operation
> > completed. But this ^^^ (ssl.conf with Listen) can be prevented by
> > https://review.openstack.org/#/c/460555/ - for now you can even try sudo mv
> > /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.BACKUP; and re-run the
> > upgrade. Since we have https://review.openstack.org/#/c/460560/ in the
> > environment already it should re-create it without that listen line it it.
> 
> I didn't run minor update - I started with latest OSP9 and then upgraded to
> OSP10.


ACK yeah I understand a bit better... I think it is a case we missed with all the mod_ssl workarounds for the different branches. 

Here we start with OSP9 without mod_ssl, and during the upgrade we actually install it (see comment #7) and it creates the ssl.conf with the problematic Listen 443. I am hoping we can prevent that by doing a touch on the file before running the upgrade, even if we don't have mod_ssl installed at that point

Comment 9 Marios Andreou 2017-05-09 11:42:41 UTC
@mcornea I posted this for stable/newton: https://review.openstack.org/#/c/463529/ - can you try it (unless you've already started a manual verification) ... it just adds the touch before the yum install for mod_ssl.


    sudo cp -r /usr/share/openstack-tripleo-heat-templates /usr/share/openstack-tripleo-heat-templates.ORIG
    curl https://review.openstack.org/changes/463529/revisions/current/patch?download | \
        base64 -d | sudo patch  -d /usr/share/openstack-tripleo-heat-templates/ -p1


should do it unless there are merge conflicts

Comment 10 Marius Cornea 2017-05-09 12:40:58 UTC
(In reply to marios from comment #9)
> @mcornea I posted this for stable/newton:
> https://review.openstack.org/#/c/463529/ - can you try it (unless you've
> already started a manual verification) ... it just adds the touch before the
> yum install for mod_ssl.
> 
> 
>     sudo cp -r /usr/share/openstack-tripleo-heat-templates
> /usr/share/openstack-tripleo-heat-templates.ORIG
>     curl
> https://review.openstack.org/changes/463529/revisions/current/patch?download
> | \
>         base64 -d | sudo patch  -d
> /usr/share/openstack-tripleo-heat-templates/ -p1
> 
> 
> should do it unless there are merge conflicts

Manually creating empty ssl.conf before starting upgrade worked and major-upgrade-pacemaker.yaml completed fine.

Comment 14 Lukas Bezdicka 2017-06-21 10:46:32 UTC
*** Bug 1455640 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2017-06-28 14:50:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1585


Note You need to log in before you can comment on or make changes to this bug.