Description of problem: OSP9 -> OSP10 upgrade fails because httpd is unable to start: [stack@undercloud-0 ~]$ openstack stack failures list overcloud overcloud.UpdateWorkflow.ControllerPacemakerUpgradeDeployment_Step4.0: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: 12dc669e-03a8-4d41-be88-a32305a8bb4e status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1 deploy_stdout: | ... Fri May 5 11:45:28 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Node is bootstrap checking rabbitmq to be started here rabbitmq has started Fri May 5 11:45:38 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Starting or enabling redis Fri May 5 11:45:38 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Going to pcs resource enable redis Fri May 5 11:45:40 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Node is bootstrap checking redis to be started here redis has started Fri May 5 11:45:42 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Starting or enabling openstack-cinder-volume Fri May 5 11:45:42 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Going to pcs resource enable openstack-cinder-volume Fri May 5 11:45:44 UTC 2017 69d25c64-db27-450a-836d-3850761bac87 tripleo-upgrade controller-0 Node is bootstrap checking openstack-cinder-volume to be started here openstack-cinder-volume has started (truncated, view all with --long) deploy_stderr: | Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-5.2.0-15.el7ost.noarch How reproducible: 2/2 Steps to Reproduce: 1. Deploy OSP9 latest 2. Upgrade to OSP10 latest Actual results: Upgrade fails during major-upgrade-pacemaker.yaml Expected results: Upgrade completes ok. Additional info: HAProxy is already binding on 443: [root@controller-0 heat-admin]# netstat -tupan | grep 443 tcp 0 0 172.17.1.10:443 0.0.0.0:* LISTEN 25337/haproxy tcp 0 0 10.0.0.101:443 0.0.0.0:* LISTEN 25337/haproxy [root@controller-0 heat-admin]# httpd [Fri May 05 12:02:57.924999 2017] [so:warn] [pid 20373] AH01574: module access_compat_module is already loaded, skipping [Fri May 05 12:02:57.925181 2017] [so:warn] [pid 20373] AH01574: module actions_module is already loaded, skipping [Fri May 05 12:02:57.925188 2017] [so:warn] [pid 20373] AH01574: module alias_module is already loaded, skipping [Fri May 05 12:02:57.925265 2017] [so:warn] [pid 20373] AH01574: module auth_basic_module is already loaded, skipping [Fri May 05 12:02:57.925272 2017] [so:warn] [pid 20373] AH01574: module auth_digest_module is already loaded, skipping [Fri May 05 12:02:57.925293 2017] [so:warn] [pid 20373] AH01574: module authn_anon_module is already loaded, skipping [Fri May 05 12:02:57.925298 2017] [so:warn] [pid 20373] AH01574: module authn_core_module is already loaded, skipping [Fri May 05 12:02:57.925372 2017] [so:warn] [pid 20373] AH01574: module authn_dbm_module is already loaded, skipping [Fri May 05 12:02:57.925379 2017] [so:warn] [pid 20373] AH01574: module authn_file_module is already loaded, skipping [Fri May 05 12:02:57.925457 2017] [so:warn] [pid 20373] AH01574: module authz_core_module is already loaded, skipping [Fri May 05 12:02:57.925533 2017] [so:warn] [pid 20373] AH01574: module authz_dbm_module is already loaded, skipping [Fri May 05 12:02:57.925540 2017] [so:warn] [pid 20373] AH01574: module authz_groupfile_module is already loaded, skipping [Fri May 05 12:02:57.925549 2017] [so:warn] [pid 20373] AH01574: module authz_host_module is already loaded, skipping [Fri May 05 12:02:57.925556 2017] [so:warn] [pid 20373] AH01574: module authz_owner_module is already loaded, skipping [Fri May 05 12:02:57.925563 2017] [so:warn] [pid 20373] AH01574: module authz_user_module is already loaded, skipping [Fri May 05 12:02:57.925569 2017] [so:warn] [pid 20373] AH01574: module autoindex_module is already loaded, skipping [Fri May 05 12:02:57.925575 2017] [so:warn] [pid 20373] AH01574: module cache_module is already loaded, skipping [Fri May 05 12:02:57.925828 2017] [so:warn] [pid 20373] AH01574: module deflate_module is already loaded, skipping [Fri May 05 12:02:57.925840 2017] [so:warn] [pid 20373] AH01574: module dir_module is already loaded, skipping [Fri May 05 12:02:57.925974 2017] [so:warn] [pid 20373] AH01574: module env_module is already loaded, skipping [Fri May 05 12:02:57.925980 2017] [so:warn] [pid 20373] AH01574: module expires_module is already loaded, skipping [Fri May 05 12:02:57.925987 2017] [so:warn] [pid 20373] AH01574: module ext_filter_module is already loaded, skipping [Fri May 05 12:02:57.925992 2017] [so:warn] [pid 20373] AH01574: module filter_module is already loaded, skipping [Fri May 05 12:02:57.926078 2017] [so:warn] [pid 20373] AH01574: module include_module is already loaded, skipping [Fri May 05 12:02:57.926193 2017] [so:warn] [pid 20373] AH01574: module log_config_module is already loaded, skipping [Fri May 05 12:02:57.926201 2017] [so:warn] [pid 20373] AH01574: module logio_module is already loaded, skipping [Fri May 05 12:02:57.926206 2017] [so:warn] [pid 20373] AH01574: module mime_magic_module is already loaded, skipping [Fri May 05 12:02:57.926212 2017] [so:warn] [pid 20373] AH01574: module mime_module is already loaded, skipping [Fri May 05 12:02:57.926217 2017] [so:warn] [pid 20373] AH01574: module negotiation_module is already loaded, skipping [Fri May 05 12:02:57.926377 2017] [so:warn] [pid 20373] AH01574: module rewrite_module is already loaded, skipping [Fri May 05 12:02:57.926384 2017] [so:warn] [pid 20373] AH01574: module setenvif_module is already loaded, skipping [Fri May 05 12:02:57.926768 2017] [so:warn] [pid 20373] AH01574: module status_module is already loaded, skipping [Fri May 05 12:02:57.926777 2017] [so:warn] [pid 20373] AH01574: module substitute_module is already loaded, skipping [Fri May 05 12:02:57.926783 2017] [so:warn] [pid 20373] AH01574: module suexec_module is already loaded, skipping [Fri May 05 12:02:57.926860 2017] [so:warn] [pid 20373] AH01574: module unixd_module is already loaded, skipping [Fri May 05 12:02:57.926951 2017] [so:warn] [pid 20373] AH01574: module version_module is already loaded, skipping [Fri May 05 12:02:57.926961 2017] [so:warn] [pid 20373] AH01574: module vhost_alias_module is already loaded, skipping [Fri May 05 12:02:57.926984 2017] [so:warn] [pid 20373] AH01574: module dav_module is already loaded, skipping [Fri May 05 12:02:57.926990 2017] [so:warn] [pid 20373] AH01574: module dav_fs_module is already loaded, skipping [Fri May 05 12:02:57.927850 2017] [so:warn] [pid 20373] AH01574: module mpm_prefork_module is already loaded, skipping [Fri May 05 12:02:57.936800 2017] [so:warn] [pid 20373] AH01574: module systemd_module is already loaded, skipping [Fri May 05 12:02:57.936903 2017] [so:warn] [pid 20373] AH01574: module cgi_module is already loaded, skipping [Fri May 05 12:02:57.943869 2017] [alias:warn] [pid 20373] AH00671: The Alias directive in /etc/httpd/conf.d/autoindex.conf at line 21 will probably never match because it overlaps an earlier Alias. (98)Address already in use: AH00073: make_sock: unable to listen for connections on address [::]:443 (98)Address already in use: AH00073: make_sock: unable to listen for connections on address 0.0.0.0:443 no listening sockets available, shutting down AH00015: Unable to open logs
i think this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1441977 and fixed by the stable/newton backport at https://review.openstack.org/#/c/460560/2 - it landed 2 days ago. @mcornea can you check if you had this in the environment? grep -rn "include ::apache::mod::ssl" /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/keystone.pp for example on any of the overcloud nodes should tell us.
(In reply to marios from comment #1) > i think this is related to > https://bugzilla.redhat.com/show_bug.cgi?id=1441977 and fixed by the > stable/newton backport at https://review.openstack.org/#/c/460560/2 - it > landed 2 days ago. @mcornea can you check if you had this in the > environment? grep -rn "include ::apache::mod::ssl" > /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/keystone. > pp for example on any of the overcloud nodes should tell us. [root@controller-0 heat-admin]# grep -rn "include ::apache::mod::ssl" /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/keystone.pp 90: include ::apache::mod::ssl [root@controller-0 heat-admin]# rpm -qa | grep puppet-tripleo puppet-tripleo-5.5.0-12.el7ost.noarch
Hey Marius thanks for giving me access to the box - the env does already have https://review.openstack.org/#/c/460560/ [0]. Still think it is related though and we may be missing the other part of the fix for BZ 1441977 https://review.openstack.org/#/c/460555 on OSP9. Its a bit confusing because the tripleo-heat-templates on the undercloud (openstack-tripleo-heat-templates-5.2.0-15.el7ost.noarch) *do* have the relevant "touch ssl.conf" line but they are the latest for OSP10. This 'fix'/workaround needs to happen during the minor update, so the fix is needed in OSP9 first and afaics we do not have this workaround in latest OSP9 tripleo-heat-templates. I believe it goes something like, during the update we touch ssl.conf to prevent the update of mod_ssl from creating /etc/httpd/conf.d/ssl.conf since that contains a "Listen 443" line causing the conflict. If we touch it before the update it won't get created/updated by mod_ssl. I think Lucas/Sofer can validate my understanding as they worked on the related bug. I'll revisit early next week. For testing, you'd need to run the minor update with https://review.openstack.org/#/c/460555/2 included before doing the major upgrade with https://review.openstack.org/#/c/458033/ (which is already in OSP10 and this env afaics). thanks. [0] [root@controller-0 httpd]# grep -rn "include ::apache::mod::ssl" /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/ /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/keystone.pp:90: include ::apache::mod::ssl /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/aodh/api.pp:40: include ::apache::mod::ssl /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/ceilometer/api.pp:33: include ::apache::mod::ssl /usr/share/openstack-puppet/modules/tripleo/manifests/profile/base/gnocchi/api.pp:53: include ::apache::mod::ssl
FYI/more info as I'm looking at this some more today - the actual problem here is that the overcloud nodes have an ssl.conf with an uncommented 'Listen 443' in it causing the conflict as in BZ 1441977 - from the environment when I checked on friday like: [root@controller-0 httpd]# grep Listen /etc/httpd/conf.d/ssl.conf Listen 443 https Its not clear to me why you didn't hit this issue on minor update and how you ended up with the file created and the previous stack update operation completed. But this ^^^ (ssl.conf with Listen) can be prevented by https://review.openstack.org/#/c/460555/ - for now you can even try sudo mv /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.BACKUP; and re-run the upgrade. Since we have https://review.openstack.org/#/c/460560/ in the environment already it should re-create it without that listen line it it.
(In reply to marios from comment #5) > FYI/more info as I'm looking at this some more today - the actual problem > here is that the overcloud nodes have an ssl.conf with an uncommented > 'Listen 443' in it causing the conflict as in BZ 1441977 - from the > environment when I checked on friday like: > > [root@controller-0 httpd]# grep Listen /etc/httpd/conf.d/ssl.conf > Listen 443 https > > Its not clear to me why you didn't hit this issue on minor update and how > you ended up with the file created and the previous stack update operation > completed. But this ^^^ (ssl.conf with Listen) can be prevented by > https://review.openstack.org/#/c/460555/ - for now you can even try sudo mv > /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.BACKUP; and re-run the > upgrade. Since we have https://review.openstack.org/#/c/460560/ in the > environment already it should re-create it without that listen line it it. I didn't run minor update - I started with latest OSP9 and then upgraded to OSP10.
@mcornea I just found the OSP9 clone for the ssl.conf issue at BZ 1446289 and added a comment there. OK I think I understand a bit better now. On OSP9 you didn't have mod_ssl - or at least shouldn't have as per the discussion about this on the BZ 1446289. However we landed this https://review.openstack.org/#/c/461060/ which will install mod_ssl on the mitaka to newton upgrade (as here). I saw that in the env in fact mod_ssl was installed on friday: [root@controller-0 httpd]# grep ssl /var/log/yum.log May 05 14:50:08 Updated: erlang-ssl-18.3.4.4-1.el7ost.x86_64 May 05 14:54:14 Installed: 1:mod_ssl-2.4.6-45.el7_3.4.x86_64 I think the fix then could be adding a 'touch ssl.conf' before the installation of mod_ssl. So, to confirm this mcornea, before running the upgrade, can you: 1. confirm you don't have mod_ssl installed on OSP9 2. manually 'create' the ssl.conf file with "touch /etc/httpd/conf.d/ssl.conf" on all the overcloud nodes (well controllers really or wherever httpd is running but shouldn't hurt everywhere) Once we confirm we can add that into the upgrade script for stable/newton.
(In reply to Marius Cornea from comment #6) > (In reply to marios from comment #5) ... > > Its not clear to me why you didn't hit this issue on minor update and how > > you ended up with the file created and the previous stack update operation > > completed. But this ^^^ (ssl.conf with Listen) can be prevented by > > https://review.openstack.org/#/c/460555/ - for now you can even try sudo mv > > /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.BACKUP; and re-run the > > upgrade. Since we have https://review.openstack.org/#/c/460560/ in the > > environment already it should re-create it without that listen line it it. > > I didn't run minor update - I started with latest OSP9 and then upgraded to > OSP10. ACK yeah I understand a bit better... I think it is a case we missed with all the mod_ssl workarounds for the different branches. Here we start with OSP9 without mod_ssl, and during the upgrade we actually install it (see comment #7) and it creates the ssl.conf with the problematic Listen 443. I am hoping we can prevent that by doing a touch on the file before running the upgrade, even if we don't have mod_ssl installed at that point
@mcornea I posted this for stable/newton: https://review.openstack.org/#/c/463529/ - can you try it (unless you've already started a manual verification) ... it just adds the touch before the yum install for mod_ssl. sudo cp -r /usr/share/openstack-tripleo-heat-templates /usr/share/openstack-tripleo-heat-templates.ORIG curl https://review.openstack.org/changes/463529/revisions/current/patch?download | \ base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1 should do it unless there are merge conflicts
(In reply to marios from comment #9) > @mcornea I posted this for stable/newton: > https://review.openstack.org/#/c/463529/ - can you try it (unless you've > already started a manual verification) ... it just adds the touch before the > yum install for mod_ssl. > > > sudo cp -r /usr/share/openstack-tripleo-heat-templates > /usr/share/openstack-tripleo-heat-templates.ORIG > curl > https://review.openstack.org/changes/463529/revisions/current/patch?download > | \ > base64 -d | sudo patch -d > /usr/share/openstack-tripleo-heat-templates/ -p1 > > > should do it unless there are merge conflicts Manually creating empty ssl.conf before starting upgrade worked and major-upgrade-pacemaker.yaml completed fine.
*** Bug 1455640 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1585