Bug 1372040

Summary: [Docs]OSP-Director-10: upgrading undercloud from: osp9 to osp10, the yum update command hangs for about 20min over: 'Yum Cleanup: 1:openstack-nova' .
Product: Red Hat OpenStack Reporter: Omri Hochman <ohochman>
Component: documentationAssignee: Dan Macpherson <dmacpher>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Lopes <mlopes>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: cjanisze, dbecker, dmacpher, lbopf, mandreou, mburns, mcornea, michele, morazi, ohochman, rhel-osp-director-maint, sathlang, srevivo
Target Milestone: ---Keywords: Documentation
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-23 07:59:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1367466    

Description Omri Hochman 2016-08-31 18:38:44 UTC
OSP-Director-10:   upgrading undercloud from:  osp9 to osp10, the yum update command hangs for about 20min over:  'Yum Cleanup: 1:openstack-nova' .


environments: 
--------------
instack-undercloud-5.0.0-0.20160818065636.41ef775.el7ost.noarch
instack-5.0.0-0.20160802165724.5aabf5c.el7ost.noarch
openstack-heat-api-cfn-7.0.0-0.20160823082523.1106458.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-33.el7ost.noarch
openstack-heat-templates-0.0.1-0.20160822094546.1ac2823.el7ost.noarch
python-heat-tests-7.0.0-0.20160823082523.1106458.el7ost.noarch
openstack-heat-engine-7.0.0-0.20160823082523.1106458.el7ost.noarch
puppet-heat-9.1.0-0.20160815142726.d364553.el7ost.noarch
python-heatclient-1.3.0-0.20160802194627.44dfe53.el7ost.noarch
openstack-heat-common-7.0.0-0.20160823082523.1106458.el7ost.noarch
openstack-heat-api-7.0.0-0.20160823082523.1106458.el7ost.noarch
heat-cfntools-1.3.0-2.el7ost.noarch
openstack-tripleo-heat-templates-5.0.0-0.20160823140311.072404b.el7ost.noarch

The cleanup issue seems to be with: 
openstack-nova-13.1.1-4.el7ost.noarch    

Description : 
--------------
when following the steps to upgrade undercloud from osp9 to osp10 ,  when running 'yum update command' just after fixing the repos,  the yum process hangs for about ~15 minutes over the step : 
'yum cleanup for openstack-nova'

Info about upgrade process: https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade#controller-and-block-storage-upgrade


How to reproduce :
------------------
(1) Deploy osp9
(2) Update the repos on the undercloud to point to osp10 
(3) run 'yum update' 

logs: 
-----
19:41:32   Cleanup    : 1:openstack-nova-13.1.1-4.el7ost.noarch                  360/504 
20:02:08   Cleanup    : 1:openstack-nova-compute-13.1.1-4.el7ost.noarch          361/504 
20:02:11   Cleanup    : 1:openstack-nova-api-13.1.1-4.el7ost.noarch              362/504       <- rabbitmq-server was restarted manually here

in nova-compute.log
---------------------
AMQP server on 192.0.2.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in

Comment 2 Sofer Athlan-Guyot 2016-09-06 16:36:13 UTC
Hi,

is this workaround not able to fix this: 

before running the undercloud upgrade:

sudo systemctl stop 'openstack-*'
sudo systemctl stop 'neutron-*'

if such, then the upstream bug is https://bugs.launchpad.net/tripleo/+bug/1593182

Comment 4 Omri Hochman 2016-09-08 19:50:50 UTC
(In reply to Sofer Athlan-Guyot from comment #2)
> Hi,
> 
> is this workaround not able to fix this: 
> 
> before running the undercloud upgrade:
> 
> sudo systemctl stop 'openstack-*'
> sudo systemctl stop 'neutron-*'
> 
> if such, then the upstream bug is
> https://bugs.launchpad.net/tripleo/+bug/1593182

Workaround seems valid:
19:40:04   Cleanup    : 1:openstack-nova-13.1.1-4.el7ost.noarch                 702/1118 
19:40:05   Cleanup    : 1:openstack-nova-compute-13.1.1-4.el7ost.noarch         703/1118 
19:40:06   Cleanup    : libvirt-python-1.2.17-2.el7.x86_64                      704/1118

Comment 5 Marios Andreou 2016-10-10 12:03:23 UTC
adding info here from my environment... i was about to file a new BZ so collected the info (and found this bugzilla):


Doing this:
sudo yum localinstall -y http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm
sudo yum -y update
sudo rhos-release -P 10 -r 7.3
sudo yum-config-manager --disable 'rhelosp-9.0*'
openstack undercloud upgrade

at some point during the cleanup for the yum update done before the openstack undercloud install [1] the update hangs on 'Cleanup' for openstack-nova-compute:

     "   Cleanup    : 1:openstack-nova-compute-13.1.1-10.el7ost.noarch        801/1156 " 

systemctl status says "activating": 
    ● openstack-nova-compute.service - OpenStack Nova Compute Server
       Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled)
       Active: activating (start) since Fri 2016-10-07 04:05:04 EDT; 1h 0min ago
     Main PID: 25500 (nova-compute)
       CGroup: /system.slice/openstack-nova-compute.service
               └─25500 /usr/bin/python2 /usr/bin/nova-compute

    Oct 07 04:05:04 instack.localdomain systemd[1]: Starting OpenStack Nova Compute Server...
    Oct 07 04:05:06 instack.localdomain nova-compute[25500]: Option "rpc_backend" from group "DEFAULT" is deprecated for removal.  Its value may be silently ignored in the future.
    Oct 07 04:05:06 instack.localdomain nova-compute[25500]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
    Oct 07 04:05:06 instack.localdomain nova-compute[25500]: Option "notification_topics" from group "DEFAULT" is deprecated. Use option "topics" from group "oslo_messaging_notifications".


    [stack@instack ~]$ systemctl | grep nova
      openstack-nova-api.service                                                               loaded active     running         OpenStack Nova API Server
      openstack-nova-cert.service                                                              loaded active     running         OpenStack Nova Cert Server
      openstack-nova-compute.service                                                           loaded activating start     start OpenStack Nova Compute Server
      openstack-nova-conductor.service                                                         loaded active     running         OpenStack Nova Conductor Server

As soon as I "sudo systemctl stop openstack-nova-compute" the update continues and the undercloud upgrade eventually completes OK.

The openstack-nova-compute package is like:

    [m@m PACKAGES_FOR_BZ_STOP_SERVICES]$ grepr openstack-nova-compute ./*
    ./osp10_upgraded_packages:66:openstack-nova-compute-14.0.0-1.el7ost.noarch
    ./osp9_updated_packages:739:openstack-nova-compute-13.1.1-10.el7ost.noarch
    ./osp9_deployed_packages:246:openstack-nova-compute-13.1.0-6.el7ost.noarch


Workaround is to include a stop before the openstack undercloud upgrade:

    sudo yum localinstall -y http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm
    sudo yum -y update
    sudo rhos-release -P 10 -r 7.3
    sudo yum-config-manager --disable 'rhelosp-9.0*'
    #STOP services as workaround
    sudo systemctl stop 'openstack-*'
    sudo systemctl stop 'neutron-*'
    openstack undercloud upgrade


I am filing the BZ for now to capture this information but we aren't sure yet if it is confined to these specific package versions or if we need a more permanent fix to stop the services before the undercloud upgrade. 

My development env is OSP9 poodle being upgraded to OSP10 puddle; this may be a significant factor since afaik I am the only person hitting this.

[1] https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/undercloud.py#L50

Comment 6 Marios Andreou 2016-10-11 16:06:52 UTC
Update after spending some time trying to progress this issue today. I filed a launchpad bug and also a quick fix at https://review.openstack.org/385012 (linked in related changes above). However during discussion of this issue in the upstream tripleo meeting today http://eavesdrop.openstack.org/meetings/tripleo/2016/tripleo.2016-10-11-14.00.log.txt the consensus was that we deal with this as a documentation fix. 

Upstream tripleo docs already document a stop for the openstack-* and neutron-* services before running the undercloud upgrade, like at http://tripleo.org/installation/installation.html#updating-undercloud-components 

Re-assigning this to docs team for now - for clarity, we need to document for OSP10 undercloud upgrade, that before running the "openstack undercloud upgrade" command the operator should stop services like:


        sudo systemctl stop 'openstack-*'
        sudo systemctl stop 'neutron-*'
        openstack undercloud upgrade

Comment 7 Lucy Bopf 2016-10-13 06:21:01 UTC
Changing the component to 'documentation' for tracking purposes.

Comment 8 Marios Andreou 2016-10-13 10:29:14 UTC
removing this as blocking the upgrades rfe https://bugzilla.redhat.com/show_bug.cgi?id=1337794 since this is now a docs bug

Comment 11 Sofer Athlan-Guyot 2016-12-02 07:55:32 UTC
*** Bug 1391686 has been marked as a duplicate of this bug. ***

Comment 18 Dan Macpherson 2017-02-03 03:01:43 UTC
Hi Omri,

This content is now live:

https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/single/upgrading-red-hat-openstack-platform/#sect-Major-Updating_Director_Packages

Was there anything else to add for this BZ? If not, I'll close this BZ.

Comment 19 Dan Macpherson 2017-02-23 07:59:36 UTC
No response in over 2 weeks. If nothing else to add to this BZ, I'm closing it. If further changes are required for this issue, please feel free to reopen it.