Bug 1426945

Summary: OSP10 -> OSP11 upgrade: the upgrade process is stuck for ~13 minutes waiting for nova services to get started after packages were updated
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: rhosp-directorAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 11.0 (Ocata)CC: dbecker, emacchi, jcoufal, mburns, mcornea, morazi, rhel-osp-director-maint, sathlang
Target Milestone: rcKeywords: TestOnly, Triaged
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-17 20:02:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2017-02-26 17:10:53 UTC
Description of problem:
OSP10 -> OSP11 upgrade: the upgrade process is stuck for ~13 minutes waiting for nova services to get started after packages were updated:

We can see in yum.log - see the diference between the time nova-api got updated 16:47:00 and the time when the following package got updated - 16:59:53

eb 26 16:47:00 Updated: 1:openstack-nova-common-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-novncproxy-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-console-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-cert-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-conductor-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-scheduler-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-compute-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-api-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:59:53 Updated: python-oslo-serialization-2.16.0-0.20170208094041.ac5787d.el7ost.noarch


During this timeframe:

[root@overcloud-controller-0 heat-admin]# systemctl list-units | grep nova
  openstack-nova-api.service                                                  loaded activating   start        start OpenStack Nova API Server
  openstack-nova-consoleauth.service                                          loaded activating   start        start OpenStack Nova VNC console auth Server
  openstack-nova-novncproxy.service                                           loaded active       running            OpenStack Nova NoVNC Proxy Server
  openstack-nova-scheduler.service                                            loaded activating   start        start OpenStack Nova Scheduler Server


[root@overcloud-controller-0 heat-admin]# tail -5 /var/log/nova/nova-api.log 
2017-02-26 16:56:12.351 383452 ERROR nova     value = self._do_get(name, group, namespace)
2017-02-26 16:56:12.351 383452 ERROR nova   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2715, in _do_get
2017-02-26 16:56:12.351 383452 ERROR nova     % (opt.name, str(ve)))
2017-02-26 16:56:12.351 383452 ERROR nova ConfigFileValueError: Value for option osapi_compute_workers is not valid: Should be greater than or equal to 1
2017-02-26 16:56:12.351 383452 ERROR nova 
[root@overcloud-controller-0 heat-admin]# tail -5 /var/log/nova/nova-consoleauth.log 
2017-02-26 16:56:17.420 383492 ERROR nova     value = self._do_get(name, group, namespace)
2017-02-26 16:56:17.420 383492 ERROR nova   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2715, in _do_get
2017-02-26 16:56:17.420 383492 ERROR nova     % (opt.name, str(ve)))
2017-02-26 16:56:17.420 383492 ERROR nova ConfigFileValueError: Value for option metadata_workers is not valid: Should be greater than or equal to 1
2017-02-26 16:56:17.420 383492 ERROR nova 
[root@overcloud-controller-0 heat-admin]# tail -5 /var/log/nova/nova-scheduler.log 
2017-02-26 16:55:20.597 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 6 attempts left.
2017-02-26 16:55:33.608 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 5 attempts left.
2017-02-26 16:55:46.619 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 4 attempts left.
2017-02-26 16:55:59.637 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 3 attempts left.
2017-02-26 16:56:12.648 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 2 attempts left.

We can notice the nova processes CPU usage during this timeframe but the overall upgrade process doesn't show any progress at all:  

[root@overcloud-controller-0 heat-admin]# pidstat 1  
Linux 3.10.0-514.6.1.el7.x86_64 (overcloud-controller-0.localdomain) 	02/26/2017 	_x86_64_	(4 CPU)

04:58:48 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:49 PM     0         1    0.00    0.99    0.00    0.99     3  systemd
04:58:49 PM     0         9    0.00    0.99    0.00    0.99     0  rcu_sched
04:58:49 PM   996    139233    0.99    0.00    0.00    0.99     2  neutron-server
04:58:49 PM     0    349039    6.93    0.00    0.00    6.93     0  ansible-playboo
04:58:49 PM   162    384699   93.07    6.93    0.00  100.00     2  nova-consoleaut
04:58:49 PM     0    384706    0.00    0.99    0.00    0.99     0  pidstat
04:58:49 PM   162    384709   59.41    7.92    0.00   67.33     3  nova-scheduler
04:58:49 PM   162    384710   46.53    7.92    0.00   54.46     1  nova-api

04:58:49 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:50 PM     0       544    0.00    1.00    0.00    1.00     0  systemd-journal
04:58:50 PM     0      1045    0.00    1.00    0.00    1.00     0  ovs-vswitchd
04:58:50 PM     0    349039    9.00    1.00    0.00   10.00     1  ansible-playboo
04:58:50 PM   162    384699   99.00    0.00    0.00   99.00     2  nova-consoleaut
04:58:50 PM   162    384709   94.00    6.00    0.00  100.00     3  nova-scheduler
04:58:50 PM   162    384710   92.00    6.00    0.00   98.00     1  nova-api

04:58:50 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:51 PM     0       990    1.00    0.00    0.00    1.00     2  ovsdb-server
04:58:51 PM     0    349039    9.00    1.00    0.00   10.00     0  ansible-playboo
04:58:51 PM     0    384706    0.00    1.00    0.00    1.00     0  pidstat
04:58:51 PM   162    384709   99.00    0.00    0.00   99.00     3  nova-scheduler
04:58:51 PM   162    384710  100.00    0.00    0.00  100.00     2  nova-api
04:58:51 PM   162    384731   47.00    8.00    0.00   55.00     1  nova-consoleaut

04:58:51 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:52 PM     0    349039    9.00    0.00    0.00    9.00     3  ansible-playboo
04:58:52 PM   162    384709    1.00    0.00    0.00    1.00     3  nova-scheduler
04:58:52 PM   162    384710   94.00    7.00    0.00  100.00     2  nova-api
04:58:52 PM   162    384731   92.00    7.00    0.00   99.00     0  nova-consoleaut

04:58:52 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:53 PM     0      1045    1.00    0.00    0.00    1.00     0  ovs-vswitchd
04:58:53 PM     0    349039    8.00    1.00    0.00    9.00     2  ansible-playboo
04:58:53 PM   162    380921    1.00    0.00    0.00    1.00     1  nova-novncproxy
04:58:53 PM     0    384706    1.00    0.00    0.00    1.00     1  pidstat
04:58:53 PM   162    384731  100.00    0.00    0.00  100.00     0  nova-consoleaut
04:58:53 PM   162    384741   49.00    6.00    0.00   55.00     3  nova-api

04:58:53 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:54 PM     0    349039    9.00    1.00    0.00   10.00     3  ansible-playboo
04:58:54 PM     0    384706    0.00    1.00    0.00    1.00     1  pidstat
04:58:54 PM   162    384741   92.00    7.00    0.00   99.00     0  nova-api
04:58:54 PM   162    384749   70.00   10.00    0.00   80.00     2  nova-consoleaut

04:58:54 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:55 PM     0    349039    9.00    0.00    0.00    9.00     0  ansible-playboo
04:58:55 PM     0    384706    0.00    1.00    0.00    1.00     2  pidstat
04:58:55 PM   162    384741   98.00    3.00    0.00  100.00     3  nova-api
04:58:55 PM   162    384749   95.00    5.00    0.00  100.00     1  nova-consoleaut

04:58:55 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:56 PM     0         1    0.00    1.00    0.00    1.00     2  systemd
04:58:56 PM     0         9    0.00    1.00    0.00    1.00     0  rcu_sched
04:58:56 PM    81       690    1.00    0.00    0.00    1.00     1  dbus-daemon
04:58:56 PM   996    139236    1.00    0.00    0.00    1.00     2  neutron-server
04:58:56 PM     0    349039    8.00    1.00    0.00    9.00     0  ansible-playboo
04:58:56 PM   162    384741   98.00    1.00    0.00   99.00     3  nova-api
04:58:56 PM   162    384761    5.00    0.00    0.00    5.00     1  nova-consoleaut

04:58:56 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:57 PM     0    349039    9.00    0.00    0.00    9.00     3  ansible-playboo
04:58:57 PM     0    383753    0.00    1.00    0.00    1.00     2  systemctl
04:58:57 PM   162    384761   90.00   10.00    0.00  100.00     0  nova-consoleaut
04:58:57 PM   162    384769   24.00    5.00    0.00   29.00     1  nova-api

04:58:57 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:58 PM     0    349039    9.00    1.00    0.00   10.00     0  ansible-playboo
04:58:58 PM     0    384706    1.00    1.00    0.00    2.00     1  pidstat
04:58:58 PM   162    384761   96.00    4.00    0.00  100.00     2  nova-consoleaut
04:58:58 PM   162    384769   92.00    8.00    0.00  100.00     3  nova-api

04:58:58 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:59 PM     0       544    0.00    1.00    0.00    1.00     1  systemd-journal
04:58:59 PM     0       687    1.00    0.00    0.00    1.00     2  systemd-logind
04:58:59 PM     0    349039    9.00    1.00    0.00   10.00     0  ansible-playboo
04:58:59 PM   162    384769   98.00    2.00    0.00  100.00     3  nova-api
04:58:59 PM   162    384782    5.00    1.00    0.00    6.00     2  nova-consoleaut

04:58:59 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:59:00 PM     0    349039    9.00    1.00    0.00   10.00     2  ansible-playboo
04:59:00 PM     0    384706    0.00    1.00    0.00    1.00     1  pidstat
04:59:00 PM   162    384769   96.00    5.00    0.00  100.00     3  nova-api
04:59:00 PM   162    384782   88.00   10.00    0.00   98.00     0  nova-consoleaut

04:59:00 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:59:01 PM     0         9    0.00    1.00    0.00    1.00     3  rcu_sched
04:59:01 PM     0    349039    9.00    0.00    0.00    9.00     2  ansible-playboo
04:59:01 PM   162    384782   96.00    4.00    0.00  100.00     1  nova-consoleaut
04:59:01 PM   162    384792   25.00    5.00    0.00   30.00     0  nova-api

04:59:01 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:59:02 PM     0         1    1.00    0.00    0.00    1.00     2  systemd
04:59:02 PM     0       655    0.00    1.00    0.00    1.00     3  auditd
04:59:02 PM     0    349039    8.00    2.00    0.00   10.00     2  ansible-playboo
04:59:02 PM     0    384706    0.00    1.00    0.00    1.00     1  pidstat
04:59:02 PM   162    384792   91.00    8.00    0.00   99.00     0  nova-api
04:59:02 PM   162    384800    5.00    1.00    0.00    6.00     3  nova-consoleaut

04:59:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:59:03 PM     0    349039    9.00    0.00    0.00    9.00     0  ansible-playboo
04:59:03 PM   162    384792   99.00    1.00    0.00  100.00     3  nova-api
04:59:03 PM   162    384800   87.00   12.00    0.00   99.00     2  nova-consoleaut


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.0.0-0.20170222195630.46117f4.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Upgrade OSP10->OSP11
2. watch openstack stack list --nested | grep PROGRESS

Actual results:
At some point the nested stacks don't change the status. Checking the controller for upgrade related activity we can notice in the yum.log that openstack-nova-api was the last package updated and it doesn't go further.

Expected results:
I'm not sure exactly what the yum update is waiting for but it looks the upgrade process is stuck during this large timeframe. 

Additional info:

Comment 1 Sofer Athlan-Guyot 2017-03-21 16:50:52 UTC
The reordering of the step should fix the issue here.  Waiting for Marius for  confirmation.

Comment 2 Marius Cornea 2017-03-22 12:43:21 UTC
(In reply to Sofer Athlan-Guyot from comment #1)
> The reordering of the step should fix the issue here.  Waiting for Marius
> for  confirmation.

This is fixed in latest build:

Mar 22 10:21:42 Updated: 1:openstack-nova-common-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-scheduler-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Installed: 1:openstack-nova-placement-api-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-conductor-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-compute-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-cert-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-console-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-api-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-novncproxy-15.0.0-2.el7ost.noarch
Mar 22 10:22:00 Updated: python-oslo-serialization-2.16.0-0.20170208094041.ac5787d.el7ost.noarch

Comment 4 errata-xmlrpc 2017-05-17 20:02:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245