Bug 1426945 - OSP10 -> OSP11 upgrade: the upgrade process is stuck for ~13 minutes waiting for nova services to get started after packages were updated
Summary: OSP10 -> OSP11 upgrade: the upgrade process is stuck for ~13 minutes waiting ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 11.0 (Ocata)
Assignee: Sofer Athlan-Guyot
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-26 17:10 UTC by Marius Cornea
Modified: 2017-05-17 20:02 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-17 20:02:14 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1245 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 23:01:50 UTC
OpenStack gerrit 439807 None None None 2017-03-21 16:50:52 UTC

Description Marius Cornea 2017-02-26 17:10:53 UTC
Description of problem:
OSP10 -> OSP11 upgrade: the upgrade process is stuck for ~13 minutes waiting for nova services to get started after packages were updated:

We can see in yum.log - see the diference between the time nova-api got updated 16:47:00 and the time when the following package got updated - 16:59:53

eb 26 16:47:00 Updated: 1:openstack-nova-common-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-novncproxy-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-console-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-cert-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-conductor-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-scheduler-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-compute-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:47:00 Updated: 1:openstack-nova-api-15.0.1-0.20170222170803.10a32dd.el7ost.noarch
Feb 26 16:59:53 Updated: python-oslo-serialization-2.16.0-0.20170208094041.ac5787d.el7ost.noarch


During this timeframe:

[root@overcloud-controller-0 heat-admin]# systemctl list-units | grep nova
  openstack-nova-api.service                                                  loaded activating   start        start OpenStack Nova API Server
  openstack-nova-consoleauth.service                                          loaded activating   start        start OpenStack Nova VNC console auth Server
  openstack-nova-novncproxy.service                                           loaded active       running            OpenStack Nova NoVNC Proxy Server
  openstack-nova-scheduler.service                                            loaded activating   start        start OpenStack Nova Scheduler Server


[root@overcloud-controller-0 heat-admin]# tail -5 /var/log/nova/nova-api.log 
2017-02-26 16:56:12.351 383452 ERROR nova     value = self._do_get(name, group, namespace)
2017-02-26 16:56:12.351 383452 ERROR nova   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2715, in _do_get
2017-02-26 16:56:12.351 383452 ERROR nova     % (opt.name, str(ve)))
2017-02-26 16:56:12.351 383452 ERROR nova ConfigFileValueError: Value for option osapi_compute_workers is not valid: Should be greater than or equal to 1
2017-02-26 16:56:12.351 383452 ERROR nova 
[root@overcloud-controller-0 heat-admin]# tail -5 /var/log/nova/nova-consoleauth.log 
2017-02-26 16:56:17.420 383492 ERROR nova     value = self._do_get(name, group, namespace)
2017-02-26 16:56:17.420 383492 ERROR nova   File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2715, in _do_get
2017-02-26 16:56:17.420 383492 ERROR nova     % (opt.name, str(ve)))
2017-02-26 16:56:17.420 383492 ERROR nova ConfigFileValueError: Value for option metadata_workers is not valid: Should be greater than or equal to 1
2017-02-26 16:56:17.420 383492 ERROR nova 
[root@overcloud-controller-0 heat-admin]# tail -5 /var/log/nova/nova-scheduler.log 
2017-02-26 16:55:20.597 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 6 attempts left.
2017-02-26 16:55:33.608 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 5 attempts left.
2017-02-26 16:55:46.619 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 4 attempts left.
2017-02-26 16:55:59.637 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 3 attempts left.
2017-02-26 16:56:12.648 382652 WARNING oslo_db.sqlalchemy.engines [req-40dc07a2-5c32-429d-8b9a-57254463e473 - - - - -] SQL connection failed. 2 attempts left.

We can notice the nova processes CPU usage during this timeframe but the overall upgrade process doesn't show any progress at all:  

[root@overcloud-controller-0 heat-admin]# pidstat 1  
Linux 3.10.0-514.6.1.el7.x86_64 (overcloud-controller-0.localdomain) 	02/26/2017 	_x86_64_	(4 CPU)

04:58:48 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:49 PM     0         1    0.00    0.99    0.00    0.99     3  systemd
04:58:49 PM     0         9    0.00    0.99    0.00    0.99     0  rcu_sched
04:58:49 PM   996    139233    0.99    0.00    0.00    0.99     2  neutron-server
04:58:49 PM     0    349039    6.93    0.00    0.00    6.93     0  ansible-playboo
04:58:49 PM   162    384699   93.07    6.93    0.00  100.00     2  nova-consoleaut
04:58:49 PM     0    384706    0.00    0.99    0.00    0.99     0  pidstat
04:58:49 PM   162    384709   59.41    7.92    0.00   67.33     3  nova-scheduler
04:58:49 PM   162    384710   46.53    7.92    0.00   54.46     1  nova-api

04:58:49 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:50 PM     0       544    0.00    1.00    0.00    1.00     0  systemd-journal
04:58:50 PM     0      1045    0.00    1.00    0.00    1.00     0  ovs-vswitchd
04:58:50 PM     0    349039    9.00    1.00    0.00   10.00     1  ansible-playboo
04:58:50 PM   162    384699   99.00    0.00    0.00   99.00     2  nova-consoleaut
04:58:50 PM   162    384709   94.00    6.00    0.00  100.00     3  nova-scheduler
04:58:50 PM   162    384710   92.00    6.00    0.00   98.00     1  nova-api

04:58:50 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:51 PM     0       990    1.00    0.00    0.00    1.00     2  ovsdb-server
04:58:51 PM     0    349039    9.00    1.00    0.00   10.00     0  ansible-playboo
04:58:51 PM     0    384706    0.00    1.00    0.00    1.00     0  pidstat
04:58:51 PM   162    384709   99.00    0.00    0.00   99.00     3  nova-scheduler
04:58:51 PM   162    384710  100.00    0.00    0.00  100.00     2  nova-api
04:58:51 PM   162    384731   47.00    8.00    0.00   55.00     1  nova-consoleaut

04:58:51 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:52 PM     0    349039    9.00    0.00    0.00    9.00     3  ansible-playboo
04:58:52 PM   162    384709    1.00    0.00    0.00    1.00     3  nova-scheduler
04:58:52 PM   162    384710   94.00    7.00    0.00  100.00     2  nova-api
04:58:52 PM   162    384731   92.00    7.00    0.00   99.00     0  nova-consoleaut

04:58:52 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:53 PM     0      1045    1.00    0.00    0.00    1.00     0  ovs-vswitchd
04:58:53 PM     0    349039    8.00    1.00    0.00    9.00     2  ansible-playboo
04:58:53 PM   162    380921    1.00    0.00    0.00    1.00     1  nova-novncproxy
04:58:53 PM     0    384706    1.00    0.00    0.00    1.00     1  pidstat
04:58:53 PM   162    384731  100.00    0.00    0.00  100.00     0  nova-consoleaut
04:58:53 PM   162    384741   49.00    6.00    0.00   55.00     3  nova-api

04:58:53 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:54 PM     0    349039    9.00    1.00    0.00   10.00     3  ansible-playboo
04:58:54 PM     0    384706    0.00    1.00    0.00    1.00     1  pidstat
04:58:54 PM   162    384741   92.00    7.00    0.00   99.00     0  nova-api
04:58:54 PM   162    384749   70.00   10.00    0.00   80.00     2  nova-consoleaut

04:58:54 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:55 PM     0    349039    9.00    0.00    0.00    9.00     0  ansible-playboo
04:58:55 PM     0    384706    0.00    1.00    0.00    1.00     2  pidstat
04:58:55 PM   162    384741   98.00    3.00    0.00  100.00     3  nova-api
04:58:55 PM   162    384749   95.00    5.00    0.00  100.00     1  nova-consoleaut

04:58:55 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:56 PM     0         1    0.00    1.00    0.00    1.00     2  systemd
04:58:56 PM     0         9    0.00    1.00    0.00    1.00     0  rcu_sched
04:58:56 PM    81       690    1.00    0.00    0.00    1.00     1  dbus-daemon
04:58:56 PM   996    139236    1.00    0.00    0.00    1.00     2  neutron-server
04:58:56 PM     0    349039    8.00    1.00    0.00    9.00     0  ansible-playboo
04:58:56 PM   162    384741   98.00    1.00    0.00   99.00     3  nova-api
04:58:56 PM   162    384761    5.00    0.00    0.00    5.00     1  nova-consoleaut

04:58:56 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:57 PM     0    349039    9.00    0.00    0.00    9.00     3  ansible-playboo
04:58:57 PM     0    383753    0.00    1.00    0.00    1.00     2  systemctl
04:58:57 PM   162    384761   90.00   10.00    0.00  100.00     0  nova-consoleaut
04:58:57 PM   162    384769   24.00    5.00    0.00   29.00     1  nova-api

04:58:57 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:58 PM     0    349039    9.00    1.00    0.00   10.00     0  ansible-playboo
04:58:58 PM     0    384706    1.00    1.00    0.00    2.00     1  pidstat
04:58:58 PM   162    384761   96.00    4.00    0.00  100.00     2  nova-consoleaut
04:58:58 PM   162    384769   92.00    8.00    0.00  100.00     3  nova-api

04:58:58 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:58:59 PM     0       544    0.00    1.00    0.00    1.00     1  systemd-journal
04:58:59 PM     0       687    1.00    0.00    0.00    1.00     2  systemd-logind
04:58:59 PM     0    349039    9.00    1.00    0.00   10.00     0  ansible-playboo
04:58:59 PM   162    384769   98.00    2.00    0.00  100.00     3  nova-api
04:58:59 PM   162    384782    5.00    1.00    0.00    6.00     2  nova-consoleaut

04:58:59 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:59:00 PM     0    349039    9.00    1.00    0.00   10.00     2  ansible-playboo
04:59:00 PM     0    384706    0.00    1.00    0.00    1.00     1  pidstat
04:59:00 PM   162    384769   96.00    5.00    0.00  100.00     3  nova-api
04:59:00 PM   162    384782   88.00   10.00    0.00   98.00     0  nova-consoleaut

04:59:00 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:59:01 PM     0         9    0.00    1.00    0.00    1.00     3  rcu_sched
04:59:01 PM     0    349039    9.00    0.00    0.00    9.00     2  ansible-playboo
04:59:01 PM   162    384782   96.00    4.00    0.00  100.00     1  nova-consoleaut
04:59:01 PM   162    384792   25.00    5.00    0.00   30.00     0  nova-api

04:59:01 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:59:02 PM     0         1    1.00    0.00    0.00    1.00     2  systemd
04:59:02 PM     0       655    0.00    1.00    0.00    1.00     3  auditd
04:59:02 PM     0    349039    8.00    2.00    0.00   10.00     2  ansible-playboo
04:59:02 PM     0    384706    0.00    1.00    0.00    1.00     1  pidstat
04:59:02 PM   162    384792   91.00    8.00    0.00   99.00     0  nova-api
04:59:02 PM   162    384800    5.00    1.00    0.00    6.00     3  nova-consoleaut

04:59:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
04:59:03 PM     0    349039    9.00    0.00    0.00    9.00     0  ansible-playboo
04:59:03 PM   162    384792   99.00    1.00    0.00  100.00     3  nova-api
04:59:03 PM   162    384800   87.00   12.00    0.00   99.00     2  nova-consoleaut


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.0.0-0.20170222195630.46117f4.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Upgrade OSP10->OSP11
2. watch openstack stack list --nested | grep PROGRESS

Actual results:
At some point the nested stacks don't change the status. Checking the controller for upgrade related activity we can notice in the yum.log that openstack-nova-api was the last package updated and it doesn't go further.

Expected results:
I'm not sure exactly what the yum update is waiting for but it looks the upgrade process is stuck during this large timeframe. 

Additional info:

Comment 1 Sofer Athlan-Guyot 2017-03-21 16:50:52 UTC
The reordering of the step should fix the issue here.  Waiting for Marius for  confirmation.

Comment 2 Marius Cornea 2017-03-22 12:43:21 UTC
(In reply to Sofer Athlan-Guyot from comment #1)
> The reordering of the step should fix the issue here.  Waiting for Marius
> for  confirmation.

This is fixed in latest build:

Mar 22 10:21:42 Updated: 1:openstack-nova-common-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-scheduler-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Installed: 1:openstack-nova-placement-api-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-conductor-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-compute-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-cert-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-console-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-api-15.0.0-2.el7ost.noarch
Mar 22 10:21:43 Updated: 1:openstack-nova-novncproxy-15.0.0-2.el7ost.noarch
Mar 22 10:22:00 Updated: python-oslo-serialization-2.16.0-0.20170208094041.ac5787d.el7ost.noarch

Comment 4 errata-xmlrpc 2017-05-17 20:02:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245


Note You need to log in before you can comment on or make changes to this bug.