Bug 1465776 - OSP8 -> OSP9 -> OSP10 upgrade: major-upgrade-pacemaker-converge.yaml fails restarting openstack-cinder-scheduler: ServiceTooOld: One of the services is in Liberty version. We do not provide backward compatibility with Liberty now, you need to upgrade
OSP8 -> OSP9 -> OSP10 upgrade: major-upgrade-pacemaker-converge.yaml fails re...
Status: MODIFIED
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
10.0 (Newton)
Unspecified Unspecified
unspecified Severity urgent
: z4
: 10.0 (Newton)
Assigned To: Sofer Athlan-Guyot
Amit Ugol
: Regression, Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-28 03:45 EDT by Marius Cornea
Modified: 2017-08-14 09:19 EDT (History)
7 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-5.3.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
cinder-scheduler.log (3.69 MB, text/plain)
2017-06-28 03:45 EDT, Marius Cornea
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1701259 None None None 2017-06-29 09:03 EDT
OpenStack gerrit 478922 None None None 2017-06-29 09:04 EDT

  None (edit)
Description Marius Cornea 2017-06-28 03:45:03 EDT
Created attachment 1292577 [details]
cinder-scheduler.log

Description of problem:
OSP8 -> OSP9 -> OSP10 upgrade: major-upgrade-pacemaker-converge.yaml fails restarting openstack-cinder-scheduler:  ServiceTooOld: One of the services is in Liberty version. We do not provide backward compatibility with Liberty now, you need to upgrade to Mitaka first.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-5.2.0-20.el7ost.noarch
openstack-cinder-9.1.4-3.el7ost.noarch
puppet-cinder-9.5.0-1.el7ost.noarch
python-cinder-9.1.4-3.el7ost.noarch
python-cinderclient-1.9.0-6.el7ost.noarch

How reproducible:
1/1

Steps to Reproduce:
1. Deploy OSP8
2. Upgrade to OSP9
3. Upgrade to OSP10

Actual results:
major-upgrade-pacemaker-converge.yaml during OSP9 -> OSP10 upgrade fails with cinder-scheduler service not being able to restart.

Expected results:
major-upgrade-pacemaker-converge.yaml completes fine.

Additional info:
Attaching scheduler.log
Comment 2 Sofer Athlan-Guyot 2017-06-28 07:08:28 EDT
Hi,

so the problem happens during osp8->osp9 upgrade:

 - Jun 27 19:28:46 -> end of controller upgrade to osp9
 - First error in volume:
   - 2017-06-27 19:25:09.142 6574 ERROR cinder.cmd.volume ServiceTooOld: One of the services is in Liberty version. We do not provide backward compatibility with Liberty now, you need to upgrade to Mitaka first.

 - Then error with the scheduler:
   - 2017-06-27 19:28:06.474 4167 CRITICAL cinder [req-3f310722-468e-48e4-99be-6668872890c5 - - - - -] ServiceTooOld: One of the services is in Liberty version. We do not provide backward compatibility with Liberty now, you need to upgrade to Mitaka first.
 - Jun 27 19:55:30 -> start osp9 convergence.
 - Jun 28 08:47:40 -> start upgrade to osp10

But we only catch the problem during restart in the osp10 upgrade.

This issue seems to be an orchestration issue between cinder services.  In the release note of Mitaka[1] we can find this:

  "As cinder-backup was strongly reworked in this release, the recommended upgrade order when executing live (rolling) upgrade is c-api->c-sch->c-vol->c-bak."

we don't use cinder-backup but in the log[3] we can see that c-vol is restarted before c-sch and that must be the root cause.

There is also this launchpad bug[2] that seems to confirm this.

Looking at the code for confirmation.


[1]: https://docs.openstack.org/releasenotes/cinder/mitaka.html#id6
[2]: https://bugs.launchpad.net/devstack/+bug/1612781
[3]: c-vol fails to start at 19:25 and c-sch fails to start at 19:28
Comment 4 Sofer Athlan-Guyot 2017-06-28 11:58:43 EDT
Hi,

so my previous timeline is wrong, just discard it.

So the cause of the error is the second line in the database:

INSERT INTO `services` VALUES 

('2017-06-27 15:44:41','2017-06-27 19:13:24',NULL,0,2,'hostgroup','cinder-scheduler','cinder-scheduler',3040,0,'nova',NULL,NULL,'2.0','1.3','not-capable',0,NULL,NULL),
('2017-06-27 15:44:41','2017-06-27 16:59:01',NULL,0,5,'hostgroup','cinder-scheduler','cinder-scheduler',440,0,'nova',NULL,NULL,NULL,NULL,'not-capable',0,NULL,NULL),
('2017-06-27 15:45:23','2017-06-28 15:11:30',NULL,0,8,'hostgroup@tripleo_ceph','cinder-volume','cinder-volume',1199,0,'nova',NULL,NULL,'3.0','1.11','disabled',0,NULL,NULL);

this entry doesn't have a NULL version, while the others two have 1.3 and 1.11.  This trigger an error when osp10 cinder-volume node (version 8.1.1) and cinder-scheduler (8.1.1) fails with

 "ServiceTooOld: One of the services is in Liberty version. We do not provide backward compatibility with Liberty now, you need to upgrade to Mitaka first."

This second database entry is created before osp8/9 upgrade and is not updated since the first stop that occurred for the osp8/9 upgrade.

On osp10 the services keeps restarting continuously.

We would need help from dfg:storage to get to the bottom of it.
Comment 5 Gorka Eguileor 2017-06-28 12:39:10 EDT
It looks like for some reason the DB had 2 scheduler entries for host hostgroup on OSP8, this meant that when it was upgraded to OSP9 only 1 of then was updated (the one with id=2) but then when OSP10 checks it sees that there is one entry with NULL in the versions (Liberty) and complains.

We cannot have duplicate entries or obsolete entries (from services that will not exist after the upgrade).
Comment 6 Sofer Athlan-Guyot 2017-06-28 13:04:23 EDT
Hi,

so after a Gorka, we came to the conclusion that, as long a we don't do rolling upgrade, we can just delete the entries to avoid duplicate.

I've tested this one liner:


sudo cinder-manage service list | awk '/^cinder/{print $1  " "  $2}' | while read service host; do sudo cinder-manage service remove $service $host; done

and the cinder-scheduler/volume could start.

We need to add that at the right time during upgrade.

Note You need to log in before you can comment on or make changes to this bug.