DescriptionValli Annamalai
2018-07-17 18:38:51 UTC
Description of problem:
OSP10 was deployed with 3 controllers and 2 computes.
Undercloud was upgraded from OSP10 to 13
Fast Forward prepare was run including all the templates.
But I missed the ffwd-upgrade run command and executed the controller upgrade.
So during controller upgrade_steps, the task Install docker package failed:
u'TASK [Install docker packages on upgrade if missing] ***************************',
u'Tuesday 17 July 2018 11:47:43 -0400 (0:00:00.101) 0:20:22.448 ********** ',
u'fatal: [192.168.24.7]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}',
u'fatal: [192.168.24.15]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}',
u'fatal: [192.168.24.12]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}',
u'',
u'PLAY RECAP *********************************************************************',
u'192.168.24.12 : ok=354 changed=226 unreachable=0 failed=1 ',
u'192.168.24.15 : ok=354 changed=226 unreachable=0 failed=1 ',
u'192.168.24.7 : ok=354 changed=226 unreachable=0 failed=1 ',
So when I ran the ffwd-upgrade run command, it failed with error:
An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-3f978f6a-a1df-4d5d-a636-26e7d1b26bad)
And in keystone log:
[root@lorenzo stack]# tail /var/log/keystone/keystone.log
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1152, in _request_authentication
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi auth_packet = self._read_packet()
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1014, in _read_packet
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi packet.check_error()
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi err.raise_mysql_exception(self._data)
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi raise errorclass(errno, errval)
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi DBNonExistentDatabase: (pymysql.err.InternalError) (1049, u"Unknown database 'keystone'") (Background on this error at: http://sqlalche.me/e/2j85)
2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi
Since the upgrade_steps playbook failed in the middle after disabling all the services, the openstack CLI commands failed.
So there should be a way to recover from this other than the hard way of starting OSP 10 from scratch. The playbook can be made to revert all changes made when it fails in the middle. Or there could be a validation step in the beginning of controller upgrade to check if the ffwd-upgrade run command completed successfully.
Version-Release number of selected component (if applicable):
How reproducible:
Can be reproduced when the run command is missed and the controllers upgrade is started
Steps to Reproduce:
1. Deploy OSP10
2. Upgrade undercloud from 10 to 13
3. openstack overcloud ffwd-upgrade prepare
4. openstack overcloud upgrade run --roles Controller
5. Step 4 will fail with the task: Install docker packages
6. openstack overcloud ffwd-upgrade run --yes
7. Step 6 will throw error with keystone
Actual results:
When upgrade steps in controller fail, its impossible to recover the cloud.
Expected results:
When upgrade steps fail, it should revert the changes so the cloud is not disturbed. Or a validation step should be added to make sure all previous command were completed successfully.
Additional info:
This RFE is not marked as an MVP for 17.0, so it is being moved for consideration to OSP 17.1. As stated in the OSP Program Call, QE and Docs only have the capacity to verify and document MVP features for OSP 17.0.
I think we pretty much adressed this in OSP13->OSP16 where if issue happens the usual procedure is to run proper step unless there is needed change to THT. In that case one edits templates, reruns prepare and continues with same step they were at.
Description of problem: OSP10 was deployed with 3 controllers and 2 computes. Undercloud was upgraded from OSP10 to 13 Fast Forward prepare was run including all the templates. But I missed the ffwd-upgrade run command and executed the controller upgrade. So during controller upgrade_steps, the task Install docker package failed: u'TASK [Install docker packages on upgrade if missing] ***************************', u'Tuesday 17 July 2018 11:47:43 -0400 (0:00:00.101) 0:20:22.448 ********** ', u'fatal: [192.168.24.7]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}', u'fatal: [192.168.24.15]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}', u'fatal: [192.168.24.12]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\\n Run \\"yum repolist all\\" to see the repos you have.\\n To enable Red Hat Subscription Management repositories:\\n subscription-manager repos --enable <repo>\\n To enable custom repositories:\\n yum-config-manager --enable <repo>\\n", "rc": 1, "results": []}', u'', u'PLAY RECAP *********************************************************************', u'192.168.24.12 : ok=354 changed=226 unreachable=0 failed=1 ', u'192.168.24.15 : ok=354 changed=226 unreachable=0 failed=1 ', u'192.168.24.7 : ok=354 changed=226 unreachable=0 failed=1 ', So when I ran the ffwd-upgrade run command, it failed with error: An unexpected error prevented the server from fulfilling your request. (HTTP 500) (Request-ID: req-3f978f6a-a1df-4d5d-a636-26e7d1b26bad) And in keystone log: [root@lorenzo stack]# tail /var/log/keystone/keystone.log 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1152, in _request_authentication 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi auth_packet = self._read_packet() 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1014, in _read_packet 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi packet.check_error() 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi err.raise_mysql_exception(self._data) 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi raise errorclass(errno, errval) 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi DBNonExistentDatabase: (pymysql.err.InternalError) (1049, u"Unknown database 'keystone'") (Background on this error at: http://sqlalche.me/e/2j85) 2018-07-17 13:17:26.316 48958 ERROR keystone.common.wsgi Since the upgrade_steps playbook failed in the middle after disabling all the services, the openstack CLI commands failed. So there should be a way to recover from this other than the hard way of starting OSP 10 from scratch. The playbook can be made to revert all changes made when it fails in the middle. Or there could be a validation step in the beginning of controller upgrade to check if the ffwd-upgrade run command completed successfully. Version-Release number of selected component (if applicable): How reproducible: Can be reproduced when the run command is missed and the controllers upgrade is started Steps to Reproduce: 1. Deploy OSP10 2. Upgrade undercloud from 10 to 13 3. openstack overcloud ffwd-upgrade prepare 4. openstack overcloud upgrade run --roles Controller 5. Step 4 will fail with the task: Install docker packages 6. openstack overcloud ffwd-upgrade run --yes 7. Step 6 will throw error with keystone Actual results: When upgrade steps in controller fail, its impossible to recover the cloud. Expected results: When upgrade steps fail, it should revert the changes so the cloud is not disturbed. Or a validation step should be added to make sure all previous command were completed successfully. Additional info: