Description of problem: A new command in OSP13 that backs up the undercloud ("openstack undercloud backup"). However, the tar command in the script requires the --xattrs flag so that the filesystem backup retains the swift metadata. With the metadata, the swift filestore is unusable, which makes re-running "openstack overcloud" difficult since the overcloud plan is stored in swift. Version-Release number of selected component (if applicable): 13 How reproducible: Always Steps to Reproduce: 1. Run "openstack undercloud backup" 2. Use the backup on a fresh undercloud 3. Try any swift object operations (e.g. "openstack object show overcloud roles_data.yaml") Actual results: Object gets quarantined and any swift commands result in a 404 error Expected results: Successful swift operations Additional info:
Omri, mind to add this to your jenkins job? THanks!
(In reply to Carlos Camacho from comment #18) > Omri, mind to add this to your jenkins job? > > THanks! Added: https://code.engineering.redhat.com/gerrit/#/c/144219/ running job: https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-df-13-deployment-7.5-baremetal-3cont_1comp_3ceph-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-vxlan-rhos18-undercloud-backup-restore-RHELOSP-31862/14/
(In reply to Carlos Camacho from comment #18) > Omri, mind to add this to your jenkins job? > > THanks! Hi Carlos - It looks that openstack undercloud backup failed with error, [stack@undercloud75 ~]$ source stackrc (undercloud) [stack@undercloud75 ~]$ openstack undercloud backup --add-path /etc/ --add-path /root/ --add-path /var/lib/glance/ --add-path /var/lib/docker/ --add-path /var/lib/registry/ --add-path /srv/node/ Started Mistral Workflow tripleo.undercloud_backup.v1.backup. Execution ID: fdd0e16d-6e77-4fee-ad6d-24d8e3ee39b9 Waiting for messages on queue 'tripleo' with no timeout. Undercloud backup finished with errors Output: {u'execution': {u'created_at': u'2018-07-17 21:27:49', u'id': u'fdd0e16d-6e77-4fee-ad6d-24d8e3ee39b9', u'input': {u'queue_name': u'tripleo', u'sources_path': u'/etc/,/home/stack/,/root/,/srv/node/,/var/lib/docker/,/var/lib/glance/,/var/lib/registry/'}, u'name': u'tripleo.undercloud_backup.v1.backup', u'params': {u'namespace': u''}, u'spec': {u'description': u'This workflow will launch the Undercloud backup', u'input': [{u'sources_path': u'/home/stack/'}, {u'queue_name': u'tripleo'}], u'name': u'backup', u'tags': [u'tripleo-common-managed'], u'tasks': {u'cleanup_backup': {u'action': u'tripleo.undercloud.remove_temp_dir', u'input': {u'path': u'<% $.backup_path.path %>'}, u'name': u'cleanup_backup', u'on-error': u'send_message', u'on-success': u'send_message', u'publish': {u'message': u'<% task().result %>', u'status': u'SUCCESS'}, u'publish-on-error': {u'message': u'<% task().result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}, u'create_backup_dir': {u'action': u'tripleo.undercloud.create_backup_dir', u'name': u'create_backup_dir', u'on-error': u'send_message', u'on-success': u'get_database_credentials', u'publish': {u'backup_path': u'<% task().result %>', u'message': u'<% task().result %>', u'status': u'SUCCESS'}, u'publish-on-error': {u'message': u'<% task().result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}, u'create_database_backup': {u'action': u'tripleo.undercloud.create_database_backup', u'input': {u'dbpassword': u'<% $.undercloud_db_password %>', u'dbuser': u'root', u'path': u'<% $.backup_path.path %>'}, u'name': u'create_database_backup', u'on-error': u'send_message', u'on-success': u'create_fs_backup', u'publish': {u'message': u'<% task().result %>', u'status': u'SUCCESS'}, u'publish-on-error': {u'message': u'<% task().result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}, u'create_fs_backup': {u'action': u'tripleo.undercloud.create_file_system_backup', u'input': {u'path': u'<% $.backup_path.path %>', u'sources_path': u'<% $.sources_path %>'}, u'name': u'create_fs_backup', u'on-error': u'send_message', u'on-success': u'upload_backup', u'publish': {u'message': u'<% task().result %>', u'status': u'SUCCESS'}, u'publish-on-error': {u'message': u'<% task().result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}, u'get_database_credentials': {u'action': u"mistral.environments_get name='tripleo.undercloud-config'", u'name': u'get_database_credentials', u'on-error': u'send_message', u'on-success': u'create_database_backup', u'publish': {u'message': u'<% task().result %>', u'status': u'SUCCESS', u'undercloud_db_password': u'<% task(get_database_credentials).result.variables.undercloud_db_password %>'}, u'publish-on-error': {u'message': u'<% task().result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}, u'get_free_space': {u'action': u'tripleo.undercloud.get_free_space', u'name': u'get_free_space', u'on-error': u'send_message', u'on-success': u'create_backup_dir', u'publish': {u'free_space': u'<% task().result %>', u'message': u'<% task().result %>', u'status': u'SUCCESS'}, u'publish-on-error': {u'message': u'<% task().result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}, u'send_message': {u'action': u'zaqar.queue_post', u'input': {u'messages': {u'body': {u'payload': {u'execution': u'<% execution() %>', u'message': u"<% $.get('message', '') %>", u'status': u"<% $.get('status', 'SUCCESS') %>"}, u'type': u'tripleo.undercloud_backup.v1.launch'}}, u'queue_name': u'<% $.queue_name %>'}, u'name': u'send_message', u'on-success': [{u'fail': u'<% $.get(\'status\') = "FAILED" %>'}], u'retry': u'count=5 delay=1', u'type': u'direct', u'version': u'2.0'}, u'upload_backup': {u'action': u'tripleo.undercloud.upload_backup_to_swift', u'input': {u'backup_path': u'<% $.backup_path.path %>'}, u'name': u'upload_backup', u'on-error': u'send_message', u'on-success': u'cleanup_backup', u'publish': {u'message': u'<% task().result %>', u'status': u'SUCCESS'}, u'publish-on-error': {u'message': u'<% task().result %>', u'status': u'FAILED'}, u'type': u'direct', u'version': u'2.0'}}, u'version': u'2.0'}, u'updated_at': u'2018-07-17 21:27:49'}, u'message': u"Failed to run action [action_ex_id=c893bd41-f43e-4d5e-9bbb-fdc98729ce52, action_cls='<class 'mistral.actions.action_factory.CreateFileSystemBackup'>', attributes='{}', params='{u'path': u'/var/tmp/undercloud-backup-D8Rezg', u'sources_path': u'/etc/,/home/stack/,/root/,/srv/node/,/var/lib/docker/,/var/lib/glance/,/var/lib/registry/'}']\n [Errno 2] No such file or directory: '/var/tmp/undercloud-backup-D8Rezg/filesystem-20180717172803.tar'", u'status': u'FAILED'} (undercloud) [stack@undercloud75 ~]$
Hey Omri, This was tested before, I reverted the change and it's still failing, until then the verification for this is blocked.
I found the issue, depending on the env permissions the mistral workflow fails. Here is the fix, https://code.engineering.redhat.com/gerrit/#/c/144306/ Ill merge it downstream to create the package.
Hit same issue as Omri. Can't verify waiting for new build/rpm to include latest fixed-in, before retesting.
Forgot to add my failing version was openstack-tripleo-common-8.6.1-23.el7ost.noarch
FYI I've cherry picked the fix. Edited /usr/share/tripleo-common/sudoers line 10. https://code.engineering.redhat.com/gerrit/#/c/144306/1/sudoers Then sudo cp /usr/share/tripleo-common/sudoers /etc/sudoers.d/tripleo-common Backup worked! Waiting for fix to land in RPM and retest before I verify. (undercloud) [stack@undercloud-0 ~]$ cat /usr/share/tripleo-common/sudoers Defaults!/usr/bin/run-validation !requiretty Defaults:validations !requiretty Defaults:mistral !requiretty mistral ALL = (validations) NOPASSWD:SETENV: /usr/bin/run-validation mistral ALL = NOPASSWD: /usr/bin/chown -h validations\: /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \ !/usr/bin/chown /tmp/validations_identity_* *, !/usr/bin/chown /tmp/validations_identity_*..* mistral ALL = NOPASSWD: /usr/bin/rm -f /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \ !/usr/bin/rm /tmp/validations_identity_* *, !/usr/bin/rm /tmp/validations_identity_*..* mistral ALL = NOPASSWD: /bin/nova-manage cell_v2 discover_hosts * mistral ALL = NOPASSWD: /usr/bin/tar --xattrs --ignore-failed-read -C / -cf /var/tmp/undercloud-backup-*.tar * mistral ALL = NOPASSWD: /usr/bin/chown mistral. /var/tmp/undercloud-backup-*/filesystem-*.tar mistral ALL = NOPASSWD: /usr/bin/yum -y install octavia-amphora-image validations ALL = NOPASSWD: ALL (undercloud) [stack@undercloud-0 ~]$ cat /etc/sudoers.d/tripleo-common cat: /etc/sudoers.d/tripleo-common: Permission denied (undercloud) [stack@undercloud-0 ~]$ sudo cat /etc/sudoers.d/tripleo-common Defaults!/usr/bin/run-validation !requiretty Defaults:validations !requiretty Defaults:mistral !requiretty mistral ALL = (validations) NOPASSWD:SETENV: /usr/bin/run-validation mistral ALL = NOPASSWD: /usr/bin/chown -h validations\: /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \ !/usr/bin/chown /tmp/validations_identity_* *, !/usr/bin/chown /tmp/validations_identity_*..* mistral ALL = NOPASSWD: /usr/bin/rm -f /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \ !/usr/bin/rm /tmp/validations_identity_* *, !/usr/bin/rm /tmp/validations_identity_*..* mistral ALL = NOPASSWD: /bin/nova-manage cell_v2 discover_hosts * mistral ALL = NOPASSWD: /usr/bin/tar --ignore-failed-read -C / -cf /var/tmp/undercloud-backup-*.tar * mistral ALL = NOPASSWD: /usr/bin/chown mistral. /var/tmp/undercloud-backup-*/filesystem-*.tar mistral ALL = NOPASSWD: /usr/bin/yum -y install octavia-amphora-image validations ALL = NOPASSWD: ALL (undercloud) [stack@undercloud-0 ~]$ sudo cp /usr/share/tripleo-common/sudoers /etc/sudoers.d/tripleo-common (undercloud) [stack@undercloud-0 ~]$ sudo cat /etc/sudoers.d/tripleo-common Defaults!/usr/bin/run-validation !requiretty Defaults:validations !requiretty Defaults:mistral !requiretty mistral ALL = (validations) NOPASSWD:SETENV: /usr/bin/run-validation mistral ALL = NOPASSWD: /usr/bin/chown -h validations\: /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \ !/usr/bin/chown /tmp/validations_identity_* *, !/usr/bin/chown /tmp/validations_identity_*..* mistral ALL = NOPASSWD: /usr/bin/rm -f /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \ !/usr/bin/rm /tmp/validations_identity_* *, !/usr/bin/rm /tmp/validations_identity_*..* mistral ALL = NOPASSWD: /bin/nova-manage cell_v2 discover_hosts * mistral ALL = NOPASSWD: /usr/bin/tar --xattrs --ignore-failed-read -C / -cf /var/tmp/undercloud-backup-*.tar * mistral ALL = NOPASSWD: /usr/bin/chown mistral. /var/tmp/undercloud-backup-*/filesystem-*.tar mistral ALL = NOPASSWD: /usr/bin/yum -y install octavia-amphora-image validations ALL = NOPASSWD: ALL (undercloud) [stack@undercloud-0 ~]$ openstack undercloud backup --add-path /home/stack Started Mistral Workflow tripleo.undercloud_backup.v1.backup. Execution ID: bcf97657-1e67-4f3e-968e-30a5c7fcbf04 Waiting for messages on queue 'tripleo' with no timeout. Undercloud Backup succeed
Phase2 produced a still insufficient version of: openstack-tripleo-common-8.6.1-23.el7ost.noarch Spinning up a new phase1 deployment maybe that will provide the needed fixed-in version to verify this. else I'll just wait a while longer before I try to re-verify.
Verification wise when you mentioned: 2. Use the backup on a fresh undercloud. Would this be the correct guide to follow (section 2.1. Restoring the undercloud) https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/back_up_and_restore_the_director_undercloud/restore_the_undercloud
For the most part, yes, but with two caveats: 1. You need to download a copy of the archive from the old undercloud's swift store. 2. The archive name will be different. Apart from that, it should be the same procedure.
New doc bug about mention this new backup command in OPSD backup/restore section: https://bugzilla.redhat.com/show_bug.cgi?id=1615077 RFE Backup command should mention where the backup file is created. Tipping off the customer on backup location is missing. https://bugzilla.redhat.com/show_bug.cgi?id=1615079
Someone please (RFE) fix the error messages[0] returned if\once backup fails due to Swift reaching it's max capacity. Ran the backup two times eventually Swift filled up on third backup attempt. After removing two previous backups from Swift, backup now completed without errors. {u'msg': u"Unexpected error while running command.\nCommand: /usr/bin/tar -C /var/tmp/undercloud-backup-dwsPE2 -cf /tmp/tmpOFpNdd --exclude .git --exclude .tox --exclude *.pyc --exclude *.pyo .\nExit code: 2\nStdout: u''\nStderr: u'/usr/bin/tar: /tmp/tmpOFpNdd: Wrote only 8192 of 10240 bytes\\n/usr/bin/tar: Error is not recoverable: exiting now\\n'"}, u'status': u'FAILED'}
Latest update undercloud backup backup worked without an error. However when I try to restore the SQL data-> mysql -u root < all-databases-20180812104955.sql.gz ERROR: ASCII '\0' appeared in the statement, but this is not allowed unless option --binary-mode is enabled and mysql is run in non-interactive mode. Set --binary-mode to 1 if ASCII '\0' is expected. Query: ''. Ideally would like to see restore working before I verify bz. Else if we are only worried about backup part the main bz focus I could technically already verify as is.
Fixed SQL restore via unpacking dump file first: gunzip all-databases-20180812104955.sql.gz mysql -u root < all-databases-20180812104955.sql
Now hit another restore issue: mysql -e 'select host, user, password from mysql.user;' ERROR 1045 (28000): Access denied for user 'stack'@'localhost' (using password: NO)
Hi Carlos, I'm stuck with SQL restore #35, any idea on how I could pass this to finish an undercloud restore attempt so that I could verify this bug via full backup/restore cycle? Or is undercloud backup complete enough to verify? I'm less happy about it being left open then again BZ isn't about restoring issues.
Cloning /root/.my.cnf from source undercloud, to target undercloud fixed issue mentioned on #35
Created attachment 1475572 [details] /home/stack/.instack/install-undercloud.log While installing undercloud on target undercloud, hit below error. Unsure how to advance. ERROR: TIMEOUT waiting for execution d9713706-0345-43e4-832e-5b77673e9001 to finish. State: RUNNING 2018-08-13 08:30:51,234 DEBUG: An exception occurred Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 2337, in install _post_config(instack_env, upgrade) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 2029, in _post_config _post_config_mistral(instack_env, mistral, swift) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1965, in _post_config_mistral _create_default_plan(mistral, plans) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1907, in _create_default_plan fail_on_error=True) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1852, in _wait_for_mistral_execution raise RuntimeError(error_message) RuntimeError: TIMEOUT waiting for execution d9713706-0345-43e4-832e-5b77673e9001 to finish. State: RUNNING 2018-08-13 08:30:51,234 ERROR: ############################################################################# Undercloud install failed. Reason: TIMEOUT waiting for execution d9713706-0345-43e4-832e-5b77673e9001 to finish. State: RUNNING See the previous output for details about what went wrong. The full install log can be found at /home/stack/.instack/install-undercloud.log. ############################################################################# Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 2337, in install _post_config(instack_env, upgrade) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 2029, in _post_config _post_config_mistral(instack_env, mistral, swift) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1965, in _post_config_mistral _create_default_plan(mistral, plans) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1907, in _create_default_plan fail_on_error=True) File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1852, in _wait_for_mistral_execution raise RuntimeError(error_message) RuntimeError: TIMEOUT waiting for execution d9713706-0345-43e4-832e-5b77673e9001 to finish. State: RUNNING Command 'instack-install-undercloud' returned non-zero exit status 1
Verified on: openstack-tripleo-common-8.6.3-5.el7ost.noarch openstack-tripleo-common-containers-8.6.3-5.el7ost.noarch Deploy an undercloud backup it up Deploy a new undercloud from backup files+dump Once restore completed, compared some data example-> #. stackrc #ironic node-list The "ironic" CLI is deprecated and will be removed in the S* release. Please use the "openstack baremetal" CLI instead. +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | db235974-8e8c-4961-a747-283b36490a8d | compute-0 | b1242cac-1c29-4199-9d61-b9f48aebfd30 | power on | active | False | | 4b6571d0-e213-4f9a-b49d-331b34b92227 | controller-0 | d8c29026-9e14-418e-af2d-00fad25142ac | power on | active | False | +--------------------------------------+--------------+--------------------------------------+----- #glance image-list +--------------------------------------+------------------------+ | ID | Name | +--------------------------------------+------------------------+ | b2eeac74-efb9-46e2-a46a-554aeb892350 | bm-deploy-kernel | | 46c0a72f-4d98-4339-8e56-86d8de5179c3 | bm-deploy-ramdisk | | 8975b71d-823e-4ed8-81d5-c10c0eb748eb | overcloud-full | | 7123db92-9b64-4721-85a6-5689b7fb57b3 | overcloud-full-initrd | | 43a26768-fc0f-4cf9-85a0-93ec831c6fc4 | overcloud-full-vmlinuz | +--------------------------------------+------------------------+ $ swift list __cache__ ov-hzbw3zdpsb-0-fxrw2qb6gbzn-NovaCompute-3nahouwjjc6w ov-ixlkdsb2rpv-0-zjgvszg6cgko-Controller-xfsrngelxb5a overcloud overcloud-swift-rings undercloud-backups undercloud-backups_segments All data looks the same as on source undercloud. Backup/restore work as expected.
Forgot to mention safe to ignore comment #38 issue user error. Started fresh and things worked out as expected on #39.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2574