[OSP13] Undercloud backup and restore fails when following the procedure and steps as it was documented for OSP12 ( fails on OSP13 env ). Environment: ------------ OSP13 Puddle 2018-03-16.1 Description : --------------- Undercloud backup and restore fails when following the procedure and steps as it was documented for OSP12 ( it fails on OSP13 env ). Results : --------- (1) missing /etc/my.cnf.d/server.cnf (on osp13 env ) (2) when attempted to use mariadb-server.cnf (instead of server.cnf) the command " cat /root/undercloud-all-databases.sql | mysql" failed with ERROR Docs link: ----------- https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/back_up_and_restore_the_director_undercloud/restore list of files exist under /etc/my.cnf.d/ ----------------------------------------- [root@undercloud75 ~]# cd /etc/my.cnf.d/ [root@undercloud75 my.cnf.d]# ls auth_gssapi.cnf client.cnf enable_encryption.preset galera.cnf mariadb-server.cnf mysql-clients.cnf tokudb.cnf [root@undercloud75 my.cnf.d]# ls -ls total 28 4 -rw-r--r--. 1 root root 41 Feb 1 12:42 auth_gssapi.cnf 4 -rw-r--r--. 1 root root 295 Dec 14 2016 client.cnf 4 -rw-r--r--. 1 root root 763 Dec 14 2016 enable_encryption.preset 4 -rw-r--r--. 1 root root 999 Feb 20 15:01 galera.cnf 4 -rw-r--r--. 1 root root 1462 Feb 1 12:42 mariadb-server.cnf 4 -rw-r--r--. 1 root root 232 Dec 14 2016 mysql-clients.cnf 4 -rw-r--r--. 1 root root 285 Dec 14 2016 tokudb.cnf When attempting to use mariadb-server.cnf instead of server.cnf : ("cat /root/undercloud-all-databases.sql | mysql" -> FAILED)) ----------------------------------------------------------------- [root@undercloud75 ~]# tar -xzC / -f undercloud-backup-*.tar.gz etc/my.cnf.d/mariadb-server.cnf [root@undercloud75 ~]# tar -xzC / -f undercloud-backup-*.tar.gz root/undercloud-all-databases.sql tar: root/undercloud-all-databases.sql: time stamp 2018-02-20 19:31:31.598594491 is 15509.047497842 s in the future [root@undercloud75 ~]# systemctl start mariadb [root@undercloud75 ~]# mysql -uroot -e"set global max_allowed_packet = 16777216;" [root@undercloud75 ~]# cat /root/undercloud-all-databases.sql | mysql ERROR 1911 (HY000) at line 3992: Unknown option 'STATS_PERSISTENT'
Hey Omri, Can you clarify the steps executed from the docs? Im trying to follow up your steps in my environment but im not getting them correctly.
Hello Omri, We are working on a new refactor for the backup/restore procedures, this is the upstrean review https://review.openstack.org/#/c/544975/ do you mind to check it? Cheers, Carlos.
(In reply to Carlos Camacho from comment #2) > Hey Omri, > > Can you clarify the steps executed from the docs? > Im trying to follow up your steps in my environment but im not getting them > correctly. Hi Carlos, I was following the official RH documentation for OSP12, the section that explains the backup/restore undercloud ( link on Bz body) . (In reply to Carlos Camacho from comment #3) > Hello Omri, > > We are working on a new refactor for the backup/restore procedures, this is > the upstrean review https://review.openstack.org/#/c/544975/ do you mind to > check it? > That's great- I'll try to follow those steps and report back here. > > Cheers, > Carlos.
(In reply to Carlos Camacho from comment #3) > Hello Omri, > > We are working on a new refactor for the backup/restore procedures, this is > the upstrean review https://review.openstack.org/#/c/544975/ do you mind to > check it? > > > Cheers, > Carlos. By attempting to run the manual commands suggested in the documentation path patch https://review.openstack.org/#/c/544975/, it seems that again, there is a request to tar /etc/my.cnf.d/server.cnf which is a file that no longer exist in osp13 environment (as reported in Bz body).
Hello Omri, You are right there, depending on the MySQL version we can have a different configuration file name. I pushed a fix for the latest docs in https://review.openstack.org/#/c/558429
(In reply to Carlos Camacho from comment #7) > Hello Omri, > > You are right there, depending on the MySQL version we can have a different > configuration file name. > > I pushed a fix for the latest docs in https://review.openstack.org/#/c/558429 Trying to follow the steps on osp13 env : ------------------------------------------- [root@undercloud75 ~]# tar --ignore-failed-read -czf \ > undercloud-backup-`date +%F`.tar.gz \ > /root/undercloud-all-databases.sql \ > /etc/my.cnf.d \ > /var/lib/glance/images \ > /srv/node \ > /home/stack \ > /etc/pki \ > /opt/stack tar: Removing leading `/' from member names tar: /root/undercloud-all-databases.sql: Warning: Cannot stat: No such file or directory seems like we're missign undercloud-all-databases.sql [root@undercloud75 ~]# find / -name undercloud-all-databases.sql [root@undercloud75 ~]# [stack@undercloud75 ~]$ sudo find / -name *.sql /usr/lib/python2.7/site-packages/nova/db/sqlalchemy/migrate_repo/versions/246_sqlite_upgrade.sql /usr/lib/python2.7/site-packages/glance/db/sqlalchemy/migrate_repo/versions/003_sqlite_upgrade.sql /usr/lib/python2.7/site-packages/glance/db/sqlalchemy/migrate_repo/versions/006_mysql_upgrade.sql /usr/lib/python2.7/site-packages/glance/db/sqlalchemy/migrate_repo/versions/006_sqlite_upgrade.sql /usr/lib/python2.7/site-packages/glance/db/sqlalchemy/migrate_repo/versions/045_sqlite_upgrade.sql /usr/lib/python2.7/site-packages/glance/db/sqlalchemy/migrate_repo/versions/011_sqlite_upgrade.sql /usr/lib/python2.7/site-packages/glance/db/sqlalchemy/migrate_repo/versions/037_sqlite_upgrade.sql /usr/share/mariadb/mroonga/install.sql /usr/share/mariadb/mroonga/uninstall.sql /usr/share/mariadb/fill_help_tables.sql /usr/share/mariadb/install_spider.sql /usr/share/mariadb/maria_add_gis_sp.sql /usr/share/mariadb/maria_add_gis_sp_bootstrap.sql /usr/share/mariadb/mysql_performance_tables.sql /usr/share/mariadb/mysql_system_tables.sql /usr/share/mariadb/mysql_system_tables_data.sql /usr/share/mariadb/mysql_test_data_timezone.sql /usr/share/mariadb/mysql_to_mariadb.sql /usr/share/puppet/ext/dbfix.sql /usr/share/openstack-puppet/modules/veritas_hyperscale/files/scripts/db/01_HyperScale.sql /usr/share/openstack-puppet/modules/veritas_hyperscale/files/scripts/db/02_HyperScaleStatsSchema.sql /usr/share/openstack-puppet/modules/veritas_hyperscale/files/scripts/db/03_HyperScaleWorkflow.sql /usr/share/openstack-puppet/modules/veritas_hyperscale/files/scripts/db/51_HyperScaleAlertsDescription.sql
Hello, Omri, did you run the previous step? Is this one: mysqldump --opt --single-transaction --all-databases > /root/undercloud-all-databases.sql If you don't create the DB dump you won't be able to zip it.
(undercloud) [stack@undercloud75 ~]$ openstack undercloud backup --add-path /etc/hosts \ > --add-path /var/log/ \ > --add-path /var/lib/glance/images/ \ > --add-path /srv/node/ \ > --add-path /etc/ Swift API is failing during the backup : ----------------------------------------- u'version': u'2.0'}, u'updated_at': u'2018-02-22 15:08:57'}, u'message': {u'msg': u'Object PUT failed: https://192.168.0.2:13808/v1/AUTH_5b0b3efc458a4fc4af8e21487556aeca/undercloud-backups/UC-backup-20180222101308.tar 413 Request Entity Too Large [first 60 chars of response] <html><h1>Request Entity Too Large</h1><p>The body of your r'}, u'status': u'FAILED'} [root@undercloud75 undercloud-backup-JjgPDr]# ls -lah total 8.2G drwx------. 2 mistral mistral 86 Feb 22 10:09 . drwxrwxrwt. 14 root root 4.0K Feb 22 10:08 .. -rw-r--r--. 1 mistral mistral 35M Feb 22 10:09 all-databases-20180222100900.sql.gz -rw-r--r--. 1 mistral mistral 8.2G Feb 22 10:13 filesystem-20180222100929.tar
(In reply to Omri Hochman from comment #10) > (undercloud) [stack@undercloud75 ~]$ openstack undercloud backup --add-path > /etc/hosts \ > > --add-path /var/log/ \ > > --add-path /var/lib/glance/images/ \ > > --add-path /srv/node/ \ > > --add-path /etc/ > Opened the following Bz for the openstack undercloud backup : https://bugzilla.redhat.com/show_bug.cgi?id=1563783 while we'll track the "manual backup steps" on this ticket.
Hi Omri, Can you set the pm_ack for this BZ. Thank you, Mathieu
(In reply to mathieu bultel from comment #12) > Hi Omri, > Can you set the pm_ack for this BZ. > > Thank you, > Mathieu Should be PM_ACK by PM . Adding Jarda.
granted
Attempted to restore, by running the steps manually and encountered the following issue : [root@undercloud75 ~]# cat /root/undercloud-all-databases.sql | mysql ERROR 2006 (HY000) at line 3153: MySQL server has gone away [root@undercloud75 ~]# for i in ceilometer glance heat ironic keystone neutron nova;do mysql -e "drop user $i";done ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'ceilometer'@'%' ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'glance'@'%' ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'heat'@'%' ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'ironic'@'%' ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'keystone'@'%' ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'neutron'@'%' ERROR 1396 (HY000) at line 1: Operation DROP USER failed for 'nova'@'% Later on attempted to continue despite the errors, result in undercloud install failure. RuntimeError: os-refresh-config failed. See log for details. 2018-02-20 16:22:05,237 ERROR: ############################################################################# Undercloud install failed.
After applying (if you want to run this on 13, use the backports) https://review.openstack.org/#/c/568245/ https://review.openstack.org/#/c/564784/ The following commands should reinstall the UC from the backup (tested just now): source ~/stackrc rm -rf /var/tmp/test_uc_backup mkdir -p /var/tmp/test_uc_backup cd /var/tmp/test_uc_backup openstack container delete undercloud-backups --recursive openstack undercloud backup --exclude-path /home/stack/ openstack container save undercloud-backups tar -xvf *.tar gunzip *.gz cat all-databases-*.sql | sudo mysql for i in ceilometer glance heat ironic keystone neutron nova;do sudo mysql -e "drop user $i" || true;done sudo mysql -e 'flush privileges' openstack undercloud install In this case, if trying to run the manual workflow, just be sure you run the DB dump correctly.
Hello, The latest docs version available is here: http://tripleo.org/install/controlplane_backup_restore/00_index.html
(In reply to Carlos Camacho from comment #20) > Hello, > > The latest docs version available is here: > http://tripleo.org/install/controlplane_backup_restore/00_index.html Attempted to follow the docs we failed on conflict, probably we would need to add a patch to deal with: /root/.my.cnf was conflicting with the install
Ok, I'm working on a few ansible playbooks to agree on the actual docs test for this feature. The playbooks will: 1) Create the backup. 2) Destroy the Undercloud node (remove packages, DB server and config files). 3) Restore the Undercloud.
Hi, here you have some playbooks for testing the backup/restore https://github.com/ccamacho/tripleo-ansible/tree/master/undercloud-backup-restore-check I'll push this upstream in docs.
Omri, Are you able to test with the playbooks Carlos has linked and provide some feedback?
Carlos, correct me if I'm wrong, AFAIK the last time playbooks partially succeeded as we had an issue to connect to the endpoints after undercloud was restored? we can run again when it's ready, I think would also be great to verify the steps from the documentation.
Hey Omri, the thing is how we are testing it, the way I was trying to make it work was on the same Undercloud used to launch the upgrade, removing files/db server and run it again (which currently works). 1) The first check worked using the same Undercloud node, that's what the playbooks do currently (Tested in both, my and your environment). 2) What we tried on your env last Friday was to use another clean machine with the undercloud Installed (but not configured), after the restore and the reinstall finished the "openstack stack list" worked but "openstack server list" failed with a keystone issue (probably because we are skipping to restore another config file). The workflow for the restore works, as it's the same we were using since all the time[1], the only new addition is the way we create the backup, running 'openstack undercloud backup' which will create the DB dump and copy files. I'm trying to reproduce this using Quickstart to integrate the playbooks upstream, but creating an Undercloud snapshot before deploying the Overcloud does not seem to be something easy to do. [1]: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/back_up_and_restore_the_director_undercloud/restore I don't think this should be a blocker as we are able to create the backup correctly, also we can test it using the same Undercloud node.
(In reply to Carlos Camacho from comment #27) > > I don't think this should be a blocker as we are able to create the backup > correctly, also we can test it using the same Undercloud node. We were using the same undercloud node, but it was cleaned (revert to snapshot) and re-installed (as the instructions suggest). Removing the blocker should be PM decision. In the past AFAIR we were testing backup and restore with the following scenario: (1) deploy UC + OC (2) backup undercloud (3) copy the backup files to side folder (4) revert the undercloud machine to clean rhel (5) re-install undercloud (6) copy the backup files to back undercloud machine (7) attempt to restore undercloud from backup -> it works with latest OSP13 this scenario ^ did not PASS for us.
Let's have some feedback from PM, in any case this should be a documentation bug.
I think Omri's use case is the most common backup/restore test case which is the ability to restore backed up data from a remote location on a clean undercloud machine. This covers the scenario where the undercloud machine is lost(due to a hardware failure for example). In order to bring the undercloud machine back online we should be able to restore the data that has been previously backed up on a clean rhel undercloud machine. I think the ability to restore the undercloud from backup after a failed upgrade attempt is a different test case but it should work in addition to Omri's test.
Hey Omri you are right here, the .pem was failing I was able to restore them and now I'm having issues with glance (didn't restore the glance images on fs). I'm taking all the steps here for verification https://gist.github.com/ccamacho/f22037dcb305d326182b81d5be61b279 Just let me finish to restore all the data to verify these docs amend. I'll try to put all the steps together upstream, in an ansible playbook to test the steps in an upstream CI Job (https://review.openstack.org/#/c/569991/)
(In reply to Carlos Camacho from comment #34) > Hey Omri you are right here, the .pem was failing I was able to restore them > and now I'm having issues with glance (didn't restore the glance images on > fs). Hi Carlos, Yes, we're testing with SSL enabled on the undercloud, so the .pem was one issue, the other issue as you mentioned, was the glance images. > > I'm taking all the steps here for verification > https://gist.github.com/ccamacho/f22037dcb305d326182b81d5be61b279 > > Just let me finish to restore all the data to verify these docs amend. > > I'll try to put all the steps together upstream, in an ansible playbook to > test the steps in an upstream CI Job > (https://review.openstack.org/#/c/569991/) looks good plan: https://gist.github.com/ccamacho/f22037dcb305d326182b81d5be61b279 And that's great to have it tested with playbooks upstream. to verify this bug, we would need the updated/fixed steps that are on the playbooks to be in the documentation, so we can, in QE validate the steps work for OSP13 RC downstream puddles.
Hello, Last Friday I verified this on ohochman's env. the issue was that the Undercloud was using SSL to retrieve endpoints data, but there was no docs reference about how to update the certificates, thus, the post install steps when installing the Undercloud fails. I'm waiting for QE people from Canada wakes up to show them how to run the steps and verify this BZ.
Ill move this to MODIFIED when https://review.openstack.org/#/c/570554 is merged. Those steps were verified with QE yesterday.
Docs merged.
Hi Derek, can you check this ticket and eaplain how we can close it Thanks
Moving to NEW, because this work has not been accepted or assigned yet by the RHOSP docs team.
Formally accepting this work into the RHOSP 13 z program and assigning to Dan for review. Dan, we've since spoken about what you think is required for this update, and I believe it renders comment 48 obsolete.
We also have this info in https://docs.openstack.org/tripleo-docs/latest/install/controlplane_backup_restore/01_undercloud_backup.html Which is tested in the Jenkins job from Omri. Maybe this just needs a QA verification but should be fine.
I think the original issue has been resolved. Since we're up to 58 comments here, I'll close this BZ down so we can refocus on the "openstack undercloud backup" command in this new BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1612697