Description of problem: luci_admin restore does not fully restore the database to its original state. I.e. when a backup is created on a clean DB, then a cluster is added into luci, after restoring the db backup the created cluster is still present in a database. Version-Release number of selected component (if applicable): luci-0.12.2-38.el5 How reproducible: Always Steps to Reproduce: 1. Install a initialize luci: "luci_admin init" 2. Do a backup: "luci_admin backup" 3. Start luci and create a new cluster 4. Stop luci and do a db restore: "luci_admin restore" 5. Start luci again and check out cluster list Actual results: Cluster created in step 3 is still present. Expected results: Database should be restored to a state in step 2, i.e. no cluster is present.
Created attachment 535023 [details] Luci backup containing no clusters.
Created attachment 535118 [details] Proposed change to luci_admin script. In fact, the luci_admin script was not doing what can be marked as "restore" operation. It simply added items contained in the previously generated backup XML file. This behaviour is kept as a default for backward compatibility, but a new switch '-r' ('--replace') is added for ``restore'' subcommand that activates the commonly desired semantics "remove old data items first and then import new ones". Currently, such data items are storage and cluster ones (the latter solves the original issue raised by Radek).
The patch (attachment 535118 [details]) in the previous comment is meant to be applied after the patch for bug 755935 (attachment 535040 [details]). Not only to apply cleanly, but also because this one modifies built-in help message to reflect both changes [*]. For this reason, I am setting the (weak) dependency between the bugs. [*] The man page should also be modified respectively.
The proper work of "luci_admin restore -r <XML-file>" can be checked, e.g., with the case in bug description. Instead of checking set of clusters being currently in DB via luci, the output of another "luci_admin backup" launched afterwards should provide the information as well (under <clusterList> tag).
Acking QA.
Created attachment 538214 [details] slightly modified patch from attachment 535118 [details]
This applies to both this bug 755941 and bug 755935: They should be both fixed as of conga-0.12.2-41.el5. However, to ensure fix for bug 755941 works properly, my first step was to do a little cleanup in luci's code to ensure special values (temporary flags for nodes and for the whole cluster are handled consistently within luci). This fix was introduced as of conga-0.12.2-40.el5. These temporary flags could have possibly and presumably got into the backup XML if, e.g., cluster creation process was interrupted in the middle (luci shutdown) and the database was then backed up using "luci_admin backup" (not verified).
Created attachment 538238 [details] patch containing mentioned cleanup In addition, it should also solve (theoretically only, hard to test in isolation) a problem that "busy" (handling another task at the moment) nodes were not excluded properly in luci internals when desired. Thus e.g., for the purpose of cluster nodes listing, busy node could have been chosen as a data source. So it should be also tested that common actions with cluster work as they should in build conga-0.12.2-41.el5 (or -40, but -41 is probably more convenient, together with the 2 bugs) and no obviously wrong information is displayed (such as a node is offline when it is not etc.), mainly in combination with user-initiated tasks being performed on particular nodes.
Created attachment 544668 [details] reiterated patch (should reflect all Radek's suggestions) In fact, the luci_admin script was not doing what can be marked as "restore" operation. It simply added items contained in the previously generated backup XML file (and in some cases, even generated tracebacks if was not successful in doing that). This behaviour is kept as a default for backward compatibility, but a new switch '-r' ('--replace') is added for ``restore'' subcommand that activates the commonly desired semantics "remove old configuration first and then import a new one". Currently, such data items are storage and cluster ones. Complementary, '-u' ('--update') switch has been added, which is meant to explicitly request (as of now default) "update" semantics. Also be permissive if a smooth merge cannot be done due to existing vs. new item collision -- keep existing and continue (cluster, storage system, certificate). Also add some comments and restrict the behaviour a bit.
Marking as VERIFIED in luci-0.12.2-42.el5. ad 1) Trying to restore an existing cluster records in database now gives a reasonable message instead of a traceback: # luci_admin restore -u test.xml Restoring the luci server... An error occurred while restoring storage system "west-13.lab.bos.redhat.com": The id "west-13.lab.bos.redhat.com" is invalid - it is already in use. An error occurred while restoring storage system "west-16.lab.bos.redhat.com": The id "west-16.lab.bos.redhat.com" is invalid - it is already in use. An error occurred while restoring storage system "west-14.lab.bos.redhat.com": The id "west-14.lab.bos.redhat.com" is invalid - it is already in use. An error occurred while restoring storage system "west-15.lab.bos.redhat.com": The id "west-15.lab.bos.redhat.com" is invalid - it is already in use. An error occurred while restoring the cluster "mycluster": The id "mycluster" is invalid - it is already in use. Certificate "/var/lib/luci/var/certs/privkey.pem" already exists, skipping... Certificate "/var/lib/luci/var/certs/https.key.pem" already exists, skipping... Certificate "/var/lib/luci/var/certs/cacert.pem" already exists, skipping... Certificate "/var/lib/luci/var/certs/https.pem" already exists, skipping... Certificate "/var/lib/luci/var/certs/cacert.config" already exists, skipping... Certificate "/var/lib/luci/var/certs/peers/west-16.lab.bos.redhat.com_cert_pub" already exists, skipping... Certificate "/var/lib/luci/var/certs/peers/west-14.lab.bos.redhat.com_cert_pub" already exists, skipping... Certificate "/var/lib/luci/var/certs/peers/west-15.lab.bos.redhat.com_cert_pub" already exists, skipping... Certificate "/var/lib/luci/var/certs/peers/west-13.lab.bos.redhat.com_cert_pub" already exists, skipping... The luci restore was successful. You must restart the luci server for changes to take effect. Run "service luci restart" to do so Replacing an existing configuration is indeed okay: # luci_admin restore -r test.xml Restoring the luci server... Certificate "/var/lib/luci/var/certs/privkey.pem" already exists, skipping... Certificate "/var/lib/luci/var/certs/https.key.pem" already exists, skipping... Certificate "/var/lib/luci/var/certs/cacert.pem" already exists, skipping... Certificate "/var/lib/luci/var/certs/https.pem" already exists, skipping... Certificate "/var/lib/luci/var/certs/cacert.config" already exists, skipping... The luci restore was successful. You must restart the luci server for changes to take effect. Run "service luci restart" to do so Omitting the -u parameter still does an update restore as it is a default behavior. ad 3) Replacing a valid configuration with an empty one (done from clean backup right after luci_admin init) will clean up the database and certificates as well: # ls -ls /var/lib/luci/var/certs/peers/ total 0 ad 2) New man page content is sufficient: <...> backup [backup-file] Create a backup of luci’s configuration to backup-file (default is /var/lib/luci/var/luci_backup.xml. Managed clusters, storage systems, local users, user permissions, and SSL certificates will be stored as part of the backup. restore [-u|--update|-r|--replace] [backup-file] Restore luci’s configuration using data from backup-file previously created using backup (default is /var/lib/luci/var/luci_backup.xml). Switches -u (--update) and -r (--replace) controls the semantics of the command: keep (default) or remove existing configuration respectively. The latter has probably the expected effect, but has to be specified explicitly (backward compatibility). </...>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0151.html