Bug 1316014

Summary: Cinder backup is miss-configured making backup tests fails
Product: Red Hat OpenStack Reporter: Arx Cruz <acruz>
Component: openstack-tripleo-heat-templatesAssignee: Angus Thomas <athomas>
Status: CLOSED ERRATA QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: acruz, dbecker, eharney, geguileo, jcoufal, jschluet, mburns, morazi, pgrist, rhel-osp-director-maint, tbarron, tkammer, tshefi
Target Milestone: rcKeywords: Automation, AutomationBlocker, Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 15:27:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 4 Paul Grist 2016-10-13 22:08:22 UTC
Marking this as ON_QA as I think this was covered by your recent OSP10 testing. Can you confirm and the build info for the build you tested?

Comment 6 Tzach Shefi 2016-10-25 13:01:21 UTC
Paul, don't have that setup any more, installing a new one for this. 

Arx - when you say  "miss configured" in terms of QA which of the below (or other) would verify this bug?

Would bringing up  1 controller + 1 compute RHOS10 + Cinder backup service yaml passing mentioned temptest test be sufficient? 

Or should I bring up an HA system 3 controllers + 1 compute + Cinder backup service and then verify tempest backup tests pass?  Followed by restarting current Cinder service controller to also verify that Cinder services including backup move over to other controller? 


One last Q can I move to verified before we have a fixed-in version?
Who needs to fill this in?

Comment 7 Arx Cruz 2016-10-25 17:25:24 UTC
I believe it doesn't matter if is HA or not, the problem is, cinder backup service must be running in the same server as cinder volume, otherwise, the backup won't work, as you can see in my comments, the cinder backup was running on the controller:

|  cinder-backup   | overcloud-controller-0.localdomain | nova | enabled |   up  | 2016-03-02T11:25:32.000000 |        -        |

While the cinder volume was running in a separated server:

|  cinder-volume   |      rbd:volumes@tripleo_ceph      | nova | enabled |   up  | 2016-03-02T11:25:52.000000 |        -        |

both volume and backup services must be running in the same server to test pass, which the installation isn't ensuring. Here are the steps I did manually to fix it and make the test pass:

In the controller, edit /etc/cinder/cinder.conf and remove/comment the backend_host=rbd:volumes option in [tripleo_ceph] section (at bottom of the config file)

Notice the values of rdb_pool and rdb_user in the same [tripleo_ceph] section, and use these values in the backup_ceph_pool and backup_ceph_user options in the [DEFAULT] section
It will be something like this:

backup_ceph_conf = /etc/ceph/ceph.conf
backup_ceph_user = openstack
backup_ceph_chunk_size = 134217728
backup_ceph_pool = volumes
backup_ceph_stripe_unit = 0
backup_ceph_stripe_count = 0
backup_driver = cinder.backup.drivers.ceph

In the controller, restart the services:
systemctl restart openstack-cinder-api openstack-cinder-scheduler openstack-cinder-volume openstack-cinder-backup

Then, if you run cinder service-list  you will have the backup and volume running in the controller, and the test will pass.

Of course the backup and volume can run in other server rather then the controller, this was just the easiest solution for a manual fix.


So, in terms of QA, the "miss configured" doesn't matter, for QA what matter is that the test pass, all I did was show you how the installation should manage the volume and backup service: both in the same host, be it the controller or any other server as far as they are in the same host.

Tempest test passing means the client will be able to do what the test proposes, in this case, create a vm, create a volume, attach the volume, write something, detach the volume, create the backup, restore the backup, and delete, using both V1 and V2 of volume backup api.

Regarding you move to verified before have a fixed-in version here's what bugzilla workflow says:

This bug fix is available for the Assigned Quality Engineer to test. If not using the
Errata Tool, typically moved to ON_QA by RCM when the bug fix has been
incorporated into a build which is available to be tested. If using the Errata Tool, moved
to ON_QA automatically when the bug number is added to the erratum’s bug list, and a
build is available to be tested. Test cases or plans are written and available.
Possible Transitions:
● VERIFIED: bug fix has successfully finished testing by QE
● ASSIGNED: bug report has been returned to an engineer due to a failure in testing.
QE provides a description of the failure and any additional debugging information

I hope I clarify your questions.

Comment 10 Tzach Shefi 2016-10-31 10:16:12 UTC
Both tests passed Tempest automation phase2 run.
Version: openstack-tripleo-heat-templates-5.0.0-0.8.0rc3.el7ost.noarch

I personally failed to build a working deployment to test this. 
Unsure about source of problem virt infra automation or limited virt resources.

Comment 14 errata-xmlrpc 2016-12-14 15:27:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html