Description of problem: When the Active share service stops and another, passive share service takes over as the active, when the share service that was stopped comes back, the shares that were on the other share service will become unavailable for Manila to control. Version-Release number of selected component (if applicable): openstack-manila-3.0.0-8.el7ost.noarch openstack-manila-ui-2.5.1-9.el7ost.noarch puppet-manila-9.5.0-1.el7ost.noarch python-manilaclient-1.11.0-1.el7ost.noarch python-manila-3.0.0-8.el7ost.noarch openstack-manila-share-3.0.0-8.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Use Infrared to deploy an OSP-10z4 deployment with Manila and with at least 2 controller nodes and any number of compute nodes. I used a NetApp backend for the storage. 2. Disable the active Manila Share service. Observe that the service will start on another controller node. 3. Create a share on the new share service. 4. Re-enable the disabled share service and observe that the share that was created is no longer controllable by Manila. Actual results: Manila shares created on the other share service become uncontrollable when the first share service is reactivated. Expected results: Disruption of the share service shall not impact Manila shares. Additional info: I looked into how Cinder does Volume service HA and they use a hostgroup for all of the volume services so that the "hostname" of the volume service does not change when another volume service takes over the active role. Chances are something similar will need to happen to Manila as well.
puppet manila patch 499937 has merged upstream in stable/newton but we still need to cherry pick THT patch 499111 after it merges to stable/ocata
stable/ocata tripleo-heat-templates patch 499111 has been cherry-picked to stable/newton as 508117
508117 has merged upstream to stable/newton
Doing the procedure I listed above with the OSP-10z6 puddle, I was able to successfully have shares survive the loss of the controller node where the share service was running with all shares created before the controller was killed being listed and available while it was down. Looks like we're good here.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3231