Bug 1457437

Summary: Customer requires support clarification on SCSI 3 Persistent Reservation in OpenStack
Product: Red Hat OpenStack Reporter: David Peacock <dpeacock>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED DEFERRED QA Contact: Joe H. Rahme <jhakimra>
Severity: low Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: awaugama, berrange, dasmith, eglynn, hmatsumo, kchamart, knoel, mburns, mtessun, pbonzini, sbauza, sferdjao, sgordon, srevivo, virt-maint, vromanso, ykawada
Target Milestone: ---Keywords: TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-29 14:51:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1464908, 1470007, 1484075, 1519019, 1519021    
Bug Blocks:    

Description David Peacock 2017-05-31 17:52:29 UTC
Description of problem:

For use with Microsoft Failover Cluster(MSFC) on OpenStack instances, we have a customer who would like us to clarify the support position regarding SCSI 4 Persistent Reservation on OpenStack.

The closest information found so far are two ambiguous BZs:

https://bugzilla.redhat.com/show_bug.cgi?id=1219841
https://bugzilla.redhat.com/show_bug.cgi?id=1111784

Would someone from RHOSP engineering please weigh in here?

Thank you,
David

Comment 1 Kashyap Chamarthy 2017-06-02 12:43:01 UTC
(In reply to David Peacock from comment #0)
> Description of problem:
> 
> For use with Microsoft Failover Cluster(MSFC) on OpenStack instances, we
> have a customer who would like us to clarify the support position regarding
> SCSI 4 Persistent Reservation on OpenStack.
> 
> The closest information found so far are two ambiguous BZs:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1219841
> https://bugzilla.redhat.com/show_bug.cgi?id=1111784
> 
> Would someone from RHOSP engineering please weigh in here?
> 
> Thank you,
> David

Hi David,

I am just triaging this issue, and trying to figure out where to start myself.  One of the core Virtualization engineers who's worked on it is on a public holiday (I have added him to NEEDINFO here).  And another, I can't reach him at this moment.

Also, this involves Microsoft Failover Cluster (MFSC), which potentially means a partnership / certification in this area is needed.  So these details need to be worked out.

I see from the RHV article (https://access.redhat.com/support/cases/#/case/01857927), RHV team are still working to get support for this, as evidenced by one of the bugs linked there is in ASSIGNED state).

---

Paolo: Do you have any comments here re: "SCSI 3 Persistent Reservations" in combination with Microsoft Failover Cluster (MSFC)?

Given, you've filed the following bug (and did work in this area): 

    https://bugzilla.redhat.com/show_bug.cgi?id=1219841 -- 
   [RFE] vioscsi.sys should support MS Cluster Services

Comment 2 David Peacock 2017-06-02 13:37:20 UTC
Thank you, let me know if you need anything from myself or the customer.

Comment 5 Martin Tessun 2017-06-05 22:42:12 UTC
(In reply to Kashyap Chamarthy from comment #1)
> Paolo: Do you have any comments here re: "SCSI 3 Persistent Reservations" in
> combination with Microsoft Failover Cluster (MSFC)?
> 
> Given, you've filed the following bug (and did work in this area): 
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=1219841 -- 
>    [RFE] vioscsi.sys should support MS Cluster Services

So there are quite some issues around that:
1. S3-PR from a virtio-scsi device that is added to a VM, is problematic
   for multipath devices based on FC devices (see mpathpersist and the
   need to know the reservation key therefore on the host)
2. The VMs *must* run on different nodes (compute nodes) as the reservation is
   done on host level
3. multipath needs to propagate unpriv_SGIO to its underlying devices, as
   otherwise the S3-PR commands are not forwarded to the device, as the qemu
   process is not run as root user.

For (3) there is a BZ open to the multipath team, and that should be addressed soon. It can be worked around by manually setting unpriv_SGIO to all needed devices manually

(2) can be achieved with Affinity Rules (not sure this feature exists in OSP)

(1) is most problematic, but using iSCSI works perfectly as long as the same host initiator is used across all connections on *one* host (and of course each host uses its own initiator idn).

Does this answer your question?

Comment 11 Kashyap Chamarthy 2017-09-22 13:47:01 UTC
From what I gather, there is in-progress work for the underlying 
componentS:

(1) QEMU

    As of today, the patch set is already ready for merge (I confirmed 
    with Paolo on #qemu, OFTC IRC) into upstream QEMU:

        https://lists.nongnu.org/archive/html/qemu-devel/2017-09/msg04922.html
        -- [PATCH v2 0/4] scsi, block: introduce persistent reservation
        managers

(2) libvirt

    Paolo started the libvirt design discussion in August:
    
        https://www.redhat.com/archives/libvir-list/2017-August/msg00631.html
        -- New QEMU daemon for persistent reservations

    And continued design discussion of the same thread involving libvirt
    usage / security modeling / SELinux, etc in September-2017:

        https://www.redhat.com/archives/libvir-list/2017-September/msg00206.html

(3) SELinux policy

    Notice the "Depends On" bug that Paolo added above (1484075):
   
        https://bugzilla.redhat.com/show_bug.cgi?id=1484075 -- [RFE] Add 
        S3 PR support to qemu (similar to mpathpersist)

Comment 12 Kashyap Chamarthy 2017-09-29 14:51:08 UTC
I'm closing this Nova bug, as we don't yet support it in Nova.

And the lower-layer work is still progress, as noted in comment#11.