Cinder introduced "shared_targets" and "service_uuid" fields in volumes to allow volume consumers to protect themselves from unintended leftover devices when handling iSCSI connections with shared targets. The way to protect from the automatic scans that happen on detach/map race conditions is by locking and only allowing one attach or one detach operation for each server to happen at a given time. When using an up to date Open iSCSI initiator we don't need to use locks, as it has the possibility to disable automatic LUN scans (which are the real cause of the leftover devices), and OS-Brick already supports this feature. Currently Nova is blindly locking whenever "shared_targets" is set to True, even when the iSCSI initiator and OS-Brick are already preventing such races, which introduces unnecessary serialization on the connection of volumes. Current code in Nova will also serialize all connections for non iSCSI connections, like RBD, which don't report "shared_targets", because it doesn't mean anything to them.
I've slept since I discussed this last. Remind me what shared targets means in the context of iscsi? IIRC shared targets means you attach 1 thing and you get all the things. E.g. when you mount an NFS export you get all the volumes on that export even if you only need 1. Is shared targets the same with iscsi?
Yes, it is a similar concept. Some iSCSI backends have a 1 to 1 relationship between the iSCSI target-portal and volume/LUN (in this case the iSCSI initiator needs to login for each one), whereas others share the same target-portal for all volumes/LUNs (we login once and we have all the LUNs that get mapped in there). There was a race condition for shared targets between the mapping/unmapping at the backend and the attach/detach in the host caused by iSCSI AEN/AER and the Open iSCSI initiator behavior that resulted in leftover devices on the host. To fix it I added a feature to the Open iSCSI initiator and support for it in OS-Brick (backporting it downstream all the way back to OSP8), so at RH we no longer had this issues, and upstream anyone using a modern iSCSI initiator would not have it either. About 6 months later Nova added a big lock around the mapping/unmapping + attaching/detaching operations that serialized them for all the backends that didn't report that they didn't have shared targets (the driver set it to True or didn't set it) regardless of whether the type of transport protocol used with the backend (shared targets mean nothing to backends like Ceph or FC backends), it also didn't care if the initiator running on the host had the new feature that didn't require the lock. So the lock is really only necessary if you are doing iSCSI and the initiator doesn't have the manual scans feature, and that is what the os-brick context manager does.
Oh, nice! Locking in os-brick had previously been hard NAKed. This was always where it needed to live as the locking requirements are specific to not just the specific backend, but also the bugs present in that backend. Attempting to doing it in Nova, or even worse c-vol, made this a minefield. I suggest you might want to add some intent to the interface, e.g.: brick_utils.guard_attach(volume), and brick_utils.guard_detach(volume) In the first instance these can both be aliases for guard_connection, but IIRC there was at least 1 driver which could handle concurrency in one operation but not the other. Anyway, assuming the locking is correct in os-brick I'm all in favour of this in Nova.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543