Bug 1310330 - [RFE] Provide a way to remove stale LUNs from hypervisors
[RFE] Provide a way to remove stale LUNs from hypervisors
Status: NEW
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: RFEs (Show other bugs)
4.0.0
All All
high Severity high
: ovirt-4.2.0
: ---
Assigned To: Allon Mureinik
Raz Tamir
: FutureFeature, Reopened
Depends On:
Blocks: 1417161
  Show dependency treegraph
 
Reported: 2016-02-20 08:13 EST by Greg Scott
Modified: 2017-11-16 04:21 EST (History)
33 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-14 04:51:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ratamir: testing_plan_complete-


Attachments (Terms of Use)
domain dialog with warnings about unzoned iSCSI LUN (48.47 KB, image/png)
2016-10-04 11:36 EDT, Tim Speetjens
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 880738 None None None 2016-02-20 08:13 EST
Red Hat Knowledge Base (Solution) 129983 None None None 2016-11-07 19:59 EST

  None (edit)
Description Greg Scott 2016-02-20 08:13:56 EST
Description of problem:
Bug number 880738 asks for the capability for RHEV-M to orchestrate getting rid of stale LUNs after removing a storage domain.  That RFE was satisfied in 3.6.1 by reducing the impact of stale LUNs, but did not implement automating stale LUN removal.  It was requested that we put together this RFE, asking for that same capability.  

Version-Release number of selected component (if applicable):
3.n, 4.n

How reproducible:
At will

Steps to Reproduce:
1. Remove a storage domain
2. Remove the associated LUNs advertised by the SAN
3. RHEV hypervisors still have references to these now non-existant LUNs.
4. Either log into each hypervisor individually and remove these stale LUNs by hand, or put hypervisors into maintenance mode one by one and do a rolling reboot.

Actual results:
Hours of wasted time.  With 68 hypervisors and 2500 VMs, 2 minutes to live-migrate each VM, 5 minutes to reboot each hypervisor, plus overhead, the time to clean up after removing a storage domain adds up to around 6000 minutes for each cycle.  For customers who need to rotate storage domains regularly, the unnecessary overhead is unacceptable.  

Removing all the paths and LUNs by hand on each hypervisor may take less time once the UUID paths are identified, but the process is error prone and requires an unacceptably high skill level.

Expected results:
Pasting word for word from bug number 880738:
If RHEV-H is supposed to be an appliance managed by RHEV-M, then RHEV-M should also be orchestrating the storage removal as well, all the way down to removing the paths.

Additional info:
The original bug number 880738 links to 39 support cases.  Adding this capability will save countless support and customer hours.
Comment 1 Nir Soffer 2016-02-20 16:35:38 EST
(In reply to Greg Scott from comment #0)
> Steps to Reproduce:
> 1. Remove a storage domain
> 2. Remove the associated LUNs advertised by the SAN
> 3. RHEV hypervisors still have references to these now non-existant LUNs.
> 4. Either log into each hypervisor individually and remove these stale LUNs
> by hand, or put hypervisors into maintenance mode one by one and do a
> rolling reboot.
> 
> Actual results:
> Hours of wasted time.  With 68 hypervisors and 2500 VMs, 2 minutes to
> live-migrate each VM, 5 minutes to reboot each hypervisor, plus overhead,
> the time to clean up after removing a storage domain adds up to around 6000
> minutes for each cycle.  

You don't need to live migrate vms or reboot a host to remove stale devices.

> For customers who need to rotate storage domains
> regularly, the unnecessary overhead is unacceptable.

Wy do you need to rotate storage domains regularly?

What is regularly?

> Removing all the paths and LUNs by hand on each hypervisor may take less
> time once the UUID paths are identified, but the process is error prone and
> requires an unacceptably high skill level.

The storage administrator who provided the luns in the first place has
all the info the remove them - lun guid. Using the guid, you can find
the underlying devices and remove the multipath device and the underlying
scsi devices.

There is nothing RHEV specific about the stale devices, they are not used
by RHEV at this point. The procedure is the same for for any host that
has stale devices.

I suggest to start by documenting this procedure for RHEV customers.

Maybe a tool for removing stale devices can be used to prevent errors?

> Expected results:
> Pasting word for word from bug number 880738:
> If RHEV-H is supposed to be an appliance managed by RHEV-M, then RHEV-M
> should also be orchestrating the storage removal as well, all the way down
> to removing the paths.

RHEV-M does not control adding the devices to the hypervisors, so it is 
arguable if it should control removal of the devices. The removal process
can be automated by other means.

RHEV-H may be harder to automate, but this should be solved by RHEV-H.

Fabian, do we have a solution for RHEV-H for automating operations in a cluster?
Comment 2 Greg Scott 2016-02-20 18:01:13 EST
> Why do you need to rotate storage domains regularly?
>
> What is regularly?

Once per month.  The application uses a few thousand Windows 7 VMs in pools, created from a template.  Every month the template is patched and new pools and VMs created from the newly patched template.  At least until 3.6, they had to do a new storage domain and get rid of the old storage domain every month to keep enough contiguous free space in the SAN. Doing new pools and VMs in the same storage domain apparently created too much fragmentation, so Red Hat recommended doing it with new storage domains into a new set of LUNs.

So every month, the customer has to go through this process involving hours and hours and hours of manual work.

> RHEV-M does not control adding the devices to the hypervisors . . .

OK, this is technically true with fiber channel storage domains, but not for iSCSI or NFS.  For iSCSI and NFS, RHEV-M tells RHEV-H to everything it needs to login or mount and set up the correct LVM entities.  For fiber channel, rhev-m already "sees" the LUN and rhev-m tells rhev-h to do the rest.  OK, fair enough.

So when I tell rhev-m to tear down a storage domain, rhev-m knows everything it needs to instruct the SPM host to take care of business, including all the fiberchannel WWID info.  It should be possible to store that info somewhere, so when the storage administrator gets rid of the underlying LUN, I can tell rhev-m to tell all the rhev-h systems to get rid of references to the now stale LUN. 

Why do it? Well, we did tell lots of paying customer we were going to do it back in 2013.  That seems like a pretty good reason to me.
Comment 3 Greg Scott 2016-02-21 00:40:53 EST
Typo above and bz doesn't let me edit comments. This sentence:

>  For fiber channel, rhev-m already "sees" the LUN and rhev-m tells rhev-h to do
> the rest.

Should say

For fiber channel, rhev-h already "sees" the LUN and rhev-m tells rhev-h to do the rest.

Now it makes sense.

- Greg
Comment 5 Fabian Deutsch 2016-02-22 10:55:14 EST
(In reply to Nir Soffer from comment #1)
> (In reply to Greg Scott from comment #0)> > Expected results:
> > Pasting word for word from bug number 880738:
> > If RHEV-H is supposed to be an appliance managed by RHEV-M, then RHEV-M
> > should also be orchestrating the storage removal as well, all the way down
> > to removing the paths.
> 
> RHEV-M does not control adding the devices to the hypervisors, so it is 
> arguable if it should control removal of the devices. The removal process
> can be automated by other means.
> 
> RHEV-H may be harder to automate, but this should be solved by RHEV-H.
> 
> Fabian, do we have a solution for RHEV-H for automating operations in a
> cluster?

Today Node itself does not do anything with storage.

FCoE/FC is getting configured manually.
iSCSI is connected using vdsm.

At the bottom line I'd expect vdsm to do the automation, or leave it for the admin to do it manually. At least I don't see a point where Node should do something.
Comment 6 Pavel Zhukov 2016-02-23 08:52:25 EST
Doesn't multipath's option  "deferred_remove yes" help here? https://bugzilla.redhat.com/show_bug.cgi?id=631009
Once SD is removed from RHEV and unzoned (all paths are failed) multipath will take care about all underlying devices and remove them.
Comment 7 Marina 2016-02-23 12:01:38 EST
Yaniv,
Bringing this RFE to your attention please.
Comment 9 Nir Soffer 2016-03-05 15:44:20 EST
Ben, would the fix for bug 631009 (deferred_remove yes), will resolve this
issue?

The use case is this:

1. An unused multipath device on a host is unzoned on the storage
   server.

2. All the paths on this device becomes faulty, since the server 
   does not exposed this LUN to this host now

3. using this multipath configuration:

defaults {
    polling_interval            5
    no_path_retry               fail
    user_friendly_names         no
    flush_on_last_del           yes
    fast_io_fail_tmo            5
    dev_loss_tmo                30
    max_fds                     4096
    deffered_remove             yes
}

devices {
    device {
        all_devs                yes
        no_path_retry           fail
    }
}

After some timeout (dev_loss_tmo?), paths are removed from the system,
and the multipath device is removed?

Or this only helps if you manually delete the faulty paths (I don't remember
seeing paths removed automatically).
Comment 10 Ben Marzinski 2016-03-07 14:53:48 EST
deferred_remove won't help remove the faulty paths at all. The system should remove them after dev_loss_tmo has passed.  Once all paths to a multipath device are removed, the device itself should be removed.  Setting deferred_remove helps in cases where the device is open when multipath tries to remove it.  In this case, multipath can't remove the device, so it starts a deferred remove.  When the device is finally closed, it is automatically removed by device-mapper.

But that doesn't sound like the case you are seeing.  It sounds like the paths are failing but not being automatically removed.  This happens under multipath, in the scsi layer. dev_loss_tmo only causes a scsi device to be removed if there is a loss of connection, AFAIK. It sounds to me like the errors that the scsi device is reporting are not causing the scsi-layer to automatically remove it.
In this case, there is nothing in multipath to force remove device that have been
failed for too long.
Comment 16 Tim Speetjens 2016-10-04 11:36 EDT
Created attachment 1207249 [details]
domain dialog with warnings about unzoned iSCSI LUN

Both lines represent a LUN that was attached before, but now is unzoned. The orange one was an SD, but now removed. The other was never used, only zoned, then unzoned.
Comment 22 Greg Scott 2017-01-17 12:19:43 EST
Let's not get hung up on the word, "remove."  If I'm following this, the challenge is, the raw LUNs will still exist immediately after getting rid of a storage domain, at least until the storage admin gets rid of them. How about this:

The RHEV admin tells RHEVM to get rid of a storage domain.

If the storage domain is block (FC or FCOE or iSCSI), then RHVM tells all the RHV-H systems to treat the LUNs that used to be part of that storage domain as just raw LUNs if they still exist, or just get rid of references to them if the LUNs no longer exist.

But thinking this through, there's a timing challenge.  After RHVM gets rid of the storage domain, maybe the SAN administrator gets rid of the raw LUNs, maybe not.  So RHVM needs to "know" about all the LUNs from the hosts' point of view.  And then once the storage admin gets rid of the raw LUNs, RHVM can tell the hypervisors to get rid of their references to the now stale raw LUNs.

But that gets ugly - now we have a manager tracking LUN objects over which it has no control.

So what about this - the manager gets rid of the storage domain as before. Add a GUI element and API for RHVM to tell all the hosts later on to re-enumerate all the LUNs they see, which should clean up stale LUNs after the storage admin removes them from the SAN. So the steps would be:

1 - The RHEV admin tells RHVM to get rid of the storage domain.
2 - Later on, the SAN admin gets rid of the raw LUNs from the SAN point of view.
3 - The RHEV admin clicks the RHVM GUI button to tell the hosts to clean up their stale LUNs.

- Greg
Comment 23 Nir Soffer 2017-01-17 12:34:08 EST
Here is possible way RHV can help to remove devices from hypervisors.

1. System administrator unzone the devices on the storage server

The system administrator must do this before trying to removing the devices from
a RHV setup.

RHV is not responsible for adding or removing devices, only for *discovering*
devices added by the system administrator.

To be responsible for removing devices, RHV must have control of the storage
server, similar to OpenStack Cinder.

2. System administrator select the devices to remove

The system will show the available devices available using the same way we
show devices for creating new storage domain, using a host selected by
the system administrator (Host.getDeviceList).

3. System send request to remove the devices to all connected hosts

The system will first send a request to all hosts except the host selected
for enumerating the devices. If removal was successful (specified devices are
not available on a host) on all hosts, remove the devices on the host selected
for enumerating the devices. Finally remove the devices from RHV database.

Notes:

- You cannot remove devices from the setup if the devices are not available
  on the host selected for enumerating devices.

- System cannot remove devices from hosts which are not connected.

- If the devices were not unzoned on the storage server, they will appear 
  again on all hosts once we perform the next scsi scan, and be added to 
  RHV database on the next creation/edit of storage domain.

This requires adding new vdsm api, new UI and flow in engine similar to 
resizing of a device. In this flow the user select a device and and the system
send a request to all hosts for resizing the device.
Comment 25 Yaniv Kaul 2017-02-22 03:53:04 EST
(In reply to Nir Soffer from comment #23)
> Here is possible way RHV can help to remove devices from hypervisors.
> 
> 1. System administrator unzone the devices on the storage server
> 
> The system administrator must do this before trying to removing the devices
> from
> a RHV setup.
> 
> RHV is not responsible for adding or removing devices, only for *discovering*
> devices added by the system administrator.

When is discovery taking place? If it's only a user initiated action, then I assume we can use 'rescan-scsi-bus.sh' with '-a -r' (and perhaps '-m' as well)

> 
> To be responsible for removing devices, RHV must have control of the storage
> server, similar to OpenStack Cinder.
> 
> 2. System administrator select the devices to remove
> 
> The system will show the available devices available using the same way we
> show devices for creating new storage domain, using a host selected by
> the system administrator (Host.getDeviceList).

The available device is seen based on Engine data or data from VDSM? Was the device unzoned already? I assume not - where exactly is the step the storage admin unzones it, so it won't be re-discovered?


> 
> 3. System send request to remove the devices to all connected hosts
> 
> The system will first send a request to all hosts except the host selected
> for enumerating the devices. If removal was successful (specified devices are
> not available on a host) on all hosts, remove the devices on the host
> selected
> for enumerating the devices. Finally remove the devices from RHV database.
> 
> Notes:
> 
> - You cannot remove devices from the setup if the devices are not available
>   on the host selected for enumerating devices.

ACK - that means it is seen by that host, not from Engine DB?

> 
> - System cannot remove devices from hosts which are not connected.

Agreed. What happens when they come back?

> 
> - If the devices were not unzoned on the storage server, they will appear 
>   again on all hosts once we perform the next scsi scan, and be added to 
>   RHV database on the next creation/edit of storage domain.

Makes sense as well - but I'd like to make sure this is the only time of re-discovery - what happens when a server reboots or goes back from maintenance to up? It is seen on the host, but not in Engine?

> 
> This requires adding new vdsm api, new UI and flow in engine similar to 
> resizing of a device. In this flow the user select a device and and the
> system
> send a request to all hosts for resizing the device.



Yaniv D. - this is assigned to Allon, but not targeted yet to 4.2?
Comment 26 Nir Soffer 2017-02-22 04:51:19 EST
(In reply to Yaniv Kaul from comment #25)
> (In reply to Nir Soffer from comment #23)
> > Here is possible way RHV can help to remove devices from hypervisors.
> > 
> > 1. System administrator unzone the devices on the storage server
> > 
> > The system administrator must do this before trying to removing the devices
> > from
> > a RHV setup.
> > 
> > RHV is not responsible for adding or removing devices, only for *discovering*
> > devices added by the system administrator.
> 
> When is discovery taking place? If it's only a user initiated action, then I
> assume we can use 'rescan-scsi-bus.sh' with '-a -r' (and perhaps '-m' as
> well)

We are not using rescan-scsi-bus.sh but iscsiadm and our helper for scanning
fc (/usr/libexec/vdsm/fc-scan).

I'm not sure that using rescan-scsi-bus.sh is good idea, it does too much
things that may not be wanted or safe for us.

Scanning is done by the system without user interaction.

> > To be responsible for removing devices, RHV must have control of the storage
> > server, similar to OpenStack Cinder.
> > 
> > 2. System administrator select the devices to remove
> > 
> > The system will show the available devices available using the same way we
> > show devices for creating new storage domain, using a host selected by
> > the system administrator (Host.getDeviceList).
> 
> The available device is seen based on Engine data or data from VDSM? 

Based on what vdsm reports, and what engine knows about the devices
(for example, it will gray out used devices).

> Was the
> device unzoned already? I assume not - where exactly is the step the storage
> admin unzones it, so it won't be re-discovered?

It should be unzoned at this point, see step 1.

> > 3. System send request to remove the devices to all connected hosts
> > 
> > The system will first send a request to all hosts except the host selected
> > for enumerating the devices. If removal was successful (specified devices are
> > not available on a host) on all hosts, remove the devices on the host
> > selected
> > for enumerating the devices. Finally remove the devices from RHV database.
> > 
> > Notes:
> > 
> > - You cannot remove devices from the setup if the devices are not available
> >   on the host selected for enumerating devices.
> 
> ACK - that means it is seen by that host, not from Engine DB?

Yes.

We can use engine DB as well, but we must support the case when you lost
your engine DB, or you restored to older version and it does not reflect the 
the storage. The truth is what we actually see on storage.

If we want to make this more robust (and complex), we can ask all hosts to
return the device list in the same time and merge the results, marking devices
that are not available on all hosts.

> > - System cannot remove devices from hosts which are not connected.
> 
> Agreed. What happens when they come back?

In the simplest solution, nothing, you have to open the dialog again
using a host that see the device and ask to remove the device again.

In the more complex solution, the system the system will ask the host
to remove the device when it comes back online. This is how we implement
ceph secrets, each time we connect, the host get the list of secrets is
must keep. Any secrets not in this list be will removed, and new secrets
will be added.  We can do similar thing by sending list of devices a host
should see when connecting a host to storage.

In the Kubernetes world, the host can get the list of devices from etcd
remove unneeded devices, or rescan the bus to find devices which are
not available on the host, but listed in etcd.

I suggest we start with the simplest possible solution, which will be much
better than no solution.
 
> > - If the devices were not unzoned on the storage server, they will appear 
> >   again on all hosts once we perform the next scsi scan, and be added to 
> >   RHV database on the next creation/edit of storage domain.
> 
> Makes sense as well - but I'd like to make sure this is the only time of
> re-discovery - what happens when a server reboots or goes back from
> maintenance to up? It is seen on the host, but not in Engine?

Currently vdsm will discover devices when vdsm starts, when connecting
to storage server, when looking up domain in the domain cache failed, etc.
Comment 41 Gianluca Cecchi 2017-10-06 10:33:37 EDT
Hello,
Any chance to have some solution for 4.2? Or any testing for 4.1.6?
Could it be an option to have a solution for RHEV and FC bases storage domains, similar to OS version
https://access.redhat.com/solutions/20063
but richer, due to involvement of different hosts?
Thanks
Comment 42 Yaniv Kaul 2017-10-06 14:45:13 EDT
(In reply to Gianluca Cecchi from comment #41)
> Hello,
> Any chance to have some solution for 4.2? Or any testing for 4.1.6?
> Could it be an option to have a solution for RHEV and FC bases storage
> domains, similar to OS version
> https://access.redhat.com/solutions/20063
> but richer, due to involvement of different hosts?
> Thanks

I've been looking at implementing this in Ansible (where the input is the wwn).
Of course, it takes me a while as it's not my day to day work on oVirt.
I'm still hoping to complete it soon.

Note You need to log in before you can comment on or make changes to this bug.