Bug 1435335 - [BLOCKED] When a RHEVH host loses its path to storage, filesystems go to read-only. When paths come back, filesystems should recover (depends on platform bug 1436415 )
Summary: [BLOCKED] When a RHEVH host loses its path to storage, filesystems go to read...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.7
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: Nir Soffer
QA Contact: Yosi Ben Shimon
URL:
Whiteboard:
Depends On: 1436415
Blocks: CEECIR_RHV43_proposed 1558102
TreeView+ depends on / blocked
 
Reported: 2017-03-23 14:47 UTC by Greg Scott
Modified: 2022-03-22 13:06 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-13 16:52:20 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-38442 0 None None None 2022-03-22 13:06:53 UTC
Red Hat Knowledge Base (Solution) 3234761 0 None None None 2018-03-19 15:45:53 UTC
Red Hat Knowledge Base (Solution) 3386121 0 None None None 2018-03-20 18:57:28 UTC
oVirt gerrit 89135 0 master MERGED multipath: Add user devices blacklist 2021-02-09 10:00:41 UTC
oVirt gerrit 93301 0 ovirt-4.2 ABANDONED multipath: Add user devices blacklist 2021-02-09 10:00:41 UTC

Internal Links: 1436415

Description Greg Scott 2017-03-23 14:47:56 UTC
Description of problem:

When a RHEVH or RHVH host loses connectivity to its storage, file systems go to read-only.  Especially in boot from SAN scenarios.  This is expected.  But when storage connection paths come back, the file systems stay read-only.  The only cure is to reboot the rhev-h host. See the writeup at https://access.redhat.com/solutions/276283.

This is not good enough for RHEV-H, which might be hosting dozens of VMs that now can't migrate anywhere else.  And if this host happens to have the SPM role, then the entire datacenter goes non-responsive.  With no ability to recover the booted filesystem, and no ability to migrate VMs somewhere else, the only recovery is to kill existing VMs and mass-reboot all effected hosts.


Version-Release number of selected component (if applicable):

RHEVH-6, probably also RHVH-7

How reproducible:
Always

Steps to Reproduce:
1. Set up a RHEV / RHV environment with fiberchannel storage.  RHEV-H / RHV-H systems boot from SAN.
2. Disconnect all paths to storage and wait for file systems to go read-only.
3. Reconnect the SAN paths.

Actual results:

It's all over once the SAN paths go bad.  There is no recovery.


Expected results:

If we can't recover the boot file systems, then at least provide a way to migrate the VMs away.  But the best solution is, find a way to recover that boot file system so everything can continue operating.

Additional info:

We've seen this problem multiple times recently with SAN firmware upgrades and blade chassis upgrades.  The controllers and paths are all redundant, but the firmware upgrades don't wait long enough between the controller A and controller B upgrades, so all SAN paths are dead for a time.  This wreaks havoc on large environments with lots of hosts and thousands of guest VMs.

Comment 6 Greg Scott 2017-03-27 21:25:14 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1436415

Comment 7 Tal Nisan 2017-03-28 16:54:12 UTC
Targeting tentatively to 4.1.2 in case the RHEL bug will be fixed until then

Comment 8 Allon Mureinik 2017-04-25 06:39:06 UTC
This depends on a platform bug targeted to RHEL 7.4, so I'm changing the target here accordingly.

If the platform fix is backported to 7.3.z we can rethink this.

Comment 9 Ryan Barry 2017-09-19 09:17:19 UTC
Allon -- it looks like the platform bug is still NEW. Do we plan to fix this?

Comment 10 Allon Mureinik 2017-09-19 15:39:55 UTC
(In reply to Ryan Barry from comment #9)
> Allon -- it looks like the platform bug is still NEW. Do we plan to fix this?

I can't see anything we can do without the platform's fix.
Note this is currently [tentatively] targetted for RHV 4.3.

Comment 11 Marina Kalinin 2017-12-07 21:16:25 UTC
Closing this bug for now.
Once the platform bug is fixed, this will probably be fixed automatically, and if not, we will review it again. But right now nothing much we can do on the RHV side.

I attached all customer's tickets from this bug to the platform bug as well.

Comment 12 Dave Wysochanski 2018-03-14 18:23:32 UTC
An "all paths down" scenario is a common scenario in SAN storage.  To avoid data corruption (i.e. read-only filesystems) the standard solution is to use either infinite retries or very high number of retries of IO operations that time out.  For RHEL that means multipath and queue_if_no_path.

Here is some instructions on setting up multipath on root (RHEL7)
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/dm_multipath/index#move_root_to_multipath

RHEL6:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/dm_multipath/move_root_to_multipath

If for some reason this does not solve the problem please provide more details of the multipath setup you are using and why this doesn't avoid the filesystems going readonly.  At this point platform does not think this is a bug or the RFE cannot be solved or requires an unreasonable amount of work given the fact a properly configured multipath would likely have avoided this bug.

Thanks.

Comment 13 Nir Soffer 2018-03-19 15:03:15 UTC
I think the solution is to define a special rule for the multipath devices used by the host, that will ensure queuing when all paths are down.

The best way to do this is to add a drop-in configuration file like:

$ cat /etc/multipath/conf.d/host.conf
multipaths {
    multipath {
        wwid xxxyyy
        no_path_retry queue
    }
}

This should prevent the getting readonly file system (see comment 12).

This has also the bad effect when all paths are done, vdsm will get stuck on this
multipath device when running lvm commands, or when trying to write to vdsm log.

The log issue was recently fixed for bug 1516831 - vdsm should run fine when 
writing to /var/log blocks in 4.2. If the internal log queue is overloaded, log
messages are dropped.

Getting stuck on the host mutliapth device is solved by
https://gerrit.ovirt.org/#/c/89135/ - this require additional configuration on
the admin side.

We can have many other issues when the local file system blocks, but I believe
working in degraded mode for couple of minutes and recovering to full
functionallity when some paths are recovered is good enough.

The next step is reproducing this issue with the vdsm patch and proper multipath
configuration to see if move work is needed.

Comment 14 Marina Kalinin 2018-03-19 18:58:49 UTC
Nir, one thing not to forget. Maybe it will require a separate bug?
It is great if we blacklist the boot LUN or if we just change the configs for it in multipath as in bz#1558102, but what I am concerned about is how would the Admin of the system notified about a problem on the host?
In the past they would be probably notified by storage monitoring thread, but if we blacklist, then what will monitor host's health in that regard? What if the boot LUN is never coming back and we set it to infinite retry, who would be responsible alerting about it?

Comment 15 Nir Soffer 2018-04-01 23:07:58 UTC
(In reply to Marina from comment #14)
> It is great if we blacklist the boot LUN or if we just change the configs
> for it in multipath as in bz#1558102, but what I am concerned about is how
> would the Admin of the system notified about a problem on the host?

Storage monitoring only monitor LUNs used by storage domains, so it never monitored
LUNs used for the root file system.

> What if the boot LUN is never coming back and we set it to infinite retry,
> who would be responsible alerting about it?

The new multiapth alerts feature in 4.2 does monitor all LUNs, so the admin will
get events abut the the host LUNs.

See https://www.ovirt.org/develop/release-management/features/storage/multipath-events/

Comment 16 Nir Soffer 2018-04-01 23:19:47 UTC
Elad, the patch is merge in master. We need to test now the behavior of the system
when booting from SAN and the boot LUN becomes inaccessible for several minutes.

To setup this test:

1. add a multipath configuration drop-in file for the LUN used for booting.

See comment 13 for the details.

2. configure vdsm to blacklist the same LUN

Add this file:

$ cat /etc/vdsm/vdsm.conf.d/host.conf
[multipath]
blacklist = <wwid of boot lun>

And restart vdsm.

3. Activate the host and start couple of vms

4. Simulate a storage outage for 5 minutes

I think unmasking the LUN used for boot on the server should work.

Expected behavior:

- Host will function is degraded mode, as application will get stuck on the root
  file system while storage is not available.

- When storage becomes available again, the host should recover to normal 
  operation.

- The root file system should not become readonly.

Comment 17 Yaniv Kaul 2018-04-02 14:31:33 UTC
(In reply to Nir Soffer from comment #16)
> Elad, the patch is merge in master. We need to test now the behavior of the
> system
> when booting from SAN and the boot LUN becomes inaccessible for several
> minutes.
> 
> To setup this test:
> 
> 1. add a multipath configuration drop-in file for the LUN used for booting.
> 
> See comment 13 for the details.
> 
> 2. configure vdsm to blacklist the same LUN
> 
> Add this file:
> 
> $ cat /etc/vdsm/vdsm.conf.d/host.conf
> [multipath]
> blacklist = <wwid of boot lun>
> 
> And restart vdsm.
> 
> 3. Activate the host and start couple of vms
> 
> 4. Simulate a storage outage for 5 minutes
> 
> I think unmasking the LUN used for boot on the server should work.
> 
> Expected behavior:
> 
> - Host will function is degraded mode, as application will get stuck on the
> root
>   file system while storage is not available.
> 
> - When storage becomes available again, the host should recover to normal 
>   operation.
> 
> - The root file system should not become readonly.

Why are we even investing in this? There are enough other loggers that might be trying to log on the root fs and will become stuck?

Comment 18 Nir Soffer 2018-04-02 14:40:33 UTC
(In reply to Yaniv Kaul from comment #17)
> Why are we even investing in this? There are enough other loggers that might
> be trying to log on the root fs and will become stuck?

There is a need to survive storage outage without a reboot.

Current vdsm in master should not be affected by blocked /var/log or by the host
LUN if using the new multipath:blacklist option.

We want to know if more work is needed to survive such event with minimal damage.

Comment 19 Elad 2018-04-11 09:26:52 UTC
Hi Nir, as the patch is merged in master, we'll test it when it'll move to ON_QA

Comment 20 Nir Soffer 2018-04-11 09:31:57 UTC
(In reply to Elad from comment #19)
> Hi Nir, as the patch is merged in master, we'll test it when it'll move to
> ON_QA

This requires backporting the patch to 4.2. We like to test this with master before
we backport if possible.

Comment 21 Elad 2018-04-11 12:04:02 UTC
OK, we'll do our best to test it in the upcoming days.

Comment 22 Elad 2018-04-16 15:01:03 UTC
Yosi, please take a look

Comment 23 Elad 2018-07-08 08:08:32 UTC
Hi Nir, is this request still relevant?

Comment 24 Nir Soffer 2018-07-10 20:15:41 UTC
(In reply to Elad from comment #23)
> Hi Nir, is this request still relevant?

Yes.

Comment 27 Nir Soffer 2018-07-31 21:35:15 UTC
Yosi, the backport is ready, you can test it.

Comment 28 Yosi Ben Shimon 2018-08-21 08:23:33 UTC
Hi Nir,
I'm still testing this bug. It takes more time than I thought it would.

Comment 30 Nir Soffer 2018-10-09 16:39:53 UTC
Yosi, what kind of info do yo need?

Comment 31 Sandro Bonazzola 2019-01-28 09:41:02 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 36 Nir Soffer 2019-03-17 13:23:11 UTC
Tal, the vdsm part is merged and available in 4.3. Since QE do not have
capacity to test this I suggest to close as CURRENTRELEASE.


Note You need to log in before you can comment on or make changes to this bug.