Description of problem: When a RHEVH or RHVH host loses connectivity to its storage, file systems go to read-only. Especially in boot from SAN scenarios. This is expected. But when storage connection paths come back, the file systems stay read-only. The only cure is to reboot the rhev-h host. See the writeup at https://access.redhat.com/solutions/276283. This is not good enough for RHEV-H, which might be hosting dozens of VMs that now can't migrate anywhere else. And if this host happens to have the SPM role, then the entire datacenter goes non-responsive. With no ability to recover the booted filesystem, and no ability to migrate VMs somewhere else, the only recovery is to kill existing VMs and mass-reboot all effected hosts. Version-Release number of selected component (if applicable): RHEVH-6, probably also RHVH-7 How reproducible: Always Steps to Reproduce: 1. Set up a RHEV / RHV environment with fiberchannel storage. RHEV-H / RHV-H systems boot from SAN. 2. Disconnect all paths to storage and wait for file systems to go read-only. 3. Reconnect the SAN paths. Actual results: It's all over once the SAN paths go bad. There is no recovery. Expected results: If we can't recover the boot file systems, then at least provide a way to migrate the VMs away. But the best solution is, find a way to recover that boot file system so everything can continue operating. Additional info: We've seen this problem multiple times recently with SAN firmware upgrades and blade chassis upgrades. The controllers and paths are all redundant, but the firmware upgrades don't wait long enough between the controller A and controller B upgrades, so all SAN paths are dead for a time. This wreaks havoc on large environments with lots of hosts and thousands of guest VMs.
See https://bugzilla.redhat.com/show_bug.cgi?id=1436415
Targeting tentatively to 4.1.2 in case the RHEL bug will be fixed until then
This depends on a platform bug targeted to RHEL 7.4, so I'm changing the target here accordingly. If the platform fix is backported to 7.3.z we can rethink this.
Allon -- it looks like the platform bug is still NEW. Do we plan to fix this?
(In reply to Ryan Barry from comment #9) > Allon -- it looks like the platform bug is still NEW. Do we plan to fix this? I can't see anything we can do without the platform's fix. Note this is currently [tentatively] targetted for RHV 4.3.
Closing this bug for now. Once the platform bug is fixed, this will probably be fixed automatically, and if not, we will review it again. But right now nothing much we can do on the RHV side. I attached all customer's tickets from this bug to the platform bug as well.
An "all paths down" scenario is a common scenario in SAN storage. To avoid data corruption (i.e. read-only filesystems) the standard solution is to use either infinite retries or very high number of retries of IO operations that time out. For RHEL that means multipath and queue_if_no_path. Here is some instructions on setting up multipath on root (RHEL7) https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/dm_multipath/index#move_root_to_multipath RHEL6: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/dm_multipath/move_root_to_multipath If for some reason this does not solve the problem please provide more details of the multipath setup you are using and why this doesn't avoid the filesystems going readonly. At this point platform does not think this is a bug or the RFE cannot be solved or requires an unreasonable amount of work given the fact a properly configured multipath would likely have avoided this bug. Thanks.
I think the solution is to define a special rule for the multipath devices used by the host, that will ensure queuing when all paths are down. The best way to do this is to add a drop-in configuration file like: $ cat /etc/multipath/conf.d/host.conf multipaths { multipath { wwid xxxyyy no_path_retry queue } } This should prevent the getting readonly file system (see comment 12). This has also the bad effect when all paths are done, vdsm will get stuck on this multipath device when running lvm commands, or when trying to write to vdsm log. The log issue was recently fixed for bug 1516831 - vdsm should run fine when writing to /var/log blocks in 4.2. If the internal log queue is overloaded, log messages are dropped. Getting stuck on the host mutliapth device is solved by https://gerrit.ovirt.org/#/c/89135/ - this require additional configuration on the admin side. We can have many other issues when the local file system blocks, but I believe working in degraded mode for couple of minutes and recovering to full functionallity when some paths are recovered is good enough. The next step is reproducing this issue with the vdsm patch and proper multipath configuration to see if move work is needed.
Nir, one thing not to forget. Maybe it will require a separate bug? It is great if we blacklist the boot LUN or if we just change the configs for it in multipath as in bz#1558102, but what I am concerned about is how would the Admin of the system notified about a problem on the host? In the past they would be probably notified by storage monitoring thread, but if we blacklist, then what will monitor host's health in that regard? What if the boot LUN is never coming back and we set it to infinite retry, who would be responsible alerting about it?
(In reply to Marina from comment #14) > It is great if we blacklist the boot LUN or if we just change the configs > for it in multipath as in bz#1558102, but what I am concerned about is how > would the Admin of the system notified about a problem on the host? Storage monitoring only monitor LUNs used by storage domains, so it never monitored LUNs used for the root file system. > What if the boot LUN is never coming back and we set it to infinite retry, > who would be responsible alerting about it? The new multiapth alerts feature in 4.2 does monitor all LUNs, so the admin will get events abut the the host LUNs. See https://www.ovirt.org/develop/release-management/features/storage/multipath-events/
Elad, the patch is merge in master. We need to test now the behavior of the system when booting from SAN and the boot LUN becomes inaccessible for several minutes. To setup this test: 1. add a multipath configuration drop-in file for the LUN used for booting. See comment 13 for the details. 2. configure vdsm to blacklist the same LUN Add this file: $ cat /etc/vdsm/vdsm.conf.d/host.conf [multipath] blacklist = <wwid of boot lun> And restart vdsm. 3. Activate the host and start couple of vms 4. Simulate a storage outage for 5 minutes I think unmasking the LUN used for boot on the server should work. Expected behavior: - Host will function is degraded mode, as application will get stuck on the root file system while storage is not available. - When storage becomes available again, the host should recover to normal operation. - The root file system should not become readonly.
(In reply to Nir Soffer from comment #16) > Elad, the patch is merge in master. We need to test now the behavior of the > system > when booting from SAN and the boot LUN becomes inaccessible for several > minutes. > > To setup this test: > > 1. add a multipath configuration drop-in file for the LUN used for booting. > > See comment 13 for the details. > > 2. configure vdsm to blacklist the same LUN > > Add this file: > > $ cat /etc/vdsm/vdsm.conf.d/host.conf > [multipath] > blacklist = <wwid of boot lun> > > And restart vdsm. > > 3. Activate the host and start couple of vms > > 4. Simulate a storage outage for 5 minutes > > I think unmasking the LUN used for boot on the server should work. > > Expected behavior: > > - Host will function is degraded mode, as application will get stuck on the > root > file system while storage is not available. > > - When storage becomes available again, the host should recover to normal > operation. > > - The root file system should not become readonly. Why are we even investing in this? There are enough other loggers that might be trying to log on the root fs and will become stuck?
(In reply to Yaniv Kaul from comment #17) > Why are we even investing in this? There are enough other loggers that might > be trying to log on the root fs and will become stuck? There is a need to survive storage outage without a reboot. Current vdsm in master should not be affected by blocked /var/log or by the host LUN if using the new multipath:blacklist option. We want to know if more work is needed to survive such event with minimal damage.
Hi Nir, as the patch is merged in master, we'll test it when it'll move to ON_QA
(In reply to Elad from comment #19) > Hi Nir, as the patch is merged in master, we'll test it when it'll move to > ON_QA This requires backporting the patch to 4.2. We like to test this with master before we backport if possible.
OK, we'll do our best to test it in the upcoming days.
Yosi, please take a look
Hi Nir, is this request still relevant?
(In reply to Elad from comment #23) > Hi Nir, is this request still relevant? Yes.
Yosi, the backport is ready, you can test it.
Hi Nir, I'm still testing this bug. It takes more time than I thought it would.
Yosi, what kind of info do yo need?
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
Tal, the vdsm part is merged and available in 4.3. Since QE do not have capacity to test this I suggest to close as CURRENTRELEASE.