Bug 1370564

Summary: require ioprocess 0.15.2 in vdsm spec to fix NFS and POSIX storage domain not unmounted cleanly due to open ioprocesses
Product: Red Hat Enterprise Virtualization Manager Reporter: Allie DeVolder <adevolder>
Component: vdsmAssignee: Nir Soffer <nsoffer>
Status: CLOSED ERRATA QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5.7CC: acanan, adevolder, amureini, bazulay, gklein, kgoldbla, lsurette, melewis, mkalinin, nsoffer, ratamir, srevivo, tnisan, ycui, ykaul, ylavi
Target Milestone: ovirt-3.6.10Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, ioprocess helper was keeping an open file on shared storage while it was running. This meant that the host could not mount the storage domain. Now, a version of ioprocess that fixes this issue is required. This means that ioprocess no longer keeps files open the shared storage and the mount will succeed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-17 18:06:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1371634    
Bug Blocks:    

Description Allie DeVolder 2016-08-26 15:30:05 UTC
Description of problem:
After I put an hypervisor in maintenance mode, activating the hypervisor again results in "Non-operational" state as the POSIX storage is not able to mount due to open file handle.

Version-Release number of selected component (if applicable):
3.5 with RHEL 6.8 hosts, as well as RHEL 7.2 hosts. (We only have data from the 6.8 hosts)

How reproducible:
100% according to customer

Steps to Reproduce:
1. Place a host in maintenance mode
2. Attempt to activate host

Actual results:
Failure to activate due to unclean unmount

Expected results:
Clean activation

Additional info:

/tmp/1 is the lsof from an active host and /tmp/2 is the lsof from a host in the maintenance mode. 

[root@acorehost103 ~]# grep DIRECT /tmp/1
ioprocess 17188    vdsm    0r      REG               0,30        0      84992 /rhev/data-center/mnt/_dev_gcegpfs01/__DIRECT_IO_TEST__
ioprocess 17213    vdsm    0r      REG               0,29        0     164864 /rhev/data-center/mnt/_dev_gcegpfs02/__DIRECT_IO_TEST__
ioprocess 31749    vdsm    0r      REG               0,28        0   29159682 /rhev/data-center/mnt/gpfs16.rcf.bnl.gov:_gpfs02_admin_New__RHEV/__DIRECT_IO_TEST__ (gpfs16.rcf.bnl.gov:/gpfs02/admin/New_RHEV)
[root@acorehost103 ~]# grep DIRECT /tmp/2
ioprocess 17188    vdsm    0r      REG               0,30        0      84992 /__DIRECT_IO_TEST__
ioprocess 17213    vdsm    0r      REG               0,29        0     164864 /__DIRECT_IO_TEST__
ioprocess 31749    vdsm    0r      REG               0,28        0   29159682 __DIRECT_IO_TEST__
[root@acorehost103 ~]#

System recovers if vdsm is manually restarted on the hypervisor.

Comment 1 Fabian Deutsch 2016-08-26 17:49:08 UTC
Currently I don't see a Node impact, thus moving to storage.

Comment 2 Yaniv Kaul 2016-08-26 18:18:02 UTC
Does it happen with 3.6?

Comment 3 Allon Mureinik 2016-08-28 08:00:49 UTC
Nir - shouldn't this be covered by your work on ioprocess leakage?

Comment 4 Nir Soffer 2016-08-29 16:10:35 UTC
This is a duplicate of bug 1339777, fixed in ioprocess-0.16.1.

The fix was not backported to ioprocess-0.15, available in rhev-3.6 chanel, but
we can backport it if needed (trivial fix).

Comment 5 Nir Soffer 2016-08-29 17:10:07 UTC
Backport is here: https://gerrit.ovirt.org/62953

Comment 6 Nir Soffer 2016-08-30 17:08:55 UTC
This depends on ioprocess bug 1371634, we can require the ioprocess version
fixing this when it is available.

Comment 8 Aharon Canan 2016-09-05 11:21:38 UTC
This bug is for requiring newer ioprocess pkg right ? 

If so, Can you please add the version we want to require ?

Comment 9 Nir Soffer 2016-09-06 15:29:30 UTC
We need 0.15.2, see comment 7.

Comment 10 Allon Mureinik 2016-09-06 15:44:57 UTC
*** Bug 1373491 has been marked as a duplicate of this bug. ***

Comment 16 Nir Soffer 2016-09-14 14:15:17 UTC
How to verify:

I don't think we can reproduce the mount issue described in the description. We
can easily reproduce the open __DIRECT_IO_TEST__ file, but we already did
when verifying bug 1371634.

So the only think we want to verify here is that when installing vdsm on a system
running ioprocess < 0.15.2, iprocess 0.15.2 in installed during vdsm
upgrade/installation.

We can also check that after vdsm upgrade, none of the ioprocess instances has
an open file on shared storage (using lsof).

Comment 19 errata-xmlrpc 2017-01-17 18:06:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0109.html