Description of problem: After I put an hypervisor in maintenance mode, activating the hypervisor again results in "Non-operational" state as the POSIX storage is not able to mount due to open file handle. Version-Release number of selected component (if applicable): 3.5 with RHEL 6.8 hosts, as well as RHEL 7.2 hosts. (We only have data from the 6.8 hosts) How reproducible: 100% according to customer Steps to Reproduce: 1. Place a host in maintenance mode 2. Attempt to activate host Actual results: Failure to activate due to unclean unmount Expected results: Clean activation Additional info: /tmp/1 is the lsof from an active host and /tmp/2 is the lsof from a host in the maintenance mode. [root@acorehost103 ~]# grep DIRECT /tmp/1 ioprocess 17188 vdsm 0r REG 0,30 0 84992 /rhev/data-center/mnt/_dev_gcegpfs01/__DIRECT_IO_TEST__ ioprocess 17213 vdsm 0r REG 0,29 0 164864 /rhev/data-center/mnt/_dev_gcegpfs02/__DIRECT_IO_TEST__ ioprocess 31749 vdsm 0r REG 0,28 0 29159682 /rhev/data-center/mnt/gpfs16.rcf.bnl.gov:_gpfs02_admin_New__RHEV/__DIRECT_IO_TEST__ (gpfs16.rcf.bnl.gov:/gpfs02/admin/New_RHEV) [root@acorehost103 ~]# grep DIRECT /tmp/2 ioprocess 17188 vdsm 0r REG 0,30 0 84992 /__DIRECT_IO_TEST__ ioprocess 17213 vdsm 0r REG 0,29 0 164864 /__DIRECT_IO_TEST__ ioprocess 31749 vdsm 0r REG 0,28 0 29159682 __DIRECT_IO_TEST__ [root@acorehost103 ~]# System recovers if vdsm is manually restarted on the hypervisor.
Currently I don't see a Node impact, thus moving to storage.
Does it happen with 3.6?
Nir - shouldn't this be covered by your work on ioprocess leakage?
This is a duplicate of bug 1339777, fixed in ioprocess-0.16.1. The fix was not backported to ioprocess-0.15, available in rhev-3.6 chanel, but we can backport it if needed (trivial fix).
Backport is here: https://gerrit.ovirt.org/62953
This depends on ioprocess bug 1371634, we can require the ioprocess version fixing this when it is available.
This bug is for requiring newer ioprocess pkg right ? If so, Can you please add the version we want to require ?
We need 0.15.2, see comment 7.
*** Bug 1373491 has been marked as a duplicate of this bug. ***
How to verify: I don't think we can reproduce the mount issue described in the description. We can easily reproduce the open __DIRECT_IO_TEST__ file, but we already did when verifying bug 1371634. So the only think we want to verify here is that when installing vdsm on a system running ioprocess < 0.15.2, iprocess 0.15.2 in installed during vdsm upgrade/installation. We can also check that after vdsm upgrade, none of the ioprocess instances has an open file on shared storage (using lsof).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0109.html