Description of problem: ----------------------- After updating from RHV 4.3.7 to RHV 4.3.8, the HC host goes non-operational Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHV 4.3.7 (w/ RHGS 3.4.4 - glusterfs-3.12 How reproducible: ----------------- 1/1 Steps to Reproduce: ------------------- 1. Deploy RHHI-V : RHVH 4.3.7 + Hosted engine setup with 3 HC hosts 2. Create 16 VMs running kernel untar workload 3. Update the engine to RHV 4.3.8 4. Update RHVH node from RHV Manager UI Actual results: --------------- VMs running on that RHVH host is migrated to other hosts, moved that host to maintenance and post reboot, the host went in to non-operational Expected results: ----------------- Host should be in activated state post reboot. Additional info:
Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHV 4.3.7 (w/ RHGS 3.4.4 - glusterfs-3.12.2-47.5 ) RHV 4.3.8 (w/ RHGS 3.5 - glusterfs-6.0-24.el7rhgs )
Suspicious statements in vdsm logs lead to something was wrong with 4K native implementation check: <snip> 2019-12-05 20:42:48,670+0530 ERROR (jsonrpc/3) [storage.TaskManager.Task] (Task='a02a9ba0-4fb8-434b-be0a-a6b22ab55de6') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in getStorageDomainInfo File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2753, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 305, in validateSdUUID sdDom = sdCache.produce(sdUUID=sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, in produce domain.getRealDomain() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, in _realProduce domain = self._findDomain(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, in _findDomain return findMethod(sdUUID) File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line 62, in findDomain return GlusterStorageDomain(GlusterStorageDomain.findDomainPath(sdUUID)) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 378, in __init__ manifest.sdUUID, manifest.mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line 853, in _detect_block_size block_size = iop.probe_block_size(mountpoint) File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py", line 384, in probe_block_size return self._ioproc.probe_block_size(dir_path) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 602, in probe_block_size "probe_block_size", {"dir": dir_path}, self.timeout) File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 448, in _sendCommand raise OSError(errcode, errstr) OSError: [Errno 2] No such file or directory 2019-12-05 20:42:48,670+0530 INFO (jsonrpc/3) [storage.TaskManager.Task] (Task='a02a9ba0-4fb8-434b-be0a-a6b22ab55de6') aborting: Task is aborted: u'[Errno 2] No such file or directory' - code 100 (task:1181) 2019-12-05 20:42:48,670+0530 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH getStorageDomainInfo error=[Errno 2] No such file or directory (dispatcher:87) </snip>
Vojtech, could you check?
the issue is tracked under BZ #1780290
(In reply to Vojtech Juranek from comment #4) > the issue is tracked under BZ #1780290 Thanks Vojta, as per RHHI-V process, we do have additional bug created to track this issue for RHHI-V product perspective
The issue is also seen with fresh installation of RHVH 4.3.8 , RHHI-V 1.7 with RHGS 3.5.1 But with the latest upstream version of ioprocess, the issue is fixed
Verified the upgrade from RHV 4.3.7 to RHV 4.3.8. Steps involved are: 1. RHHI-V Deployment ( self-hosted-engine deployment ) with 3 nodes. 2. 10 RHEL 7.7 VMs are created and all these VMs are continuously running I/O with kernel untar workload Note: kernel untar workload, downloads the tarball of kernel, untars and computes the sha256sum of all the extracted files. 3. Enabled the local repo that contained the RHVH 4.3.8 redhat-virtualization-host-image-update 4. Enabled global maintenance for Hosted Engine VM. 5. RHV Manager 4.3.7 is updated to RHV Manager 4.3.8, and also the all software packages update is done and rebooted. 6. HE VM is started and move out of global maintenance 7. Once RHV Manager UI is up, logged in to it and upgraded one after the other from UI 8. Once all the hosts are upgraded, edit the cluster 'Default' and go to 'General' -> 'Compatibility version' and update it from '4.2' to '4.3' Existing VMs needs to powered-off and restarted post updating the compatibility version Known_issues 1. Sometimes, in the 2 network configuration, the gluster network never came up and need to set 'BOOTPROTO=dhcp' and bring up the network. 2. Because of 1, gluster bricks are not coming up, leading to pending self-heal One had to bring up the network and heal starts 3. After all the healing is completed, it takes sometime ( not more than 5 mins ) to reflect the status of heal in RHV Manager UI
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0508