Description of problem: rhv storage failures when using nfs v4 Version-Release number of selected component (if applicable): Red Hat Virtualization Host 4.2.5 vdsm-4.20.35-1.el7ev.x86_64 nfs-utils-1.3.0-0.54.el7.x86_64 How reproducible: always at customer environment not able to reproduce in my environment Steps to Reproduce: 1. edit /etc/nfsmount.conf (uncomment "Defaultvers=4") 2. try to create snapshot or create vm out of template using nfs storage domain 3. Actual results: 2018-08-13 13:35:11,634+0300 ERROR (tasks/1) [root] Job u'11c69282-2ed9-497c-b948-4a3e81db2b5e' failed (jobs:221) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run self._run() File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/copy_data.py", line 83, in _run self._operation.run() File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 271, in run for data in self._operation.watch(): File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 104, in watch self._finalize(b"", err) File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 178, in _finalize raise cmdutils.Error(self._cmd, rc, out, err) Error: Command ['/usr/bin/taskset', '--cpu-list', '0-11', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/172.15.16.22:_root__vdm__6_rhv__pre__data/9f6bfaa5-c106-4921-b998-25619badb123/images/50a815b6-6756-48eb-a540-9533d9ac2e2a/309de5ea-ceaa-4f3b-b5cd-28275b018cce', '-O', 'raw', u'/rhev/data-center/mnt/172.15.16.22:_export/6bb380a7-f4ef-450f-9a25-0a9127db90a3/images/411b12475-47cc-4bf4-a18f-0620678b79a3/df8633b5-ed5c-45e7-8c72-f7b39dc00714'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while writing sector 0: Resource temporarily unavailable\n') Expected results:operation will not fail Additional info: when using nfs version v3, there are no errors rpm info for nfs server queried from rhv host program vers proto port service 102660 1 udp 65160 102660 1 tcp 65076 536870919 3 tcp 12345 536870919 1 tcp 12345 536870919 3 udp 12345 536870919 1 udp 12345 824395111 1 tcp 59767 824395111 1 udp 57352 100011 1 tcp 49968 rquotad 100011 1 udp 56735 rquotad 536870914 1 udp 4658 536870914 1 tcp 4658 100021 3 udp 62679 nlockmgr 100021 2 udp 62679 nlockmgr 100021 1 udp 62679 nlockmgr 100021 4 udp 62679 nlockmgr 100021 3 tcp 60103 nlockmgr 100021 2 tcp 60103 nlockmgr 100021 1 tcp 60103 nlockmgr 100021 4 tcp 60103 nlockmgr 100024 1 udp 63770 status 100024 1 tcp 63770 status 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 3 tcp 2049 nfs 100003 2 tcp 2049 nfs 140391 1 udp 31491 100005 3 tcp 1234 mountd 100005 2 tcp 1234 mountd 100005 1 tcp 1234 mountd 100005 3 udp 1234 mountd 100005 2 udp 1234 mountd 100005 1 udp 1234 mountd 100003 4 tcp 2049 nfs 100000 4 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 tcp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100000 2 tcp 111 portmapper
Freddy, can you have a look please?
isn't it the problem with OFD? From RHEL 7.5 hyperviors there is new kind of locks for NFSv4 - OFD. If the NFS protocol in use is 4 and we have Windows NFS server, then qemu-kvm attempts to lock an already locked byte on the VM's disk. That is tracked by bug: Bug 1589627 - Improve image locking error handling https://bugzilla.redhat.com/show_bug.cgi?id=1589627 It is likely that EMC storage (if it is some old Clarion or VNX) doesn't have POSIX lock splitting and merging semantics. RFC 5661, Section 18.10.4 When the client sends a LOCK operation that corresponds to a range that the lock-owner has locked already (with the same or different lock type), or to a sub-range of such a range, or to a byte-range that includes multiple locks already granted to that lock-owner, in whole or in part, and the server does not support such locking operations (i.e., does not support POSIX locking semantics), the server will return the error NFS4ERR_LOCK_RANGE. In that case, the client may return an error, or it may emulate the required operations, using only LOCK for ranges that do not include any bytes already locked by that lock-owner and LOCKU of locks held by that lock-owner (specifying an exactly matching range and type).
Marian, Can you check comment #2 ? It may be an issue with the NFS box implementation. Thanks
(In reply to Olimp Bockowski from comment #2) > isn't it the problem with OFD? > From RHEL 7.5 hyperviors there is new kind of locks for NFSv4 - OFD. If the > NFS protocol in use is 4 and we have Windows NFS server, then qemu-kvm > attempts to lock an already locked byte on the VM's disk. That is tracked by > bug: > > Bug 1589627 - Improve image locking error handling > https://bugzilla.redhat.com/show_bug.cgi?id=1589627 > > It is likely that EMC storage (if it is some old Clarion or VNX) doesn't > have POSIX lock splitting and merging semantics. RFC 5661, Section 18.10.4 ... If this is the case, this bug should move to qemu. RHV have no control on the locks taken by qemu.
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
so on the hypervisor there is a problem to read metadata file from NFS4 based Storage Domain strace -fvttTyyx -s 4096 -o /tmp/strace.txt lsof -b +M -n -l In general process 12150 is asked for file description 0, 1 and 2 and it fails after ~ 6 minutes. 12765 11:03:19.307373 openat(AT_FDCWD, "/proc/12150/fd", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4</proc/12150/fd> <0.000015> 12765 11:03:19.307425 getdents(4</proc/12150/fd>, [{d_ino=126674, d_off=1, d_reclen=24, d_name=".", d_type=DT_DIR}, {d_ino=126673, d_off=2, d_reclen=24, d_name="..", d_type=DT_DIR}, {d_ino=126675, d_off=3, d_reclen=24, d_name="0", d_type=DT_LNK}, {d_ino=126676, d_off=4, d_reclen=24, d_name="1", d_type=DT_LNK}, {d_ino=126677, d_off=130, d_reclen=24, d_name="2", d_type=DT_LNK}], 32768) = 120 <0.000021> 12765 11:03:19.307492 readlink("/proc/12150/fd/0", "/rhev/data-center/mnt/172.29.4.1:_sd__sna__zc4__01__gl__template__02/26e8ef88-e23a-4753-81a5-7309b3cb2b15/dom_md/metadata", 4096) = 121 <0.000018> 12765 11:03:19.307543 lstat("/proc/12150/fd/0", {st_dev=makedev(0, 3), st_ino=126675, st_mode=S_IFLNK|0500, st_nlink=1, st_uid=36, st_gid=36, st_blksize=1024, st_blocks=0, st_size=64, st_atime=2019/03/21-11:03:19.307375483, st_mtime=2019/03/21-10:56:48.506369660, st_ctime=2019/03/21-10:56:48.506369660}) = 0 <0.000012> 12765 11:03:19.307606 stat("/proc/12150/fd/0", {st_dev=makedev(0, 45), st_ino=31536, st_mode=S_IFREG|0644, st_nlink=1, st_uid=36, st_gid=36, st_blksize=65536, st_blocks=8, st_size=366, st_atime=2019/03/21-11:08:52.912759000, st_mtime=2018/04/12-17:30:45.610240000, st_ctime=2018/04/12-17:30:45.638236000}) = 0 <390.728956> --- ~ 6 minutes later 12765 11:09:50.036825 open("/proc/12150/fdinfo/0", O_RDONLY) = -1 ENOENT (No such file or directory) <0.001793> 12765 11:09:50.038754 readlink("/proc/12150/fd/1", 0x7ffc72a01e20, 4096) = -1 ENOENT (No such file or directory) <0.000031> 12765 11:09:50.038865 open("/proc/12150/fdinfo/1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000028> 12765 11:09:50.038964 readlink("/proc/12150/fd/2", 0x7ffc72a01e20, 4096) = -1 ENOENT (No such file or directory) <0.000069> 12765 11:09:50.039095 open("/proc/12150/fdinfo/2", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000019> 12765 11:09:50.039158 getdents(4</proc/12150/fd>, 0x19a6960, 32768) = -1 ENOENT (No such file or directory) <0.000011> 12765 11:09:50.039226 close(4</proc/12150/fd>) = 0 <0.000016> So it hang with /rhev/data-center/mnt/172.29.4.1:_sd__sna__zc4__01__gl__template__02/26e8ef88-e23a-4753-81a5-7309b3cb2b15/dom_md/metadata In messages we have many call traces, all the same: Mar 21 10:57:23 cdvhost046 kernel: pool D ffffa17ec876bf40 0 10897 10860 0x00000084 Mar 21 10:57:23 cdvhost046 kernel: Call Trace: Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffc089375e>] ? nfs4_file_open+0x14e/0x2a0 [nfsv4] Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab119e59>] schedule_preempt_disabled+0x29/0x70 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab117c17>] __mutex_lock_slowpath+0xc7/0x1d0 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab116fff>] mutex_lock+0x1f/0x2f Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaacf7db6>] ima_file_check+0xa6/0x1b0 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac2f42a>] do_last+0x59a/0x12c0 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac30227>] path_openat+0xd7/0x640 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac31dbd>] do_filp_open+0x4d/0xb0 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac3f167>] ? __alloc_fd+0x47/0x170 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac1e0d7>] do_sys_open+0x137/0x240 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab1256d5>] ? system_call_after_swapgs+0xa2/0x146 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac1e1fe>] SyS_open+0x1e/0x20 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab12579b>] system_call_fastpath+0x22/0x27 Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab1256e1>] ? system_call_after_swapgs+0xae/0x146 Mar 21 10:57:23 cdvhost046 kernel: INFO: task pool:11416 blocked for more than 120 seconds. Mar 21 10:57:23 cdvhost046 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 21 10:57:23 cdvhost046 kernel: pool D ffffa1beeae62f70 0 11416 10860 0x00000084
I believe the call traces and lsof are not tightly related. Call traces were before strace (lsof) I see that call traces are in messages everytime we try to activate hypervisor, then it goes into non operational and vdsm gives: 2019-01-23 13:49:25,871+0100 ERROR (monitor/26e8ef8) [storage.Monitor] Error checking domain 26e8ef88-e23a-4753-81a5-7309b3cb2b15 ... File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 442, in _sendCommand raise Timeout(os.strerror(errno.ETIMEDOUT)) Timeout: Connection timed out the more specific error is: 2019-01-23 13:46:36,439+0100 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.29.4.2:$CUSTOMER_STORAGE_DOMAIN/060adb2d-1c3c-4f0e-b13b-24d5ca8fac0c/dom_md/metadata (monitor:498) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in _pathChecked delay = result.delay() File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in delay raise exception.MiscFileReadException(self.path, self.rc, self.err) MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/172.29.4.2:#CUSTOMER_STORAGE_DOMAIN/060adb2d-1c3c-4f0e-b13b-24d5ca8fac0c/dom_md/metadata', 1, 'Read timeout')
removing needinfo as Olimp did provided reply in c10 and c11
Can't reproduce the bug. when following the bug description, I see no error. how to reproduce this bug?
Hi, did you used NFS4 from EMC VNX5800 as reported by Olimp and Andrea
No, I don't have this device. how can I connect to one? any idea?
(In reply to Marian Jankular from comment #0) > Error: Command ['/usr/bin/taskset', '--cpu-list', '0-11', '/usr/bin/nice', > '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', > '-p', '-t', 'none', '-T', 'none', '-f', 'raw', > u'/rhev/data-center/mnt/172.15.16.22:_root__vdm__6_rhv__pre__data/9f6bfaa5- > c106-4921-b998-25619badb123/images/50a815b6-6756-48eb-a540-9533d9ac2e2a/ > 309de5ea-ceaa-4f3b-b5cd-28275b018cce', '-O', 'raw', > u'/rhev/data-center/mnt/172.15.16.22:_export/6bb380a7-f4ef-450f-9a25- > 0a9127db90a3/images/411b12475-47cc-4bf4-a18f-0620678b79a3/df8633b5-ed5c-45e7- > 8c72-f7b39dc00714'] failed with rc=1 out='' err=bytearray(b'qemu-img: error > while writing sector 0: Resource temporarily unavailable\n') This is not the error message that qemu-img would print in cases where it can't lock the image. I think locking is not the problem here. I'm not completely sure where this EAGAIN comes from. The block drivers involved in this conversion (raw and file-posix) don't produce this error code. Can we get an strace output for the qemu-img invocation to check whether it is something the kernel returned? Also, comment 10 is a completely different operation that seems to fail. Is this an indication that there is a more general problem with the host or at least this NFS mount?
@Kevin - we can't get anything, customer abandoned it and changed for NFSv3 as a workaround
sync2jira
(In reply to Olimp Bockowski from comment #19) > @Kevin - we can't get anything, customer abandoned it and changed for NFSv3 > as a workaround Marian, can you please create a KCS and mention this as a workaround? If we can't reproduce internally, we may need to close this BZ with insufficient data.
I suggest closing it INSUFFICIENT DATA at this point. If this would be something real, it would be impacting all customers. NFSv4 is very common today and we would have had this reported by others as well. The bug has been open since 2.5y, unless the issue is still happening on customer environment, I suggest to close it.
closing the bug : INSUFFICIENT DATA
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days