Bug 1624734 - rhv storage failures when using nfs v4
Summary: rhv storage failures when using nfs v4
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.2.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Tal Nisan
QA Contact: Avihai
URL:
Whiteboard:
Depends On: 1551486 1694148
Blocks: 1541529
TreeView+ depends on / blocked
 
Reported: 2018-09-03 07:38 UTC by Marian Jankular
Modified: 2023-09-15 00:11 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-21 20:44:22 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5070191 0 None None None 2020-05-13 14:52:20 UTC

Description Marian Jankular 2018-09-03 07:38:12 UTC
Description of problem:
rhv storage failures when using nfs v4

Version-Release number of selected component (if applicable):
Red Hat Virtualization Host 4.2.5
vdsm-4.20.35-1.el7ev.x86_64
nfs-utils-1.3.0-0.54.el7.x86_64

How reproducible:
always at customer environment 
not able to reproduce in my environment

Steps to Reproduce:
1. edit /etc/nfsmount.conf (uncomment "Defaultvers=4")
2. try to create snapshot or create vm out of template using nfs storage domain
3.

Actual results:
2018-08-13 13:35:11,634+0300 ERROR (tasks/1) [root] Job u'11c69282-2ed9-497c-b948-4a3e81db2b5e' failed (jobs:221)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 157, in run
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sdm/api/copy_data.py", line 83, in _run
    self._operation.run()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/qemuimg.py", line 271, in run
    for data in self._operation.watch():
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 104, in watch
    self._finalize(b"", err)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/operation.py", line 178, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
Error: Command ['/usr/bin/taskset', '--cpu-list', '0-11', '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 'raw', u'/rhev/data-center/mnt/172.15.16.22:_root__vdm__6_rhv__pre__data/9f6bfaa5-c106-4921-b998-25619badb123/images/50a815b6-6756-48eb-a540-9533d9ac2e2a/309de5ea-ceaa-4f3b-b5cd-28275b018cce', '-O', 'raw', u'/rhev/data-center/mnt/172.15.16.22:_export/6bb380a7-f4ef-450f-9a25-0a9127db90a3/images/411b12475-47cc-4bf4-a18f-0620678b79a3/df8633b5-ed5c-45e7-8c72-f7b39dc00714'] failed with rc=1 out='' err=bytearray(b'qemu-img: error while writing sector 0: Resource temporarily unavailable\n')

Expected results:operation will not fail

Additional info:
when using nfs version v3, there are no errors

rpm info for nfs server queried from rhv host

program vers proto   port  service
    102660    1   udp  65160
    102660    1   tcp  65076
 536870919    3   tcp  12345
 536870919    1   tcp  12345
 536870919    3   udp  12345
 536870919    1   udp  12345
 824395111    1   tcp  59767
 824395111    1   udp  57352
    100011    1   tcp  49968  rquotad
    100011    1   udp  56735  rquotad
 536870914    1   udp   4658
 536870914    1   tcp   4658
    100021    3   udp  62679  nlockmgr
    100021    2   udp  62679  nlockmgr
    100021    1   udp  62679  nlockmgr
    100021    4   udp  62679  nlockmgr
    100021    3   tcp  60103  nlockmgr
    100021    2   tcp  60103  nlockmgr
    100021    1   tcp  60103  nlockmgr
    100021    4   tcp  60103  nlockmgr
    100024    1   udp  63770  status
    100024    1   tcp  63770  status
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    2   tcp   2049  nfs
    140391    1   udp  31491
    100005    3   tcp   1234  mountd
    100005    2   tcp   1234  mountd
    100005    1   tcp   1234  mountd
    100005    3   udp   1234  mountd
    100005    2   udp   1234  mountd
    100005    1   udp   1234  mountd
    100003    4   tcp   2049  nfs
    100000    4   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100000    2   tcp    111  portmapper

Comment 1 Tal Nisan 2018-09-04 07:03:50 UTC
Freddy, can you have a look please?

Comment 2 Olimp Bockowski 2018-09-05 08:21:57 UTC
isn't it the problem with OFD? 
From RHEL 7.5 hyperviors there is new kind of locks for NFSv4 - OFD. If the NFS protocol in use is 4 and we have Windows NFS server, then qemu-kvm attempts to lock an already locked byte on the VM's disk. That is tracked by bug:

Bug 1589627 - Improve image locking error handling 
https://bugzilla.redhat.com/show_bug.cgi?id=1589627

It is likely that EMC storage (if it is some old Clarion or VNX) doesn't have POSIX lock splitting and merging semantics. RFC 5661, Section 18.10.4

When the client sends a LOCK operation that corresponds to a range
that the lock-owner has locked already (with the same or different
lock type), or to a sub-range of such a range, or to a byte-range
that includes multiple locks already granted to that lock-owner, in
whole or in part, and the server does not support such locking
operations (i.e., does not support POSIX locking semantics), the
server will return the error NFS4ERR_LOCK_RANGE.  In that case, the
client may return an error, or it may emulate the required
operations, using only LOCK for ranges that do not include any bytes
already locked by that lock-owner and LOCKU of locks held by that
lock-owner (specifying an exactly matching range and type).

Comment 4 Fred Rolland 2018-10-03 11:54:46 UTC
Marian,
Can you check comment #2 ?
It may be an issue with the NFS box implementation.

Thanks

Comment 6 Nir Soffer 2018-11-18 17:48:49 UTC
(In reply to Olimp Bockowski from comment #2)
> isn't it the problem with OFD? 
> From RHEL 7.5 hyperviors there is new kind of locks for NFSv4 - OFD. If the
> NFS protocol in use is 4 and we have Windows NFS server, then qemu-kvm
> attempts to lock an already locked byte on the VM's disk. That is tracked by
> bug:
> 
> Bug 1589627 - Improve image locking error handling 
> https://bugzilla.redhat.com/show_bug.cgi?id=1589627
> 
> It is likely that EMC storage (if it is some old Clarion or VNX) doesn't
> have POSIX lock splitting and merging semantics. RFC 5661, Section 18.10.4
...
If this is the case, this bug should move to qemu. RHV have no control on the locks
taken by qemu.

Comment 7 Sandro Bonazzola 2019-01-28 09:40:31 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 10 Olimp Bockowski 2019-03-25 09:50:18 UTC
so on the hypervisor there is a problem to read metadata file from NFS4 based Storage Domain

strace -fvttTyyx -s 4096 -o /tmp/strace.txt lsof -b +M -n -l

In general process 12150 is asked for file description 0, 1 and 2 and it fails after ~ 6 minutes.

12765 11:03:19.307373 openat(AT_FDCWD, "/proc/12150/fd", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4</proc/12150/fd> <0.000015>
12765 11:03:19.307425 getdents(4</proc/12150/fd>, [{d_ino=126674, d_off=1, d_reclen=24, d_name=".", d_type=DT_DIR}, {d_ino=126673, d_off=2, d_reclen=24, d_name="..", d_type=DT_DIR}, {d_ino=126675, d_off=3, d_reclen=24, d_name="0", d_type=DT_LNK}, {d_ino=126676, d_off=4, d_reclen=24, d_name="1", d_type=DT_LNK}, {d_ino=126677, d_off=130, d_reclen=24, d_name="2", d_type=DT_LNK}], 32768) = 120 <0.000021>
12765 11:03:19.307492 readlink("/proc/12150/fd/0", "/rhev/data-center/mnt/172.29.4.1:_sd__sna__zc4__01__gl__template__02/26e8ef88-e23a-4753-81a5-7309b3cb2b15/dom_md/metadata", 4096) = 121 <0.000018>
12765 11:03:19.307543 lstat("/proc/12150/fd/0", {st_dev=makedev(0, 3), st_ino=126675, st_mode=S_IFLNK|0500, st_nlink=1, st_uid=36, st_gid=36, st_blksize=1024, st_blocks=0, st_size=64, st_atime=2019/03/21-11:03:19.307375483, st_mtime=2019/03/21-10:56:48.506369660, st_ctime=2019/03/21-10:56:48.506369660}) = 0 <0.000012>
12765 11:03:19.307606 stat("/proc/12150/fd/0", {st_dev=makedev(0, 45), st_ino=31536, st_mode=S_IFREG|0644, st_nlink=1, st_uid=36, st_gid=36, st_blksize=65536, st_blocks=8, st_size=366, st_atime=2019/03/21-11:08:52.912759000, st_mtime=2018/04/12-17:30:45.610240000, st_ctime=2018/04/12-17:30:45.638236000}) = 0 <390.728956>

--- 
~ 6 minutes later

12765 11:09:50.036825 open("/proc/12150/fdinfo/0", O_RDONLY) = -1 ENOENT (No such file or directory) <0.001793>
12765 11:09:50.038754 readlink("/proc/12150/fd/1", 0x7ffc72a01e20, 4096) = -1 ENOENT (No such file or directory) <0.000031>
12765 11:09:50.038865 open("/proc/12150/fdinfo/1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000028>
12765 11:09:50.038964 readlink("/proc/12150/fd/2", 0x7ffc72a01e20, 4096) = -1 ENOENT (No such file or directory) <0.000069>
12765 11:09:50.039095 open("/proc/12150/fdinfo/2", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000019>
12765 11:09:50.039158 getdents(4</proc/12150/fd>, 0x19a6960, 32768) = -1 ENOENT (No such file or directory) <0.000011>
12765 11:09:50.039226 close(4</proc/12150/fd>) = 0 <0.000016>

So it hang with /rhev/data-center/mnt/172.29.4.1:_sd__sna__zc4__01__gl__template__02/26e8ef88-e23a-4753-81a5-7309b3cb2b15/dom_md/metadata

In messages we have many call traces, all the same:

Mar 21 10:57:23 cdvhost046 kernel: pool            D ffffa17ec876bf40     0 10897  10860 0x00000084
Mar 21 10:57:23 cdvhost046 kernel: Call Trace:
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffc089375e>] ? nfs4_file_open+0x14e/0x2a0 [nfsv4]
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab119e59>] schedule_preempt_disabled+0x29/0x70
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab117c17>] __mutex_lock_slowpath+0xc7/0x1d0
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab116fff>] mutex_lock+0x1f/0x2f
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaacf7db6>] ima_file_check+0xa6/0x1b0
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac2f42a>] do_last+0x59a/0x12c0
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac30227>] path_openat+0xd7/0x640
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac31dbd>] do_filp_open+0x4d/0xb0
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac3f167>] ? __alloc_fd+0x47/0x170
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac1e0d7>] do_sys_open+0x137/0x240
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab1256d5>] ? system_call_after_swapgs+0xa2/0x146
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffaac1e1fe>] SyS_open+0x1e/0x20
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab12579b>] system_call_fastpath+0x22/0x27
Mar 21 10:57:23 cdvhost046 kernel: [<ffffffffab1256e1>] ? system_call_after_swapgs+0xae/0x146
Mar 21 10:57:23 cdvhost046 kernel: INFO: task pool:11416 blocked for more than 120 seconds.
Mar 21 10:57:23 cdvhost046 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 21 10:57:23 cdvhost046 kernel: pool            D ffffa1beeae62f70     0 11416  10860 0x00000084

Comment 11 Olimp Bockowski 2019-03-25 10:26:15 UTC
I believe the call traces and lsof are not tightly related. Call traces were before strace (lsof) 
I see that call traces are in messages everytime we try to activate hypervisor, then it goes into non operational and vdsm gives:


2019-01-23 13:49:25,871+0100 ERROR (monitor/26e8ef8) [storage.Monitor] Error checking domain 26e8ef88-e23a-4753-81a5-7309b3cb2b15
...
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 442, in _sendCommand
    raise Timeout(os.strerror(errno.ETIMEDOUT))
Timeout: Connection timed out

the more specific error is:

2019-01-23 13:46:36,439+0100 ERROR (check/loop) [storage.Monitor] Error checking path /rhev/data-center/mnt/172.29.4.2:$CUSTOMER_STORAGE_DOMAIN/060adb2d-1c3c-4f0e-b13b-24d5ca8fac0c/dom_md/metadata (monitor:498)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 496, in _pathChecked
    delay = result.delay()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/check.py", line 391, in delay
    raise exception.MiscFileReadException(self.path, self.rc, self.err)
MiscFileReadException: Internal file read failure: (u'/rhev/data-center/mnt/172.29.4.2:#CUSTOMER_STORAGE_DOMAIN/060adb2d-1c3c-4f0e-b13b-24d5ca8fac0c/dom_md/metadata', 1, 'Read timeout')

Comment 13 Marian Jankular 2019-04-24 13:32:54 UTC
removing needinfo as Olimp did provided reply in c10 and c11

Comment 14 Ahmad Khiet 2019-07-09 13:41:48 UTC
Can't reproduce the bug.
when following the bug description, I see no error.
how to reproduce this bug?

Comment 15 Marian Jankular 2019-07-09 14:04:28 UTC
Hi,

did you used NFS4 from EMC VNX5800 as reported by Olimp and Andrea

Comment 16 Ahmad Khiet 2019-07-10 14:25:22 UTC
No, I don't have this device.

how can I connect to one? any idea?

Comment 17 Kevin Wolf 2019-07-17 16:09:02 UTC
(In reply to Marian Jankular from comment #0)
> Error: Command ['/usr/bin/taskset', '--cpu-list', '0-11', '/usr/bin/nice',
> '-n', '19', '/usr/bin/ionice', '-c', '3', '/usr/bin/qemu-img', 'convert',
> '-p', '-t', 'none', '-T', 'none', '-f', 'raw',
> u'/rhev/data-center/mnt/172.15.16.22:_root__vdm__6_rhv__pre__data/9f6bfaa5-
> c106-4921-b998-25619badb123/images/50a815b6-6756-48eb-a540-9533d9ac2e2a/
> 309de5ea-ceaa-4f3b-b5cd-28275b018cce', '-O', 'raw',
> u'/rhev/data-center/mnt/172.15.16.22:_export/6bb380a7-f4ef-450f-9a25-
> 0a9127db90a3/images/411b12475-47cc-4bf4-a18f-0620678b79a3/df8633b5-ed5c-45e7-
> 8c72-f7b39dc00714'] failed with rc=1 out='' err=bytearray(b'qemu-img: error
> while writing sector 0: Resource temporarily unavailable\n')

This is not the error message that qemu-img would print in cases where it can't lock the image. I think locking is not the problem here.

I'm not completely sure where this EAGAIN comes from. The block drivers involved in this conversion (raw and file-posix) don't produce this error code. Can we get an strace output for the qemu-img invocation to check whether it is something the kernel returned?

Also, comment 10 is a completely different operation that seems to fail. Is this an indication that there is a more general problem with the host or at least this NFS mount?

Comment 19 Olimp Bockowski 2019-08-22 12:47:54 UTC
@Kevin - we can't get anything, customer abandoned it and changed for NFSv3 as a workaround

Comment 20 Daniel Gur 2019-08-28 13:11:52 UTC
sync2jira

Comment 21 Daniel Gur 2019-08-28 13:16:05 UTC
sync2jira

Comment 22 Marina Kalinin 2019-11-20 22:09:40 UTC
(In reply to Olimp Bockowski from comment #19)
> @Kevin - we can't get anything, customer abandoned it and changed for NFSv3
> as a workaround

Marian, can you please create a KCS and mention this as a workaround?

If we can't reproduce internally, we may need to close this BZ with insufficient data.

Comment 26 Marina Kalinin 2021-02-19 19:47:05 UTC
I suggest closing it INSUFFICIENT DATA at this point.
If this would be something real, it would be impacting all customers. NFSv4 is very common today and we would have had this reported by others as well. The bug has been open since 2.5y, unless the issue is still happening on customer environment, I suggest to close it.

Comment 27 Ahmad Khiet 2021-02-21 20:44:22 UTC
closing the bug : INSUFFICIENT DATA

Comment 29 Red Hat Bugzilla 2023-09-15 00:11:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.