1609701 – libvirt/qemu doesn't report about guest i/ o error while blocking nfs/gluster storage

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1609701 - libvirt/qemu doesn't report about guest i/ o error while blocking nfs/gluster storage

Summary: libvirt/qemu doesn't report about guest i/ o error while blocking nfs/gluster...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jiri Denemark
QA Contact:	Han Han
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1481022
TreeView+	depends on / blocked

Reported:	2018-07-30 08:14 UTC by Polina
Modified:	2018-09-10 08:41 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-10 08:35:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
qemu, libvirt, vdsm, engine logs (1.86 MB, application/x-gzip) 2018-07-30 08:14 UTC, Polina	no flags	Details
engine_qemu_libvirt_vdsm_nfs_vm.tar.gz (2.72 MB, application/x-gzip) 2018-08-02 08:28 UTC, Polina	no flags	Details
attached logs for gluster vm (1006.20 KB, application/x-gzip) 2018-08-02 08:29 UTC, Polina	no flags	Details
View All

Description Polina 2018-07-30 08:14:34 UTC

Created attachment 1471443 [details]
qemu, libvirt, vdsm, engine logs

Description of problem: while blocking connection from the host to the storage domain, it is expected to get reporting from qemu/libvirt about i/o error. Then the vdsm can report about the pause state due to i/o storage error.

Version-Release number of selected component (if applicable):
rhv-release-4.2.5-5-001.noarch
libvirt-3.9.0-14.el7_5.6.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64

How reproducible:100%

Steps to Reproduce:
1. Run VM associated with nfs disk (similar problem with gluster). 
2. block connection from the host to the SD, like
   iptables -I INPUT -s yellow-vdsb.qa.lab.tlv.redhat.com -j DROP

Actual results: no reporting about i/o error 

Expected results: i/o error reported.

Additional info: attached logs: qemu, libvirt, vdsm, engine

Comment 2 Han Han 2018-07-31 09:28:06 UTC

Test on libvirt-3.9.0-14.el7_5.6.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64

1. Setup a glusterfs server and mount it on /mnt
2. According to the engine.log, preparing a running VM using following disk xml:
...
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads' iothread='1'/>
      <source file='/mnt/A.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' function='0x0'/>
    </disk>
..

3. Block the gluster server and check disk error:
# iptables -I INPUT -s 10.66.6.81 -j DROP
# virsh domblkerror fuse
No errors found

After a while, check the disk error:
# virsh domblkerror fuse
vda: unspecified error

Libvirt doesn't the disk error when the first try.

Comment 3 Han Han 2018-08-02 02:42:08 UTC

Hi, Polina,
Could you please check if the comment2 matches your rhv scenarios? BTW, the former libvirtd.log is empty. Please set libvirtd.conf as following, restart libvirtd and gather the libvirtd.log again:
log_level = 3 
log_filters="1:qemu 1:libvirt 4:object 4:json 4:event 1:util"
log_outputs="1:file:/var/log/libvirtd.log"

Comment 4 Polina 2018-08-02 08:28:21 UTC

Created attachment 1472431 [details]
engine_qemu_libvirt_vdsm_nfs_vm.tar.gz

Hi Han,
I configured the libvirt log according to the instructions and restarted the libvirtd. Please see the logs:

1. For the NFS VM scenario the logs are in engine_qemu_libvirt_vdsm_nfs_vm.tar.gz
   In engine.log the scenario starts at the line 6194

2. For the gluster VM engine_qemu_libvirt_vdsm_gluster_vm.tar.gz
   In engine.log the scenario starts at the line 5747

About the comment 2: please see below the CreateBrokerVDSCommand reporting to the engine.log (also appears in the engine.log attached).
here is the fragment os xml vm starting  
    <disk type="file" device="cdrom" snapshot="no">
      <driver name="qemu" type="raw" error_policy="report"/>
      <source file="" startupPolicy="optional"/>
      <target dev="hdc" bus="ide"/>
      <readonly/>
      <alias name="ua-d39ebacf-1731-4858-9a96-f909147d07f3"/>
      <address bus="1" controller="0" unit="0" type="drive" target="0"/>
    </disk>

Comment 5 Polina 2018-08-02 08:29:52 UTC

Created attachment 1472437 [details]
attached logs for gluster vm

Comment 6 Martin Tessun 2018-08-29 08:40:16 UTC

(In reply to Polina from comment #0)
> Created attachment 1471443 [details]
> qemu, libvirt, vdsm, engine logs
> 
> Description of problem: while blocking connection from the host to the
> storage domain, it is expected to get reporting from qemu/libvirt about i/o
> error. Then the vdsm can report about the pause state due to i/o storage
> error.

How NFS/Gluster do report I/O errors is mainly dependant on the mount options:

soft: Does report error after timeo value is reached
hard: No errors are reported, but I/O is stuck until the NFS server gets back online.

So depending on the mount options the error is reported at some point in time. There are more options that influence the error reporting, but knowing the exact mount options RHV is using for such storage would be helpful.


> 
> Version-Release number of selected component (if applicable):
> rhv-release-4.2.5-5-001.noarch
> libvirt-3.9.0-14.el7_5.6.x86_64
> qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64
> 
> How reproducible:100%
> 
> Steps to Reproduce:
> 1. Run VM associated with nfs disk (similar problem with gluster). 
> 2. block connection from the host to the SD, like
>    iptables -I INPUT -s yellow-vdsb.qa.lab.tlv.redhat.com -j DROP
> 
> Actual results: no reporting about i/o error 
> 
> Expected results: i/o error reported.

So what are the settings of the nfs mount the storage is placed on. (See also the above explanation).

> 
> Additional info: attached logs: qemu, libvirt, vdsm, engine

Comment 7 Martin Tessun 2018-08-29 08:47:31 UTC

Well answering my own questions from looking at a RHV installation with NFS mounts I have access to:

NFSv3:
(rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountvers=3,mountport=30000,mountproto=udp,local_lock=all)

NFSv4.1:
(rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,local_lock=none)

So an I/O error can not occur before 10 minutes (timeo=600) have passed.

@Polina: Please add more information to the NFS or Gluster Mounts as you see fit.

Comment 8 Martin Tessun 2018-08-29 08:58:24 UTC

Oops, calculated the timeout /error reporting time wrong. Accordoing to https://linux.die.net/man/5/nfs it should be way more than 10 minutes:

retrans=6 and timeo=600 means 60s for the first try and every further retry adds timeo to the timeout, so this would be:

60s + 120s + 180s + 240s + 300s + 360s + 420s = 1680s = 28 minutes

Did I misunderstand that section, or should we really reduce the parameters here.

From the man page:
Options supported by all versions

These options are valid to use with any NFS version.
soft / hard

Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.

NB: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option.

timeo=n

The time in deciseconds (tenths of a second) the NFS client waits for a response before it retries an NFS request.

For NFS over TCP the default timeo value is 600 (60 seconds). The NFS client performs linear backoff: After each retransmission the timeout is increased by timeo up to the maximum of 600 seconds.

Comment 9 Martin Tessun 2018-08-29 12:10:53 UTC

Some further tests on my RHV show that the behaviour is exactly as predicted:

Testscenario:

1. Default VM on NFS storage:
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/192.168.5.1:_rhv4a__NFS__volume/f2a3e7f5-3509-4bb9-aac2-4813bea044b7/images/f64c269f-e3f4-4a0e-b28c-4831a550b40d/36f81764-bae1-438c-b532-6c4c29b0e342'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>f64c269f-e3f4-4a0e-b28c-4831a550b40d</serial>
      <boot order='1'/>
      <alias name='ua-f64c269f-e3f4-4a0e-b28c-4831a550b40d'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>


Disconnect storage:
Aug 29 11:48:48 inf6 sanlock[959]: 2018-08-29 11:48:48 1654432 [14299]: s7 delta_renew read timeout 10 sec offset 0 /rhev/data-center/mnt/192.168.5.1:_rhv4a__NFS__volume/f2a3e7f5-3509-4bb9-aac2-4813bea044b7/dom_md/ids

Followed by NFS timeout messages:
Aug 29 11:56:21 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 11:58:35 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 12:08:01 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 12:15:08 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 12:22:15 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 12:29:22 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 12:36:29 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out


and later also stuck task messages occured (as the timeo did grow):
Aug 29 11:58:56 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:00:56 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:02:56 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:04:56 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:06:56 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:08:56 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:10:56 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:12:57 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:14:57 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.
Aug 29 12:16:57 inf6 kernel: INFO: task qemu-kvm:28187 blocked for more than 120 seconds.


Trying to get domblkerrors during the nfs recovery time resulted in timeouts:
[root@inf6 ~]# virsh -r domblkerror mtessun_nfs_test
error: Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainGetDiskErrors)

[root@inf6 ~]# 

I only could get the errors listed after I enabled access to the NFS storage again:
[root@inf6 ~]# virsh -r domblkerror mtessun_nfs_test
No errors found

[root@inf6 ~]# 


This looks to me as if the NFS settings in RHV need to be revised, as it takes too long to identify a NFS storage that is no longer reachable and blocks any IO for a too long time as *every* I/O takes 28 minutes to fail. This sums up to infinity for error handling.

The VM itself stays in "unknown state" unless the storage connection is restored.


2. HA VM with lease:
    <disk type='file' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='threads'/>
      <source file='/rhev/data-center/mnt/192.168.5.1:_rhv4a__NFS__volume/f2a3e7f5-3509-4bb9-aac2-4813bea044b7/images/f64c269f-e3f4-4a0e-b28c-4831a550b40d/36f81764-bae1-438c-b532-6c4c29b0e342'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>f64c269f-e3f4-4a0e-b28c-4831a550b40d</serial>
      <boot order='1'/>
      <alias name='ua-f64c269f-e3f4-4a0e-b28c-4831a550b40d'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

Stopped communication:
[root@inf6 ~]# iptables -I INPUT -s 192.168.5.1 -j DROP
[root@inf6 ~]# date
Wed Aug 29 12:45:46 CEST 2018
[root@inf6 ~]# 


Looking for errors in the logs from qemu/kernel and NFS:

NFS timeout:
Aug 29 12:53:29 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 12:55:42 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 13:02:46 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 13:09:53 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 13:17:00 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out
Aug 29 13:24:07 inf6 kernel: nfs: server 192.168.5.1 not responding, timed out

Hung task:
Nothing this time.

VM itself goes to paused state after some while:
Aug 29, 2018, 1:17:00 PM VM mtessun_nfs_test has been paused due to storage I/O problem.

The time is a bit more than the expected 28 minutes (in fact it is 32 minutes), but ok.

The virsh domblkerror still gets timeouts as expected, but after removing the storage condition, you get:
[root@inf6 ~]# virsh -r domblkerror mtessun_nfs_test
sda: unspecified error

[root@inf6 ~]# 


This said, I don't think this is a Bug, but instead needs some tuning for NFS datadomains for RHV.

Comment 10 Michal Skrivanek 2018-08-29 13:07:15 UTC

> This looks to me as if the NFS settings in RHV need to be revised, as it takes too long to identify a NFS storage that is no longer reachable and blocks any IO for a too long time as *every* I/O takes 28 minutes to fail. This sums up to infinity for error handling.

yeah, that was the point of opening bug 1569926

Comment 11 Jiri Denemark 2018-08-29 13:35:32 UTC

Yeah, I don't see any libvirt bug here. The domblkerror virsh command results
in "query-block" QMP command to be send to QEMU. And libvirt just gets the
errors from the result and shows them to the user.

In other words if we want to do something with the "unspecified error"
message, we'd need to pass this BZ to qemu-kvm-rhev for investigation.

Comment 12 Polina 2018-09-02 07:55:33 UTC

(In reply to Martin Tessun from comment #7)

> @Polina: Please add more information to the NFS or Gluster Mounts as you see
> fit.

Please see the mount output for nfs and gluster:

yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/GE/compute-he-4/nfs_0 on /rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com:_Compute__NFS_GE_compute-he-4_nfs__0 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=10.35.160.95,local_lock=none,addr=10.35.80.5)

gluster01.scl.lab.tlv.redhat.com:/compute-ge-he-4-volume3 on /rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com:_compute-ge-he-4-volume3 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

####

gluster volume info compute-ge-he-4-volume1:
 
Volume Name: compute-ge-he-4-volume1
Type: Replicate
Volume ID: d5b2e835-9648-4a4f-bc1f-8c9fea338dbc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster01.scl.lab.tlv.redhat.com:/gluster_volumes/compute-ge-he-4-volume1
Brick2: gluster02.scl.lab.tlv.redhat.com:/gluster_volumes/compute-ge-he-4-volume1
Brick3: gluster03.scl.lab.tlv.redhat.com:/gluster_volumes/compute-ge-he-4-volume1
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.owner-gid: 36
storage.owner-uid: 36

Comment 13 Jiri Denemark 2018-09-10 08:35:19 UTC

As explained in comments #9, everything behaves as expected given the
configured NFS mount parameters.

And to clarify comment #11 a bit, we report "unspecified error" because QEMU
either reports "ok", "failed", or "nospace". Since this is clearly not "ok"
nor "nospace", we see "failed" and virsh reports that as "unspecified error".

I don't see any bug here. Except for suboptimal NFS mount parameters, which
are tracked in bug 1569926.

Note You need to log in before you can comment on or make changes to this bug.