Bug 1089873

Summary:

hang on startup using ISO image from inaccessible NFS share

Product:

Red Hat Enterprise Linux 6

Reporter:

Francesco Romani <fromani>

Component:

libvirt

Assignee:

Jiri Denemark <jdenemar>

Status:

CLOSED NOTABUG

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.5

CC:

acathrow, bili, chhu, dallan, dyuan, fromani, michal.skrivanek, mzhan, shyu

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-05-14 08:07:39 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

755528

Attachments:

Description	Flags
additional information about the test conditions	none
strace logs	none
output of strace -tt -o libvirt_strace.txt -e trace=file -f -p 2310 and libvirtd logs	none

Description Francesco Romani 2014-04-22 06:29:46 UTC

Created attachment 888369 [details]
additional information about the test conditions

Description of problem:
When investigating bz75528, found what documented in https://bugzilla.redhat.com/show_bug.cgi?id=755528#c23

When running a vm with ISO on a NFS share attached as first boot device, with startup policy "optional", but having the NFS share unreachable, the vm won't start, because libvirt will hang trying to get the permission of the said device.

output of /cat/1132/stack, being 1132 the pid of the libvirt process stuck:
[<ffffffffa04eb1b2>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc]
[<ffffffffa04eb6f5>] __rpc_execute+0xf5/0x350 [sunrpc]
[<ffffffffa04eb9b1>] rpc_execute+0x61/0xa0 [sunrpc]
[<ffffffffa04e23a5>] rpc_run_task+0x75/0x90 [sunrpc]
[<ffffffffa04e24c2>] rpc_call_sync+0x42/0x70 [sunrpc]
[<ffffffffa077f73d>] nfs3_rpc_wrapper.clone.0+0x3d/0xd0 [nfs]
[<ffffffffa077fb1c>] nfs3_proc_access+0xbc/0x180 [nfs]
[<ffffffffa076758c>] nfs_do_access+0x19c/0x240 [nfs]
[<ffffffffa07676d8>] nfs_permission+0xa8/0x1e0 [nfs]
[<ffffffff81198d43>] __link_path_walk+0xb3/0xff0
[<ffffffff81199f3a>] path_walk+0x6a/0xe0
[<ffffffff8119a14b>] filename_lookup+0x6b/0xc0
[<ffffffff8119b277>] user_path_at+0x57/0xa0
[<ffffffff81186d11>] sys_faccessat+0xe1/0x230
[<ffffffff81186e78>] sys_access+0x18/0x20
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Version-Release number of selected component (if applicable):
libvirt-0.10.2-29

How reproducible:
100%

Steps to Reproduce:
1. define a VM using a cdrom as first boot device using an ISO image on a NFS share
2. block the NFS share before to boot the VM
3. boot the VM

Actual results:
libvirt stuck trying to access the ISO image; the VM won't boot

Expected results:
VM boots

Additional info:
see the attached file for more details (XML domain, process stack trace)

Comment 2 Francesco Romani 2014-04-23 13:04:39 UTC

relevant mount/fstab line of the NFS mount on which the ISO image is stored:

shinji:/srv/ovirt/iso on /rhev/data-center/mnt/shinji:_srv_ovirt_iso type nfs (rw,soft,nosharecache,timeo=600,retrans=6,nfsvers=3,addr=192.168.1.64)

where "shinji" is 192.168.1.64

Comment 3 Jiri Denemark 2014-04-23 13:25:37 UTC

That is the NFS request should timeout in ~6 minutes.

Could you try to reproduce and keep it waiting for a significantly more time (let's say half an hour) and if it's still waiting, provide debug logs from libvirtd so that we see if we are still waiting on the first request or if multiple requests were involved? Also running "strace -e trace=file" on libvirtd at the same time could be useful too.

Comment 4 Francesco Romani 2014-04-28 11:56:41 UTC

Reproduced. It took a bit more then 12 minutes 

2014-04-28 11:37:19.680+0000: 2354: debug : qemuDomainObjBeginJobInternal:808 : Starting job: modify (async=none)
2014-04-28 11:37:19.680+0000: 2354: debug : qemuProcessStart:3606 : Beginning VM startup process
2014-04-28 11:37:19.680+0000: 2354: debug : qemuProcessStart:3618 : Setting current domain def as transient
2014-04-28 11:37:19.680+0000: 2354: debug : qemuProcessStart:3644 : Preparing host devices
2014-04-28 11:37:19.680+0000: 2354: debug : qemuProcessStart:3648 : Preparing chr devices
2014-04-28 11:37:19.680+0000: 2354: debug : qemuProcessStart:3657 : Generating domain security label (if required)
2014-04-28 11:37:19.686+0000: 2354: debug : qemuProcessStart:3666 : Ensuring no historical cgroup is lying around
2014-04-28 11:37:19.689+0000: 2354: debug : qemuProcessStart:3744 : Creating domain log file
2014-04-28 11:37:19.689+0000: 2354: debug : qemuProcessStart:3749 : Checking for KVM availability
2014-04-28 11:37:19.689+0000: 2354: debug : qemuProcessStart:3759 : Determining emulator version
2014-04-28 11:37:19.689+0000: 2354: debug : qemuCapsCacheLookup:2151 : Returning caps 0x7f0ce81166a0 for /usr/libexec/qemu-kvm
2014-04-28 11:37:19.689+0000: 2354: debug : qemuProcessStart:3768 : Checking for CDROM and floppy presence

2014-04-28 11:49:39.009+0000: 2354: debug : qemuDomainCheckDiskPresence:1920 : Dropping disk 'hdc' on domain 'F20_1' (UUID '025d59e3-f12c-41b8-be3f-de8dc22d23ce') due to inaccessible source '/rhev/data-center/mnt/192.168.1.20:_srv_virtstore_nfs_generic_iso/b0390d9e-2f1d-43c0-a56c-a20af032e934/images/11111111-1111-1111-1111-111111111111/Fedora-Live-Desktop-x86_64-20-1.iso'
2014-04-28 11:49:39.009+0000: 2354: debug : virStorageFileGetMetadata:993 : path=/rhev/data-center/mnt/blockSD/00249736-3f63-497a-b357-e6092e457e35/images/a447e318-db6c-420f-8a7e-2900190ec3f1/2133baf1-22a8-4f67-9466-160ca8df2a04 format=1 uid=107 gid=107 probe=0
2014-04-28 11:49:39.009+0000: 2354: debug : virStorageFileGetMetadataRecurse:925 : path=/rhev/data-center/mnt/blockSD/00249736-3f63-497a-b357-e6092e457e35/images/a447e318-db6c-420f-8a7e-2900190ec3f1/2133baf1-22a8-4f67-9466-160ca8df2a04 format=1 uid=107 gid=107 probe=0
2014-04-28 11:49:39.010+0000: 2354: debug : virStorageFileGetMetadataInternal:673 : path=/rhev/data-center/mnt/blockSD/00249736-3f63-497a-b357-e6092e457e35/images/a447e318-db6c-420f-8a7e-2900190ec3f1/2133baf1-22a8-4f67-9466-160ca8df2a04, fd=26, format=1
2014-04-28 11:49:39.011+0000: 2354: debug : qemuProcessStart:3798 : Setting up domain cgroup (if required)
2014-04-28 11:49:39.021+0000: 2354: debug : qemuProcessStart:3807 : Preparing monitor state
2014-04-28 11:49:39.021+0000: 2354: debug : qemuProcessStart:3843 : Assigning domain PCI addresses

It may be good enough for me. There is any way to control this timeout?

Comment 5 Michal Skrivanek 2014-05-05 14:31:27 UTC

since it's 12 minutes, that would hint that it's accessing a file twice somewhere. If we can reduce that to one and say that the reliability of the check depends on the "timeo" parameter that might be good enough.

Do you have the strace to confirm this?

Comment 6 Francesco Romani 2014-05-06 09:03:08 UTC

Created attachment 892814 [details]
strace logs

Comment 7 Francesco Romani 2014-05-06 09:34:22 UTC

Created attachment 892821 [details]
output of strace -tt -o libvirt_strace.txt  -e trace=file -f -p 2310 and libvirtd logs

Comment 8 Jiri Denemark 2014-05-06 12:42:44 UTC

Unfortunately, the strace is not very helpful:

2389  11:16:41.257783 open("/etc/group", O_RDONLY|O_CLOEXEC) = 26
2389  11:26:41.261512 --- SIGCHLD (Child exited) @ 0 (0) ---

The checks happen in a new child between these two lines. I'm not sure why -f was not enough to follow that child.

Comment 9 Jiri Denemark 2014-05-14 08:04:27 UTC

So it appears that strace on RHEL-6 is unable to trace children when it is attached to a running process. To see them, one needs to run the command using strace:

    strace -t -o nfs.strace -e trace=file -ff libvirtd --daemon --listen

Searching for the inaccessible CDROM image (RHEL-7-boot.iso in my case) gives the following result:

# grep boot.iso nfs.strace.*
nfs.strace.29789:09:59:09 access("/mnt/test/isos/RHEL-7-boot.iso", F_OK) = -1 EIO (Input/output error)


That is, we only access the file once. So there's no bug in libvirt in this area.

Comment 10 Jiri Denemark 2014-05-14 08:07:39 UTC

And after reading nfs(5), it may even work as expected as the timeout seems to be 10 minutes rather then 6:

       timeo=n        The  time  in  deciseconds  (tenths of a second) the NFS
                      client waits for a response before  it  retries  an  NFS
                      request.

                      For NFS over TCP the default timeo value is 600 (60 sec‐
                      onds).  The NFS client performs  linear  backoff:  After
                      each retransmission the timeout is increased by timeo up
                      to the maximum of 600 seconds.

                      However, for NFS over UDP, the client uses  an  adaptive
                      algorithm  to  estimate an appropriate timeout value for
                      frequently used request types (such as  READ  and  WRITE
                      requests),  but  uses the timeo setting for infrequently
                      used request types (such as FSINFO  requests).   If  the
                      timeo option is not specified, infrequently used request
                      types  are  retried  after  1.1  seconds.   After   each
                      retransmission,  the  NFS client doubles the timeout for
                      that request, up to a maximum timeout length of 60  sec‐
                      onds.

       retrans=n      The  number  of  times  the NFS client retries a request
                      before it  attempts  further  recovery  action.  If  the
                      retrans  option  is  not specified, the NFS client tries
                      each request three times.

                      The NFS client generates a "server not responding"  mes‐
                      sage after retrans retries, then attempts further recov‐
                      ery (depending on whether the hard mount  option  is  in
                      effect).

I don't really understand what they are trying to say but it seems that the real timeout is not just "timeo / 10 * retrans" as one would hope for.