Bug 688858
| Summary: | VM can't be stopped when hit read error | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | juzhang <juzhang> |
| Component: | qemu-kvm | Assignee: | Jes Sorensen <Jes.Sorensen> |
| Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.1 | CC: | kwolf, michen, mkenneth, tburke, virt-maint |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-03-29 13:32:32 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 580951 | ||
Are you sure read requests for this secondary disk where executed? What's the output of strace or tcpdump to the nfs server? (In reply to comment #2) Mount command #mount 10.66.8.113:/home/ nfs/ -o soft,timeo=2,retrans=2 > Are you sure read requests for this secondary disk where executed? Yes > What's the output of strace or tcpdump to the nfs server? ls -la /proc/`pgrep qemu`/fd dr-x------. 2 root root 0 Mar 20 17:17 . dr-xr-xr-x. 7 root root 0 Mar 20 17:16 .. lrwx------. 1 root root 64 Mar 20 17:17 0 -> /dev/pts/2 lrwx------. 1 root root 64 Mar 20 17:17 1 -> /dev/pts/2 lrwx------. 1 root root 64 Mar 20 17:17 10 -> /root/zhangjunyi/rhel5.6-virtio-64.qcow2 lrwx------. 1 root root 64 Mar 20 17:17 11 -> anon_inode:[signalfd] lr-x------. 1 root root 64 Mar 20 17:17 12 -> /dev/sr0 lrwx------. 1 root root 64 Mar 20 17:17 13 -> /root/nfs/junzhang.qcow2 lrwx------. 1 root root 64 Mar 20 17:17 14 -> anon_inode:kvm-vcpu lrwx------. 1 root root 64 Mar 20 17:17 15 -> anon_inode:kvm-vcpu lrwx------. 1 root root 64 Mar 20 17:17 16 -> anon_inode:kvm-vcpu lrwx------. 1 root root 64 Mar 20 17:17 17 -> anon_inode:kvm-vcpu lrwx------. 1 root root 64 Mar 20 17:17 18 -> socket:[1999005] lrwx------. 1 root root 64 Mar 20 17:17 19 -> socket:[1999002] lrwx------. 1 root root 64 Mar 20 17:17 2 -> /dev/pts/2 lrwx------. 1 root root 64 Mar 20 17:17 20 -> anon_inode:[eventfd] lrwx------. 1 root root 64 Mar 20 17:17 21 -> anon_inode:[eventfd] lrwx------. 1 root root 64 Mar 20 17:17 22 -> anon_inode:[signalfd] lrwx------. 1 root root 64 Mar 20 17:17 23 -> anon_inode:[eventfd] lrwx------. 1 root root 64 Mar 20 17:17 24 -> anon_inode:[eventfd] lrwx------. 1 root root 64 Mar 20 17:17 25 -> socket:[1999006] lrwx------. 1 root root 64 Mar 20 17:17 3 -> socket:[1998960] lrwx------. 1 root root 64 Mar 20 17:17 4 -> /dev/kvm lrwx------. 1 root root 64 Mar 20 17:17 5 -> anon_inode:kvm-vm lr-x------. 1 root root 64 Mar 20 17:17 6 -> pipe:[1998962] l-wx------. 1 root root 64 Mar 20 17:17 7 -> pipe:[1998962] lrwx------. 1 root root 64 Mar 20 17:17 8 -> /dev/net/tun lrwx------. 1 root root 64 Mar 20 17:17 9 -> /dev/vhost-net Please note,/root/nfs/junzhang.qcow2 is secondary disk where in nfs server. # strace -p `pidof qemu-kvm`-e trace=desc 2> a.txt Disconnect the nfs server. #service nfs stop #tail -f a.txt | grep 13 Results: got no error for file descriptor 13 in strace file when nfs is down. tail -f a.txt | grep 13 select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999413}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999413}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [11], left {0, 998813}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999613}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) select(27, [0 6 8 11 18 20 22 23 24 25 26], [], [], {1, 0}) = 1 (in [24], left {0, 999513}) (In reply to comment #3) > > Please note,/root/nfs/junzhang.qcow2 is secondary disk where in nfs server. > # strace -p `pidof qemu-kvm`-e trace=desc 2> a.txt > Disconnect the nfs server. > #service nfs stop > #tail -f a.txt | grep 13 > > Results: > got no error for file descriptor 13 in strace file when nfs is down. however,In host,when nfs server is down,I tried to enter in mount directory.hit the following messages.I think this can prove that the host has detected nfs status is disconnected #cd /root/nfs -bash: cd: /root/nfs: Input/output error What happens if you use hard nfs mounts, instead of soft mounts? (In reply to comment #5) > What happens if you use hard nfs mounts, instead of soft mounts? Using mount 10.66.8.113:/home/ nfs/ instead of mount 10.66.8.113:/home/ nfs/ -o soft,timeo=2,retrans=2. after disconnect nfs server. qemu-kvm process is hang. if you reconnect nfs server,the qemu-kvm come back. I believe soft mounts are default, could you try with -ohard ? Thanks, Jes (In reply to comment #7) > I believe soft mounts are default, could you try with -ohard ? > > Thanks, > Jes #mount 10.66.8.162:/home/ nfs/ -o hard hit the as same issue as comment6. Hi, I did some more digging into this. The problem is whether it is soft or hard mounts, the process can be stuck on a semaphore in the kernel, which doesn't get interrupted in case the NFS server disappears. Even if you mount using the 'intr' flag, you may still get stuck. The select() calls you are seeing in the strace log is simply the QEMU AIO code sitting waiting for IOs to complete. Unfortunately I don't see anything in the AIO headers that allows us to set a timeout for these operations. This is a property of NFS - not much we can do about it unfortunately. You might want to check http://nfs.sourceforge.net/#section_d for more details, see under D6. Cheers, Jes |
Description of problem: VM can't be stopped when hit read error Version-Release number of selected component (if applicable): #rpm -qa | grep qemu-kvm qemu-kvm-tools-0.12.1.2-2.149.el6.x86_64 #uname -r 2.6.32-118.el6.x86_64 guest: rhel5.6 How reproducible: 100% Steps to Reproduce: 1.In nfs server #qemu-img create -f qcow2 junzhang.qcow2 6G 2.mount nfs #mount 10.66.8.113:/home/ nfs/ -o soft,timeo=2,retrans=2 3.Boot guest with take juzhang.qcow2 as secondary disk #/usr/libexec/qemu-kvm -m 2G -smp 4 -drive file=/root/zhangjunyi/rhel5.6-virtio-64.qcow2,if=none,id=test,cache=none,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=test -cpu qemu64,+sse2,+x2apic -boot c -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:11:22:45:66:94 -monitor stdio -drive file=/dev/cdrom,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -vnc :10 -drive file=/root/nfs/junzhang.qcow2,if=none,id=test1,cache=none,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,drive=test1 -qmp tcp:0:4446,server,nowait 4.In guest,read data form secondly disk while true;do dd if=/dev/vdb of=/dev/null ;done 5.disconnect nfs server and make sure nfs service is stopped. Actual results: VM can't be stopped,in guest,still can read data from vdb Expected results: vm should be stoped with error"{"timestamp": {"seconds": 1300438562, "microseconds": 236084}, "event": "BLOCK_IO_ERROR", "data": {"device": "test1", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "__com.redhat_reason": "eio", "operation": "stop", "action": "stop"}} " Additional info: I also tested write data in vdb,vm can be stopped with messages {"timestamp": {"seconds": 1300438562, "microseconds": 237911}, "event": "BLOCK_IO_ERROR", "data": {"device": "test1", "__com.redhat_debug_info": {"message": "Input/output error", "errno": 5}, "__com.redhat_reason": "eio", "operation": "write", "action": "stop"}}