Bug 1252804

Summary: Stop nfs service during writing to data disk which is in nfs server ,qemu can not be paused and Call Trace appear in the guest
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: chayang, juzhang, knoel, michen, qzhang, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-12 22:05:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pei Zhang 2015-08-12 09:23:28 UTC
Description of problem:
Boot a guest with data disk (it is in nfs server). Then stop nfs service during dd is writing data to data disk inside guest. We found 'Call Trace' in the guest And qemu can not be paused.

Version-Release number of selected component (if applicable):
Host:
Kernel:3.10.0-304.el7.x86_64
qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-16.el7.x86_64

Guest:
kernel:3.10.0-295.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1. mount from nfs server
# mount -o soft,timeo=60,retrans=2,nosharecache 10.66.9.120:/home /mnt

2. create data disk
# qemu-img create -f qcow2 /mnt/disk1.qcow2 10G

3.Boot a guest attach the data disk(in nfs server),with werror=stop,rerror=stop:
# /usr/libexec/qemu-kvm -name rhel7.2 -machine  pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu SandyBridge -m 2G,slots=256,maxmem=40G -numa node -smp 4,sockets=2,cores=2,threads=1 -uuid 12b1a01e-5f6c-4f5f-8d27-3855a74e4b6b \
-drive file=/home/rhel7.2.qcow2,format=qcow2,if=none,id=drive-virtio-blk-0,werror=stop,rerror=stop \
-device virtio-blk-pci,bus=pci.0,addr=0x8,drive=drive-virtio-blk-0,id=virtio-blk-0,bootindex=0 \
-drive file=/mnt/disk1.qcow2,format=qcow2,if=none,id=drive-virtio-blk-1,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk-1,id=virtio-blk-1  \
-netdev tap,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:54:00:5c:08:6d \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0 \
-spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \
-monitor stdio  -serial unix:/tmp/monitor,server,nowait\

4. During dd is writing to the data disk(/dev/vda) ininside guest, stop nfs service in the nfs server
# dd if=/dev/urandom of=/dev/vda bs=1M count=2048
# service nfs stop


Actual results:
After step 4, there are 'Call Trace' in the guest.
In the guest
# dmesg
......
[  600.565732] Call Trace:
[  600.565737]  [<ffffffff8162de49>] schedule+0x29/0x70
[  600.565739]  [<ffffffff8162bb39>] schedule_timeout+0x209/0x2d0
[  600.565742]  [<ffffffff81057b7f>] ? kvm_clock_get_cycles+0x1f/0x30
[  600.565745]  [<ffffffff810d0bcc>] ? ktime_get_ts64+0x4c/0xf0
[  600.565747]  [<ffffffff8162d47e>] io_schedule_timeout+0xae/0x130
[  600.565748]  [<ffffffff8162d518>] io_schedule+0x18/0x20
[  600.565751]  [<ffffffff812cb735>] bt_get+0x135/0x1c0
[  600.565761]  [<ffffffff8109ed20>] ? wake_up_atomic_t+0x30/0x30
[  600.565764]  [<ffffffff812cbb5f>] blk_mq_get_tag+0xbf/0xf0
[  600.565765]  [<ffffffff812c744b>] __blk_mq_alloc_request+0x1b/0x200
[  600.565767]  [<ffffffff812c8e11>] blk_mq_map_request+0x191/0x1f0
[  600.565769]  [<ffffffff812ca220>] blk_sq_make_request+0x80/0x380
[  600.565772]  [<ffffffff812bb8df>] ? generic_make_request_checks+0x24f/0x380
[  600.565774]  [<ffffffff81163e09>] ? mempool_alloc+0x69/0x170
[  600.565776]  [<ffffffff812bbaf2>] generic_make_request+0xe2/0x130
[  600.565779]  [<ffffffff812bbbb1>] submit_bio+0x71/0x150
[  600.565781]  [<ffffffff8120e2ed>] ? bio_alloc_bioset+0x1fd/0x350
[  600.565783]  [<ffffffff81209303>] _submit_bh+0x143/0x210
[  600.565784]  [<ffffffff8120bf52>] __block_write_full_page+0x162/0x380
[  600.565786]  [<ffffffff8120f750>] ? I_BDEV+0x10/0x10
[  600.565788]  [<ffffffff8120f750>] ? I_BDEV+0x10/0x10
[  600.565789]  [<ffffffff8120c33b>] block_write_full_page_endio+0xeb/0x100
[  600.565791]  [<ffffffff8120c365>] block_write_full_page+0x15/0x20
[  600.565793]  [<ffffffff8120fec8>] blkdev_writepage+0x18/0x20
[  600.565795]  [<ffffffff8116b923>] __writepage+0x13/0x50
[  600.565797]  [<ffffffff8116c441>] write_cache_pages+0x251/0x4d0
[  600.565799]  [<ffffffff8116b910>] ? global_dirtyable_memory+0x70/0x70
[  600.565801]  [<ffffffff8116c70d>] generic_writepages+0x4d/0x80
[  600.565803]  [<ffffffff8116d7be>] do_writepages+0x1e/0x40
[  600.565805]  [<ffffffff811fef90>] __writeback_single_inode+0x40/0x220
[  600.565806]  [<ffffffff811ff9fe>] writeback_sb_inodes+0x25e/0x420
[  600.565808]  [<ffffffff811ffc5f>] __writeback_inodes_wb+0x9f/0xd0
[  600.565810]  [<ffffffff812004a3>] wb_writeback+0x263/0x2f0
[  600.565812]  [<ffffffff8120272b>] bdi_writeback_workfn+0x2cb/0x460
[  600.565814]  [<ffffffff81095a9b>] process_one_work+0x17b/0x470
[  600.565815]  [<ffffffff8109686b>] worker_thread+0x11b/0x400
[  600.565816]  [<ffffffff81096750>] ? rescuer_thread+0x400/0x400
[  600.565818]  [<ffffffff8109dd2f>] kthread+0xcf/0xe0
[  600.565820]  [<ffffffff8109dc60>] ? kthread_create_on_node+0x140/0x140
[  600.565823]  [<ffffffff81638d58>] ret_from_fork+0x58/0x90
[  600.565824]  [<ffffffff8109dc60>] ? kthread_create_on_node+0x140/0x140
[  720.563248] INFO: task kworker/u8:0:6 blocked for more than 120 seconds.
[  720.564093] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  720.564768] kworker/u8:0    D ffff880036623610     0     6      2 0x00000000
[  720.564776] Workqueue: writeback bdi_writeback_workfn (flush-252:0)
[  720.564779]  ffff88007c08b5e0 0000000000000046 ffff88007c7f3980 ffff88007c08bfd8
[  720.564781]  ffff88007c08bfd8 ffff88007c08bfd8 ffff88007c7f3980 ffff88007fd94bc0
[  720.564784]  0000000000000000 7fffffffffffffff ffff88007fd9c980 ffff880036623610


in the host:
(qemu) info status
VM status: running


Expected results:
After step 4, there are not 'Call Trace' in the guest, and state of qemu should become paused.

Additional info:

Comment 3 Ademar Reis 2015-08-12 22:05:20 UTC
Same as in bug 1249911: this is expected behavior, given the architecture of QEMU. To workaround it, we would need a very complex change in the way QEMU deals with local images that's just not feasible.

You'll see similar behavior in most linux applications that access a NFS mount if communication is lost with the NFS server.

Closing it as WONTFIX.