Bug 908154

Summary: guest hang and call trace after remove data disk device while guest fs was in frozen status
Product: Red Hat Enterprise Linux 7 Reporter: Sibiao Luo <sluo>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0CC: acathrow, chayang, flang, juzhang, lcapitulino, michen, qzhang, sluo, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-06 11:25:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sibiao Luo 2013-02-06 02:51:26 UTC
Description of problem:
boot a guest with a data disk, and mkfs to it, then frozen the guest fs via virtagent commands "guest-fsfreeze-freeze", and try to delete the data disk device via monitor, the guest will hang and call trace.

host info:
kernel-3.7.0-0.34.el7
qemu-kvm-1.3.0-3.el7
guest info:
kernel-3.7.0-0.34.el7
qemu-guest-agent-1.3.0-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.start guest with virtio-serial and attach a data disk.
e.g:...-drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x8 -device scsi-hd,drive=drive-data-disk,bus=scsi1.0,id=data-disk -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0,id=sr0
2.start guest agent inside guest and connect the chardev socket in host side for sending commands to guest.
guest] # /usr/bin/qemu-ga on
host ] # nc -U /tmp/qga.sock
3.format the data disk in guest.
guest] # mkfs.ext4 /dev/sdb
4.frozen the guest fs via virtagent commands "guest-fsfreeze-freeze".
{"execute":"guest-fsfreeze-freeze"}
{"return": 2}
{"execute":"guest-fsfreeze-status"}
{"return": "frozen"}
6.try to delete the data disk device via monitor.
(qemu) device_del data-disk
 
Actual results:
after step 3, format the data disk successfully.
after step 6, guest will hang and call trace, i will paste the call trace log later.

Expected results:
can remove the data disk device via monitor without any error.

Additional info:

Comment 1 Sibiao Luo 2013-02-06 02:52:05 UTC
[  955.213337] sd 3:0:0:0: [sdb] Synchronizing SCSI cache
[  955.213910] sd 3:0:0:0: [sdb]  
[  955.213912] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1080.563138] INFO: task auditd:585 blocked for more than 120 seconds.
[ 1080.564563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.566335] auditd          D ffff88007fc14100     0   585      1 0x00000000
[ 1080.566341]  ffff880079a7bc48 0000000000000086 ffff880078874bf0 ffff880079a7bfd8
[ 1080.566345]  ffff880079a7bfd8 ffff880079a7bfd8 ffff880078920000 ffff880078874bf0
[ 1080.566348]  ffff880079a7bc88 ffff880035d64000 0000000000000001 0000000000000001
[ 1080.566352] Call Trace:
[ 1080.566361]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1080.566367]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1080.566373]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1080.566379]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1080.566384]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1080.566388]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1080.566392]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1080.566395]  [<ffffffff81196642>] sys_write+0x52/0xa0
[ 1080.566399]  [<ffffffff815f4c69>] ? do_device_not_available+0x19/0x20
[ 1080.566826]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1080.566830] INFO: task rs:main Q:Reg:619 blocked for more than 120 seconds.
[ 1080.568075] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.569714] rs:main Q:Reg   D ffff88007fc14100     0   619      1 0x00000080
[ 1080.569718]  ffff8800781f7c48 0000000000000086 ffff8800782a9950 ffff8800781f7fd8
[ 1080.569722]  ffff8800781f7fd8 ffff8800781f7fd8 ffffffff818cb420 ffff8800782a9950
[ 1080.569725]  ffff8800781f7c88 ffff880035d64000 0000000000000001 0000000000000001
[ 1080.569729] Call Trace:
[ 1080.569734]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1080.569738]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1080.569742]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1080.569746]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1080.569750]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1080.569756]  [<ffffffff810b9988>] ? get_futex_key+0x148/0x290
[ 1080.569760]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1080.569764]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1080.569767]  [<ffffffff81196642>] sys_write+0x52/0xa0
[ 1080.569771]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1080.570204] INFO: task gnome-shell:1477 blocked for more than 120 seconds.
[ 1080.571383] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.573134] gnome-shell     D ffff88007fc14100     0  1477   1366 0x00000080
[ 1080.573138]  ffff880035b75c48 0000000000000086 ffff8800787f0000 ffff880035b75fd8
[ 1080.573142]  ffff880035b75fd8 ffff880035b75fd8 ffffffff818cb420 ffff8800787f0000
[ 1080.573145]  0000000000001000 ffff880035d64000 0000000000000001 0000000000000001
[ 1080.573149] Call Trace:
[ 1080.573154]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1080.573158]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1080.573162]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1080.573166]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1080.573170]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1080.573174]  [<ffffffff8115b668>] ? handle_pte_fault+0x1a8/0x430
[ 1080.573178]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1080.573182]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1080.573185]  [<ffffffff811967da>] sys_pwrite64+0x9a/0xb0
[ 1080.573189]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1080.573619] INFO: task sadc:1653 blocked for more than 120 seconds.
[ 1080.574540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.576350] sadc            D ffff88007fc14100     0  1653   1651 0x00000084
[ 1080.576354]  ffff88006069dc48 0000000000000082 ffff880077d5e540 ffff88006069dfd8
[ 1080.576358]  ffff88006069dfd8 ffff88006069dfd8 ffff88007c7f6540 ffff880077d5e540
[ 1080.576361]  ffff88006069dc28 ffff880035d64000 0000000000000001 0000000000000001
[ 1080.576364] Call Trace:
[ 1080.576369]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1080.576373]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1080.576377]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1080.576381]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1080.576386]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1080.576391]  [<ffffffff8116ce3d>] ? free_pages_and_swap_cache+0xad/0xd0
[ 1080.576395]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1080.576398]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1080.576402]  [<ffffffff81196642>] sys_write+0x52/0xa0
[ 1080.576405]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1200.576177] INFO: task auditd:585 blocked for more than 120 seconds.
[ 1200.577483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.579148] auditd          D ffff88007fc14100     0   585      1 0x00000000
[ 1200.579154]  ffff880079a7bc48 0000000000000086 ffff880078874bf0 ffff880079a7bfd8
[ 1200.579158]  ffff880079a7bfd8 ffff880079a7bfd8 ffff880078920000 ffff880078874bf0
[ 1200.579161]  ffff880079a7bc88 ffff880035d64000 0000000000000001 0000000000000001
[ 1200.579165] Call Trace:
[ 1200.579173]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1200.579180]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1200.579185]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1200.579191]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1200.579196]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1200.579200]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1200.579204]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1200.579207]  [<ffffffff81196642>] sys_write+0x52/0xa0
[ 1200.579633]  [<ffffffff815f4c69>] ? do_device_not_available+0x19/0x20
[ 1200.579638]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1200.579638] INFO: task rs:main Q:Reg:619 blocked for more than 120 seconds.
[ 1200.580598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.582191] rs:main Q:Reg   D ffff88007fc14100     0   619      1 0x00000080
[ 1200.582195]  ffff8800781f7c48 0000000000000086 ffff8800782a9950 ffff8800781f7fd8
[ 1200.582199]  ffff8800781f7fd8 ffff8800781f7fd8 ffffffff818cb420 ffff8800782a9950
[ 1200.582202]  ffff8800781f7c88 ffff880035d64000 0000000000000001 0000000000000001
[ 1200.582205] Call Trace:
[ 1200.582210]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1200.582214]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1200.582218]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1200.582222]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1200.582226]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1200.582232]  [<ffffffff810b9988>] ? get_futex_key+0x148/0x290
[ 1200.582236]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1200.582240]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1200.582243]  [<ffffffff81196642>] sys_write+0x52/0xa0
[ 1200.582247]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1200.582681] INFO: task gnome-shell:1477 blocked for more than 120 seconds.
[ 1200.583605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.584578] gnome-shell     D ffff88007fc14100     0  1477   1366 0x00000080
[ 1200.584581]  ffff880035b75c48 0000000000000086 ffff8800787f0000 ffff880035b75fd8
[ 1200.584583]  ffff880035b75fd8 ffff880035b75fd8 ffffffff818cb420 ffff8800787f0000
[ 1200.584585]  0000000000001000 ffff880035d64000 0000000000000001 0000000000000001
[ 1200.584587] Call Trace:
[ 1200.584590]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1200.584593]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1200.584595]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1200.584598]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1200.584600]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1200.584603]  [<ffffffff8115b668>] ? handle_pte_fault+0x1a8/0x430
[ 1200.584605]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1200.584608]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1200.584610]  [<ffffffff811967da>] sys_pwrite64+0x9a/0xb0
[ 1200.584612]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1200.584617] INFO: task sadc:1653 blocked for more than 120 seconds.
[ 1200.585398] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1200.586281] sadc            D ffff88007fc14100     0  1653   1651 0x00000084
[ 1200.586283]  ffff88006069dc48 0000000000000082 ffff880077d5e540 ffff88006069dfd8
[ 1200.586285]  ffff88006069dfd8 ffff88006069dfd8 ffff88007c7f6540 ffff880077d5e540
[ 1200.586287]  ffff88006069dc28 ffff880035d64000 0000000000000001 0000000000000001
[ 1200.586289] Call Trace:
[ 1200.586292]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1200.586295]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1200.586297]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1200.586300]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1200.586302]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1200.586306]  [<ffffffff8116ce3d>] ? free_pages_and_swap_cache+0xad/0xd0
[ 1200.586308]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1200.586310]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1200.586312]  [<ffffffff81196642>] sys_write+0x52/0xa0
[ 1200.586315]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1320.586188] INFO: task auditd:585 blocked for more than 120 seconds.
[ 1320.587549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1320.589048] auditd          D ffff88007fc14100     0   585      1 0x00000000
[ 1320.589053]  ffff880079a7bc48 0000000000000086 ffff880078874bf0 ffff880079a7bfd8
[ 1320.589057]  ffff880079a7bfd8 ffff880079a7bfd8 ffff880078920000 ffff880078874bf0
[ 1320.589060]  ffff880079a7bc88 ffff880035d64000 0000000000000001 0000000000000001
[ 1320.589064] Call Trace:
[ 1320.589073]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1320.589079]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1320.589085]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1320.589090]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1320.589095]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1320.589099]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1320.589103]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1320.589107]  [<ffffffff81196642>] sys_write+0x52/0xa0
[ 1320.589111]  [<ffffffff815f4c69>] ? do_device_not_available+0x19/0x20
[ 1320.589116]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b
[ 1320.589120] INFO: task rs:main Q:Reg:619 blocked for more than 120 seconds.
[ 1320.590461] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1320.592056] rs:main Q:Reg   D ffff88007fc14100     0   619      1 0x00000080
[ 1320.592060]  ffff8800781f7c48 0000000000000086 ffff8800782a9950 ffff8800781f7fd8
[ 1320.592064]  ffff8800781f7fd8 ffff8800781f7fd8 ffffffff818cb420 ffff8800782a9950
[ 1320.592067]  ffff8800781f7c88 ffff880035d64000 0000000000000001 0000000000000001
[ 1320.592070] Call Trace:
[ 1320.592075]  [<ffffffff815f2a39>] schedule+0x29/0x70
[ 1320.592079]  [<ffffffff81198394>] __sb_start_write+0xc4/0x110
[ 1320.592083]  [<ffffffff81084390>] ? wake_up_bit+0x40/0x40
[ 1320.592088]  [<ffffffff81133ff3>] generic_file_aio_write+0x63/0x100
[ 1320.592092]  [<ffffffff81211e19>] ext4_file_write+0xa9/0x490
[ 1320.592097]  [<ffffffff810b9988>] ? get_futex_key+0x148/0x290
[ 1320.592101]  [<ffffffff81195c47>] do_sync_write+0xa7/0xe0
[ 1320.592105]  [<ffffffff811962fc>] vfs_write+0xac/0x180
[ 1320.592108]  [<ffffffff81196642>] sys_write+0x52/0xa0
[ 1320.592112]  [<ffffffff815fc219>] system_call_fastpath+0x16/0x1b

Comment 2 Sibiao Luo 2013-02-06 02:55:11 UTC
my qemu-kvm command line:
# /usr/libexec/qemu-kvm -M pc-1.3 -cpu Westmere -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -no-kvm-pit-reinjection -usb -device usb-tablet,id=input0 -name sluo-test -uuid 350e716b-5f98-4bf0-9a2a-c8e423295244 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port2 -drive file=/home/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=drive-system-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device scsi-hd,drive=drive-system-disk,bus=scsi0.0,id=system-disk,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=08:2E:5F:0A:0D:B1,bus=pci.0,addr=0x5,bootindex=2,event_idx=off -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -device usb-ehci,id=ehci,bus=pci.0,addr=0x7 -chardev spicevmc,name=usbredir,id=usbredirchardev1 -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,bus=ehci.0,debug=3 -drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x8 -device scsi-hd,drive=drive-data-disk,bus=scsi1.0,id=data-disk -k en-us -boot menu=on -qmp tcp:0:4444,server,nowait -serial unix:/tmp/ttyS0,server,nowait -vnc :1 -drive file=/home/floppy.vfd,if=none,id=drive-fdc0-0-0,format=raw -global isa-fdc.driveA=drive-fdc0-0-0 -drive file=/home/my-cdrom.iso,if=none,media=cdrom,format=raw,id=drive-ide1-0-1 -device ide-drive,drive=drive-ide1-0-1,id=ide1-0-1,bus=ide.0,unit=0 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0,id=sr0 -monitor stdio

Comment 3 Luiz Capitulino 2013-02-06 11:25:09 UTC
The backtrace is because you have several tasks blocked for a long time, it doesn't seem to have anything to do with the disk removal.