Bug 1132765
| Summary: | guest kernel crash while repeated hotplug/hot-unplug virtio scsi disk with busy serving I/O | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | mazhang <mazhang> | ||||||||
| Component: | qemu-kvm | Assignee: | Fam Zheng <famz> | ||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 6.6 | CC: | bsarathy, chayang, juzhang, mazhang, michen, mkenneth, qzhang, rbalakri, sluo, virt-maint, xigao | ||||||||
| Target Milestone: | rc | Keywords: | Reopened | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2015-03-18 06:33:28 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 929415 [details]
vmcore-dmesg.txt
qemu-kvm-0.12.1.2-2.431.el6.x86_64 hit this problem. I cannot reproduce with steps in c0. I'm using a 2.6.32-525.el6.x86_64 kernel. Error messages are seen in guest dmesg: scsi 2:0:0:0: Direct-Access QEMU QEMU HARDDISK 0.12 PQ: 0 ANSI: 5 sd 2:0:0:0: Attached scsi generic sg2 type 0 sd 2:0:0:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB) sd 2:0:0:0: [sdb] Write Protect is off sd 2:0:0:0: [sdb] Mode Sense: 63 00 00 08 sd 2:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sdb: unknown partition table sd 2:0:0:0: [sdb] Attached SCSI disk sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK sd 2:0:0:0: [sdb] CDB: Read(10): 28 00 00 14 6e 00 00 01 00 00 end_request: I/O error, dev sdb, sector 1338880 Buffer I/O error on device sdb, logical block 167360 Buffer I/O error on device sdb, logical block 167361 Buffer I/O error on device sdb, logical block 167362 Buffer I/O error on device sdb, logical block 167363 Buffer I/O error on device sdb, logical block 167364 Buffer I/O error on device sdb, logical block 167365 Buffer I/O error on device sdb, logical block 167366 Buffer I/O error on device sdb, logical block 167367 Buffer I/O error on device sdb, logical block 167368 Buffer I/O error on device sdb, logical block 167369 mazhang, can you test the latest kernel image? Fam Try reproduce this bug with latest guest kernel, but hit qemu-kvm crash.
Host:
qemu-kvm-tools-0.12.1.2-2.451.el6.x86_64
qemu-kvm-0.12.1.2-2.451.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.451.el6.x86_64
gpxe-roms-qemu-0.9.7-6.12.el6.noarch
qemu-img-0.12.1.2-2.451.el6.x86_64
2.6.32-504.el6.x86_64
Guest:
kernel-2.6.32-526.el6
Result:
qemu-kvm crash.
(gdb) bt full
#0 0x00007ffff76ef6fd in write () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00007ffff724ccf1 in ?? () from /lib64/libglib-2.0.so.0
No symbol table info available.
#2 0x00007ffff71fc837 in g_io_channel_write_chars () from /lib64/libglib-2.0.so.0
No symbol table info available.
#3 0x00007ffff7e4474e in io_channel_send (fd=0x7ffff8d1ff00, buf=0x7ffff8e87960, len=16) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:736
bytes_written = 0
offset = <value optimized out>
status = <value optimized out>
__PRETTY_FUNCTION__ = "io_channel_send"
#4 0x00007ffff7dbb471 in monitor_flush (mon=0x7ffff88f1f00) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:292
rc = <value optimized out>
len = 16
buf = 0x7ffff8e87960 "{\"return\": {}}\r\n"
#5 0x00007ffff7dbb5e4 in monitor_puts (mon=0x7ffff88f1f00, str=0x7ffff8f36fbf "") at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:326
c = <value optimized out>
#6 0x00007ffff7dbb629 in monitor_json_emitter (mon=0x7ffff88f1f00, data=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:428
json = 0x7ffff8e69fd0
__PRETTY_FUNCTION__ = "monitor_json_emitter"
#7 0x00007ffff7dbb798 in monitor_protocol_emitter (mon=0x7ffff88f1f00, data=0x0) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:464
qmp = 0x7fff4a22b020
#8 0x00007ffff7dbb960 in monitor_call_handler (mon=0x7ffff88f1f00, cmd=0x7ffff82bfa00, params=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4390
ret = <value optimized out>
data = 0x0
#9 0x00007ffff7dbc574 in handle_qmp_command (parser=<value optimized out>, tokens=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:5003
err = <value optimized out>
obj = <value optimized out>
input = <value optimized out>
args = 0x7ffffc8eff70
cmd = 0x7ffff82bfa00
mon = 0x7ffff88f1f00
cmd_name = <value optimized out>
query_cmd = 0x0
__func__ = "handle_qmp_command"
#10 0x00007ffff7e20f04 in json_message_process_token (lexer=0x7ffff88f1fb0, token=0x7ffff909efd0, type=JSON_OPERATOR, x=52, y=499)
at /usr/src/debug/qemu-kvm-0.12.1.2/json-streamer.c:87
parser = 0x7ffff88f1fa8
dict = 0x7fff48cb7f00
Created attachment 984509 [details]
back trace
It's not a crash, because your monitor connection is disconnected for some reason, which is not related to virtio-scsi or guest kernel. Do you get this all the time? What are the steps? Anyway it's different from results in comment 0. Fam (In reply to Fam Zheng from comment #7) > It's not a crash, because your monitor connection is disconnected for some > reason, which is not related to virtio-scsi or guest kernel. Do you get this > all the time? What are the steps? > > Anyway it's different from results in comment 0. > > Fam Not all the time, about 50%, and can't reproduce guest kernel crash. Steps is the same as comment 0. So I'm going to close this bug as the guest crash issue is gone. Please file new bugs if there are other issues. (Again, SIGPIPE is not a crash, it's just the other end of the monitor is closed. In your test it's possibly the bash script exited.) Fam (In reply to Fam Zheng from comment #9) > So I'm going to close this bug as the guest crash issue is gone. Please file > new bugs if there are other issues. (Again, SIGPIPE is not a crash, it's > just the other end of the monitor is closed. In your test it's possibly the > bash script exited.) > > Fam Set gdb SIGPIPE nostop, break didn't happen. Hit this problem on kernel-2.6.32-544.el6.
Host:
2.6.32-544.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.457.el6.x86_64
qemu-img-rhev-0.12.1.2-2.457.el6.x86_64
qemu-kvm-rhev-debuginfo-0.12.1.2-2.457.el6.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.457.el6.x86_64
Guest:
2.6.32-544.el6.x86_64
Steps:
1. Boot vm
/usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-m 2G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-name rhel6 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-rtc base=localtime,driftfix=slew \
-nodefaults \
-monitor stdio \
-qmp tcp:0:6779,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-serial unix:/tmp/console0,server,nowait \
-spice port=5900,disable-ticketing \
-vga qxl \
-usb -device usb-tablet,id=input0 \
-netdev tap,id=tap0 \
-device virtio-net-pci,netdev=tap0,id=net0,mac=52:54:00:11:11:15 \
-device virtio-scsi-pci,id=scsi0 \
-drive file=/home/rhel6-64.qcow2,if=none,id=drive-scsi-disk,format=qcow2,cache=none,werror=stop,rerror=stop \
-device scsi-hd,drive=drive-scsi-disk,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk,bootindex=1 \
-device virtio-scsi-pci,id=bus1,bus=pci.0,addr=0x7 \
-drive file=/home/storage0.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=none,aio=native,id=scsi-disk0 \
-device scsi-hd,bus=bus1.0,drive=scsi-disk0,id=disk \
2. Run dd test in guest.
# dd if=/dev/zero of=/dev/sdb bs=1M count=10240
3. Hotplug/unplug virtio scsi disk.
[root@dhcp-9-236 ~]# cat repeat-hotplug.sh
#!/bin/bash
# some simply group snapshot stress testing
let i=0
exec 3<>/dev/tcp/localhost/6779
echo -e "{ 'execute': 'qmp_capabilities' }" >&3
read response <&3
echo $response
while [ $i -lt 200 ]
do
echo -e '{"execute":"device_add","arguments":{"driver":"virtio-scsi-pci","id":"test30"}}' >&3
read response <&3; echo "$i: $response"
echo -e '{"execute":"__com.redhat_drive_add", "arguments": {"file":"/tmp/storage.qcow2","format":"qcow2","id":"test30"}}' >&3
read response <&3; echo "$i: $response"
echo -e '{"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"test30","id":"test31"}}' >&3
read response <&3; echo "$i: $response"
sleep 2
echo -e '{"execute":"device_del","arguments":{"id":"test31"}}' >&3
read response <&3; echo "$i: $response"
echo -e '{"execute":"device_del","arguments":{"id":"test30"}}' >&3
read response <&3; echo "$i: $response"
let i=$i+1
sleep 2
done
Created attachment 1002139 [details]
vmcore-dmesg
Yes. Might be the same with bz 1199421. I'm looking at it. *** This bug has been marked as a duplicate of bug 1199421 *** mazhang, could you test this build (fix for 1199421) to make sure this one is a duplicate? https://brewweb.devel.redhat.com/taskinfo?taskID=8864631 |
Description of problem: Guest kernel crash while repeated hotplug/hot-unplug virtio scsi disk with busy serving I/O Version-Release number of selected component (if applicable): Host: qemu-img-0.12.1.2-2.438.el6.x86_64 qemu-kvm-tools-0.12.1.2-2.438.el6.x86_64 qemu-kvm-0.12.1.2-2.438.el6.x86_64 gpxe-roms-qemu-0.9.7-6.12.el6.noarch qemu-kvm-debuginfo-0.12.1.2-2.438.el6.x86_64 kernel-2.6.32-497.el6.x86_64 Guest: kernel-2.6.32-497.el6.x86_64 How reproducible: 3/3 Steps to Reproduce: 1.Boot guest: /usr/libexec/qemu-kvm \ -machine rhel6.6.0,dump-guest-core=off \ -cpu SandyBridge \ -m 2G \ -smp 4,sockets=2,cores=2,threads=1,maxcpus=160 \ -enable-kvm \ -name rhel6.6 \ -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \ -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \ -k en-us \ -rtc base=localtime,clock=host,driftfix=slew \ -nodefaults \ -monitor stdio \ -qmp tcp:0:5555,server,nowait \ -boot menu=on \ -bios /usr/share/seabios/bios.bin \ -monitor unix:/tmp/monitor2,server,nowait \ -vga qxl \ -spice port=5900,disable-ticketing \ -netdev tap,id=hostnet0,vhost=on \ -device e1000,netdev=hostnet0,id=net0,mac=00:01:02:B6:40:22 \ -usb \ -device usb-tablet,id=tablet0 \ -device virtio-scsi-pci,id=scsi0 \ -drive file=/home/RHEL-Server-6.6-64.qcow2,if=none,media=disk,id=drive-scsi-disk,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native \ -device scsi-hd,drive=drive-scsi-disk,bus=scsi0.0,id=scsi-disk0,bootindex=0 \ 2.Hotplug/unplug disk. [root@dhcp-11-12 ~]# cat repeated_plug_and_unplug.sh #!/bin/bash # some simply group snapshot stress testing let i=0 exec 3<>/dev/tcp/localhost/5555 echo -e "{ 'execute': 'qmp_capabilities' }" >&3 read response <&3 echo $response while [ $i -lt 100 ] do echo -e '{"execute":"device_add","arguments":{"driver":"virtio-scsi-pci","id":"test30"}}' >&3 read response <&3; echo "$i: $response" echo -e '{"execute":"__com.redhat_drive_add", "arguments": {"file":"/home/storage.qcow2","format":"qcow2","id":"test30"}}' >&3 read response <&3; echo "$i: $response" echo -e '{"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"test30","id":"test31"}}' >&3 read response <&3; echo "$i: $response" sleep 1 echo -e '{"execute":"device_del","arguments":{"id":"test31"}}' >&3 read response <&3; echo "$i: $response" echo -e '{"execute":"device_del","arguments":{"id":"test30"}}' >&3 read response <&3; echo "$i: $response" let i=$i+1 sleep 1 done 3. Run dd test in guest during hotplug/unplug disk. [root@vm0 ~]# dd if=/dev/zero of=/home/aaa bs=1M count=4096 Actual results: Guest kernel crash. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: /usr/lib/debug/lib/modules/2.6.32-497.el6.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 4 DATE: Fri Aug 22 19:07:43 2014 UPTIME: 00:12:22 LOAD AVERAGE: 3.01, 1.29, 0.51 TASKS: 213 NODENAME: vm0 RELEASE: 2.6.32-497.el6.x86_64 VERSION: #1 SMP Fri Aug 15 17:13:42 EDT 2014 MACHINE: x86_64 (3392 Mhz) MEMORY: 2 GB PANIC: "" PID: 3391 COMMAND: "hald-probe-stor" TASK: ffff88007a062ae0 [THREAD_INFO: ffff880076e90000] CPU: 2 STATE: TASK_RUNNING (PANIC) crash> bt PID: 3391 TASK: ffff88007a062ae0 CPU: 2 COMMAND: "hald-probe-stor" #0 [ffff880076e91640] machine_kexec at ffffffff8103b5cb #1 [ffff880076e916a0] crash_kexec at ffffffff810c9922 #2 [ffff880076e91770] oops_end at ffffffff8152dd50 #3 [ffff880076e917a0] die at ffffffff81010fab #4 [ffff880076e917d0] do_general_protection at ffffffff8152d852 #5 [ffff880076e91800] general_protection at ffffffff8152d025 [exception RIP: strnlen+9] RIP: ffffffff81293449 RSP: ffff880076e918b8 RFLAGS: 00010086 RAX: ffffffff817bbf5b RBX: ffffffff81eb8c00 RCX: 0000000000000002 RDX: 0002000100001000 RSI: ffffffffffffffff RDI: 0002000100001000 RBP: ffff880076e918b8 R8: 0000000000000073 R9: 27203a7463656a62 R10: 726177647261483e R11: 203a656d616e2065 R12: ffffffff81eb880d R13: 0002000100001000 R14: 00000000ffffffff R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff880076e918c0] string at ffffffff81294730 #7 [ffff880076e91900] vsnprintf at ffffffff81296168 #8 [ffff880076e919a0] vscnprintf at ffffffff812965f1 #9 [ffff880076e919c0] vprintk at ffffffff81075bc6 #10 [ffff880076e91a60] warn_slowpath_common at ffffffff81074ded #11 [ffff880076e91aa0] warn_slowpath_fmt at ffffffff81074ee6 #12 [ffff880076e91b00] kobject_put at ffffffff8128dac0 #13 [ffff880076e91b20] put_device at ffffffff81367727 #14 [ffff880076e91b30] scsi_host_dev_release at ffffffff8138089c #15 [ffff880076e91b60] device_release at ffffffff81367ec7 #16 [ffff880076e91b80] kobject_release at ffffffff8128dc1d #17 [ffff880076e91bb0] kref_put at ffffffff8128f107 #18 [ffff880076e91bd0] kobject_put at ffffffff8128da97 #19 [ffff880076e91bf0] put_device at ffffffff81367727 #20 [ffff880076e91c00] scsi_target_dev_release at ffffffff81389852 #21 [ffff880076e91c20] device_release at ffffffff81367ec7 #22 [ffff880076e91c40] kobject_release at ffffffff8128dc1d #23 [ffff880076e91c70] kref_put at ffffffff8128f107 #24 [ffff880076e91c90] kobject_put at ffffffff8128da97 #25 [ffff880076e91cb0] put_device at ffffffff81367727 #26 [ffff880076e91cc0] scsi_device_dev_release_usercontext at ffffffff8138d3c0 #27 [ffff880076e91d10] execute_in_process_context at ffffffff81098955 #28 [ffff880076e91d20] scsi_device_dev_release at ffffffff8138d2cc #29 [ffff880076e91d30] device_release at ffffffff81367ec7 #30 [ffff880076e91d50] kobject_release at ffffffff8128dc1d #31 [ffff880076e91d80] kref_put at ffffffff8128f107 #32 [ffff880076e91da0] kobject_put at ffffffff8128da97 #33 [ffff880076e91dc0] put_device at ffffffff81367727 #34 [ffff880076e91dd0] scsi_device_put at ffffffff8137e2f4 #35 [ffff880076e91df0] scsi_disk_put at ffffffffa006952a [sd_mod] #36 [ffff880076e91e10] sd_release at ffffffffa006a708 [sd_mod] #37 [ffff880076e91e30] __blkdev_put at ffffffff811cb6d6 #38 [ffff880076e91e80] blkdev_put at ffffffff811cb6f0 #39 [ffff880076e91e90] blkdev_close at ffffffff811cb733 #40 [ffff880076e91ec0] __fput at ffffffff8118f7c5 #41 [ffff880076e91f10] fput at ffffffff8118f905 #42 [ffff880076e91f20] filp_close at ffffffff8118ab5d #43 [ffff880076e91f50] sys_close at ffffffff8118ac35 #44 [ffff880076e91f80] system_call_fastpath at ffffffff8100b072 RIP: 0000003ed380e7a0 RSP: 00007fff31f65f38 RFLAGS: 00010202 RAX: 0000000000000003 RBX: ffffffff8100b072 RCX: 0000003ed8e20bc0 RDX: 0000000000f7c360 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 0000000000f7a010 R8: 00000000ffffffff R9: 0000000000200000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000f7b240 R13: 0000000000f7a420 R14: 00007fff31f65fc0 R15: 0000000000f7a420 ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b Expected results: Guest works well. Additional info: