Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1132765

Summary:

guest kernel crash while repeated hotplug/hot-unplug virtio scsi disk with busy serving I/O

Product:

Red Hat Enterprise Linux 6

Reporter:

mazhang <mazhang>

Component:

qemu-kvm

Assignee:

Fam Zheng <famz>

Status:

CLOSED DUPLICATE

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

high

Docs Contact:

Priority:

high

Version:

6.6

CC:

bsarathy, chayang, juzhang, mazhang, michen, mkenneth, qzhang, rbalakri, sluo, virt-maint, xigao

Target Milestone:

Keywords:

Reopened

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-03-18 06:33:28 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
vmcore-dmesg.txt	none
back trace	none
vmcore-dmesg	none

Description mazhang 2014-08-22 03:23:41 UTC

Description of problem:
Guest kernel crash while repeated hotplug/hot-unplug virtio scsi disk with busy serving I/O

Version-Release number of selected component (if applicable):

Host:
qemu-img-0.12.1.2-2.438.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.438.el6.x86_64
qemu-kvm-0.12.1.2-2.438.el6.x86_64
gpxe-roms-qemu-0.9.7-6.12.el6.noarch
qemu-kvm-debuginfo-0.12.1.2-2.438.el6.x86_64
kernel-2.6.32-497.el6.x86_64

Guest:
kernel-2.6.32-497.el6.x86_64

How reproducible:
3/3

Steps to Reproduce:
1.Boot guest:
/usr/libexec/qemu-kvm \
-machine rhel6.6.0,dump-guest-core=off \
-cpu SandyBridge \
-m 2G \
-smp 4,sockets=2,cores=2,threads=1,maxcpus=160 \
-enable-kvm \
-name rhel6.6 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-rtc base=localtime,clock=host,driftfix=slew \
-nodefaults \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-monitor unix:/tmp/monitor2,server,nowait \
-vga qxl \
-spice port=5900,disable-ticketing \
-netdev tap,id=hostnet0,vhost=on \
-device e1000,netdev=hostnet0,id=net0,mac=00:01:02:B6:40:22 \
-usb \
-device usb-tablet,id=tablet0 \
-device virtio-scsi-pci,id=scsi0 \
-drive file=/home/RHEL-Server-6.6-64.qcow2,if=none,media=disk,id=drive-scsi-disk,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native \
-device scsi-hd,drive=drive-scsi-disk,bus=scsi0.0,id=scsi-disk0,bootindex=0 \

2.Hotplug/unplug disk.

[root@dhcp-11-12 ~]# cat repeated_plug_and_unplug.sh 
#!/bin/bash
# some simply group snapshot stress testing

let i=0
exec 3<>/dev/tcp/localhost/5555
echo -e "{ 'execute': 'qmp_capabilities' }" >&3
read response <&3
echo $response
while [ $i -lt 100 ]
do
    echo -e '{"execute":"device_add","arguments":{"driver":"virtio-scsi-pci","id":"test30"}}' >&3
    read response <&3;  echo "$i: $response"
    echo -e '{"execute":"__com.redhat_drive_add", "arguments": {"file":"/home/storage.qcow2","format":"qcow2","id":"test30"}}' >&3
    read response <&3;  echo "$i: $response"
    echo -e '{"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"test30","id":"test31"}}' >&3
    read response <&3;  echo "$i: $response"
    sleep 1
    echo -e '{"execute":"device_del","arguments":{"id":"test31"}}' >&3
    read response <&3;  echo "$i: $response"
    echo -e '{"execute":"device_del","arguments":{"id":"test30"}}' >&3
    read response <&3;  echo "$i: $response"
    let i=$i+1
    sleep 1
done


3. Run dd test in guest during hotplug/unplug disk.
[root@vm0 ~]# dd if=/dev/zero of=/home/aaa bs=1M count=4096

Actual results:
Guest kernel crash.

This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-497.el6.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 4
        DATE: Fri Aug 22 19:07:43 2014
      UPTIME: 00:12:22
LOAD AVERAGE: 3.01, 1.29, 0.51
       TASKS: 213
    NODENAME: vm0
     RELEASE: 2.6.32-497.el6.x86_64
     VERSION: #1 SMP Fri Aug 15 17:13:42 EDT 2014
     MACHINE: x86_64  (3392 Mhz)
      MEMORY: 2 GB
       PANIC: ""
         PID: 3391
     COMMAND: "hald-probe-stor"
        TASK: ffff88007a062ae0  [THREAD_INFO: ffff880076e90000]
         CPU: 2
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 3391   TASK: ffff88007a062ae0  CPU: 2   COMMAND: "hald-probe-stor"
 #0 [ffff880076e91640] machine_kexec at ffffffff8103b5cb
 #1 [ffff880076e916a0] crash_kexec at ffffffff810c9922
 #2 [ffff880076e91770] oops_end at ffffffff8152dd50
 #3 [ffff880076e917a0] die at ffffffff81010fab
 #4 [ffff880076e917d0] do_general_protection at ffffffff8152d852
 #5 [ffff880076e91800] general_protection at ffffffff8152d025
    [exception RIP: strnlen+9]
    RIP: ffffffff81293449  RSP: ffff880076e918b8  RFLAGS: 00010086
    RAX: ffffffff817bbf5b  RBX: ffffffff81eb8c00  RCX: 0000000000000002
    RDX: 0002000100001000  RSI: ffffffffffffffff  RDI: 0002000100001000
    RBP: ffff880076e918b8   R8: 0000000000000073   R9: 27203a7463656a62
    R10: 726177647261483e  R11: 203a656d616e2065  R12: ffffffff81eb880d
    R13: 0002000100001000  R14: 00000000ffffffff  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff880076e918c0] string at ffffffff81294730
 #7 [ffff880076e91900] vsnprintf at ffffffff81296168
 #8 [ffff880076e919a0] vscnprintf at ffffffff812965f1
 #9 [ffff880076e919c0] vprintk at ffffffff81075bc6
#10 [ffff880076e91a60] warn_slowpath_common at ffffffff81074ded
#11 [ffff880076e91aa0] warn_slowpath_fmt at ffffffff81074ee6
#12 [ffff880076e91b00] kobject_put at ffffffff8128dac0
#13 [ffff880076e91b20] put_device at ffffffff81367727
#14 [ffff880076e91b30] scsi_host_dev_release at ffffffff8138089c
#15 [ffff880076e91b60] device_release at ffffffff81367ec7
#16 [ffff880076e91b80] kobject_release at ffffffff8128dc1d
#17 [ffff880076e91bb0] kref_put at ffffffff8128f107
#18 [ffff880076e91bd0] kobject_put at ffffffff8128da97
#19 [ffff880076e91bf0] put_device at ffffffff81367727
#20 [ffff880076e91c00] scsi_target_dev_release at ffffffff81389852
#21 [ffff880076e91c20] device_release at ffffffff81367ec7
#22 [ffff880076e91c40] kobject_release at ffffffff8128dc1d
#23 [ffff880076e91c70] kref_put at ffffffff8128f107
#24 [ffff880076e91c90] kobject_put at ffffffff8128da97
#25 [ffff880076e91cb0] put_device at ffffffff81367727
#26 [ffff880076e91cc0] scsi_device_dev_release_usercontext at ffffffff8138d3c0
#27 [ffff880076e91d10] execute_in_process_context at ffffffff81098955
#28 [ffff880076e91d20] scsi_device_dev_release at ffffffff8138d2cc
#29 [ffff880076e91d30] device_release at ffffffff81367ec7
#30 [ffff880076e91d50] kobject_release at ffffffff8128dc1d
#31 [ffff880076e91d80] kref_put at ffffffff8128f107
#32 [ffff880076e91da0] kobject_put at ffffffff8128da97
#33 [ffff880076e91dc0] put_device at ffffffff81367727
#34 [ffff880076e91dd0] scsi_device_put at ffffffff8137e2f4
#35 [ffff880076e91df0] scsi_disk_put at ffffffffa006952a [sd_mod]
#36 [ffff880076e91e10] sd_release at ffffffffa006a708 [sd_mod]
#37 [ffff880076e91e30] __blkdev_put at ffffffff811cb6d6
#38 [ffff880076e91e80] blkdev_put at ffffffff811cb6f0
#39 [ffff880076e91e90] blkdev_close at ffffffff811cb733
#40 [ffff880076e91ec0] __fput at ffffffff8118f7c5
#41 [ffff880076e91f10] fput at ffffffff8118f905
#42 [ffff880076e91f20] filp_close at ffffffff8118ab5d
#43 [ffff880076e91f50] sys_close at ffffffff8118ac35
#44 [ffff880076e91f80] system_call_fastpath at ffffffff8100b072
    RIP: 0000003ed380e7a0  RSP: 00007fff31f65f38  RFLAGS: 00010202
    RAX: 0000000000000003  RBX: ffffffff8100b072  RCX: 0000003ed8e20bc0
    RDX: 0000000000f7c360  RSI: 0000000000000000  RDI: 0000000000000004
    RBP: 0000000000f7a010   R8: 00000000ffffffff   R9: 0000000000200000
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000f7b240
    R13: 0000000000f7a420  R14: 00007fff31f65fc0  R15: 0000000000f7a420
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b


Expected results:
Guest works well.

Additional info:

Comment 1 mazhang 2014-08-22 03:24:24 UTC

Created attachment 929415 [details]
vmcore-dmesg.txt

Comment 3 mazhang 2014-08-26 06:34:56 UTC

qemu-kvm-0.12.1.2-2.431.el6.x86_64 hit this problem.

Comment 4 Fam Zheng 2015-01-26 06:57:19 UTC

I cannot reproduce with steps in c0. I'm using a 2.6.32-525.el6.x86_64 kernel.

Error messages are seen in guest dmesg:

scsi 2:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    0.12 PQ: 0 ANSI: 5
sd 2:0:0:0: Attached scsi generic sg2 type 0
sd 2:0:0:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB)
sd 2:0:0:0: [sdb] Write Protect is off
sd 2:0:0:0: [sdb] Mode Sense: 63 00 00 08
sd 2:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
 sdb: unknown partition table
sd 2:0:0:0: [sdb] Attached SCSI disk
sd 2:0:0:0: [sdb]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 2:0:0:0: [sdb] CDB: Read(10): 28 00 00 14 6e 00 00 01 00 00
end_request: I/O error, dev sdb, sector 1338880
Buffer I/O error on device sdb, logical block 167360
Buffer I/O error on device sdb, logical block 167361
Buffer I/O error on device sdb, logical block 167362
Buffer I/O error on device sdb, logical block 167363
Buffer I/O error on device sdb, logical block 167364
Buffer I/O error on device sdb, logical block 167365
Buffer I/O error on device sdb, logical block 167366
Buffer I/O error on device sdb, logical block 167367
Buffer I/O error on device sdb, logical block 167368
Buffer I/O error on device sdb, logical block 167369

mazhang, can you test the latest kernel image?

Fam

Comment 5 mazhang 2015-01-27 07:43:14 UTC

Try reproduce this bug with latest guest kernel, but hit qemu-kvm crash.

Host:
qemu-kvm-tools-0.12.1.2-2.451.el6.x86_64
qemu-kvm-0.12.1.2-2.451.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.451.el6.x86_64
gpxe-roms-qemu-0.9.7-6.12.el6.noarch
qemu-img-0.12.1.2-2.451.el6.x86_64
2.6.32-504.el6.x86_64

Guest:
kernel-2.6.32-526.el6

Result:
qemu-kvm crash.

(gdb) bt full
#0  0x00007ffff76ef6fd in write () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007ffff724ccf1 in ?? () from /lib64/libglib-2.0.so.0
No symbol table info available.
#2  0x00007ffff71fc837 in g_io_channel_write_chars () from /lib64/libglib-2.0.so.0
No symbol table info available.
#3  0x00007ffff7e4474e in io_channel_send (fd=0x7ffff8d1ff00, buf=0x7ffff8e87960, len=16) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-char.c:736
        bytes_written = 0
        offset = <value optimized out>
        status = <value optimized out>
        __PRETTY_FUNCTION__ = "io_channel_send"
#4  0x00007ffff7dbb471 in monitor_flush (mon=0x7ffff88f1f00) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:292
        rc = <value optimized out>
        len = 16
        buf = 0x7ffff8e87960 "{\"return\": {}}\r\n"
#5  0x00007ffff7dbb5e4 in monitor_puts (mon=0x7ffff88f1f00, str=0x7ffff8f36fbf "") at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:326
        c = <value optimized out>
#6  0x00007ffff7dbb629 in monitor_json_emitter (mon=0x7ffff88f1f00, data=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:428
        json = 0x7ffff8e69fd0
        __PRETTY_FUNCTION__ = "monitor_json_emitter"
#7  0x00007ffff7dbb798 in monitor_protocol_emitter (mon=0x7ffff88f1f00, data=0x0) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:464
        qmp = 0x7fff4a22b020
#8  0x00007ffff7dbb960 in monitor_call_handler (mon=0x7ffff88f1f00, cmd=0x7ffff82bfa00, params=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:4390
        ret = <value optimized out>
        data = 0x0
#9  0x00007ffff7dbc574 in handle_qmp_command (parser=<value optimized out>, tokens=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/monitor.c:5003
        err = <value optimized out>
        obj = <value optimized out>
        input = <value optimized out>
        args = 0x7ffffc8eff70
        cmd = 0x7ffff82bfa00
        mon = 0x7ffff88f1f00
        cmd_name = <value optimized out>
        query_cmd = 0x0
        __func__ = "handle_qmp_command"
#10 0x00007ffff7e20f04 in json_message_process_token (lexer=0x7ffff88f1fb0, token=0x7ffff909efd0, type=JSON_OPERATOR, x=52, y=499)
    at /usr/src/debug/qemu-kvm-0.12.1.2/json-streamer.c:87
        parser = 0x7ffff88f1fa8
        dict = 0x7fff48cb7f00

Comment 6 mazhang 2015-01-27 07:45:44 UTC

Created attachment 984509 [details]
back trace

Comment 7 Fam Zheng 2015-01-27 11:48:52 UTC

It's not a crash, because your monitor connection is disconnected for some reason, which is not related to virtio-scsi or guest kernel. Do you get this all the time? What are the steps?

Anyway it's different from results in comment 0.

Fam

Comment 8 mazhang 2015-01-28 01:45:41 UTC

(In reply to Fam Zheng from comment #7)
> It's not a crash, because your monitor connection is disconnected for some
> reason, which is not related to virtio-scsi or guest kernel. Do you get this
> all the time? What are the steps?
> 
> Anyway it's different from results in comment 0.
> 
> Fam

Not all the time, about 50%, and can't reproduce guest kernel crash.
Steps is the same as comment 0.

Comment 9 Fam Zheng 2015-01-28 02:55:26 UTC

So I'm going to close this bug as the guest crash issue is gone. Please file new bugs if there are other issues. (Again, SIGPIPE is not a crash, it's just the other end of the monitor is closed. In your test it's possibly the bash script exited.)

Fam

Comment 10 mazhang 2015-01-28 05:19:19 UTC

(In reply to Fam Zheng from comment #9)
> So I'm going to close this bug as the guest crash issue is gone. Please file
> new bugs if there are other issues. (Again, SIGPIPE is not a crash, it's
> just the other end of the monitor is closed. In your test it's possibly the
> bash script exited.)
> 
> Fam

Set gdb SIGPIPE nostop, break didn't happen.

Comment 11 mazhang 2015-03-16 07:29:28 UTC

Hit this problem on kernel-2.6.32-544.el6.

Host:
2.6.32-544.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.457.el6.x86_64
qemu-img-rhev-0.12.1.2-2.457.el6.x86_64
qemu-kvm-rhev-debuginfo-0.12.1.2-2.457.el6.x86_64
qemu-kvm-rhev-tools-0.12.1.2-2.457.el6.x86_64

Guest:
2.6.32-544.el6.x86_64

Steps:
1. Boot vm
/usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-m 2G \
-smp 4,sockets=2,cores=2,threads=1 \
-enable-kvm \
-name rhel6 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-rtc base=localtime,driftfix=slew \
-nodefaults \
-monitor stdio \
-qmp tcp:0:6779,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-serial unix:/tmp/console0,server,nowait \
-spice port=5900,disable-ticketing \
-vga qxl \
-usb -device usb-tablet,id=input0 \
-netdev tap,id=tap0 \
-device virtio-net-pci,netdev=tap0,id=net0,mac=52:54:00:11:11:15 \
-device virtio-scsi-pci,id=scsi0 \
-drive file=/home/rhel6-64.qcow2,if=none,id=drive-scsi-disk,format=qcow2,cache=none,werror=stop,rerror=stop \
-device scsi-hd,drive=drive-scsi-disk,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk,bootindex=1 \
-device virtio-scsi-pci,id=bus1,bus=pci.0,addr=0x7 \
-drive file=/home/storage0.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,cache=none,aio=native,id=scsi-disk0 \
-device scsi-hd,bus=bus1.0,drive=scsi-disk0,id=disk \

2. Run dd test in guest.
# dd if=/dev/zero of=/dev/sdb bs=1M count=10240

3. Hotplug/unplug virtio scsi disk.
[root@dhcp-9-236 ~]# cat repeat-hotplug.sh 
#!/bin/bash
# some simply group snapshot stress testing

let i=0
exec 3<>/dev/tcp/localhost/6779
echo -e "{ 'execute': 'qmp_capabilities' }" >&3
read response <&3
echo $response
while [ $i -lt 200 ]
do
    echo -e '{"execute":"device_add","arguments":{"driver":"virtio-scsi-pci","id":"test30"}}' >&3
    read response <&3;  echo "$i: $response"
    echo -e '{"execute":"__com.redhat_drive_add", "arguments": {"file":"/tmp/storage.qcow2","format":"qcow2","id":"test30"}}' >&3
    read response <&3;  echo "$i: $response"
    echo -e '{"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"test30","id":"test31"}}' >&3
    read response <&3;  echo "$i: $response"
    sleep 2
    echo -e '{"execute":"device_del","arguments":{"id":"test31"}}' >&3
    read response <&3;  echo "$i: $response"
    echo -e '{"execute":"device_del","arguments":{"id":"test30"}}' >&3
    read response <&3;  echo "$i: $response"
    let i=$i+1
    sleep 2
done

Comment 12 mazhang 2015-03-16 07:30:10 UTC

Created attachment 1002139 [details]
vmcore-dmesg

Comment 13 Fam Zheng 2015-03-16 07:34:19 UTC

Yes. Might be the same with bz 1199421. I'm looking at it.

Comment 14 Fam Zheng 2015-03-18 06:33:28 UTC


*** This bug has been marked as a duplicate of bug 1199421 ***

Comment 15 Fam Zheng 2015-03-19 01:27:41 UTC

mazhang, could you test this build (fix for 1199421) to make sure this one is a duplicate?

https://brewweb.devel.redhat.com/taskinfo?taskID=8864631