Bug 1243349

Summary: Rhel6.0 Guest will be hang when boot it with more than one queues
Product: Red Hat Enterprise Linux 7 Reporter: Quan Wenli <wquan>
Component: qemu-kvm-rhevAssignee: jason wang <jasowang>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: huding, jasowang, jen, juzhang, knoel, mrezanin, qiguo, virt-maint, wquan, xfu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.3.0-18.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-04 16:50:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Quan Wenli 2015-07-15 09:29:18 UTC
Description of problem:
Boot a rhel6.0 guest w/o mq supported with more than one queues, the rhel6.0 guest will be hang with follows error: 

Determining IP information for eth0...BUG: soft lockup - CPU#2 stuck for 61s! [ip:1114]
Modules linked in: ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext3 jbd serio_raw microcode virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio ata_generic pata_acpi ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
CPU 2:
Modules linked in: ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext3 jbd serio_raw microcode virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio ata_generic pata_acpi ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 1114, comm: ip Not tainted 2.6.32-71.el6.x86_64 #1 KVM
RIP: 0010:[<ffffffffa0059348>]  [<ffffffffa0059348>] virtqueue_get_buf+0x78/0x140 [virtio_ring]
RSP: 0018:ffff880136331598  EFLAGS: 00000246
RAX: ffff880137c02000 RBX: ffff8801363315c8 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880136331684 RDI: ffff880136331684
RBP: ffffffff81013c8e R08: ffff8801362576c0 R09: ffff880137c00000
R10: 0000000000000697 R11: 6db6db6db6db6db7 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f9f27431700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000003cfc4ff300 CR3: 0000000136279000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 [<ffffffffa01148d2>] ? virtnet_send_command.clone.0+0x252/0x290 [virtio_net]
 [<ffffffffa01148de>] ? virtnet_send_command.clone.0+0x25e/0x290 [virtio_net]
 [<ffffffff81267150>] ? sg_init_table+0x30/0x50
 [<ffffffffa0114be4>] ? virtnet_set_rx_mode+0x94/0x390 [virtio_net]
 [<ffffffff8140ea1e>] ? __dev_set_rx_mode+0x3e/0xb0
 [<ffffffff8140ec90>] ? dev_set_rx_mode+0x30/0x50
 [<ffffffff814115b9>] ? dev_open+0xb9/0x100
 [<ffffffff81410cd1>] ? dev_change_flags+0xa1/0x1d0
 [<ffffffff8141cb35>] ? do_setlink+0x1f5/0x860
 [<ffffffff8126e8d4>] ? nla_parse+0x34/0x110
 [<ffffffff8141d5ee>] ? rtnl_newlink+0x44e/0x530
 [<ffffffff8141c720>] ? rtnetlink_rcv_msg+0x1e0/0x220
 [<ffffffff8141c540>] ? rtnetlink_rcv_msg+0x0/0x220
 [<ffffffff81433b49>] ? netlink_rcv_skb+0xa9/0xd0
 [<ffffffff8141c525>] ? rtnetlink_rcv+0x25/0x40
 [<ffffffff814337ae>] ? netlink_unicast+0x2de/0x2f0
 [<ffffffff81434140>] ? netlink_sendmsg+0x200/0x2e0
 [<ffffffff813ff69e>] ? sock_sendmsg+0x11e/0x150
 [<ffffffff81091ca0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8110be6e>] ? find_get_page+0x1e/0xa0
 [<ffffffff8110dade>] ? filemap_fault+0xbe/0x510
 [<ffffffff813fdde4>] ? move_addr_to_kernel+0x64/0x70
 [<ffffffff814093e9>] ? verify_iovec+0x69/0xc0
 [<ffffffff813ff963>] ? sys_sendmsg+0x233/0x3a0
 [<ffffffff8126c118>] ? __percpu_counter_add+0x68/0x90
 [<ffffffff811363fd>] ? handle_mm_fault+0x1ed/0x2b0
 [<ffffffff814cd504>] ? do_page_fault+0x154/0x3a0
 [<ffffffff81013172>] ? system_call_fastpath+0x16/0x1b

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.3.0-10.el7.x86_64
guest kernel kernel-2.6.32-71.el6.x86_64.rpm 
host kernel kernel-3.10.0-293.el7.x86_64

How reproducible:
always 

Steps to Reproduce:
1.Boot a rhel6.0 guest w/o mq supported. 
/usr/libexec/qemu-kvm -name virt-tests-vm1 -nodefaults -chardev socket,id=hmp_id_humanmonitor1,path=/tmp/monitor-humanmonitor1-20141015-045730-ROavVfbX,server,nowait -mon chardev=hmp_id_humanmonitor1,mode=readline -chardev socket,id=serial_id_serial1,path=/tmp/serial-serial1-20141015-045730-ROavVfbX,server,nowait -device isa-serial,chardev=serial_id_serial1 -chardev socket,id=seabioslog_id_20141015-045730-ROavVfbX,path=/tmp/seabios-20141015-045730-ROavVfbX,server,nowait -device isa-debugcon,chardev=seabioslog_id_20141015-045730-ROavVfbX,iobase=0x402 -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=0x3 -drive file=/home/kvm_autotest_root/images/RHEL-Server-6.0-64.qcow2,if=none,id=drive-virtio-disk1,media=disk,snapshot=off,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,bootindex=0 -device virtio-net-pci,netdev=id7BvXJE,mac=9a:37:37:37:37:6e,bus=pci.0,addr=0x5,id=idgcnxyL,mq=on,vectors=10 -netdev tap,id=id7BvXJE,vhost=on,queues=4 -m 4096 -smp 4,cores=1,threads=1,sockets=4 -cpu Westmere,+x2apic -M pc -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :0 -vga qxl -rtc base=utc,clock=host,driftfix=none -boot order=cdn,once=c,menu=off -device sga -enable-kvm -serial stdio

2. 6.0 guest will be hang 

Actual results:


Expected results:


Additional info:

Comment 2 Quan Wenli 2015-07-17 04:52:54 UTC
It's a regression issue from qemu-kvm-rhev-2.3.0-4.el7, the bad commit is as below: 

commit 34376fc9201e8bb18439aeace2a9a9c1a85c0bb7
Author: Xiao Wang <jasowang@redhat.com>
Date:   Thu Jun 18 06:11:47 2015 +0200

    virtio-net: adding all queues in .realize()
    
    Message-id: <1434607916-15166-12-git-send-email-jasowang@redhat.com>
    Patchwork-id: 66309
    O-Subject: [RHEL7.2 qemu-kvm-rhev PATCH 11/20] virtio-net: adding all queues in .realize()
    Bugzilla: 1231610
    RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
    RH-Acked-by: Vlad Yasevich <vyasevic@redhat.com>
    RH-Acked-by: Michael S. Tsirkin <mst@redhat.com>
    
    Instead of adding queues for multiqueue during feature set. This patch
    did this in .realize(), this will help the following patches that
    count the number of virtqueues used in .device_plugged() callback.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    (cherry picked from commit da51a335aa61ec0e45879d80f3c5e2ee4f87cd2f)
    Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>

Comment 12 Jeff Nelson 2015-09-10 05:24:58 UTC
Patch series in Comment 11 was accepted, setting status to MODIFIED.

Comment 13 Quan Wenli 2015-09-15 05:15:08 UTC
Verified it with qemu-kvm-rhev-2.3.0-18.el7 with steps from comment #0. it's was fixed, could boot up rhel6 guest with more than one queues.

Comment 14 juzhang 2015-09-21 02:45:28 UTC
According to comment13, set this issue as verified.

Comment 18 errata-xmlrpc 2015-12-04 16:50:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html