Bug 726008 - switch cgroup cause guest soft lockup
Summary: switch cgroup cause guest soft lockup
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Vivek Goyal
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 846704
TreeView+ depends on / blocked
 
Reported: 2011-07-27 10:16 UTC by Suqin Huang
Modified: 2016-07-19 17:59 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-19 17:59:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
call trace (4.04 KB, text/plain)
2011-07-27 10:16 UTC, Suqin Huang
no flags Details

Description Suqin Huang 2011-07-27 10:16:06 UTC
Description of problem:
Create two cgroup, bond qemu-kvm to one group, and run dd, then move qemu-kvm to another group, which will cause guest soft lockup

Version-Release number of selected component (if applicable):
2.6.32-171.el6.x86_64

How reproducible:
sometime

Steps to Reproduce:
1. cmd
/usr/libexec/qemu-kvm -monitor stdio -chardev socket,id=serial,path=/tmp/serial,server,nowait -device isa-serial,chardev=serial -drive file='/home/images/RHEL-Server-6.1-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idYZCWrV,mac=9a:5e:b5:68:ef:c5,id=ndev00idYZCWrV,bus=pci.0,addr=0x3 -netdev tap,id=idYZCWrV,vhost=on,script='/home/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none -M rhel6.1.0 -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm -name rhel6
2. # mount -t cgroup -o blkio blkio /cgroup
3. # mkdir /cgroup/blkio1
4. # mkdir /cgroup/blkio2
5. # echo pidof of qemu-kvm to /cgroup/blkio1/tasks (also threads)
6. # echo 8:0 1024000 > /cgroup/blkio1/blkio.throttle.write_bps_device
7. # echo 8:0 1024000 > /cgroup/blkio1/blkio.throttle.read_bps_device
8. run dd in guest: dd if=/dev/zero of=/mnt/file bs=1M count=1000
9. switch qemu-kvm to another group
10. # echo pidof qemu-kvm to /cgroup/blkio2/tasks (alto threads)
11. # echo 100 > /cgroup/blkio2/blkio.weight
  
Actual results:

guest soft lockup

Expected results:


Additional info:

1. guest:
rhel6.1.64  2.6.32-131.0.15.el6.x86_64

2. host
qemu-kvm-0.12.1.2-2.172.el6.x86_64

processor	: 3
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: AMD Phenom(tm) 9600B Quad-Core Processor
stepping	: 3
cpu MHz		: 1150.000
cache size	: 512 KB

wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
bogomips	: 4587.45
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Comment 1 Suqin Huang 2011-07-27 10:16:41 UTC
Created attachment 515463 [details]
call trace

Comment 3 Vivek Goyal 2011-08-29 18:18:04 UTC
Squin, 

Is this issue reproducible? I can't seem to reproduce it.

Also CCing Eric Sandeen if he has an insight from ext4 point of view that what happened here. To me soft lockup would mean that interrupts were enabled on the cpu but a single thread was monopolizing the cpu for long time and not giving up cpu. If that's true, then it could have something to do with ext4 too. I am not sure. Eric, might have some thoughts on this.

Comment 4 RHEL Program Management 2011-10-07 15:42:31 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 5 Suqin Huang 2011-10-11 09:58:06 UTC
can not reproduce it, get diff call trace

Call Trace:
 [<ffffffff8105dc32>] ? default_wake_function+0x12/0x20
 [<ffffffffa00800dd>] do_get_write_access+0x29d/0x520 [jbd2]
 [<ffffffff8108e1c0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa00804b1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
 [<ffffffffa00d4318>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
 [<ffffffffa00b0863>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
 [<ffffffffa00b08dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
 [<ffffffffa00b0bd0>] ext4_dirty_inode+0x40/0x60 [ext4]
 [<ffffffff8119b67b>] __mark_inode_dirty+0x3b/0x160
 [<ffffffff8118bf82>] file_update_time+0xf2/0x170
 [<ffffffff8117c4c2>] pipe_write+0x2d2/0x650
 [<ffffffff811724fa>] do_sync_write+0xfa/0x140
 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211d5b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff812051c6>] ? security_file_permission+0x16/0x20
 [<ffffffff811727f8>] vfs_write+0xb8/0x1a0
 [<ffffffff810d1b52>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81173231>] sys_write+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
INFO: task pickup:1649 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pickup        D 0000000000000000     0  1649   1642 0x00000080
 ffff88007b553a68 0000000000000082 0000000000000000 ffff880078e055b8
 ffff88007b553a28 0000000000000096 0000000000000000 000000010003044c
 ffff8800379de638 ffff88007b553fd8 000000000000f598 ffff8800379de638
Call Trace:
 [<ffffffffa00800dd>] do_get_write_access+0x29d/0x520 [jbd2]
 [<ffffffff8108e1c0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa00804b1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
 [<ffffffffa00d4318>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
 [<ffffffffa00b0863>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
 [<ffffffffa00b08dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
 [<ffffffff811b461b>] ? ep_poll_callback+0xbb/0xf0
 [<ffffffffa00b0bd0>] ext4_dirty_inode+0x40/0x60 [ext4]
 [<ffffffff8119b67b>] __mark_inode_dirty+0x3b/0x160
 [<ffffffff8118c12d>] touch_atime+0x12d/0x170
 [<ffffffff8117cb15>] pipe_read+0x2d5/0x4e0
 [<ffffffff8117263a>] do_sync_read+0xfa/0x140
 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211d1f>] ? selinux_file_permission+0xbf/0x150
 [<ffffffff812051c6>] ? security_file_permission+0x16/0x20
 [<ffffffff81173065>] vfs_read+0xb5/0x1a0
 [<ffffffff810d1b52>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff811731a1>] sys_read+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b


Note You need to log in before you can comment on or make changes to this bug.