Hide Forgot
Description of problem: Create two cgroup, bond qemu-kvm to one group, and run dd, then move qemu-kvm to another group, which will cause guest soft lockup Version-Release number of selected component (if applicable): 2.6.32-171.el6.x86_64 How reproducible: sometime Steps to Reproduce: 1. cmd /usr/libexec/qemu-kvm -monitor stdio -chardev socket,id=serial,path=/tmp/serial,server,nowait -device isa-serial,chardev=serial -drive file='/home/images/RHEL-Server-6.1-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idYZCWrV,mac=9a:5e:b5:68:ef:c5,id=ndev00idYZCWrV,bus=pci.0,addr=0x3 -netdev tap,id=idYZCWrV,vhost=on,script='/home/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none -M rhel6.1.0 -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm -name rhel6 2. # mount -t cgroup -o blkio blkio /cgroup 3. # mkdir /cgroup/blkio1 4. # mkdir /cgroup/blkio2 5. # echo pidof of qemu-kvm to /cgroup/blkio1/tasks (also threads) 6. # echo 8:0 1024000 > /cgroup/blkio1/blkio.throttle.write_bps_device 7. # echo 8:0 1024000 > /cgroup/blkio1/blkio.throttle.read_bps_device 8. run dd in guest: dd if=/dev/zero of=/mnt/file bs=1M count=1000 9. switch qemu-kvm to another group 10. # echo pidof qemu-kvm to /cgroup/blkio2/tasks (alto threads) 11. # echo 100 > /cgroup/blkio2/blkio.weight Actual results: guest soft lockup Expected results: Additional info: 1. guest: rhel6.1.64 2.6.32-131.0.15.el6.x86_64 2. host qemu-kvm-0.12.1.2-2.172.el6.x86_64 processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : AMD Phenom(tm) 9600B Quad-Core Processor stepping : 3 cpu MHz : 1150.000 cache size : 512 KB wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock bogomips : 4587.45 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate
Created attachment 515463 [details] call trace
Squin, Is this issue reproducible? I can't seem to reproduce it. Also CCing Eric Sandeen if he has an insight from ext4 point of view that what happened here. To me soft lockup would mean that interrupts were enabled on the cpu but a single thread was monopolizing the cpu for long time and not giving up cpu. If that's true, then it could have something to do with ext4 too. I am not sure. Eric, might have some thoughts on this.
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
can not reproduce it, get diff call trace Call Trace: [<ffffffff8105dc32>] ? default_wake_function+0x12/0x20 [<ffffffffa00800dd>] do_get_write_access+0x29d/0x520 [jbd2] [<ffffffff8108e1c0>] ? wake_bit_function+0x0/0x50 [<ffffffffa00804b1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [<ffffffffa00d4318>] __ext4_journal_get_write_access+0x38/0x80 [ext4] [<ffffffffa00b0863>] ext4_reserve_inode_write+0x73/0xa0 [ext4] [<ffffffffa00b08dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4] [<ffffffffa00b0bd0>] ext4_dirty_inode+0x40/0x60 [ext4] [<ffffffff8119b67b>] __mark_inode_dirty+0x3b/0x160 [<ffffffff8118bf82>] file_update_time+0xf2/0x170 [<ffffffff8117c4c2>] pipe_write+0x2d2/0x650 [<ffffffff811724fa>] do_sync_write+0xfa/0x140 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81211d5b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff812051c6>] ? security_file_permission+0x16/0x20 [<ffffffff811727f8>] vfs_write+0xb8/0x1a0 [<ffffffff810d1b52>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81173231>] sys_write+0x51/0x90 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task pickup:1649 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. pickup D 0000000000000000 0 1649 1642 0x00000080 ffff88007b553a68 0000000000000082 0000000000000000 ffff880078e055b8 ffff88007b553a28 0000000000000096 0000000000000000 000000010003044c ffff8800379de638 ffff88007b553fd8 000000000000f598 ffff8800379de638 Call Trace: [<ffffffffa00800dd>] do_get_write_access+0x29d/0x520 [jbd2] [<ffffffff8108e1c0>] ? wake_bit_function+0x0/0x50 [<ffffffffa00804b1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2] [<ffffffffa00d4318>] __ext4_journal_get_write_access+0x38/0x80 [ext4] [<ffffffffa00b0863>] ext4_reserve_inode_write+0x73/0xa0 [ext4] [<ffffffffa00b08dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4] [<ffffffff811b461b>] ? ep_poll_callback+0xbb/0xf0 [<ffffffffa00b0bd0>] ext4_dirty_inode+0x40/0x60 [ext4] [<ffffffff8119b67b>] __mark_inode_dirty+0x3b/0x160 [<ffffffff8118c12d>] touch_atime+0x12d/0x170 [<ffffffff8117cb15>] pipe_read+0x2d5/0x4e0 [<ffffffff8117263a>] do_sync_read+0xfa/0x140 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81211d1f>] ? selinux_file_permission+0xbf/0x150 [<ffffffff812051c6>] ? security_file_permission+0x16/0x20 [<ffffffff81173065>] vfs_read+0xb5/0x1a0 [<ffffffff810d1b52>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff811731a1>] sys_read+0x51/0x90 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b