Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 726008

Summary:

switch cgroup cause guest soft lockup

Product:

Red Hat Enterprise Linux 6

Reporter:

Suqin Huang <shuang>

Component:

kernel

Assignee:

Vivek Goyal <vgoyal>

Status:

CLOSED DEFERRED

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

6.2

CC:

esandeen, juzhang, michen

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-07-19 17:59:30 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

846704

Attachments:

Description	Flags
call trace	none

Description Suqin Huang 2011-07-27 10:16:06 UTC

Description of problem:
Create two cgroup, bond qemu-kvm to one group, and run dd, then move qemu-kvm to another group, which will cause guest soft lockup

Version-Release number of selected component (if applicable):
2.6.32-171.el6.x86_64

How reproducible:
sometime

Steps to Reproduce:
1. cmd
/usr/libexec/qemu-kvm -monitor stdio -chardev socket,id=serial,path=/tmp/serial,server,nowait -device isa-serial,chardev=serial -drive file='/home/images/RHEL-Server-6.1-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idYZCWrV,mac=9a:5e:b5:68:ef:c5,id=ndev00idYZCWrV,bus=pci.0,addr=0x3 -netdev tap,id=idYZCWrV,vhost=on,script='/home/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none -M rhel6.1.0 -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm -name rhel6
2. # mount -t cgroup -o blkio blkio /cgroup
3. # mkdir /cgroup/blkio1
4. # mkdir /cgroup/blkio2
5. # echo pidof of qemu-kvm to /cgroup/blkio1/tasks (also threads)
6. # echo 8:0 1024000 > /cgroup/blkio1/blkio.throttle.write_bps_device
7. # echo 8:0 1024000 > /cgroup/blkio1/blkio.throttle.read_bps_device
8. run dd in guest: dd if=/dev/zero of=/mnt/file bs=1M count=1000
9. switch qemu-kvm to another group
10. # echo pidof qemu-kvm to /cgroup/blkio2/tasks (alto threads)
11. # echo 100 > /cgroup/blkio2/blkio.weight
  
Actual results:

guest soft lockup

Expected results:


Additional info:

1. guest:
rhel6.1.64  2.6.32-131.0.15.el6.x86_64

2. host
qemu-kvm-0.12.1.2-2.172.el6.x86_64

processor	: 3
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: AMD Phenom(tm) 9600B Quad-Core Processor
stepping	: 3
cpu MHz		: 1150.000
cache size	: 512 KB

wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
bogomips	: 4587.45
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Comment 1 Suqin Huang 2011-07-27 10:16:41 UTC

Created attachment 515463 [details]
call trace

Comment 3 Vivek Goyal 2011-08-29 18:18:04 UTC

Squin, 

Is this issue reproducible? I can't seem to reproduce it.

Also CCing Eric Sandeen if he has an insight from ext4 point of view that what happened here. To me soft lockup would mean that interrupts were enabled on the cpu but a single thread was monopolizing the cpu for long time and not giving up cpu. If that's true, then it could have something to do with ext4 too. I am not sure. Eric, might have some thoughts on this.

Comment 4 RHEL Program Management 2011-10-07 15:42:31 UTC

Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 5 Suqin Huang 2011-10-11 09:58:06 UTC

can not reproduce it, get diff call trace

Call Trace:
 [<ffffffff8105dc32>] ? default_wake_function+0x12/0x20
 [<ffffffffa00800dd>] do_get_write_access+0x29d/0x520 [jbd2]
 [<ffffffff8108e1c0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa00804b1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
 [<ffffffffa00d4318>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
 [<ffffffffa00b0863>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
 [<ffffffffa00b08dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
 [<ffffffffa00b0bd0>] ext4_dirty_inode+0x40/0x60 [ext4]
 [<ffffffff8119b67b>] __mark_inode_dirty+0x3b/0x160
 [<ffffffff8118bf82>] file_update_time+0xf2/0x170
 [<ffffffff8117c4c2>] pipe_write+0x2d2/0x650
 [<ffffffff811724fa>] do_sync_write+0xfa/0x140
 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211d5b>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff812051c6>] ? security_file_permission+0x16/0x20
 [<ffffffff811727f8>] vfs_write+0xb8/0x1a0
 [<ffffffff810d1b52>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81173231>] sys_write+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
INFO: task pickup:1649 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pickup        D 0000000000000000     0  1649   1642 0x00000080
 ffff88007b553a68 0000000000000082 0000000000000000 ffff880078e055b8
 ffff88007b553a28 0000000000000096 0000000000000000 000000010003044c
 ffff8800379de638 ffff88007b553fd8 000000000000f598 ffff8800379de638
Call Trace:
 [<ffffffffa00800dd>] do_get_write_access+0x29d/0x520 [jbd2]
 [<ffffffff8108e1c0>] ? wake_bit_function+0x0/0x50
 [<ffffffffa00804b1>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
 [<ffffffffa00d4318>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
 [<ffffffffa00b0863>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
 [<ffffffffa00b08dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
 [<ffffffff811b461b>] ? ep_poll_callback+0xbb/0xf0
 [<ffffffffa00b0bd0>] ext4_dirty_inode+0x40/0x60 [ext4]
 [<ffffffff8119b67b>] __mark_inode_dirty+0x3b/0x160
 [<ffffffff8118c12d>] touch_atime+0x12d/0x170
 [<ffffffff8117cb15>] pipe_read+0x2d5/0x4e0
 [<ffffffff8117263a>] do_sync_read+0xfa/0x140
 [<ffffffff8108e180>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81211d1f>] ? selinux_file_permission+0xbf/0x150
 [<ffffffff812051c6>] ? security_file_permission+0x16/0x20
 [<ffffffff81173065>] vfs_read+0xb5/0x1a0
 [<ffffffff810d1b52>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff811731a1>] sys_read+0x51/0x90
 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b