Bug 695206

Summary: autotest.ltp: INFO: task msgctl10:18993 blocked for more than 120
Product: Red Hat Enterprise Linux 6 Reporter: Amos Kong <akong>
Component: qemu-kvmAssignee: Amos Kong <akong>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: ailan, ddutile, fyang, juzhang, lmr, mkenneth, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-04 02:49:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Amos Kong 2011-04-11 04:45:16 UTC
Description of problem:
I executed autotest.ltp test in a rhel6.1 guest, got some call trace during the testing.

more info of autotest.ltp:
https://github.com/autotest/autotest/tree/master/client/tests/ltp

Version-Release number of selected component (if applicable):
guest kernel: 2.6.32-130.el6.x86_64
host kernel: 2.6.32-128.el6.x86_64
qemu-kvm-0.12.1.2-2.156.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. Boot up a rhel6.1 guest
2. Execute autotest.ltp test in guest
guest)# cd autotest
guest)# client/bin/autotest client/tests/ltp/control
  
Actual results:
got some call trace

Expected results:
test can pass.

Additional info:

1. call trace of guest

2011-04-11 12:40:29: INFO: task msgctl10:18993 blocked for more than 120 seconds.
2011-04-11 12:40:29: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2011-04-11 12:40:29: msgctl10      D 0000000000000001     0 18993  18706 0x00000080
2011-04-11 12:40:29:  ffff880040c45d38 0000000000000046 00007f16bf0f9000 ffffea00018a2f90
2011-04-11 12:40:29:  ffffea00016a5618 ffff880040c1e188 ffff88005b1237c8 0000000000000000
2011-04-11 12:40:29:  ffff880040c00678 ffff880040c45fd8 000000000000f598 ffff880040c00678
2011-04-11 12:40:29: Call Trace:
2011-04-11 12:40:29:  [<ffffffff814dd4a5>] rwsem_down_failed_common+0x95/0x1d0
2011-04-11 12:40:29:  [<ffffffff814dd603>] rwsem_down_write_failed+0x23/0x30
2011-04-11 12:40:29:  [<ffffffff8126e493>] call_rwsem_down_write_failed+0x13/0x20
2011-04-11 12:40:29:  [<ffffffff814dcb02>] ? down_write+0x32/0x40
2011-04-11 12:40:29:  [<ffffffff8116b839>] __khugepaged_exit+0x109/0x130
2011-04-11 12:40:29:  [<ffffffff81064578>] mmput+0xe8/0x120
2011-04-11 12:40:29:  [<ffffffff8106ba21>] exit_mm+0x111/0x150
2011-04-11 12:40:29:  [<ffffffff8106bdbf>] do_exit+0x15f/0x860
2011-04-11 12:40:29:  [<ffffffff8106b4be>] ? sys_wait4+0xae/0x100
2011-04-11 12:40:29:  [<ffffffff810d1b82>] ? audit_syscall_entry+0x272/0x2a0
2011-04-11 12:40:29:  [<ffffffff8106c518>] do_group_exit+0x58/0xd0
2011-04-11 12:40:29:  [<ffffffff8106c5a7>] sys_exit_group+0x17/0x20
2011-04-11 12:40:29:  [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b

2. qemu-kvm cmdline

#qemu-kvm -chardev socket,id=qmp_monitor_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20110410-070702-0oSe,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=control -chardev socket,id=serial_id_20110410-070702-0oSe,path=/tmp/serial-20110410-070702-0oSe,server,nowait -device isa-serial,chardev=serial_id_20110410-070702-0oSe -drive file=/home/devel/autotest-devel/client/tests/kvm/images/RHEL-Server-6.1-64-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idSfVU9r,mac=9a:8a:ec:e1:42:9b,id=ndev00idSfVU9r,bus=pci.0,addr=0x3 -netdev tap,id=idSfVU9r,vhost=on,ifname=t0-070702-0oSe,script=/home/devel/autotest-devel/client/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none -M rhel6.1.0 -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm

Comment 1 RHEL Program Management 2011-04-11 06:00:13 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Gleb Natapov 2011-06-15 14:27:52 UTC
(In reply to comment #0)
> 
> How reproducible:
> always
> 
> Steps to Reproduce:
> 1. Boot up a rhel6.1 guest
> 2. Execute autotest.ltp test in guest
> guest)# cd autotest
> guest)# client/bin/autotest client/tests/ltp/control
> 
> Actual results:
> got some call trace
The trace is harmless.

> 
> Expected results:
> test can pass.
How the trace stops it from passing?


> 2. qemu-kvm cmdline
> 

> -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc
How much memory/cpus host has?

Comment 5 Amos Kong 2011-07-20 03:05:33 UTC
(In reply to comment #4)
> (In reply to comment #0)
> > 
> > How reproducible:
> > always
> > 
> > Steps to Reproduce:
> > 1. Boot up a rhel6.1 guest
> > 2. Execute autotest.ltp test in guest
> > guest)# cd autotest
> > guest)# client/bin/autotest client/tests/ltp/control
> > 
> > Actual results:
> > got some call trace
> The trace is harmless.

So we can close this bug as NOTABUG.

> > Expected results:
> > test can pass.
> How the trace stops it from passing?

ltp test always timeout.

> > 2. qemu-kvm cmdline
> > 
> 
> > -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc
> How much memory/cpus host has?

host memory: 4G
host cpus: 4

Comment 6 Gleb Natapov 2011-07-20 08:56:13 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #0)
> > > 
> > > How reproducible:
> > > always
> > > 
> > > Steps to Reproduce:
> > > 1. Boot up a rhel6.1 guest
> > > 2. Execute autotest.ltp test in guest
> > > guest)# cd autotest
> > > guest)# client/bin/autotest client/tests/ltp/control
> > > 
> > > Actual results:
> > > got some call trace
> > The trace is harmless.
> 
> So we can close this bug as NOTABUG.
If ltp test always timeouts then probably we can't.

> 
> > > Expected results:
> > > test can pass.
> > How the trace stops it from passing?
> 
> ltp test always timeout.
HHave you tried to reproduce manually (without aoutotest)?

> 
> > > 2. qemu-kvm cmdline
> > > 
> > 
> > > -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc
> > How much memory/cpus host has?
> 
> host memory: 4G
> host cpus: 4

Is this the only guest on the host during the test?

Comment 7 Amos Kong 2011-08-02 08:18:23 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > > > How reproducible:
> > > > always
> > > > 
> > > > Steps to Reproduce:
> > > > 1. Boot up a rhel6.1 guest
> > > > 2. Execute autotest.ltp test in guest
> > > > guest)# cd autotest
> > > > guest)# client/bin/autotest client/tests/ltp/control
> > > > 
> > > > Actual results:
> > > > got some call trace
> > > The trace is harmless.
> > 
> > So we can close this bug as NOTABUG.
> If ltp test always timeouts then probably we can't.
> 
> > 
> > > > Expected results:
> > > > test can pass.
> > > How the trace stops it from passing?
> > 
> > ltp test always timeout.
> Have you tried to reproduce manually (without aoutotest)?

No trace outputs in manual test.
I also tested in the physical machine, system is very slow, but no call trace.

> > > > 2. qemu-kvm cmdline
> > > > 
> > > 
> > > > -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc
> > > How much memory/cpus host has?
> > 
> > host memory: 4G
> > host cpus: 4
> 
> Is this the only guest on the host during the test?

Yes.

Comment 9 Amos Kong 2012-05-04 02:49:17 UTC
Tested with latest qemu/kernel, the _harmless_ calltrace also exists because of heavy load. 
qemu-kvm-0.12.1.2-2.285.el6.x86_64
host/guest kernel: 2.6.32-262.el6.x86_64

---
client/tests/ltp is a client test in autotest upstream, there is not time limitation.

In internal autotest, they added a new test autotest.ltp, and set the test timeout to 1000000 (11 days). and this test is not executed in routine testing.
---