Red Hat Bugzilla – Bug 213362
Assertion failed in dlm/plock.c with LTP test
Last modified: 2009-04-16 16:31:22 EDT
Description of problem:
I have experienced system hang-up after running the latest LTP
(http://ltp.sourceforge.net/) tool as a part of durability testings on our GFS
environement. I was using 2 DELL PE1950 servers which installed CentOS4.3
(IA32) and DELL EMC AX150 was setup as a GFS shared storage connected to each
servers. We did "./runltp -d /gfs3" to run the LTP tool on one server and
another server remained just idle. Here are the extracted /var/log/messages
taken when the system was stopped:
Oct 4 13:15:47 centos1 kernel: lock_dlm: Assertion failed on line 500 of
Oct 4 13:15:47 centos1 kernel: lock_dlm: assertion: "!error"
Oct 4 13:15:47 centos1 kernel: lock_dlm: time = 71704458
Oct 4 13:15:47 centos1 kernel: error=-11
Oct 4 13:15:47 centos1 kernel:
Oct 4 13:15:47 centos1 kernel: ------------[ cut here ]------------
Oct 4 13:15:47 centos1 kernel: kernel BUG
Oct 4 13:15:47 centos1 kernel: invalid operand: 0000 [#1]
Oct 4 13:15:47 centos1 kernel: SMP
Oct 4 13:15:47 centos1 kernel: Modules linked in: parport_pc lp parport
autofs i2c_dev i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U)
sunrpc dm_mirror dm_multipath dm_mod button battery ac md5 ipv6 joydev
uhci_hcd ehci_hcd hw random shpchp bnx2 ext3 jbd qla6312 qla2xxx
scsi_transport_fc megaraid_sas sd_md scsi_mod
Oct 4 13:15:47 centos1 kernel: CPU: 0
Version-Release number of selected component (if applicable):
CentOS 4.3 (i386): kernel 2.6.9-34.ELsmp
It happens every time.
Steps to Reproduce:
See the description.
See the description.
No system hang happens with LTP.
It might be able to take a diskdump on this problem if you need.
Can you tell us which test case caused the assertion? There should be more
output after the kernel BUG message that states the name of the process and a
Created attachment 140560 [details]
Output of 'log' command on crash utility
Created attachment 140561 [details]
LTP runlog taken after the panic happened
Sorry for delayed response. We were able to reproduce this problem in the same
environment and with the same tool. The attached are the output of 'log'
command on crash utility and the LTP's run-log taken after the panic happened.
The logs clearly show that the test case running was fcntl11.
This shouldn't block a release since it's not been an issue outside
of this specific test.
I doubt we'll want to fiddle much with plocks on rhel4 at this late stage
unless it's a really crucial issue people are facing.
It should work in the new rhel5 code, though.