While attempting to narrow down the syscalls used to hit 126537 using iogen/doio I hit the following assertion in lock_dlm: (6 node cluster, all nodes running: iogen -o -m random -s write,writev,readv -t 1b -T1000b 10000b:tfile1 | doio -avk) k 5892 error 0 en plock 5892 7,40a9f03 req 7,40a9f03 ex e680f-12eb12 5892 w 1 ex plock 5892 error 0 en punlock 5892 7,40a9f03 remove 7,40a9f03 5892 ex punlock 5892 error 0 en plock 5892 7,40a9f03 req 7,40a9f03 ex 100753-1719bd 5892 w 1 ex plock 5892 error 0 en punlock 5892 7,40a9f03 remove 7,40a9f03 5892 ex punlock 5892 error 0 en plock 5892 7,40a9f03 req 7,40a9f03 ex 15e7ad-1b1306 5892 w 1 ex plock 5892 error 0 en punlock 5892 7,40a9f03 remove 7,40a9f03 5892 ex punlock 5892 error 0 en plock 5892 7,40a9f03 req 7,40a9f03 ex 177959-1e4de3 5892 w 1 ex plock 5892 error 0 en punlock 5892 7,40a9f03 remove 7,40a9f03 5892 ex punlock 5892 error 0 en plock 5892 7,40a9f03 req 7,40a9f03 ex 123e6f-188ee0 5892 w 1 ex plock 5892 error 0 en punlock 5892 7,40a9f03 remove 7,40a9f03 5892 ex punlock 5892 error 0 en plock 5892 7,40a9f03 req 7,40a9f03 ex 12a7f1-135265 5892 w 1 ex plock 5892 error 0 en punlock 5892 7,40a9f03 remove 7,40a9f03 5892 ex punlock 5892 error 0 en plock 5892 7,40a9f03 req 7,40a9f03 ex 41eb3-b9c4c 5892 w 1 lock_dlm: Assertion failed on line 272 of file fs/gfs_locking/lock_dlm/lock.c lock_dlm: assertion: "!error" lock_dlm: time = 106036030 gfs0: error=-16 num=11,40a9f03 Kernel panic: lock_dlm: Record message above and reboot. This then caused all the nodes to fail with stacks which look just like recovery bug 126604. Version-Release number of selected component (if applicable): How reproducible: Didn't try, yet Steps to Reproduce: 1.iogen -o -m random -s write,writev,readv -t 1b -T1000b 10000b:tfile1 | doio -avk on all nodes in your cluster in a gfs fs. 2. 3. Additional info:
/home/msp/djansa/pub/bugs/126757 contains the console output of all the nodes.
I ran this test for about an hour today and didn't have any problem. I'm curious if kernel preemption might be a factor here.
Our kernels are configured with: # CONFIG_PREEMPT is not set
This should be fixed after all the recent testing/fixing with iogen/doio related to plocks.
I hit a new assertion while atempting to veryify this bug, that bug # is: 130665
I haven't seen this assertion the past two nights while running this I/O load.
Updating version to the right level in the defects. Sorry for the storm.