Bug 156635

Summary: gfs i/o regression on rhel3-u5
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: gfsAssignee: Ben Marzinski <bmarzins>
Status: CLOSED CURRENTRELEASE QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 3   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-29 21:49:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2005-05-02 19:23:47 UTC
Description of problem:
As soon as we start up I/O (x86-64, ia64 so far) the machines panic:

link-02:

GFS: fsid=LINK_12:vedder1.1: Joined cluster. Now mounting FS...
GFS: fsid=LINK_12:vedder1.1: jid=1: Trying to acquire journal lock...
GFS: fsid=LINK_12:vedder1.1: jid=1: Looking at journal...
GFS: fsid=LINK_12:vedder1.1: jid=1: Done
general protection fault: 0000
CPU 1
Pid: 6157, comm: doio Not tainted
RIP: 0010:[<ffffffffa01e97ae>]{:gfs:gfs_ail_trans_check_empty+62}
RSP: 0000:000001003755bbc8  EFLAGS: 00010282
RAX: 0000010038efbcc0 RBX: 000001003755bbf0 RCX: 0000010004b85aa0
RDX: 00000000000000f4 RSI: ffffffffa02091eb RDI: ffffff0000172000
RBP: 08538b482574c085 R08: c6c7480000009dba R09: 0000010004b9cd40
R10: 0000000000000001 R11: 0000000000000000 R12: 000001003755bb48
R13: ffffff0000172000 R14: 000000000087c4b0 R15: ffffff00001f69b0
FS:  0000002a95add4c0(0000) GS:ffffffff805e4380(005b) knlGS:0000000040016aa0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000401d5000 CR3: 0000000004b6e000 CR4: 00000000000006e0

Call Trace: [<ffffffffa02091eb>]{:gfs:gfs_pull_tail+75}
       [<ffffffffa020957f>]{:gfs:gfs_log_reserv+527}
[<ffffffffa02017c2>]{:gfs:rq_promote+402}
       [<ffffffffa0218881>]{:gfs:gfs_trans_begin+257}
[<ffffffffa01ed03b>]{:gfs:do_do_write+379}
       [<ffffffffa01ed44f>]{:gfs:do_write+287}
[<ffffffffa01ebb9c>]{:gfs:gfs_walk_vma+444}
       [<ffffffffa01ed330>]{:gfs:do_write+0}
[<ffffffffa01eb9bb>]{:gfs:gfs_llseek+123}
       [<ffffffffa01ed55e>]{:gfs:gfs_write+174} [<ffffffff8015fb92>]{sys_write+178}
       [<ffffffff801aa2f5>]{ia32_syscall+105}
Process doio (pid: 6157, stackpage=1003755b000)
Stack: 000001003755bbc8 0000000000000000 0000010038efbcc0 0000010001000040
       000001003755bb48 ffffff0000172000 ffffff00001f6988 ffffff00001f6988
       ffffffffa02091eb 000001003ac998c0 ffffff0000172000 ffffff00001f6970
       000001003755bcc8 000001003755bc98 000001003755bc68 0000000000000001
       ffffffffa020957f ffffff00001f6960 ffffff00001f6960 0000006100172000
       0000000000007e31 0000000000000212 0000010039b29cc0 0000000000000000
       000001003755a000 0000000000000000 0000000000000000 0000000000000000
       ffffffffa02017c2 0000000000000000 000001003755a000 0000000000000000
       0000000000000000 0000000000000000 0000010039b29cc0 ffffff00001f6960
       ffffff00001f6960 0000000000000001 0000000000000000 0000010038f5a380
Call Trace: [<ffffffffa02091eb>]{:gfs:gfs_pull_tail+75}
       [<ffffffffa020957f>]{:gfs:gfs_log_reserv+527}
[<ffffffffa02017c2>]{:gfs:rq_promote+402}
       [<ffffffffa0218881>]{:gfs:gfs_trans_begin+257}
[<ffffffffa01ed03b>]{:gfs:do_do_write+379}
       [<ffffffffa01ed44f>]{:gfs:do_write+287}
[<ffffffffa01ebb9c>]{:gfs:gfs_walk_vma+444}
       [<ffffffffa01ed330>]{:gfs:do_write+0}
[<ffffffffa01eb9bb>]{:gfs:gfs_llseek+123}
       [<ffffffffa01ed55e>]{:gfs:gfs_write+174} [<ffffffff8015fb92>]{sys_write+178}
       [<ffffffff801aa2f5>]{ia32_syscall+105}

Code: 49 8b 88 80 00 00 00 48 8d 79 58 f0 ff 49 58 0f 88 26 20 00

Kernel panic: Fatal exception


link-01:

GFS: fsid=LINK_12:vedder1.0: Joined cluster. Now mounting FS...
GFS: fsid=LINK_12:vedder1.0: jid=0: Trying to acquire journal lock...
GFS: fsid=LINK_12:vedder1.0: jid=0: Looking at journal...
GFS: fsid=LINK_12:vedder1.0: jid=0: Done
GFS: fsid=LINK_12:vedder1.0: jid=1: Trying to acquire journal lock...
GFS: fsid=LINK_12:vedder1.0: jid=1: Looking at journal...
GFS: fsid=LINK_12:vedder1.0: jid=1: Done
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
 printing rip:
ffffffffa022a936
PML4 35095067 PGD 35a25067 PMD 0
Oops: 0002
CPU 1
Pid: 6110, comm: doio Not tainted
RIP: 0010:[<ffffffffa022a936>]{:gfs:gfs_ail_trans_start+86}
RSP: 0000:0000010036d29bc8  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffffff00001fa928 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffff00001fa9b0 RDI: 0000000000000058
RBP: 00000001ffffffff R08: 0000010035bb7958 R09: 0000000000000008
R10: 0000000000000008 R11: 0000010035bb79f0 R12: 0000010036d29bf0
R13: ffffff0000176000 R14: 0000010036d29b48 R15: 0000000000000000
FS:  0000002a95add4c0(0000) GS:ffffffff805e4380(005b) knlGS:0000000040016aa0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000058 CR3: 0000000004b71000 CR4: 00000000000006e0

Call Trace: [<ffffffffa022aa45>]{:gfs:gfs_ail_trans_start+357}
       [<ffffffffa024a34d>]{:gfs:ail_start+109}
[<ffffffffa024a577>]{:gfs:gfs_log_reserv+519}
       [<ffffffffa02686e0>]{:gfs:gfs_trans_glops+0}
[<ffffffffa02427c2>]{:gfs:rq_promote+402}
       [<ffffffffa0243539>]{:gfs:glock_wait_internal+201}
       [<ffffffffa0259881>]{:gfs:gfs_trans_begin+257}
[<ffffffffa022e03b>]{:gfs:do_do_write+379}
       [<ffffffffa022e44f>]{:gfs:do_write+287}
[<ffffffffa022cb9c>]{:gfs:gfs_walk_vma+444}
       [<ffffffffa022e330>]{:gfs:do_write+0}
[<ffffffffa022c9bb>]{:gfs:gfs_llseek+123}
       [<ffffffff80120005>]{thread_return+0}
[<ffffffffa022e55e>]{:gfs:gfs_write+174}
       [<ffffffff8015fb92>]{sys_write+178} [<ffffffff801aa2f5>]{ia32_syscall+105}

Process doio (pid: 6110, stackpage=10036d29000)
Stack: 0000010036d29bc8 0000000000000000 ffffffffa022aa45 0000010036d29bc8
       0000010036d29b48 ffffff0000176000 ffffff00001fa988 00000100378091c0
       ffffff00001fa9b0 0000000000000001 ffffffffa024a34d ffffff0000176000
       ffffff00001fa970 0000010036d29cc8 0000000000000001 0000000000000000
       ffffffffa024a577 ffffff00001fa960 ffffff00001fa960 0000000200176000
       00000000000084ac 0000000000000212 000001003a6857c0 ffffff0000176000
       ffffffffa02686e0 0000000000000000 000000004015b008 0000010037987830
       ffffffffa02427c2 0000000000000007 000001003a6857f8 0000010037987830
       ffffffffa0243539 0000000000000001 000001003a6857c0 0000010035abdcc8
       ffffff00001fa960 0000000000000001 0000000000000000 0000010035710580
Call Trace: [<ffffffffa022aa45>]{:gfs:gfs_ail_trans_start+357}
       [<ffffffffa024a34d>]{:gfs:ail_start+109}
[<ffffffffa024a577>]{:gfs:gfs_log_reserv+519}
       [<ffffffffa02686e0>]{:gfs:gfs_trans_glops+0}
[<ffffffffa02427c2>]{:gfs:rq_promote+402}
       [<ffffffffa0243539>]{:gfs:glock_wait_internal+201}
       [<ffffffffa0259881>]{:gfs:gfs_trans_begin+257}
[<ffffffffa022e03b>]{:gfs:do_do_write+379}
       [<ffffffffa022e44f>]{:gfs:do_write+287}
[<ffffffffa022cb9c>]{:gfs:gfs_walk_vma+444}
       [<ffffffffa022e330>]{:gfs:do_write+0}
[<ffffffffa022c9bb>]{:gfs:gfs_llseek+123}
       [<ffffffff80120005>]{thread_return+0}
[<ffffffffa022e55e>]{:gfs:gfs_write+174}
       [<ffffffff8015fb92>]{sys_write+178} [<ffffffff801aa2f5>]{ia32_syscall+105}


Code: f0 ff 49 58 0f 88 d8 1e 00 00 31 c0 85 c0 0f 85 3b 01 00 00

Kernel panic: Fatal exception

  
Version-Release number of selected component (if applicable):
GFS-6.0.2.17-4.x86_64.rpm
GFS-modules-smp-6.0.2.17-4.x86_64.rpm


How reproducible:
everytime

Comment 1 Dean Jansa 2005-05-02 19:28:59 UTC
ia64 stack:

Pid: 6666, comm:                 doio
EIP is at kfree [kernel] 0xe0 (2.4.21-32.EL)
psr : 0000101008022018 ifs : 800000000000038a ip  : [<e0000000044e55a0>]    Not
tainted
unat: 0000000000000000 pfs : 000000000000038a rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : fffefff5a5559555
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
b0  : e0000000044e5530 b6  : e0000000044e54c0 b7  : e000000004415240
f6  : 1003e0000000000000010 f7  : 0fff3ac25500000000000
f8  : 1003e0000000000007ff0 f9  : 1003e00000000000007ff
r1  : e000000004cbbd00 r2  : 00000000000000b0 r3  : 0000000000000000
r8  : 0000000000000000 r9  : e000000035a6803c r10 : a0000000006f4990
r11 : e0000000015fde30 r12 : e000000035a6fc30 r13 : e000000035a68000
r14 : e000000004b8bb10 r15 : 0000000000000000 r16 : 000000000005de3d
r17 : e00000000101fa60 r18 : 000000000006b4d8 r19 : e000000004d33848
r20 : e000000004415240 r21 : e0000000049913b8 r22 : e000000004b8bb40
r23 : e000000004b8bb10 r24 : a0000000006f4998 r25 : e0000000042ef838
r26 : e0000000042ef830 r27 : e000000035a69810 r28 : fffefff5a555a555
r29 : 0000000000000001 r30 : 0000000000000000 r31 : 0000000001584ab1

Call Trace: [<e0000000044159a0>] sp=0xe000000035a6f830 bsp=0xe000000035a696e0 sh
ow_stack [kernel] 0x80
[<e0000000044326f0>] sp=0xe000000035a6fa00 bsp=0xe000000035a696b8 die [kernel] 0
x1b0
[<e000000004452a90>] sp=0xe000000035a6fa00 bsp=0xe000000035a69660 ia64_do_page_f
ault [kernel] 0x350
[<e00000000440ea60>] sp=0xe000000035a6fa90 bsp=0xe000000035a69660 ia64_leave_ker
nel [kernel] 0x0
[<e0000000044e55a0>] sp=0xe000000035a6fc30 bsp=0xe000000035a69610 kfree [kernel]
 0xe0
[<a0000000005461b0>] sp=0xe000000035a6fc40 bsp=0xe000000035a695b8 gfs_pull_tail
[gfs] 0x350
[<a0000000005466c0>] sp=0xe000000035a6fc40 bsp=0xe000000035a694f8 gfs_log_reserv
 [gfs] 0x380
[<a00000000056cc70>] sp=0xe000000035a6fc90 bsp=0xe000000035a694a8 gfs_trans_begi
n [gfs] 0x290
[<a0000000005016d0>] sp=0xe000000035a6fc90 bsp=0xe000000035a69410 do_do_write [g
fs] 0x130
[<a000000000502200>] sp=0xe000000035a6fcb0 bsp=0xe000000035a693a0 do_write [gfs]
 0x200
[<a0000000004fe890>] sp=0xe000000035a6fcb0 bsp=0xe000000035a69328 gfs_walk_vma [
gfs] 0x1f0
[<a000000000502470>] sp=0xe000000035a6fd60 bsp=0xe000000035a692e8 gfs_write [gfs
] 0x170
[<e00000000451c0a0>] sp=0xe000000035a6fd60 bsp=0xe000000035a69290 fallback_readv
_writev [kernel] 0xc0
[<e00000000451c600>] sp=0xe000000035a6fd60 bsp=0xe000000035a69238 do_readv_write
v [kernel] 0x4e0
[<e00000000451c8b0>] sp=0xe000000035a6fde0 bsp=0xe000000035a691c0 sys_writev
[kernel] 0xf0
[<e00000000445b290>] sp=0xe000000035a6fde0 bsp=0xe000000035a69150 sys32_writev
[kernel] 0x90
[<e0000000044563c0>] sp=0xe000000035a6fe60 bsp=0xe000000035a69150
ia32_ret_from_syscall [kernel] 0x0
Kernel panic: Fatal exception


Comment 2 Ben Marzinski 2005-05-03 00:01:02 UTC
I've put fixes for this into the RHEL4, FC4, and HEAD branches of cluster. I'm
going to wait until the meeting tomorrow before I check them into RHEL4U1 or RHEL3.

Comment 3 Ben Marzinski 2005-05-03 00:05:12 UTC
oh, in case anyone was wondering. I allocated a static variable that got put on
a linked list, and worked on later, causing all sorts of pain. The fix simply
changes the variable to a dynamically allocated one.

Comment 4 Corey Marthaler 2006-11-06 17:53:04 UTC
This had been verified a long time ago.

Comment 6 Lon Hohberger 2010-10-29 21:49:35 UTC
This bugzilla is reported to have been fixed years ago.