Bug 129042

Summary: Opteron panics when mounting a single file system
Product: [Retired] Red Hat Cluster Suite Reporter: Derek Anderson <danderso>
Component: gfsAssignee: michael conrad tadpol tilstra <mtilstra>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 3   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-08-03 22:37:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Anderson 2004-08-03 14:39:17 UTC
Description of problem:
Opteron panics when mounting a single file system

nodes:   l-01, l-02, l-07.  All are embedded servers.

I got this trace from link-01 when it issued its first GFS mount call.
 Node
link-07 has also gone down multiple times with no trace.  Seems like
another
Opteron stack overflow.

general protection fault: 0000
CPU 0
Pid: 5887, comm: mount Tainted: P
RIP: 0010:[<ffffffffa00e426d>]{:audit:audit_copy_vm+13}
RSP: 0018:0000010039bd7400  EFLAGS: 00010202
RAX: 12e8c120e8c14843 RBX: 0000010039bcac00 RCX: 0000000000000000
RDX: ffffffff8042dd18 RSI: ffffffff80115e23 RDI: 0000010039bcac00
RBP: 0000010039ba2000 R08: 0000010039bd6250 R09: 0000000000000003
R10: 0000010039ba2000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff80115e23 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a955786c0(0000) GS:ffffffff805d9840(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a958acf30 CR3: 0000000000101000 CR4: 00000000000006e0

Call Trace: [<ffffffffa00e2ff2>]{:audit:__audit_attach+274}
       [<ffffffff80115e23>]{do_gettimeofday+67}
[<ffffffffa0157650>]{:gfs:gfs_glock_cb+0}
       [<ffffffffa00e291f>]{:audit:__audit_fork+95}
[<ffffffff80225f3b>]{audit_fork+59}
       [<ffffffff801234c5>]{do_fork+293}
[<ffffffffa0157650>]{:gfs:gfs_glock_cb+0}
       [<ffffffffa0190102>]{:lock_gulm:data_recv+34}
[<ffffffff80110b2e>]{arch_kernel_thread+162}
       [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0}
[<ffffffffa0187e40>]{:lock_gulm:cm_io_recving_thread+0}
       [<ffffffff80110b89>]{child_rip+0}
[<ffffffff80122725>]{kernel_thread+85}
       [<ffffffffa0188c1f>]{:lock_gulm:cm_login+543}
[<ffffffffa018920e>]{:lock_gulm:start_gulm_threads+62}
       [<ffffffffa0189512>]{:lock_gulm:gulm_mount+610}
[<ffffffffa013c5fd>]{:lock_harness:lm_mount_R5c74bedb+205}
       [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0}
[<ffffffffa015e46f>]{:gfs:gfs_mount_lockproto+303}
       [<ffffffff8013d8d2>]{do_anonymous_page+1234}
[<ffffffff8013d94f>]{do_no_page+95}
       [<ffffffff801a5103>]{do_page_fault+627}
[<ffffffff801109d6>]{error_exit+0}
       [<ffffffff80184cb3>]{create_elf_tables+211}
[<ffffffff802b5798>]{strnlen_user+56}
       [<ffffffff80184f47>]{create_elf_tables+871}
[<ffffffffa01482a4>]{:gfs:gfs_read_super+1204}
       [<ffffffffa0184680>]{:gfs:gfs_fs_type+0}
[<ffffffff80164c0c>]{get_sb_bdev+588}
       [<ffffffffa0184680>]{:gfs:gfs_fs_type+0}
[<ffffffff80164ec9>]{do_kern_mount+121}
       [<ffffffff8017baa1>]{do_add_mount+161}
[<ffffffff8017bdb9>]{do_mount+345}
       [<ffffffff80154b40>]{__get_free_pages+16}
[<ffffffff8017c1d5>]{sys_mount+197}
       [<ffffffff80110177>]{system_call+119}
Process mount (pid: 5887, stackpage=10039bd7000)
Stack: 0000010039bd7400 0000000000000018 ffffffffa00e2ff2 0000010039ba2000
       ffffffff80115e23 0000010039ba2000 0000010039bd6000 000000000000170b
       0000010039bd7898 ffffffffa0157650 ffffffffa00e291f ffffffffffffffff
       0000010039ba2000 ffffffff8044c050 ffffffff80225f3b 0000010039ba2000
       0000000000000100 0000000000000000 ffffffff801234c5 ffffffffa0157650
       ffffffffa0190102 0000010039f1a002 0000000000000000 0000010039bd75f8
       0000000000000001 0000000000000000 0000000000000000 0000010039bd75f8
       ffffffff80110b2e ffffffffa0157650 0000010039bd7898 0000010039bd75f8
       0000000000000000 0000000000000000 0000000000000001 000000000000000a
       00000000ffffffff 0000000000000002 00000000fffffff9 0000000000000000
Call Trace: [<ffffffffa00e2ff2>]{:audit:__audit_attach+274}
       [<ffffffff80115e23>]{do_gettimeofday+67}
[<ffffffffa0157650>]{:gfs:gfs_glock_cb+0}
       [<ffffffffa00e291f>]{:audit:__audit_fork+95}
[<ffffffff80225f3b>]{audit_fork+59}
       [<ffffffff801234c5>]{do_fork+293}
[<ffffffffa0157650>]{:gfs:gfs_glock_cb+0}
       [<ffffffffa0190102>]{:lock_gulm:data_recv+34}
[<ffffffff80110b2e>]{arch_kernel_thread+162}
       [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0}
[<ffffffffa0187e40>]{:lock_gulm:cm_io_recving_thread+0}
       [<ffffffff80110b89>]{child_rip+0}
[<ffffffff80122725>]{kernel_thread+85}
       [<ffffffffa0188c1f>]{:lock_gulm:cm_login+543}
[<ffffffffa018920e>]{:lock_gulm:start_gulm_threads+62}
       [<ffffffffa0189512>]{:lock_gulm:gulm_mount+610}
[<ffffffffa013c5fd>]{:lock_harness:lm_mount_R5c74bedb+205}
       [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0}
[<ffffffffa015e46f>]{:gfs:gfs_mount_lockproto+303}
       [<ffffffff8013d8d2>]{do_anonymous_page+1234}
[<ffffffff8013d94f>]{do_no_page+95}
       [<ffffffff801a5103>]{do_page_fault+627}
[<ffffffff801109d6>]{error_exit+0}
       [<ffffffff80184cb3>]{create_elf_tables+211}
[<ffffffff802b5798>]{strnlen_user+56}
       [<ffffffff80184f47>]{create_elf_tables+871}
[<ffffffffa01482a4>]{:gfs:gfs_read_super+1204}
       [<ffffffffa0184680>]{:gfs:gfs_fs_type+0}
[<ffffffff80164c0c>]{get_sb_bdev+588}
       [<ffffffffa0184680>]{:gfs:gfs_fs_type+0}
[<ffffffff80164ec9>]{do_kern_mount+121}
       [<ffffffff8017baa1>]{do_add_mount+161}
[<ffffffff8017bdb9>]{do_mount+345}
       [<ffffffff80154b40>]{__get_free_pages+16}
[<ffffffff8017c1d5>]{sys_mount+197}
       [<ffffffff80110177>]{system_call+119}

Code: f0 ff 00 c3 66 66 66 90 66 66 66 90 66 66 66 90 66 66 90 41

Kernel panic: Fatal exception

Version-Release number of selected component (if applicable):
GFS 6.0.0-7

How reproducible:
Sometimes.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 michael conrad tadpol tilstra 2004-08-03 14:42:19 UTC
this is the same stack overflow that was in 5.2.1
moved a 256 byte buffer off of the stack and into the malloc heap.
seems to fix the bug.

Comment 2 Derek Anderson 2004-08-03 14:57:25 UTC
*** Bug 129044 has been marked as a duplicate of this bug. ***

Comment 3 Mark J. Cox 2004-08-03 22:37:23 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-424.html