Description of problem: Opteron panics when mounting a single file system nodes: l-01, l-02, l-07. All are embedded servers. I got this trace from link-01 when it issued its first GFS mount call. Node link-07 has also gone down multiple times with no trace. Seems like another Opteron stack overflow. general protection fault: 0000 CPU 0 Pid: 5887, comm: mount Tainted: P RIP: 0010:[<ffffffffa00e426d>]{:audit:audit_copy_vm+13} RSP: 0018:0000010039bd7400 EFLAGS: 00010202 RAX: 12e8c120e8c14843 RBX: 0000010039bcac00 RCX: 0000000000000000 RDX: ffffffff8042dd18 RSI: ffffffff80115e23 RDI: 0000010039bcac00 RBP: 0000010039ba2000 R08: 0000010039bd6250 R09: 0000000000000003 R10: 0000010039ba2000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff80115e23 R14: 0000000000000000 R15: 0000000000000000 FS: 0000002a955786c0(0000) GS:ffffffff805d9840(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a958acf30 CR3: 0000000000101000 CR4: 00000000000006e0 Call Trace: [<ffffffffa00e2ff2>]{:audit:__audit_attach+274} [<ffffffff80115e23>]{do_gettimeofday+67} [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0} [<ffffffffa00e291f>]{:audit:__audit_fork+95} [<ffffffff80225f3b>]{audit_fork+59} [<ffffffff801234c5>]{do_fork+293} [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0} [<ffffffffa0190102>]{:lock_gulm:data_recv+34} [<ffffffff80110b2e>]{arch_kernel_thread+162} [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0} [<ffffffffa0187e40>]{:lock_gulm:cm_io_recving_thread+0} [<ffffffff80110b89>]{child_rip+0} [<ffffffff80122725>]{kernel_thread+85} [<ffffffffa0188c1f>]{:lock_gulm:cm_login+543} [<ffffffffa018920e>]{:lock_gulm:start_gulm_threads+62} [<ffffffffa0189512>]{:lock_gulm:gulm_mount+610} [<ffffffffa013c5fd>]{:lock_harness:lm_mount_R5c74bedb+205} [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0} [<ffffffffa015e46f>]{:gfs:gfs_mount_lockproto+303} [<ffffffff8013d8d2>]{do_anonymous_page+1234} [<ffffffff8013d94f>]{do_no_page+95} [<ffffffff801a5103>]{do_page_fault+627} [<ffffffff801109d6>]{error_exit+0} [<ffffffff80184cb3>]{create_elf_tables+211} [<ffffffff802b5798>]{strnlen_user+56} [<ffffffff80184f47>]{create_elf_tables+871} [<ffffffffa01482a4>]{:gfs:gfs_read_super+1204} [<ffffffffa0184680>]{:gfs:gfs_fs_type+0} [<ffffffff80164c0c>]{get_sb_bdev+588} [<ffffffffa0184680>]{:gfs:gfs_fs_type+0} [<ffffffff80164ec9>]{do_kern_mount+121} [<ffffffff8017baa1>]{do_add_mount+161} [<ffffffff8017bdb9>]{do_mount+345} [<ffffffff80154b40>]{__get_free_pages+16} [<ffffffff8017c1d5>]{sys_mount+197} [<ffffffff80110177>]{system_call+119} Process mount (pid: 5887, stackpage=10039bd7000) Stack: 0000010039bd7400 0000000000000018 ffffffffa00e2ff2 0000010039ba2000 ffffffff80115e23 0000010039ba2000 0000010039bd6000 000000000000170b 0000010039bd7898 ffffffffa0157650 ffffffffa00e291f ffffffffffffffff 0000010039ba2000 ffffffff8044c050 ffffffff80225f3b 0000010039ba2000 0000000000000100 0000000000000000 ffffffff801234c5 ffffffffa0157650 ffffffffa0190102 0000010039f1a002 0000000000000000 0000010039bd75f8 0000000000000001 0000000000000000 0000000000000000 0000010039bd75f8 ffffffff80110b2e ffffffffa0157650 0000010039bd7898 0000010039bd75f8 0000000000000000 0000000000000000 0000000000000001 000000000000000a 00000000ffffffff 0000000000000002 00000000fffffff9 0000000000000000 Call Trace: [<ffffffffa00e2ff2>]{:audit:__audit_attach+274} [<ffffffff80115e23>]{do_gettimeofday+67} [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0} [<ffffffffa00e291f>]{:audit:__audit_fork+95} [<ffffffff80225f3b>]{audit_fork+59} [<ffffffff801234c5>]{do_fork+293} [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0} [<ffffffffa0190102>]{:lock_gulm:data_recv+34} [<ffffffff80110b2e>]{arch_kernel_thread+162} [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0} [<ffffffffa0187e40>]{:lock_gulm:cm_io_recving_thread+0} [<ffffffff80110b89>]{child_rip+0} [<ffffffff80122725>]{kernel_thread+85} [<ffffffffa0188c1f>]{:lock_gulm:cm_login+543} [<ffffffffa018920e>]{:lock_gulm:start_gulm_threads+62} [<ffffffffa0189512>]{:lock_gulm:gulm_mount+610} [<ffffffffa013c5fd>]{:lock_harness:lm_mount_R5c74bedb+205} [<ffffffffa0157650>]{:gfs:gfs_glock_cb+0} [<ffffffffa015e46f>]{:gfs:gfs_mount_lockproto+303} [<ffffffff8013d8d2>]{do_anonymous_page+1234} [<ffffffff8013d94f>]{do_no_page+95} [<ffffffff801a5103>]{do_page_fault+627} [<ffffffff801109d6>]{error_exit+0} [<ffffffff80184cb3>]{create_elf_tables+211} [<ffffffff802b5798>]{strnlen_user+56} [<ffffffff80184f47>]{create_elf_tables+871} [<ffffffffa01482a4>]{:gfs:gfs_read_super+1204} [<ffffffffa0184680>]{:gfs:gfs_fs_type+0} [<ffffffff80164c0c>]{get_sb_bdev+588} [<ffffffffa0184680>]{:gfs:gfs_fs_type+0} [<ffffffff80164ec9>]{do_kern_mount+121} [<ffffffff8017baa1>]{do_add_mount+161} [<ffffffff8017bdb9>]{do_mount+345} [<ffffffff80154b40>]{__get_free_pages+16} [<ffffffff8017c1d5>]{sys_mount+197} [<ffffffff80110177>]{system_call+119} Code: f0 ff 00 c3 66 66 66 90 66 66 66 90 66 66 66 90 66 66 90 41 Kernel panic: Fatal exception Version-Release number of selected component (if applicable): GFS 6.0.0-7 How reproducible: Sometimes. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
this is the same stack overflow that was in 5.2.1 moved a 256 byte buffer off of the stack and into the malloc heap. seems to fix the bug.
*** Bug 129044 has been marked as a duplicate of this bug. ***
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-424.html