Hide Forgot
Description of problem: System PANICd while booting kexec kernel. Version-Release number of selected component (if applicable): 2.6.32-161.el6 How reproducible: This panic is not consistant, as the system did successfully boot the kexec kernel during retesting to reproduce issue. See "success" here: https://beaker.engineering.redhat.com/jobs/101905 http://tinyurl.com/5vsroew See "failure" reproduced here: https://beaker.engineering.redhat.com/jobs/102692 http://tinyurl.com/5tzmpeg Actual results: Initial failure seen here: https://beaker.engineering.redhat.com/recipes/207164 http://tinyurl.com/3bxpjw7 <-SNIP-> Kernel panic - not syncing: Fatal exception Pid: 0, comm: swapper Tainted: G D ---------------- 2.6.32-161.el6.x86_64 #1 Call Trace: [<ffffffff814dc630>] ? panic+0x78/0x143 [<ffffffff814df65c>] ? _spin_unlock_irqrestore+0x1c/0x20 [<ffffffff814e07a4>] ? oops_end+0xe4/0x100 [<ffffffff8104104b>] ? no_context+0xfb/0x260 [<ffffffff810412d5>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff810413a3>] ? bad_area_nosemaphore+0x13/0x20 [<ffffffff81041a7d>] ? __do_page_fault+0x31d/0x480 [<ffffffff814e278e>] ? do_page_fault+0x3e/0xa0 [<ffffffff814dfb15>] ? page_fault+0x25/0x30 [<ffffffff8127471c>] ? list_del+0xc/0xa0 [<ffffffff8111d2c3>] ? __rmqueue+0xc3/0x490 [<ffffffff8111f148>] ? get_page_from_freelist+0x598/0x820 [<ffffffff8111ee8e>] ? get_page_from_freelist+0x2de/0x820 [<ffffffff81120151>] ? __alloc_pages_nodemask+0x111/0x8b0 [<ffffffff8112f5e9>] ? zone_statistics+0x99/0xc0 [<ffffffff81120151>] ? __alloc_pages_nodemask+0x111/0x8b0 [<ffffffff8115a162>] ? kmem_getpages+0x62/0x170 [<ffffffff8115a7cf>] ? cache_grow+0x2cf/0x320 [<ffffffff8112f75a>] ? mod_zone_page_state+0x2a/0x30 [<ffffffff8115aa22>] ? cache_alloc_refill+0x202/0x240 [<ffffffff8115bc90>] ? alloc_arraycache+0x30/0x60 [<ffffffff8115ba58>] ? kmem_cache_alloc_node_notrace+0x128/0x130 [<ffffffff8115bbdb>] ? __kmalloc_node+0x7b/0x100 [<ffffffff8115bc90>] ? alloc_arraycache+0x30/0x60 [<ffffffff8115d05f>] ? do_tune_cpucache+0x4df/0x630 [<ffffffff8115d38b>] ? enable_cpucache+0x3b/0xf0 [<ffffffff814c751f>] ? setup_cpu_cache+0x22f/0x340 [<ffffffff8115b922>] ? kmem_cache_alloc+0x182/0x190 [<ffffffff8115e01a>] ? kmem_cache_create+0x3fa/0x580 [<ffffffff81be26f5>] ? numa_policy_init+0x34/0x124 [<ffffffff814de20e>] ? mutex_lock+0x1e/0x50 [<ffffffff814ddf39>] ? mutex_unlock+0x9/0x20 [<ffffffff810694e5>] ? cpu_maps_update_done+0x15/0x20 [<ffffffff814c5f5c>] ? register_cpu_notifier+0x2c/0x40 [<ffffffff81bbde71>] ? start_kernel+0x366/0x429 [<ffffffff81bbd33a>] ? x86_64_start_reservations+0x125/0x129 [<ffffffff81bbd438>] ? x86_64_start_kernel+0xfa/0x109 Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 Pid: 1269, comm: rs:main Q:Reg Tainted: G W ---------------- 2.6.32-161.el6.x86_64 #1 Call Trace: <NMI> [<ffffffff814dc630>] ? panic+0x78/0x143 [<ffffffff810d6abd>] ? watchdog_overflow_callback+0xcd/0xd0 [<ffffffff81109146>] ? __perf_event_overflow+0x116/0x290 [<ffffffff81109739>] ? perf_event_overflow+0x19/0x20 [<ffffffff8101ced4>] ? p4_pmu_handle_irq+0x224/0x2f0 [<ffffffff814e2216>] ? kprobe_exceptions_notify+0x16/0x430 [<ffffffff814e0ce8>] ? perf_event_nmi_handler+0x58/0xe0 [<ffffffff814e2845>] ? notifier_call_chain+0x55/0x80 [<ffffffff814e28aa>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8109468e>] ? notify_die+0x2e/0x30 [<ffffffff814e0493>] ? do_nmi+0x173/0x2b0 [<ffffffff814dfda0>] ? nmi+0x20/0x30 [<ffffffff814df60e>] ? _spin_lock+0x1e/0x30 <<EOE>> [<ffffffff8111f107>] ? get_page_from_freelist+0x557/0x820 [<ffffffff81120151>] ? __alloc_pages_nodemask+0x111/0x8b0 [<ffffffff8126e576>] ? vsnprintf+0x2b6/0x5f0 [<ffffffff81154f7a>] ? alloc_pages_vma+0x9a/0x150 [<ffffffff8113841b>] ? handle_pte_fault+0x76b/0xb50 [<ffffffff8105d3c2>] ? default_wake_function+0x12/0x20 [<ffffffff8104b569>] ? __wake_up_common+0x59/0x90 [<ffffffff811389e4>] ? handle_mm_fault+0x1e4/0x2b0 [<ffffffff81041899>] ? __do_page_fault+0x139/0x480 [<ffffffff814de20e>] ? mutex_lock+0x1e/0x50 [<ffffffffa009a108>] ? ext4_llseek+0x98/0x110 [ext4] [<ffffffff814e278e>] ? do_page_fault+0x3e/0xa0 [<ffffffff814dfb15>] ? page_fault+0x25/0x30 <-SNIP-> Expected results: System consistantly boots kexec kernel. Additional info: System hostname in following comment.
All, The reproducer provided a clearer console log: https://beaker.engineering.redhat.com/jobs/102692 http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/06/1026/102692/210405//console.log <-SNIP-> BUG: Bad page state in process irqbalance pfn:0f4e5 page:ffffea0000359218 flags:0020000000000400 count:1 mapcount:0 mapping: (null) index:0 (Not tainted) Pid: 1253, comm: irqbalance Not tainted 2.6.32-161.el6.x86_64 #1 Call Trace: [<ffffffff8111dc77>] ? bad_page+0x107/0x160 [<ffffffff8111f2d4>] ? get_page_from_freelist+0x724/0x820 [<ffffffff81120151>] ? __alloc_pages_nodemask+0x111/0x8b0 [<ffffffff81225544>] ? context_struct_compute_av+0x324/0x420 [<ffffffff8115b922>] ? kmem_cache_alloc+0x182/0x190 [<ffffffff8115b922>] ? kmem_cache_alloc+0x182/0x190 [<ffffffff81154e7a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff8111e95e>] ? __get_free_pages+0xe/0x50 [<ffffffff81213fad>] ? inode_doinit_with_dentry+0x34d/0x690 [<ffffffff8108e5b7>] ? bit_waitqueue+0x17/0xd0 [<ffffffff8121430c>] ? selinux_d_instantiate+0x1c/0x20 [<ffffffff81206f4b>] ? security_d_instantiate+0x1b/0x30 [<ffffffff8118ae75>] ? d_instantiate+0x55/0x70 [<ffffffff811dc26d>] ? proc_lookup_de+0xad/0x110 [<ffffffff811dc2eb>] ? proc_lookup+0x1b/0x20 [<ffffffff811d6787>] ? proc_root_lookup+0x27/0x50 [<ffffffff811813bb>] ? do_lookup+0x18b/0x220 [<ffffffff811819b9>] ? __link_path_walk+0x569/0x820 [<ffffffff81060a98>] ? dequeue_entity+0xf8/0x1d0 [<ffffffff8118233a>] ? path_walk+0x6a/0xe0 [<ffffffff8118250b>] ? do_path_lookup+0x5b/0xa0 [<ffffffff81174a41>] ? get_empty_filp+0xa1/0x170 [<ffffffff8118317b>] ? do_filp_open+0xfb/0xd90 [<ffffffff810938bf>] ? hrtimer_try_to_cancel+0x3f/0xd0 [<ffffffff810d0442>] ? audit_alloc_name+0x62/0x100 [<ffffffff8118fe72>] ? alloc_fd+0x92/0x160 [<ffffffff81170599>] ? do_sys_open+0x69/0x140 [<ffffffff811706b0>] ? sys_open+0x20/0x30 [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b netxen_nic 0000:02:00.0: loading firmware from phanfw.bin netxen_nic 0000:02:00.0: using 64-bit dma mask netxen_nic: Quad Gig LP Board S/N QG88BK0159 Chip rev 0x42 netxen_nic 0000:02:00.0: firmware v4.0.534 [legacy] netxen_nic 0000:02:00.0: using msi-x interrupts Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 Pid: 9, comm: khelper Not tainted 2.6.32-161.el6.x86_64 #1 Call Trace: <NMI> [<ffffffff814dc630>] ? panic+0x78/0x143 [<ffffffff810d6abd>] ? watchdog_overflow_callback+0xcd/0xd0 [<ffffffff81109146>] ? __perf_event_overflow+0x116/0x290 [<ffffffff81109739>] ? perf_event_overflow+0x19/0x20 [<ffffffff8101ced4>] ? p4_pmu_handle_irq+0x224/0x2f0 [<ffffffff814e2216>] ? kprobe_exceptions_notify+0x16/0x430 [<ffffffff814e0ce8>] ? perf_event_nmi_handler+0x58/0xe0 [<ffffffff814e2845>] ? notifier_call_chain+0x55/0x80 [<ffffffff814e28aa>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8109468e>] ? notify_die+0x2e/0x30 [<ffffffff814e0493>] ? do_nmi+0x173/0x2b0 [<ffffffff814dfda0>] ? nmi+0x20/0x30 [<ffffffff814df4ff>] ? _spin_lock_irqsave+0x2f/0x40 <<EOE>> [<ffffffff8111f324>] ? get_page_from_freelist+0x774/0x820 [<ffffffff81054253>] ? perf_event_task_sched_out+0x33/0x80 [<ffffffff81120151>] ? __alloc_pages_nodemask+0x111/0x8b0 [<ffffffff81154e7a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff8111e95e>] ? __get_free_pages+0xe/0x50 [<ffffffff81065464>] ? copy_process+0xe4/0x1300 [<ffffffff81066714>] ? do_fork+0x94/0x480 [<ffffffff81060a98>] ? dequeue_entity+0xf8/0x1d0 [<ffffffff810096d0>] ? __switch_to+0xd0/0x320 [<ffffffff81087530>] ? __call_usermodehelper+0x0/0xa0 [<ffffffff8100c0a2>] ? kernel_thread+0x82/0xe0 [<ffffffff81087530>] ? __call_usermodehelper+0x0/0xa0 [<ffffffff810875d0>] ? ____call_usermodehelper+0x0/0x140 [<ffffffff8100c100>] ? child_rip+0x0/0x20 [<ffffffff8108756e>] ? __call_usermodehelper+0x3e/0xa0 [<ffffffff81088dd0>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e6f0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088c60>] ? worker_thread+0x0/0x2a0 [<ffffffff8108e386>] ? kthread+0x96/0xa0 [<ffffffff81088c60>] ? worker_thread+0x0/0x2a0 [<ffffffff8100c10a>] ? child_rip+0xa/0x20 [<ffffffff8108e2f0>] ? kthread+0x0/0xa0 [<ffffffff8100c100>] ? child_rip+0x0/0x20 BUG: scheduling while atomic: khelper/9/0x14010000 Modules linked in: netxen_nic(+) tg3 ipv6 speedstep_lib freq_table sunrpc dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod Pid: 9, comm: khelper Not tainted 2.6.32-161.el6.x86_64 #1 Call Trace: <NMI> [<ffffffff81054216>] ? __schedule_bug+0x66/0x70 [<ffffffff814dd33c>] ? thread_return+0x65b/0x75f [<ffffffff814dc6d8>] ? panic+0x120/0x143 [<ffffffff814e054a>] ? do_nmi+0x22a/0x2b0 [<ffffffff810604ba>] ? __cond_resched+0x2a/0x40 [<ffffffff814e054a>] ? do_nmi+0x22a/0x2b0 [<ffffffff814dd6e0>] ? _cond_resched+0x30/0x40 [<ffffffff8100df36>] ? is_valid_bugaddr+0x16/0x40 [<ffffffff81263c9f>] ? report_bug+0x1f/0xc0 [<ffffffff8100f24f>] ? die+0x7f/0x90 [<ffffffff814e0074>] ? do_trap+0xc4/0x160 [<ffffffff8100cdf5>] ? do_invalid_op+0x95/0xb0 [<ffffffff814e054a>] ? do_nmi+0x22a/0x2b0 [<ffffffff814e2216>] ? kprobe_exceptions_notify+0x16/0x430 [<ffffffff8100be9b>] ? invalid_op+0x1b/0x20 [<ffffffff814e054a>] ? do_nmi+0x22a/0x2b0 [<ffffffff814e033c>] ? do_nmi+0x1c/0x2b0 [<ffffffff814dfda0>] ? nmi+0x20/0x30 [<ffffffff814dc6d8>] ? panic+0x120/0x143 <<EOE>> ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/traps.c:547! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/firmware /0000:02:00.0/loading CPU 0 Modules linked in: netxen_nic(+) tg3 ipv6 speedstep_lib freq_table sunrpc dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod Pid: 9, comm: khelper Not tainted 2.6.32-161.el6.x86_64 #1 Dell Inc. PowerEdge SC430 /0M9873 RIP: 0010:[<ffffffff814e054a>] [<ffffffff814e054a>] do_nmi+0x22a/0x2b0 RSP: 0018:ffff880002207f28 EFLAGS: 00010002 RAX: ffff880011245fd8 RBX: ffff880002207f58 RCX: 00000000c0000101 RDX: 00000000ffff8800 RSI: ffffffffffffffff RDI: ffff880002207f58 RBP: ffff880002207f48 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000001 R14: ffff880002207de8 R15: ffff880002207f58 FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000004b47a5 CR3: 000000001188c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process khelper (pid: 9, threadinfo ffff880011244000, task ffff88001123c0c0) Stack: 0000000000000000 0000000000000001 0000000000000000 0000000000000001 <0> ffff880002207cb8 ffffffff814dfda0 ffff880002207f58 ffff880002207de8 <0> 0000000000000001 0000000000000000 ffff880002207cb8 ffffffff8178ed18 Call Trace: <NMI> [<ffffffff814dfda0>] nmi+0x20/0x30 [<ffffffff814dc6d8>] ? panic+0x120/0x143 <<EOE>> Code: ff ff 83 3d 18 0d 83 00 00 75 28 83 3d 33 0d 83 00 00 75 1f 48 c7 c7 70 88 77 81 31 c0 e8 ba c1 ff ff e9 2d fe ff ff 0f 0b eb fe <0f> 0b 0f 1f 40 00 eb fa 48 c7 c7 74 49 77 81 31 c0 e8 58 c0 ff RIP [<ffffffff814e054a>] do_nmi+0x22a/0x2b0 RSP <ffff880002207f28> ---[ end trace d19396ea14965863 ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 9, comm: khelper Tainted: G D ---------------- 2.6.32-161.el6.x86_64 #1 Call Trace: <NMI> [<ffffffff814dc630>] ? panic+0x78/0x143 [<ffffffff814e07b2>] ? oops_end+0xf2/0x100 [<ffffffff8100f22b>] ? die+0x5b/0x90 [<ffffffff814e0074>] ? do_trap+0xc4/0x160 [<ffffffff8100cdf5>] ? do_invalid_op+0x95/0xb0 [<ffffffff814e054a>] ? do_nmi+0x22a/0x2b0 [<ffffffff814e2216>] ? kprobe_exceptions_notify+0x16/0x430 [<ffffffff8100be9b>] ? invalid_op+0x1b/0x20 [<ffffffff814e054a>] ? do_nmi+0x22a/0x2b0 [<ffffffff814e033c>] ? do_nmi+0x1c/0x2b0 [<ffffffff814dfda0>] ? nmi+0x20/0x30 [<ffffffff814dc6d8>] ? panic+0x120/0x143 <<EOE>> Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 Pid: 1253, comm: irqbalance Tainted: G B ---------------- 2.6.32-161.el6.x86_64 #1 Call Trace: <NMI> [<ffffffff814dc630>] ? panic+0x78/0x143 [<ffffffff810d6abd>] ? watchdog_overflow_callback+0xcd/0xd0 [<ffffffff81109146>] ? __perf_event_overflow+0x116/0x290 [<ffffffff81109739>] ? perf_event_overflow+0x19/0x20 [<ffffffff8101ced4>] ? p4_pmu_handle_irq+0x224/0x2f0 [<ffffffff814e2216>] ? kprobe_exceptions_notify+0x16/0x430 [<ffffffff814e0ce8>] ? perf_event_nmi_handler+0x58/0xe0 [<ffffffff814e2845>] ? notifier_call_chain+0x55/0x80 [<ffffffff814e28aa>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8109468e>] ? notify_die+0x2e/0x30 [<ffffffff814e0493>] ? do_nmi+0x173/0x2b0 [<ffffffff814dfda0>] ? nmi+0x20/0x30 [<ffffffff814e0834>] ? oops_begin+0x74/0xc0 <<EOE>> [<ffffffff8100f1fe>] ? die+0x2e/0x90 [<ffffffff814e0312>] ? do_general_protection+0x152/0x160 [<ffffffff814dfae5>] ? general_protection+0x25/0x30 [<ffffffff81274720>] ? list_del+0x10/0xa0 [<ffffffff8111d2c3>] ? __rmqueue+0xc3/0x490 [<ffffffff814dc5b1>] ? dump_stack+0x6f/0x76 [<ffffffff8111f148>] ? get_page_from_freelist+0x598/0x820 [<ffffffff81120151>] ? __alloc_pages_nodemask+0x111/0x8b0 [<ffffffff81225544>] ? context_struct_compute_av+0x324/0x420 [<ffffffff8115b922>] ? kmem_cache_alloc+0x182/0x190 [<ffffffff8115b922>] ? kmem_cache_alloc+0x182/0x190 [<ffffffff81154e7a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff8111e95e>] ? __get_free_pages+0xe/0x50 [<ffffffff81213fad>] ? inode_doinit_with_dentry+0x34d/0x690 [<ffffffff8108e5b7>] ? bit_waitqueue+0x17/0xd0 [<ffffffff8121430c>] ? selinux_d_instantiate+0x1c/0x20 [<ffffffff81206f4b>] ? security_d_instantiate+0x1b/0x30 [<ffffffff8118ae75>] ? d_instantiate+0x55/0x70 [<ffffffff811dc26d>] ? proc_lookup_de+0xad/0x110 [<ffffffff811dc2eb>] ? proc_lookup+0x1b/0x20 [<ffffffff811d6787>] ? proc_root_lookup+0x27/0x50 [<ffffffff811813bb>] ? do_lookup+0x18b/0x220 [<ffffffff811819b9>] ? __link_path_walk+0x569/0x820 [<ffffffff81060a98>] ? dequeue_entity+0xf8/0x1d0 [<ffffffff8118233a>] ? path_walk+0x6a/0xe0 [<ffffffff8118250b>] ? do_path_lookup+0x5b/0xa0 [<ffffffff81174a41>] ? get_empty_filp+0xa1/0x170 [<ffffffff8118317b>] ? do_filp_open+0xfb/0xd90 [<ffffffff810938bf>] ? hrtimer_try_to_cancel+0x3f/0xd0 [<ffffffff810d0442>] ? audit_alloc_name+0x62/0x100 [<ffffffff8118fe72>] ? alloc_fd+0x92/0x160 [<ffffffff81170599>] ? do_sys_open+0x69/0x140 [<ffffffff811706b0>] ? sys_open+0x20/0x30 [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b <-SNIP-> -pbunyan
Looks like the same with Bug 690301.
*** Bug 722958 has been marked as a duplicate of this bug. ***
Upstream kernel doesn't have this bug, fyi.
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.