SGI noted a crash on any ia64 with perfmon (not perfmon2) http://marc.theaimsgroup.com/?l=linux-ia64&m=113882384921688 I tested this again on RHEL4 U4 (kernel version 2.6.9-40) and with libpfm-3.0-3_EL.ia64.rpm & libpfm-devel-3.0-3_EL.ia64.rpm installed on my Tiger4. Compiled the reproducer test program and tesed the same. The result is I am unable to reproduce the System crash. After trying for 10 to 15 times, I was able to reproduce the crash finally. Here is the statck back trace. csdor-tiger1.jf.intel.com login: kernel BUG at mm/rmap.c:479! perfmonstat[5662]: bugcheck! 0 [1] Modules linked in: md5(U) ipv6(U) parport_pc(U) lp(U) parport(U) autofs4(U) sunrpc(U) ds(U) yenta_socket(U) pcmcia_core(U) vfat(U) fat(U) dm_mirror(U) dm_mod(U) button(U) joydev(U) uhci_hcd(U) ehci_hcd(U) e1000(U) sg(U) ext3(U) jbd(U) mptscsih(U) mptsas(U) mptspi(U) mptfc(U) mptscsi(U) mptbase(U) sd_mod(U) scsi_mod(U) Pid: 5662, CPU 3, comm: perfmonstat psr : 0000101008126010 ifs : 8000000000000207 ip : [<a000000100108b50>] Not tainted ip is at page_remove_rmap+0x170/0x180 unat: 0000000000000000 pfs : 0000000000000207 rsc : 0000000000000003 rnat: 00000000000002fd bsps: a0000001007d6fb0 pr : 19a981556655a969 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000100108b50 b6 : a000000100015f80 b7 : a00000010025e6a0 f6 : 1003e0000000000001200 f7 : 1003e8080808080808081 f8 : 1003e00000000000023dc f9 : 1003e000000000e580000 f10 : 1003e00000000356f424c f11 : 1003e44b831eee7285baf r1 : a0000001009bb700 r2 : 00000000000c4180 r3 : 00000000000c4180 r8 : 000000000000001d r9 : 00000000000000fd r10 : a0000001007cdd18 r11 : 000000000000c418 r12 : e00000000c627b70 r13 : e00000000c620000 r14 : 0000000000004000 r15 : a00000010074fbc0 r16 : a00000010074fbc8 r17 : e00000003e35fde8 r18 : a0000001009e2c40 r19 : a0000001009e2c40 r20 : 0000000000000004 r21 : 0000000000000000 r22 : 0000000000000000 r23 : 0000000000000000 r24 : 0000000000000000 r25 : 0000000000000004 r26 : e000000008b00dd0 r27 : 0000000000000003 r28 : e00000000c620dd4 r29 : e000000008b00dd4 r30 : e00000003e358050 r31 : 00000000356f424c Call Trace: [<a000000100016da0>] show_stack+0x80/0xa0 sp=e00000000c6276e0 bsp=e00000000c6213b0 [<a0000001000176b0>] show_regs+0x890/0x8c0 sp=e00000000c6278b0 bsp=e00000000c621368 [<a00000010003e8f0>] die+0x150/0x240 sp=e00000000c6278d0 bsp=e00000000c621328 [<a00000010003ea20>] die_if_kernel+0x40/0x60 sp=e00000000c6278d0 bsp=e00000000c6212f8 [<a00000010003ebc0>] ia64_bad_break+0x180/0x600 sp=e00000000c6278d0 bsp=e00000000c6212d0 [<a00000010000f600>] ia64_leave_kernel+0x0/0x260 sp=e00000000c6279a0 bsp=e00000000c6212d0 [<a000000100108b50>] page_remove_rmap+0x170/0x180 sp=e00000000c627b70 bsp=e00000000c621298 [<a0000001000f7f50>] unmap_vmas+0xa70/0x1280 sp=e00000000c627b70 bsp=e00000000c621110 [<a000000100103f60>] exit_mmap+0x120/0x4c0 sp=e00000000c627c20 bsp=e00000000c6210b8 [<a000000100072d60>] mmput+0x100/0x1a0 sp=e00000000c627ce0 bsp=e00000000c621098 [<a0000001001ac6f0>] do_task_stat+0x9f0/0xb40 sp=e00000000c627ce0 bsp=e00000000c620f78 [<a0000001001ac8d0>] proc_tgid_stat+0x30/0x60 sp=e00000000c627e20 bsp=e00000000c620f50 [<a0000001001a4a50>] proc_info_read+0xb0/0x160 sp=e00000000c627e20 bsp=e00000000c620f08 [<a000000100123850>] vfs_read+0x290/0x360 sp=e00000000c627e20 bsp=e00000000c620eb8 [<a000000100123e90>] sys_read+0x70/0xe0 sp=e00000000c627e20 bsp=e00000000c620e40 [<a00000010000f4a0>] ia64_ret_from_syscall+0x0/0x20 sp=e00000000c627e30 bsp=e00000000c620e40 [<a000000000010640>] 0xa000000000010640 sp=e00000000c628000 bsp=e00000000c620e40 Kernel panic - not syncing: Fatal exception After tons of testing, I tracked down to the following patches: 2005-06-25 Nick Piggin [PATCH] sched: no aggressive idle balancing 2005-06-25 Nick Piggin [PATCH] sched: tweak affine wakeup 2005-06-25 Nick Piggin [PATCH] sched: balance timers that could fix the problem in 2.6.13-rc1. More testing is still needed to verify it is really fixed in 2.6.13-rc1. But none of them can be applied to rhel4 kernel. So the problem becomes complex because no upsteam patches can be used to fix the problem. Or the upstream just happen to work. So more debug and analysis are still needed to get the root cause and real fix in rhel4 kernel. Actually the testing results above is misleading. Probably the schedule changes just make the bug really hard to reproduce. The following patch fixed the problem. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=41d5e5d73ecef4ef56b7b4cde962929a712689b4
I did notice in an email re: this BZ, that rhel-4.5.z+ was set, but I do not see any "rhel-4.5.z" Flag box below. Shouldn't the owner of the bugzilla see that box? Anyway, after looking at the patch referenced by the last line of the problem description above, I did recall seeing Luming Yu's RHKL post of the same patch go by recently. But it would have been helpful if there had been a reference to the RHEL4.6 bugzilla that this was generated from. So just for clarification, please confirm that this is a request to backport this update to RHEL4.5?: [RHEL 4.6 PATCH] BZ 185082 CVE-2006-0558 ia64 crash http://post-office.corp.redhat.com/archives/rhkernel-list/2007-July/msg00568.html Bugzilla Bug 185082: CVE-2006-0558 ia64 crash https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185082 Thanks, Dave
Sorry -- I thought that I was the assignee...
A patch addressing this issue has been included in build 2.6.9-55.0.3.EL.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2007-0774.html
for historical record, comment #1 points to a patch on an internal mailing list: --- linux-2.6.9/arch/ia64/kernel/perfmon.c.0 2007-07-18 10:44:51.000000000 +0800 +++ linux-2.6.9/arch/ia64/kernel/perfmon.c 2007-07-18 10:48:30.000000000 +0800 @@ -2267,7 +2267,7 @@ * allocate a sampling buffer and remaps it into the user address space of the task */ static int -pfm_smpl_buffer_alloc(struct task_struct *task, pfm_context_t *ctx, unsigned long rsize, void **user_vaddr) +pfm_smpl_buffer_alloc(struct task_struct *task, struct file *filp, pfm_context_t *ctx, unsigned long rsize, void **user_vaddr) { struct mm_struct *mm = task->mm; struct vm_area_struct *vma = NULL; @@ -2317,6 +2317,7 @@ * partially initialize the vma for the sampling buffer */ vma->vm_mm = mm; + vma->vm_file = filp; vma->vm_flags = VM_READ| VM_MAYREAD |VM_RESERVED; vma->vm_page_prot = PAGE_READONLY; /* XXX may need to change */ @@ -2354,6 +2355,8 @@ goto error; } + get_file(filp); + /* * now insert the vma in the vm list for the process, must be * done with mmap lock held @@ -2430,7 +2433,7 @@ } static int -pfm_setup_buffer_fmt(struct task_struct *task, pfm_context_t *ctx, unsigned int ctx_flags, +pfm_setup_buffer_fmt(struct task_struct *task, struct file *filp, pfm_context_t *ctx, unsigned int ctx_flags, unsigned int cpu, pfarg_context_t *arg) { pfm_buffer_fmt_t *fmt = NULL; @@ -2471,7 +2474,7 @@ /* * buffer is always remapped into the caller's address space */ - ret = pfm_smpl_buffer_alloc(current, ctx, size, &uaddr); + ret = pfm_smpl_buffer_alloc(current, filp, ctx, size, &uaddr); if (ret) goto error; /* keep track of user address of buffer */ @@ -2682,7 +2685,7 @@ * does the user want to sample? */ if (pfm_uuid_cmp(req->ctx_smpl_buf_id, pfm_null_uuid)) { - ret = pfm_setup_buffer_fmt(current, ctx, ctx_flags, 0, req); + ret = pfm_setup_buffer_fmt(current, filp, ctx, ctx_flags, 0, req); if (ret) goto buffer_error; }