Bug 528066 - [Cisco/LSI 4.9 bug] mptctl module dereferences a userspace address, triggering a crash
Summary: [Cisco/LSI 4.9 bug] mptctl module dereferences a userspace address, triggerin...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.8
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: 4.9
Assignee: Rob Evers
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 533798
TreeView+ depends on / blocked
 
Reported: 2009-10-08 19:47 UTC by Issue Tracker
Modified: 2018-10-20 04:17 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
A bug in the mptctl_do_mpt_command() function in the mpt driver may have resulted in crashes during boot on i386 systems with certain adapters using the mpt driver, and also running the hugemem kernel.
Clone Of:
Environment:
Last Closed: 2011-02-16 15:22:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed fix in mptclt.c for kernel crash (1.11 KB, text/plain)
2009-10-29 08:58 UTC, kashyap
no flags Details
recreated patch for RHEL4.8 kernel (1.08 KB, patch)
2009-10-30 05:56 UTC, kashyap
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0263 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update 2011-02-16 15:14:55 UTC

Description Issue Tracker 2009-10-08 19:47:34 UTC
Escalated to Bugzilla from IssueTracker

Comment 2 Issue Tracker 2009-10-08 19:47:37 UTC
Event posted on 10-08-2009 12:02pm EDT by jabrown

crash> sys
      KERNEL: vmlinux
    DUMPFILE: 351790-vmcore
        CPUS: 16
        DATE: Wed Oct  7 17:22:03 2009
      UPTIME: 00:03:44
LOAD AVERAGE: 1.24, 0.41, 0.14
       TASKS: 185
    NODENAME: svlhiav-ventura-1
     RELEASE: 2.6.9-89.0.9.ELhugemem
     VERSION: #1 SMP Wed Aug 19 08:12:26 EDT 2009
     MACHINE: i686  (2926 Mhz)
      MEMORY: 64 GB
       PANIC: "Oops: 0000 [#1]" (check log for details)

crash> bt
PID: 474    TASK: 8b56adf0  CPU: 2   COMMAND: "mpt-status"
 #0 [8c90ebb4] netpoll_start_netdump at f8f1b570
 #1 [8c90ebd4] die at 210633d
 #2 [8c90ec08] do_invalid_op at 2106718
 #3 [8c90ecb8] error_code (via invalid_op) at fffecede
    EAX: 0000002f  EBX: 8c90e000  ECX: 8c90ecf0  EDX: 022e72a3  EBP:
022e6f22 
    DS:  007b      ESI: 022dfa48  ES:  007b      EDI: 00000000
    CS:  0060      EIP: 02122cce  ERR: ffffffff  EFLAGS: 00010286 
 #4 [8c90ecf4] panic at 2122cce
 #5 [8c90ecfc] die at 21063bf
 #6 [8c90ed34] do_page_fault at 211bac5
 #7 [8c90ee14] error_code (via page_fault) at fffecede
    EAX: 00000080  EBX: 00000028  ECX: 8d719000  EDX: 8d719000  EBP:
feedf604 
    DS:  007b      ESI: 8c90ef20  ES:  007b      EDI: 00000000
    CS:  0060      EIP: f88afeb0  ERR: ffffffff  EFLAGS: 00010283 
 #8 [8c90ee50] mptctl_do_mpt_command at f88afeb0
 #9 [8c90eedc] mptctl_mpt_command at f88afd7b
#10 [8c90ef68] mptctl_ioctl at f88ae7bf
#11 [8c90ef94] sys_ioctl at 216cc13
#12 [8c90efc0] system_call at fffec219
    EAX: 00000036  EBX: 00000003  ECX: c0386d14  EDX: feedf5d0 
    DS:  007b      ESI: 08fa02e8  ES:  007b      EDI: feedcdd0
    SS:  007b      ESP: feeda5a8  EBP: feee1de8
    CS:  0073      EIP: 08056074  ERR: 00000036  EFLAGS: 00000282 

crash> ps | grep "> "
>     0      1   1  236b10b0  RU   0.0       0      0  [swapper]
>     0      1   3  236b05b0  RU   0.0       0      0  [swapper]
>     0      1   4  236b0030  RU   0.0       0      0  [swapper]
>     0      1   5  236b3670  RU   0.0       0      0  [swapper]
>     0      1   6  236b30f0  RU   0.0       0      0  [swapper]
>     0      1   7  236b2b70  RU   0.0       0      0  [swapper]
>     0      1   8  236b25f0  RU   0.0       0      0  [swapper]
>     0      1   9  236b2070  RU   0.0       0      0  [swapper]
>     0      1  10  236c56b0  RU   0.0       0      0  [swapper]
>     0      1  11  236c5130  RU   0.0       0      0  [swapper]
>     0      1  12  236c4bb0  RU   0.0       0      0  [swapper]
>     0      1  13  236c4630  RU   0.0       0      0  [swapper]
>     0      1  14  236c40b0  RU   0.0       0      0  [swapper]
>     0      1  15  236c76f0  RU   0.0       0      0  [swapper]
>   474    473   2  8b56adf0  RU   0.0    1708    144  mpt-status
>   722      1   0  8b2ae630  RU   0.0    3232    720  xinetd

crash> ps | grep UN
  31717      1   2  8ba36db0  UN   0.0       0      0  [kjournald]
  32382      1  10  8cb60c70  UN   0.0    2368    728  syslogd




This event sent from IssueTracker by djeffery  [SEG - Storage]
 issue 351790

Comment 4 Issue Tracker 2009-10-08 19:47:41 UTC
Event posted on 10-08-2009 03:41pm EDT by djeffery

The system crash is do to a bug in the mpt driver.  The function
mptctl_do_mpt_command()'s second parameter, mfPtr, is a pointer to a
userspace address.  This pointer is directly dereferenced by this
function:

  if (((MPIHeader_t *)(mfPtr))->MsgContext == 0x02012020) {

which is the instruction the kernel crashed at:

  0xf88afeb0 <mptctl_do_mpt_command+295>: cmpl   $0x2012020,0x8(%ebp)

While always unsafe, the driver can usually get away with this addressing
violation as the race window for the page to be unmapped is small on most
kernels.  But this is a largemem kernel.  Direct userspace address
accesses from the kernel don't work on largemem kernels.  So this bug can
be very rare to trigger on any non-largemem kernels while failing horribly
on systems like this one that use a largemem kernel.


This event sent from IssueTracker by djeffery  [SEG - Storage]
 issue 351790

Comment 14 Rob Evers 2009-10-28 14:39:08 UTC
Sathya,

Can you take a look at comment 4 and provide a patch similar to upstream:

mptctl_do_mpt_command()

	if (copy_from_user(mf, mfPtr, karg.dataSgeOffset * 4)) {
		printk(MYIOC_s_ERR_FMT "%s@%d::mptctl_do_mpt_command - "
			"Unable to read MF from mpt_ioctl_command struct @ %p\n",
			ioc->name, __FILE__, __LINE__, mfPtr);
		function = -1;
		rc = -EFAULT;
		goto done_free_mem;
	}

Rob

Comment 15 Andrius Benokraitis 2009-10-28 16:03:34 UTC
I'm adding Cisco Engineering to this bugzilla.

Question for Cisco Eng.: Can a B200 be used to test this or must this be tested on a B250 (due to the high memory)? If a B250 is required, Cisco and or Cisco/IT must assist in testing and verifying the fix.

Thanks!

Comment 16 Andrius Benokraitis 2009-10-28 16:16:26 UTC
BTW: I'm noticing this was found on an i386 arch - something that was not included in the current 4.8 hardware certification. Gary, can you confirm?

Comment 17 Gary Case 2009-10-28 16:37:07 UTC
That's correct. This system was never certified on 32-bit RHEL. 

-Gary

Comment 21 Rob Evers 2009-10-28 17:43:45 UTC
> While always unsafe, the driver can usually get away with this addressing
> violation as the race window for the page to be unmapped is small on most
> kernels.  But this is a largemem kernel.  Direct userspace address
> accesses from the kernel don't work on largemem kernels.  So this bug can
> be very rare to trigger on any non-largemem kernels while failing horribly
> on systems like this one that use a largemem kernel.

Since this bug can happen on other non-largemem kernels as well, this bug still needs to be fixed, independent of the cisco certification or the kernel that this bug was observed on.

Rob

Comment 22 Sathya Prakash 2009-10-29 05:11:54 UTC
I am adding Kashyap, Who is currently handling this driver for further analysis and action

Comment 23 kashyap 2009-10-29 08:58:34 UTC
Created attachment 366579 [details]
Proposed fix in mptclt.c for kernel crash 

Rob,

I have attached proposed patch for this issue. 
Kernel crash at
"if (((MPIHeader_t *)(mfPtr))->MsgContext == 0x02012020)" is valid. This part of code is not there at upstream. We can not remove this code because of some old user space application requires above condition check.

considering above line as must for RHEL4.8 I have provided patch which is better way of doing memory access of userspace from kernel address space.

Thanks,
Kashyap

Comment 25 Rob Evers 2009-10-29 14:19:23 UTC
Kashyap,

I had a few problems with this patch:

- The patch needs to be generated from rhel4.8.  There is context in the patch
  that differs from rhel4.8.
- The patch needs to apply cleanly to rhel4.8 using the -p1 option.

- The 4.8 code has the following snippet that appears to conflict with the one in this patch:

	/* Copy the request frame
	 * Reset the saved message context.
	 */
	if (copy_from_user(mf, mfPtr, karg.dataSgeOffset * 4)) {
		printk(KERN_ERR "%s@%d: mptctl_do_mpt_command - "
			"Unable to read MF from mpt_ioctl_command struct @ %p\n",
			__FILE__, __LINE__, mfPtr);
		rc = -EFAULT;
		goto done_free_mem;
	}

Please resolve these issues and re-attach the patch.

Rob

Comment 26 kashyap 2009-10-30 05:56:15 UTC
Created attachment 366762 [details]
recreated patch for RHEL4.8 kernel

Recreated patch for RHEL4.8 kernel. Please try this new patch with -p1 option.

Thanks,
Kashyap

Comment 31 Rob Evers 2009-11-04 15:09:00 UTC
Hi Kashyap,

Can you confirm if this problem exists or not in rhel5.  Looks like the driver has diverged in rhel5 a bit from rhel4.

Thanks, Rob

Comment 33 kashyap 2009-11-06 05:41:57 UTC
In RHEL5 this problem is not exists. 
- Kashyap

Comment 36 Vivek Goyal 2009-11-16 16:02:03 UTC
Committed in 89.16.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 38 Chris Ward 2009-12-04 13:10:14 UTC
The latest EUS .z stream bits can be found here:

http://people.redhat.com/~cward/4.8.z/kernel/

Please report back on the testing status of this bug as soon as possible.

The target END TESTING date for this 4.8.z kernel is
approximately December 15th, 2009 (2009-12-15)

When reporting your results, make sure to indicate
which version of the kernel build you tested.

Thank you for your expedited response!

Comment 44 Douglas Silas 2011-01-30 22:43:28 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A bug in the mptctl_do_mpt_command() function in the mpt driver may have resulted in crashes during boot on i386 systems with certain adapters using the mpt driver, and also running the hugemem kernel.

Comment 45 errata-xmlrpc 2011-02-16 15:22:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html


Note You need to log in before you can comment on or make changes to this bug.