564249 – [LSI 5.6 feat] update megaraid_sas to version 4.31

Bug 564249 - [LSI 5.6 feat] update megaraid_sas to version 4.31

Summary: [LSI 5.6 feat] update megaraid_sas to version 4.31

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.6
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	5.6
Assignee:	Tomas Henzl
QA Contact:	Storage QE
Docs Contact:
URL:
Whiteboard:
Duplicates (5):	499876 563083 563370 568570 602714 (view as bug list)
Depends On:
Blocks:	531800 536863 547220 554476 557597 566321 566322 566323 619362 619363 619365 649132 655106
TreeView+	depends on / blocked

Reported:	2010-02-12 04:58 UTC by bo yang
Modified:	2018-12-01 15:17 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	A bug was found in the way the megaraid_sas driver (for SAS based RAID controllers) handled physical disks and management IOCTLs (Input/Output Control). All physical disks were exported to the disk layer, allowing an oops in megasas_complete_cmd_dpc() when completing the IOCTL command if a timeout occurred. One possible trigger for this bug was running mkfs. This update resolves this issue by updating the megaraid_sas driver to version 4.31.
Clone Of:
Environment:
Last Closed:	2011-01-13 21:05:11 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Add the megaraid sas 4.27 driver patch to rhel5.6 (94.27 KB, patch) 2010-03-09 15:33 UTC, bo yang	no flags	Details \| Diff
attached the 4.31 patch (based on 4.27 driver) (10.71 KB, patch) 2010-06-15 15:24 UTC, bo yang	no flags	Details \| Diff
patch proposed by Fujitsu (11.57 KB, patch) 2010-06-23 08:38 UTC, Moritoshi Oshiro	no flags	Details \| Diff
diff file generated from comment-22 (1.14 KB, patch) 2010-07-15 17:12 UTC, Rob Evers	no flags	Details \| Diff
recreate the patch for rhel5.6 (from 4.17 to 4.31) (101.48 KB, patch) 2010-07-21 09:49 UTC, bo yang	no flags	Details \| Diff
change the max_sectors sysfs from bin attr to scsi host attr (2.65 KB, patch) 2010-07-28 20:02 UTC, bo yang	no flags	Details \| Diff
changed the max_sectors again (10.27 KB, patch) 2010-07-30 15:31 UTC, Tomas Henzl	no flags	Details \| Diff
take off the lock from complete_cmd_dpc routine (709 bytes, patch) 2010-07-30 21:01 UTC, bo yang	no flags	Details \| Diff
Show Obsolete (4) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0017	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update	2011-01-13 10:37:42 UTC

Description bo yang 2010-02-12 04:58:44 UTC

1. Add the CTIO support to the driver
2. Add the Online controller reset support to the driver
3. driver will auto update the devices add/delete to the system without megaraid application installed.

Comment 1 Andrius Benokraitis 2010-03-02 20:29:55 UTC

*** Bug 563370 has been marked as a duplicate of this bug. ***

Comment 3 bo yang 2010-03-09 15:33:16 UTC

Created attachment 398826 [details]
Add the megaraid sas 4.27 driver patch to rhel5.6

This is the patch to upgrade megaraid sas driver to version 4.27

Comment 9 Andrius Benokraitis 2010-06-01 20:18:02 UTC

Bo - you planning on submitting 4.27 or 4.28 as an update soon?

Comment 10 bo yang 2010-06-02 21:02:40 UTC

Andrius,

Do I need new IDs to submit the new driver?  I have 4.31.

Thanks,

Bo Yang

Comment 11 Andrius Benokraitis 2010-06-02 21:07:59 UTC

Tomas, have you started work on the 4.26 driver previously attached? 

Bo, the answer to your question depends on if Tomas has already submitted 4.26 driver internally...

Comment 12 Tomas Henzl 2010-06-03 08:53:16 UTC

(In reply to comment #10)

Post the additional set here, so that in result we have one patch for 4.27 and the other for 4.31. (We prefer smaller patches not on big patch).
Please post the new set also in upstream, it is better you do it now so we do not end in various problems like the last time.

Thanks,
Tomas

Comment 13 bo yang 2010-06-03 15:03:05 UTC

Tomas,

Does 4.27 already in RHEL5.6?  If yes, I only need to post the diff between 4.27 and 4.31.

Regards,

Bo Yang

Comment 14 Tomas Henzl 2010-06-03 15:16:41 UTC

(In reply to comment #13)
> Does 4.27 already in RHEL5.6?  If yes, I only need to post the diff between
> 4.27 and 4.31.

It is not. Use the patch you posted here as a base for your new incremental patch.
And yes post only the diff between 4.27 and 4.31.

Comment 15 Marizol Martinez 2010-06-04 14:50:16 UTC

*** Bug 568570 has been marked as a duplicate of this bug. ***

Comment 17 bo yang 2010-06-15 15:24:03 UTC

Created attachment 424193 [details]
attached the 4.31 patch (based on 4.27 driver)

1.      Add the three times online controller reset if driver detects the fw in failure state.
        also do online controller reset (OCR) before driver kill adapter to have the last chance
        to bringup FW.

2.      Add input parameter of max_sectors to 1MB.  Customer can change the max_sectors size upto
        1MB.

3.      Fix the issue of during the system reboot test, FW raises the interrupt as FW state change and
        system hang.

4.      Fix the issue of fw report the enclure as system PDs.

Comment 18 Tomas Henzl 2010-06-16 09:30:57 UTC

(In reply to comment #17)
> Created an attachment (id=424193) [details]
> attached the 4.31 patch (based on 4.27 driver)

There are some compile issues, please correct that,
thanks, tomas

# make -j16 drivers/scsi/megaraid/
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CC [M]  drivers/scsi/megaraid/megaraid_sas.o
drivers/scsi/megaraid/megaraid_sas.c: In function ‘sysfs_max_sectors_read’:
drivers/scsi/megaraid/megaraid_sas.c:3766: warning: initialization from incompatible pointer type
drivers/scsi/megaraid/megaraid_sas.c: At top level:
drivers/scsi/megaraid/megaraid_sas.c:3783: warning: initialization from incompatible pointer type
drivers/scsi/megaraid/megaraid_sas.c: In function ‘megasas_io_attach’:
drivers/scsi/megaraid/megaraid_sas.c:3850: error: ‘struct Scsi_Host’ has no member named ‘shost_dev’
make[1]: *** [drivers/scsi/megaraid/megaraid_sas.o] Error 1
make: *** [drivers/scsi/megaraid/] Error 2

Comment 19 Tomas Henzl 2010-06-22 11:50:36 UTC

Bo,
do you need some help with this?

Comment 20 Moritoshi Oshiro 2010-06-23 08:38:45 UTC

Created attachment 426198 [details]
patch proposed by Fujitsu

Comment from Fujitsu: 
I think something like this patch should work.

 - Fixes the compile error
 - Changes to use class_device_attribute instead of bin_attribute, so sysfs entry "max_sectors" will appear in /sys/class/scsi_host/host#/ (But, I'm not sure if it's the place LSI wants to have it.)
 - Adds class_device_remove_file() in megasas_detach_one()

Bo, Tomas,

Could you take a look at it, please?

Kei Tokunaga

Comment 21 bo yang 2010-06-23 11:31:45 UTC

I will get back to you today.

Comment 22 bo yang 2010-06-24 01:22:22 UTC

Tomas,

Please apply the following patch to fix this issue (I am also attach the patch):

diff -rupN old/drivers/scsi/megaraid/megaraid_sas.c new/drivers/scsi/megaraid/megaraid_sas.c
--- old/drivers/scsi/megaraid/megaraid_sas.c	2010-06-22 13:25:52.000000000 -0400
+++ new/drivers/scsi/megaraid/megaraid_sas.c	2010-06-22 13:53:11.000000000 -0400
@@ -3758,17 +3758,17 @@ static int megasas_start_aen(struct mega
 }
 
 static ssize_t
-sysfs_max_sectors_read(struct kobject *kobj, struct bin_attribute *bin_attr,
-			char *buf, loff_t off, size_t count)
+sysfs_max_sectors_read(struct kobject *kobj, char *buf,
+		loff_t off, size_t count)
 {
-	struct device *dev = container_of(kobj, struct device, kobj);
-
-	struct Scsi_Host *host = class_to_shost(dev);
-
+        
+	struct Scsi_Host *host = class_to_shost(container_of(kobj,
+				 struct class_device, kobj));
+        
 	struct megasas_instance *instance =
 				(struct megasas_instance *)host->hostdata;
 
-	count = sprintf(buf, "%u\n", instance->max_sectors_per_req);
+	count = sprintf(buf,"%u\n", instance->max_sectors_per_req);
 
 	return count+1;
 }
@@ -3847,8 +3847,9 @@ static int megasas_io_attach(struct mega
  	/*
 	 * Create sysfs entries for module paramaters
 	 */
-	error = sysfs_create_bin_file(&instance->host->shost_dev.kobj,
-			&sysfs_max_sectors_attr);
+ 	error = sysfs_create_bin_file(&instance->host->shost_classdev.kobj,
+		&sysfs_max_sectors_attr);
+
 
 	if (error) {
 		printk(KERN_INFO "megasas: Error in creating the sysfs entry"

Comment 23 Issue Tracker 2010-06-25 08:01:16 UTC

Event posted on 06-25-2010 05:01pm JST by moshiro

Hi, 

Forwarding Fujitsu's comment:
============================================================
> Tomas,
>
> Please apply the following patch to fix this issue (I am also attach
the
> patch):
>
> diff -rupN old/drivers/scsi/megaraid/megaraid_sas.c
> new/drivers/scsi/megaraid/megaraid_sas.c
> --- old/drivers/scsi/megaraid/megaraid_sas.c 2010-06-22
13:25:52.000000000
> -0400
> +++ new/drivers/scsi/megaraid/megaraid_sas.c 2010-06-22
13:53:11.000000000
> -0400
> @@ -3758,17 +3758,17 @@ static int megasas_start_aen(struct mega
>  }
>
>  static ssize_t
> -sysfs_max_sectors_read(struct kobject *kobj, struct bin_attribute
*bin_attr,
> -   char *buf, loff_t off, size_t count)
> +sysfs_max_sectors_read(struct kobject *kobj, char *buf,
> +  loff_t off, size_t count)
>  {
> - struct device *dev = container_of(kobj, struct device, kobj);
> -
> - struct Scsi_Host *host = class_to_shost(dev);
> -
> +        
> + struct Scsi_Host *host = class_to_shost(container_of(kobj,
> +     struct class_device, kobj));
> +        
>   struct megasas_instance *instance =
>      (struct megasas_instance *)host->hostdata;
>
> - count = sprintf(buf, "%u\n", instance->max_sectors_per_req);
> + count = sprintf(buf,"%u\n", instance->max_sectors_per_req);
>
>   return count+1;
>  }
> @@ -3847,8 +3847,9 @@ static int megasas_io_attach(struct mega
>    /*
>    * Create sysfs entries for module paramaters
>    */
> - error = sysfs_create_bin_file(&instance->host->shost_dev.kobj,
> -   &sysfs_max_sectors_attr);
> +  error = sysfs_create_bin_file(&instance->host->shost_classdev.kobj,
> +  &sysfs_max_sectors_attr);
> +
>
>   if (error) {
>    printk(KERN_INFO "megasas: Error in creating the sysfs entry"   

With this patch, it compiles just fine.

One minor thing is that "max_sectors" sysfs file shows its value twice.

# cat /sys/class/scsi_host/host0/max_sectors
640
640

Kei Tokunaga
============================================================


This event sent from IssueTracker by moshiro 
 issue 962043

Comment 24 Tomas Henzl 2010-06-25 08:55:42 UTC

(In reply to comment #23)
> 
> One minor thing is that "max_sectors" sysfs file shows its value twice.
> 
> # cat /sys/class/scsi_host/host0/max_sectors
> 640
> 640
 
This is new with this patch?
I've such a feeling I've already talked with Bo about this in the past.

Comment 25 Tomas Henzl 2010-06-28 14:39:38 UTC

(In reply to comment #22)
Bo,
I'm just working on the test kernel. Let us expect it will work properly for now. Could you please in the meantime refresh the bz#602714 and post a patch there also?
Thanks, Tomas

Comment 26 Moritoshi Oshiro 2010-06-29 06:23:17 UTC

Hi Tomas, 

(In reply to comment #24)
> (In reply to comment #23)
> > 
> > One minor thing is that "max_sectors" sysfs file shows its value twice.
> > 
> > # cat /sys/class/scsi_host/host0/max_sectors
> > 640
> > 640
> 
> This is new with this patch?
> I've such a feeling I've already talked with Bo about this in the past.    


From FJ:

=============================================================
>> One minor thing is that "max_sectors" sysfs file shows its value twice.
>>
>> # cat /sys/class/scsi_host/host0/max_sectors
>> 640
>> 640
>
> This is new with this patch?

It's not new with the patch posted on Comment 22, but with the one posted on Comment 17 as the max_sectors file was introduced by it.

> I've such a feeling I've already talked with Bo about this in the past.

Ah, OK.  Thanks for the information.

Comment 27 Rob Evers 2010-07-15 17:12:02 UTC

Created attachment 432149 [details]
diff file generated from comment-22

diff file generated from comment-22 in this bz enabling update to 4.31 to compile

Comment 28 Rob Evers 2010-07-15 17:14:48 UTC

Request Fujitsu commit to testing this update in rhel5.6

Comment 29 Rob Evers 2010-07-15 17:21:22 UTC

Bo,

While reviewing the updates here against what you proposed for the rhel5.3z update, I noticed a few snippets from the diffs that looked like they might apply to the other versions of the driver.

Can you review the 5.3 version after the patch in bz602714 against the updates here and determine if any updates should be applied in the other?

Thanks, Rob

What I noticed:

Patch for 5.3 contains:

/**
+ * megasas_check_reset_gen2 -   For controller reset check
+ * @regs:                               MFI register set
+ */
+static int
+megasas_check_reset_gen2(struct megasas_instance *instance, struct megasas_register_set __iomem * regs)
+{
+        if (instance->adprecovery != MEGASAS_HBA_OPERATIONAL) {
+                return 1;
+        }
+
+        return 0;
+}

Corresponding rhel5.6 patch snippet:

+/**
+ * megasas_check_reset_gen2 -   For controller reset check
+ * @regs:                               MFI register set
+ */
+static int
+megasas_check_reset_gen2(struct megasas_instance *instance, struct megasas_register_set __iomem * regs)
+{
+        return 0;
+}
+

Why the differences?

Patch for 5.3 contains:

    /* If we have already declared adapter dead, donot complete cmds */
-    if (instance->hw_crit_error)
+    if (instance->adprecovery == MEGASAS_HW_CRITICAL_ERROR )
        return;

Rhel5.6 version of patch has:

       /* If we have already declared adapter dead, donot complete cmds */
-       if (instance->hw_crit_error)
+       spin_lock_irqsave(&instance->hba_lock, flags);
+       if (instance->adprecovery == MEGASAS_HW_CRITICAL_ERROR ) {
+               spin_unlock_irqrestore(&instance->hba_lock, flags);
               return;
+       }
+       spin_unlock_irqrestore(&instance->hba_lock, flags);

Is the locking needed in 5.3z patch as well?

Comment 30 Martin Wilck 2010-07-15 17:27:52 UTC

(In reply to comment #28)
> Request Fujitsu commit to testing this update in rhel5.6    

Could you provide a test kernel now, please? I'd like to do the retest ASAP.
Note that we have requested the same fix also for RHEL6.0 (bug #607930).

Comment 31 Rob Evers 2010-07-15 19:40:37 UTC

Bo,

Can you please remove // style comments in your next rhel5.6 update for the megaraid driver?

Thanks, Rob

Comment 32 Rob Evers 2010-07-15 19:42:08 UTC

(In reply to comment #30)
> (In reply to comment #28)
> > Request Fujitsu commit to testing this update in rhel5.6    
> 
> Could you provide a test kernel now, please? I'd like to do the retest ASAP.
> Note that we have requested the same fix also for RHEL6.0 (bug #607930).    

Hopefully early next week.  I'm on pto tomorrow and am still trying to get these patches posted.

Comment 35 Rob Evers 2010-07-20 13:50:57 UTC

(In reply to comment #34)

Yes, please make and test the rhel5.6 patch against the current rhel5.6 sources.

Comment 36 bo yang 2010-07-21 09:49:27 UTC

Created attachment 433362 [details]
recreate the patch for rhel5.6 (from 4.17 to 4.31)

Rob,

This patch based on the src from:
http://people.redhat.com/jwilson/el5/206.el5/

Also take off the // in the src.

Comment 38 Rob Evers 2010-07-21 13:52:40 UTC

(In reply to comment #36)
> Created an attachment (id=433362) [details]
> recreate the patch for rhel5.6 (from 4.17 to 4.31)
> 
> Rob,
> 
> This patch based on the src from:
> http://people.redhat.com/jwilson/el5/206.el5/
> 
> Also take off the // in the src.    

Thanks Bo.  In the interest of getting this posted, and enabling the 5.3z patch in bz602714, I am obsoleting this patch.  Can get these changes in using a new bz for rhel5.6.  This is not a pressing issue at the moment.

Please use the rhel5.6 base with the three patches that are not obsoleted here at the code base for the new bz.

Thanks, Rob

Comment 39 Rob Evers 2010-07-21 17:49:01 UTC

Hi Bo,

It was brought to my attention that there is an outstanding rhel5 kdump issue as well:

https://bugzilla.redhat.com/show_bug.cgi?id=435698

Do you typically test kdump before providing driver updates?  This is pretty important for Red Hat.

I recommend testing kdump under load and repeatedly, perhaps 10 times?

Rob

Comment 40 Rob Evers 2010-07-23 14:42:19 UTC

(In reply to comment #38)
> (In reply to comment #36)
> > Created an attachment (id=433362) [details] [details]
> > recreate the patch for rhel5.6 (from 4.17 to 4.31)
> > 
> > Rob,
> > 
> > This patch based on the src from:
> > http://people.redhat.com/jwilson/el5/206.el5/
> > 
> > Also take off the // in the src.    
> 
> Thanks Bo.  In the interest of getting this posted, and enabling the 5.3z patch
> in bz602714, I am obsoleting this patch.  Can get these changes in using a new
> bz for rhel5.6.  This is not a pressing issue at the moment.
> 
> Please use the rhel5.6 base with the three patches that are not obsoleted here
> at the code base for the new bz.
> 
> Thanks, Rob    

Bo,

When I posted this patch internally, another issue came up.  The maintainer of the rhel5.6 kernel prefers that all the issues get addressed before this will get accepted in rhel5.6.  This means the rhel5.3-z update is still blocked until this gets accepted.

Can you:

- Post an update for the issue below on linux-scsi
- add another patch to this bz that addresses the issue for rhel5.6.  Please make this patch apply on top of the last patch you attached here.  I will remove the obsolete status of it.

Once this is done, all 4 patches can then be re-posted internally and all should be good.

Not sure if you opened another rhel5.6 bz for megaraid-sas yet but it won't be needed after this.

See issue below.

Thanks, Rob


The new issue that came up:

>
> +static ssize_t
> +sysfs_max_sectors_read(struct kobject *kobj, char *buf,
> +               loff_t off, size_t count)
> +{
> +    struct Scsi_Host *host =
> +        class_to_shost(container_of(kobj,
> +                        struct class_device, kobj));
> +
> +    struct megasas_instance *instance =
> +                (struct megasas_instance *)host->hostdata;
> +
> +    count = sprintf(buf, "%u\n", instance->max_sectors_per_req);
> +
> +    return count+1;
> +}
> +
> +static struct bin_attribute sysfs_max_sectors_attr = {
> +    .attr = {
> +        .name = "max_sectors",
> +        .mode = S_IRUSR|S_IRGRP|S_IROTH,
> +        .owner = THIS_MODULE,
> +    },
> +    .size = 7,
> +    .read = sysfs_max_sectors_read,
> +};

I do not think this needs to be a binary sysfs file.

It should be a scsi_host_template->shost_attrs attr. Maybe it could even be a default scsi_sysfs.c:scsi_sysfs_shost_attrs attr.

Comment 41 bo yang 2010-07-23 16:34:52 UTC

Rob,
 
I am trying to understand the issue.  If you can give me the details of the issue, I can test the changes and see the fix.

I am in China Now and am traving back to US.  Hope I can do the test on next Monday or Tuesday after I back to office.

Bo Yang

Comment 42 Tomas Henzl 2010-07-26 13:38:46 UTC

(In reply to comment #41)
> I am trying to understand the issue.  If you can give me the details of the
> issue, I can test the changes and see the fix.

The issues are 
a) that the max_sectors shouldn't be a binary sysfs file, but added under  scsi_host_template->shost_attrs attr. 

b) kdump. I haven't tested kdump with your latest patch (I'll do that). Have you done some kdump testing?

Comment 43 Tomas Henzl 2010-07-26 13:42:36 UTC

(In reply to comment #42)
> a) that the max_sectors shouldn't be a binary sysfs file, but added under 
> scsi_host_template->shost_attrs attr. 

For an example look here:
drivers/scsi/megaraid/megaraid_mbox.c
static struct scsi_host_template megaraid_template_g = {
...
	.shost_attrs			= megaraid_shost_attrs,

Comment 44 bo yang 2010-07-28 13:59:56 UTC

Tomas,

As I comment #41, I am doing the changes and test it.

Regards,

Bo Yang

Comment 45 bo yang 2010-07-28 20:02:21 UTC

Created attachment 435115 [details]
change the  max_sectors sysfs from bin attr to scsi host attr

Tomas,

Please find the attached patch for this changes.  I tested it works fine.

Comment 46 Tomas Henzl 2010-07-29 10:53:59 UTC

*** Bug 602714 has been marked as a duplicate of this bug. ***

Comment 47 Jiri Pirko 2010-07-29 10:58:27 UTC

as bug 602714 is closed as a duplicate of this one, moving z-stream proposals here.

Comment 50 Tomas Henzl 2010-07-29 11:36:32 UTC

Poasted today http://post-office.corp.redhat.com/archives/rhkernel-list/2010-July/msg01550.html

Comment 54 Tomas Henzl 2010-07-29 11:55:39 UTC

(In reply to comment #45)
> Created an attachment (id=435115) [details]
> change the  max_sectors sysfs from bin attr to scsi host attr
> 
> Tomas,
> 
> Please find the attached patch for this changes.  I tested it works fine.    

Thanks Bo

Comment 55 Tomas Henzl 2010-07-29 14:54:21 UTC

Hi Bo,

please could you answer (if possible today)
these question, originally in comment#29,
but they are still valid.

Thanks, Tomas

Patch for 5.3 contains:

/**
+ * megasas_check_reset_gen2 -   For controller reset check
+ * @regs:                               MFI register set
+ */
+static int
+megasas_check_reset_gen2(struct megasas_instance *instance, struct
megasas_register_set __iomem * regs)
+{
+        if (instance->adprecovery != MEGASAS_HBA_OPERATIONAL) {
+                return 1;
+        }
+
+        return 0;
+}

Corresponding rhel5.6 patch snippet:

+/**
+ * megasas_check_reset_gen2 -   For controller reset check
+ * @regs:                               MFI register set
+ */
+static int
+megasas_check_reset_gen2(struct megasas_instance *instance, struct
megasas_register_set __iomem * regs)
+{
+        return 0;
+}
+

Why the differences?

Patch for 5.3 contains:

    /* If we have already declared adapter dead, donot complete cmds */
-    if (instance->hw_crit_error)
+    if (instance->adprecovery == MEGASAS_HW_CRITICAL_ERROR )
        return;

Rhel5.6 version of patch has:

       /* If we have already declared adapter dead, donot complete cmds */
-       if (instance->hw_crit_error)
+       spin_lock_irqsave(&instance->hba_lock, flags);
+       if (instance->adprecovery == MEGASAS_HW_CRITICAL_ERROR ) {
+               spin_unlock_irqrestore(&instance->hba_lock, flags);
               return;
+       }
+       spin_unlock_irqrestore(&instance->hba_lock, flags);

Is the locking needed in 5.3z patch as well?

Comment 56 bo yang 2010-07-29 17:08:26 UTC

Tomas,

Patch for 5.3 contains:

/**
+ * megasas_check_reset_gen2 -   For controller reset check
+ * @regs:                               MFI register set
+ */
+static int
+megasas_check_reset_gen2(struct megasas_instance *instance, struct
megasas_register_set __iomem * regs)
+{
+        if (instance->adprecovery != MEGASAS_HBA_OPERATIONAL) {
+                return 1;
+        }
+
+        return 0;
+}

Corresponding rhel5.6 patch snippet:

+/**
+ * megasas_check_reset_gen2 -   For controller reset check
+ * @regs:                               MFI register set
+ */
+static int
+megasas_check_reset_gen2(struct megasas_instance *instance, struct
megasas_register_set __iomem * regs)
+{
+        return 0;
+}
+

Why the differences?

The changes was made after we submited the rhel5.6 patch (version 4.32) which is required for rhel5.6.  


The lock in 5.6 is required for 5.3z.  

Regards,

Bo Yang

Comment 57 bo yang 2010-07-29 17:15:52 UTC

Tomas,

You can send me the e-mail if you want to know the information immidiately or make changes.

Regards,

Bo Yang

Comment 58 Issue Tracker 2010-07-30 05:50:21 UTC

Event posted on 2010-07-30 14:50 JST by myamazak

Hi all,

I'll forward a comment from FJ.
----------------------------------------------------------------------
---
Hereafter, we will test test kernel (5.3 x86_64).
---

We tested the test kernel (5.3 x86_64) and confirmed there is no problem
with it.
Moreover, could you please make the PAE test kernel available for us?

---
New bugzilla tickets for z-stream have been created.
---

Thank you for the information.

Best Regards,
Masahiro Maeda
----------------------------------------------------------------------

Regards,
M Yamazaki



This event sent from IssueTracker by myamazak 
 issue 1000913

Comment 59 Tomas Henzl 2010-07-30 09:22:46 UTC

(In reply to comment #58)
> We tested the test kernel (5.3 x86_64) and confirmed there is no problem
> with it.
> Moreover, could you please make the PAE test kernel available for us?
> 

Uploaded to http://people.redhat.com/thenzl/602714/

Sorry my diskspace there is limited so I can't post all the kernels, let me know if you need some other arch.

Comment 60 Tomas Henzl 2010-07-30 11:46:36 UTC

(In reply to comment #56)
> 
> The lock in 5.6 is required for 5.3z.  
> 

Bo,
to sum it up the first change is OK, because something else has also changed - OK?

The second issue - the lock. I just checked the sources and the "instance->adrecovery" is often protected with a lock but also often it is not protected. 
It's really hard for me to be sure where the lock is not needed.

Please repost the 5.3.z patch with corrected locking. If a change is needed to the 5.6 patch please repost it also.

Thanks, Tomas

Comment 61 Tomas Henzl 2010-07-30 15:31:29 UTC

Created attachment 435585 [details]
changed the max_sectors again

Bo,
the latest version of the max_sectors sys has been rejected on our internal list. This one has passed. Please let me know if you are OK with this version, if possible retest in your lab. 
When you would want to change something else in your patch (locking) use the patch here so the changes in sysfs aren't lost.
Thanks, Tomas

Comment 62 bo yang 2010-07-30 21:01:15 UTC

Created attachment 435643 [details]
take off the lock from complete_cmd_dpc routine

The lock for MEGASAS_HW_CRITICAL_ERROR in complete_cmd_dpc routine will not make difference.  take it out.

Comment 63 bo yang 2010-07-30 21:16:00 UTC

Driver will claim MEGASAS_HW_CRITICAL_ERROR after 3 times OCR can't bring back the controller FW or controller reset can't be success.  In complete_cmd_dpc routine, put lock or no lock for the MEGASAS_HW_CRITICAL_ERROR will not make difference.  take it out.

Comment 66 Tomas Henzl 2010-08-01 18:22:06 UTC

(In reply to comment #62)
> The lock for MEGASAS_HW_CRITICAL_ERROR in complete_cmd_dpc routine will not
> make difference.  take it out.

Thanks, besides that I was hoping you'll look also at the other places where the locking differs - in megasas_mgmt_ioctl_aen the locking also differs. Please look at this too.

Pleas look also at the comment#61 and answer the questions there.

Comment 67 Jarod Wilson 2010-08-02 21:48:02 UTC

in kernel-2.6.18-210.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 69 Issue Tracker 2010-08-03 01:36:26 UTC

Event posted on 08-03-2010 10:36am JST by moshiro

From FJ: We confirmed the PAE kernel works fine. 


This event sent from IssueTracker by moshiro 
 issue 1000913

Comment 70 Tomas Henzl 2010-08-03 10:46:03 UTC

Bo,
we are still waiting on your reply to comment#61 and #66

Thanks, Tomas

Comment 71 bo yang 2010-08-03 13:50:42 UTC

Tomas,

Here are the summary why 5.3z don't have the lock, but 5.6 have the lock.

in 5.3z --> base 4.01, no MEGASAS_HW_CRITICAL_ERROR check in ioctl and ioctl_aen.  
in 5.6 ---> base 4.17. the MEGASAS_HW_CRITICAL_ERROR check added to fix some issues between 4.01 to 4.17.  

Please let me know if we need to port the changes between version 4.01 to 4.17 to 5.3z?

Thanks,

Bo Yang

Comment 72 bo yang 2010-08-03 13:56:21 UTC

Tomas,

Here are the summary why 5.3z don't have the MEGASAS_HW_CRITICAL_ERROR check, but 5.6 have the MEGASAS_HW_CRITICAL_ERROR check.

in 5.3z --> base 4.01, no MEGASAS_HW_CRITICAL_ERROR check in ioctl and
ioctl_aen.  
in 5.6 ---> base 4.17. the MEGASAS_HW_CRITICAL_ERROR check added to fix some
issues between 4.01 to 4.17.  

Please let me know if we need to port the changes between version 4.01 to 4.17
to 5.3z?

Thanks,

Bo Yang

Comment 73 Tomas Henzl 2010-08-03 14:48:05 UTC

(In reply to comment #72)
Bo,

I wanted to know that the you know about the differences, it is ok now.

Please could you reply also on the comment#61 and look at the patch attached there?

Thanks

Comment 75 Issue Tracker 2010-08-19 04:50:44 UTC

Event posted on 08-19-2010 01:50pm JST by moshiro

Comment from FJ regarding each fix request:
--- 
We did test on a 5.6 kernel (210.el5) and confirmed it works OK.

Here is the status of each stream.

 5.3 hotfix:
   - Official hotfix provided to Fujitsu.
   - DONE

 5.3.z:
   - Early errata provided to Fujitsu.
   - Fujitsu tested and confirmed the early errata works OK.
   - Fujitsu waiting for the official errata.

 5.5.z:
   - Fujitsu waiting for early errata and/or the official errata.

 5.6:
   - Fujitsu tested and confirmed 210.el5 works OK.

Kei Tokunaga 
---


This event sent from IssueTracker by moshiro 
 issue 1000913

Comment 76 Tomas Henzl 2010-08-25 11:33:53 UTC

*** Bug 563083 has been marked as a duplicate of this bug. ***

Comment 77 Tomas Henzl 2010-09-03 13:24:56 UTC

Bo,
I think that this part of the patch is wrong:
@@ -3432,20 +4209,30 @@ megasas_suspend(struct pci_dev *pdev, pm
 	instance->instancet->disable_intr(instance->reg_set);
 	free_irq(instance->pdev->irq, instance);
 
+	scsi_host_put(host);
 	pci_save_state(pdev);
 	pci_disable_device(pdev);

The scsi_host_put decrements the host's refcount, I think it makes no sense here.
---------------

A similar case is the megasas_resume. When the megasas_resume fails and it is not the root device and then someone wants to rmmod the driver, it ends up calling scsi_host_put(host) in megasas_resume and in megasas_detach_one. 
Calling scsi_host_put(host) twice seems to be wrong. The same is for calls to pci_free_consistent, I think this is not needed in when the resume fails.
Comments?

Comment 78 bo yang 2010-09-15 02:18:14 UTC

Tomas,
 
You are right. it is the extra.

Thanks,

Comment 79 Tomas Henzl 2010-09-29 10:34:17 UTC

*** Bug 435698 has been marked as a duplicate of this bug. ***

Comment 80 Moritoshi Oshiro 2010-09-30 08:36:05 UTC

=== In Red Hat Customer Portal Case 00323953 ===
--- Comment by Oshiro, Moritoshi on 9/30/2010 5:36 PM ---

Hi Bo-san, 

Fujitsu would like to wait until the code becomes complete and then ask for a test package. Could you please make a test package again once you are done with code cleaning up?

Comment 83 bo yang 2010-10-07 21:09:59 UTC

Tomas,

Do I need to create a new patch?  if yes.  Which is the base src?

Thanks,

Bo Yang

Comment 84 Tomas Henzl 2010-10-08 09:34:09 UTC

(In reply to comment #83)
> 
> Do I need to create a new patch?  if yes.  Which is the base src?
> 
If you are asking because of comment#77,78 then not, I created the patch myself.

You can download this test kernel from http://people.redhat.com/jwilson/el5

Comment 92 Martin Prpič 2010-11-11 14:00:02 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A bug was found in the way the megaraid_sas driver (for SAS based RAID controllers) handled physical disks and management IOCTLs (Input/Output Control). All physical disks were exported to the disk layer, allowing an oops in megasas_complete_cmd_dpc() when completing the IOCTL command if a timeout occurred. One possible trigger for this bug was running mkfs. This update resolves this issue by updating the megaraid_sas driver to version 4.31.

Comment 93 Tomas Henzl 2010-11-18 14:36:24 UTC

*** Bug 499876 has been marked as a duplicate of this bug. ***

Comment 96 John Jarvis 2010-11-18 20:53:20 UTC

This enhancement request was evaluated by the full Red Hat Enterprise Linux 
team for inclusion in a Red Hat Enterprise Linux minor release.   As a 
result of this evaluation, Red Hat has tentatively approved inclusion of 
this feature in the next Red Hat Enterprise Linux Update minor release.   
While it is a goal to include this enhancement in the next minor release 
of Red Hat Enterprise Linux, the enhancement is not yet committed for 
inclusion in the next minor release pending the next phase of actual 
code integration and successful Red Hat and partner testing.

Comment 97 IBM Bug Proxy 2010-11-19 02:21:39 UTC

------- Comment From jmtt.com 2010-11-18 21:16 EDT-------
This driver was present during my verification of Bug 60123   -  [5.6 FEAT] x3690 X5 - Megalon Tracker .  That test was conducted on the partner beta drop which contained this version of the driver:

[root@elm3a88 lsscsi-0.24b1]# modinfo megaraid_sas
filename:       /lib/modules/2.6.18-229.el5/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    LSI MegaRAID SAS Driver
author:         megaraidlinux
version:        00.00.04.31-RH1
license:        GPL
srcversion:     D2465CC945888034BE573A9
vermagic:       2.6.18-229.el5 SMP mod_unload gcc-4.1

Comment 99 Raghavendra Biligiri 2010-11-30 11:08:08 UTC

Verified that megaraid_sas driver (version - 4.31-RH1) is included in RHEL5.6-Snapshot2

Comment 101 errata-xmlrpc 2011-01-13 21:05:11 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Note You need to log in before you can comment on or make changes to this bug.

andriusb
bdonahue
bmr
bugproxy
bzeranski
coughlan
cward
dhoward
jjarvis
jpirko
ltroan
martinez
martin.wilck
mfuruta
moshiro
narayanan_d
nobody+PNT0273897
qcai
raghavendra_biligiri
revers
sbest
tao