Bug 363961

Summary: [Areca 5.2 feat] Update arcmsr to 1.20.00.15.RH
Product: Red Hat Enterprise Linux 5 Reporter: Tomas Henzl <thenzl>
Component: kernelAssignee: Tomas Henzl <thenzl>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: andriusb, coughlan, nick.cheng
Target Milestone: ---Keywords: FutureFeature, OtherQA
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 15:00:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 262201, 429877    
Attachments:
Description Flags
The patch for Areca driver, arcmsr, against RHEL5.2
none
The patch to fix the issue of handling unknown command and let kernel process command timeout
none
The patch for Kconfig in arcmsr part
none
The patch for Kconfig in arcmsr part
none
Areca modified patch1
none
Modified patch to fix the issue of handling unknown command and let kernel process command timeout
none
patch files of arcmsr none

Description Tomas Henzl 2007-11-02 15:12:16 UTC
Nick,

we are working on RHEL 5.2. I think that we should update the arcmsr driver with
some bugfixes from upstream. 
Could you provide tha patch for us ?

Comment 1 Nick Cheng 2007-11-05 02:00:58 UTC
Tomas,
There are some patches that is submitted by others after I upstreamed to kernel org.
Do you just need the my patch or other sequential patches?

Comment 2 Tomas Henzl 2007-11-05 10:26:05 UTC
Nick,
we don't need everything backported. On the other hand it would be nice to keep
the driver up to date as much as possible. I'm mostly interested in bugfixes and
new HW support and it doesn't matter who posted the patches upstream.
When you send a patch to us I would also like to know if it is in upstream and
what does it relate to (bugfix, new hw support etc.).

Comment 3 Nick Cheng 2007-11-06 03:45:20 UTC
Tomas,
I am in a dilemma as to the patch contents.
There are some changes that accompany the functions in kernel 2.6.23 or later.
If I have to backport to kernel 2.6.18-XXXX in RedHat, I need to modify it and
therefore it will not the same version that I upstream to kernel org.
Could you agree to that? 
Thank you,

Comment 4 Tomas Henzl 2007-11-06 10:21:57 UTC
Nick,
because of that we are not backporting everything, but only important parts
(bugfixes, new hw support). We should try to be with the sources so close as
possible to the upstream, but sometimes it is not feasible, then part of the
code has to be rewritten. This exactly is what is kindly expected from you when
you backport your driver to RHEL(4-5).

Simply, yes I agree with that.

Comment 5 Nick Cheng 2007-11-07 03:21:03 UTC
OK,
I got it. I will make a version for RH ASAP.


Comment 6 RHEL Program Management 2007-11-07 07:25:14 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Tomas Henzl 2007-11-21 09:18:27 UTC
Nick,
do you need any help with posting the patch from me ? Or is there something
where I could help with ?

Comment 8 Nick Cheng 2007-11-21 09:56:10 UTC
Tomas,
Sorry, I am almost occupied by other stuffs.
Do you mind if I make a patch file for all arcmsr-relevant files, including
arcmsr.h, arcmsr_hba.c, arcmsr_attr.c, Kconfig, Kconfig.arcmsr,
ChangeLog.arcmsr, pci_ids, put in the same directory ?
I don't have free machine to install RHEL5.1 to make to patch.
Thank you, 

Comment 9 Tomas Henzl 2007-11-21 10:16:04 UTC
Nick,
it should be working with our kernel (2.6.18), the correct directories I'm able
to find myself. I hope that afterwards you will able to find a free machine to
check  if the patched kernel is working with the new hardware.
Thank you.

Comment 10 Nick Cheng 2007-11-26 11:28:35 UTC
Created attachment 268831 [details]
The patch for Areca driver, arcmsr, against RHEL5.2

Comment 11 Nick Cheng 2007-12-05 08:35:44 UTC
Created attachment 277881 [details]
The patch to fix the issue of handling unknown command and let kernel process command timeout

Comment 12 Tomas Henzl 2007-12-07 14:17:55 UTC
Nick,
there is a lof of changes in the Kconfig part of your patch. (for example
+config SCSI_SNI_53C710) Please post the part with the chandes for Kconfig ONLY
needed for your driver as a separate file. The part which belongs to
arcmsr*.{h,c} is OK. 

Comment 13 Nick Cheng 2007-12-08 03:38:22 UTC
Really?
I just checked it twice.
But I don't see any change on it.
Actually there is no any changes on kconfig.arcmsr file this time.
Do you still need the patch for kconfig.arcmsr? 

Comment 14 Tomas Henzl 2007-12-10 09:47:03 UTC
Nick,
I'm sorry I should have said it more precise. What I meant is for example this
from your first patch (https://bugzilla.redhat.com/attachment.cgi?id=268831):

diff -Naurp arcmsr-1.20.00.14/Kconfig arcmsr-1.20.00.15.RH-71106/Kconfig
--- arcmsr-1.20.00.14/Kconfig	2007-11-26 15:40:54.000000000 +0800
+++ arcmsr-1.20.00.15.RH-71106/Kconfig	2007-11-26 15:40:54.000000000 +0800
@@ -3,11 +3,14 @@ menu "SCSI device support"
 config RAID_ATTRS
 	tristate "RAID Transport Class"
 	default n
+	depends on BLOCK
 	---help---
 	  Provides RAID
 
 config SCSI
 	tristate "SCSI device support"
+	depends on BLOCK
+	select SCSI_DMA if HAS_DMA
 	---help---
 	  If you want to use a SCSI hard disk, SCSI tape drive, SCSI CD-ROM or
 	  any other SCSI device under Linux, say Y and make sure that you know
@@ -27,6 +30,17 @@ config SCSI
 	  However, do not compile this as a module if your root file system
 	  (the one containing the directory /) is located on a SCSI device.
 
+config SCSI_DMA
+	bool
+	default n
+
+config SCSI_TGT
+	tristate "SCSI target support"
+	depends on SCSI && EXPERIMENTAL
+	---help---
+	  If you want to use SCSI target mode drivers enable this option.
+	  If you choose M, the module will be called scsi_tgt.
+
 config SCSI_NETLINK

Comment 15 Nick Cheng 2007-12-11 02:39:37 UTC
Created attachment 283511 [details]
The patch for Kconfig in arcmsr part

Comment 16 Nick Cheng 2007-12-11 02:39:48 UTC
Created attachment 283521 [details]
The patch for Kconfig in arcmsr part

Comment 17 Tomas Henzl 2007-12-11 14:50:25 UTC
Nick,
you are going to remove with your patch the shutdown function from your driver
 	.remove			= arcmsr_remove,
-	.shutdown		= arcmsr_shutdown
Because I was not able to find it upstream I'd like to ask 
- why do you remove this function ?
- is the change posted upstream and where ?
Are there also some other changes which are not in the upstream kernel ?

From the Kconfig -
+	  on condition that the kernel version is greater than 2.6.19.
RHEL5 is based on kernel 2.6.18, does it mean that the SCSI_ARCMSR_AER is not
used in your patch and we shouldn't change the Kconfig file ?



Comment 18 Tomas Henzl 2007-12-12 10:31:32 UTC
>With regard to first question, shutdown function should be there.
>Would you mind if I post you a whole new patch?
No need for that I'll post an updated version here hope soon,so you can check it.

Besides that is in the patch something what is not accepted upstream ?

Comment 19 Nick Cheng 2007-12-12 10:55:19 UTC
Sorry, I am not very sure what you mean.
But the version that I upstream is not applicable to RHEL5.X because there are
some changes which are coherent to kernel 2.619 or later.
Therefore, I have to modify it to fit the kernel version in RHEL5.X. 

Comment 20 Tomas Henzl 2007-12-12 14:09:38 UTC
It's OK that have modified the sources. What I meant are new features not
accepted upstream, something like this could cause as problems in the future,
that's why I'm asking. 
I've posted now two modified patches, so please check them. Besides the shutdown
function I've removed the changes in Kconfig(arcmsr).


Comment 21 Tomas Henzl 2007-12-12 14:11:46 UTC
Created attachment 285711 [details]
Areca modified patch1

Comment 22 Tomas Henzl 2007-12-12 14:12:54 UTC
Created attachment 285721 [details]
Modified  patch to fix the issue of handling unknown command and let kernel process command timeout

Comment 23 Tomas Henzl 2007-12-17 09:41:14 UTC
Nick,
I've posted it to our list. Have you posted the patch form #11 (The patch to fix
the issue of handling unknown command and let kernel process command timeout) to
lkml ? I'm not able to find it there.



Comment 24 nickcheng 2007-12-17 09:59:50 UTC
Tomas,
Sorry. I am fully occupied by other case currently.
I will upstream to kernel org tomorrow.
Thank you for your patience,

Comment 26 nickcheng 2007-12-21 11:24:44 UTC
Tomas,
I found a lot of changes that I never saw before on
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git.
I am still waiting for the response from the scsi maintainer.
It will influence the patch that I will submit. 
Sorry for my delay.

Comment 27 Nick Cheng 2008-01-02 10:28:56 UTC
Tomas,
I have got the response from the maintainer and followed the version on
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git.
But I found a questionable segment of code.
I need to clarify it.
Sorry about that

Comment 28 Tomas Henzl 2008-01-02 11:07:36 UTC
Nick,
if you want to change the posted patch please keep in mind that the our deadline
is in about one week.

Comment 29 Tomas Henzl 2008-01-09 11:44:25 UTC
(In reply to comment #24)
> Sorry. I am fully occupied by other case currently.
> I will upstream to kernel org tomorrow.
Nick,
it's important (it will be better accepted in our list) when it is at least
posted on lkml. Have you already done it ? If yes, where can I find the post ?


Comment 30 nickcheng 2008-01-09 13:00:00 UTC
Tomas,
It is really complicated.
Last week I would submit the patch to kernel.org but I found some changes are
not acknowledged by me and than submitted directly to kernel.org.
Among the changes, I found two places which are questionable.
I tried to discuss with the submitter but don't get the response as yet. 
Therefore, I have to correct it and then do the long-term test on my site in
case of any accidents.
I know the deadline is approaching and I will submit my version to kernel.org
and Red Hat tomorrow.
But I have to remind you the patch to RedHat based on the version that is
accepted by kernel.org will be different with the one that I submitted to you. 
Sorry about that,



Comment 31 nickcheng 2008-01-10 10:47:25 UTC
Created attachment 291262 [details]
patch files of arcmsr

Comment 32 Tomas Henzl 2008-01-10 12:52:25 UTC
Nick,
what's has been changed in your patch ? 
Are the changes accepted upstream ?
And last, but maybe most important - how important are the changes (data
corruption,etc..), because I'm now not able to catch our internal beta deadline,
so the has to be important to be accepted.

Comment 33 nickcheng 2008-01-10 13:07:29 UTC
Tomas,
This is the patch against RHEL5.1.
Most of parts are accepted by upstream.
But others  changes that fix the scsi error handling and unknown commands
processing on new model HBA are not accepted.
These are extremely important.

Comment 34 Tomas Henzl 2008-01-10 13:36:46 UTC
Nick,
I think that unknown command processing is not new
https://bugzilla.redhat.com/attachment.cgi?id=285721 and I hope it'll go into
RHEL5.2 beta, so as your new hw support.
The question is - what is new in your latest patch against the previous stand ?

As I've written in Comment #23 your previous patch is (I hope at least) in our
RHEl5.2 internal beta.

Comment 35 nickcheng 2008-01-11 01:43:31 UTC
Tomas,
The latest patch includes,
(1). fix the portability problems
(2). fix the iomem release on type B
(3). add return -ENOMEM in case of ioremap() failing
(4). modify acb->devstate[i][j] as ARECA_RAID_GONE in the initial stage instead
of ARECA_RAID_GOOD in arcmsr_alloc_ccb_pool()
(5). fix the assignment of arcmsr_cdb->Context as (unsigned long)arcmsr_cdb
(6). add the checking state of (outbound_intstatus &
ARCMSR_MU_OUTBOUND_HANDLE_INT) == 0 in arcmsr_handle_hba_isr()
(7). fix the scsi error handling in arcmsr_polling_hbb_ccbdone()

No.1 ~ No.3 are from kernel.org fellow and the others are from us.
Of course, the first three have been upstreamed.
The rest were submitted by me yesterday.


Comment 36 Tomas Henzl 2008-01-11 15:31:46 UTC
Nick,
is some part of your latest patch fixing a problem introduced with your previous
patch ?
That means - is there a regression in RHEL5.2 compared to RHEL5.1 ?
(I'm currently not aware of any problem, I'm only questioning.)

Is any of your latest patches needed to fix a problem with 
1) data corruption ?
2) system crash/hang ? 

If you'll find something which corresponds to this, please explain it thoroughly
 and elaborate a new patch only with the needed part.

For the other patches we'll open a new bugzilla and move them into RHEL5.3


Comment 37 nickcheng 2008-01-12 05:42:17 UTC
Tomas,
Previous patch does not introduce any bugs.
In other words, the latest patch does not fix the bugs which are provoked by
previous patch.
It fixes the inherent bugs in arcmsr-1.20.00.15.RH.
The patch does fix the probelms with data corruption and system crash/hang in
certain cases, such as certain motherboards with multi-devices and less memory
and long-term running.
These fixes are for these issues.
Do you still need more detailed explanation?
We wish the patch could be included in RHEL5.2/RHEL4.7, because the fixs are not
minor.
If you still have any comments, please let me know.
Thanks for your time and patience,

 


Comment 38 Tomas Henzl 2008-01-14 10:48:39 UTC
Nick,
thanks for your effort, but I still have some points.

I think I should explain my situation better: the deadline for internal beta has
passed and only data corruption+system crash/hang fixes are accepted. And it's
not me who accepts/rejects those patches I only post them to our list. For
example I somehow could not follow how 
'(1). fix the portability problems' could be causing a system crash.
Please be so patient with me and go through the seven parts of your patch and
either drop the part from your patch or write a explanation why is this
particular part important and how it's causing data corruption/system crash.

Example:
(1). fix the portability problems - several customers reported system hang when
not applied  because of ...
(2). .... and so on until (7)

Then remove from the patch the dropped parts and post the reduced version again.


> We wish the patch could be included in RHEL5.2/RHEL4.7, because the fixs are not
> minor.
This bugzilla is for RHEL5.2 if want to include the patch also in RHEL4.7, the
only way is to open a new bugzilla for it, and please do it today as the
planning freeze for 4.7 is (I think) tomorrow. Send me then the bz number, so
I'm aware of it.



Comment 39 Tomas Henzl 2008-01-14 14:04:28 UTC
Nick,
also this 
-		ver_addr = pci_alloc_consistent(acb->pdev, 1032, &buf_handle);
-		if (!ver_addr) {
+		tmp = kmalloc(1032, GFP_KERNEL|GFP_DMA);
+		ver_addr = (unsigned long *)tmp;
itself is a change in the right direction (I've sent to you a letter about this
Jun.6.2007.) but it looks for me 'only' as a waste of resources and not as an
critical error.

Comment 40 Don Zickus 2008-01-14 20:05:08 UTC
in 2.6.18-68.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 41 nickcheng 2008-01-15 02:10:44 UTC
Hi Tomas,
About Comment #39, I found if I use pci_alloc_consistent and then open Areca AP,
the memory usage will increase gradually. But if I use the kmalloc, it will not.
This was ever reported by Falconstor and Pogolinux.
Especially for Falconstor, if their AP is open, it will use up the memory
overnight and the system will hang. 
We though it only happens on specific version and the  pci_alloc_consistent()
was also proposed by the kernel org fellow as well then there is no need to
change, but to my surprise it happens again.

I check the kernel source for this piece of code of pci_alloc_consistent but
found it has been unchanged for a long time, therefore we guest it malfunctions
on the memory management or somewhere in the kernel code.

This is why this time I decide to upstream to RedHat and kernel org.
The most important is it will hang the system for long-term running.


 

Comment 42 nickcheng 2008-01-15 04:38:39 UTC
Tomas,
Thanks for your patience.
I would explain one by one and let you decide which is acceptable and which is not.
(1). fix the portability problems
[Discription]
This fixs the endian issue. This will hang the system, which defines big-endian,
while I/O.

(2). fix the iomem release on type B
[Discription]
This fixs the io-memory allocation and release issue on Type B. It could cause
memory leakage on the heavy  load system and leads the system unstable. 

(3). add return -ENOMEM in case of ioremap() failing
[Discription]
It fixs the system unstable issue while iomemory allocation fails.

(4). modify acb->devstate[i][j] as ARECA_RAID_GONE in the initial stage instead
of ARECA_RAID_GOOD in arcmsr_alloc_ccb_pool()
[Discription]
This fixs the exiting volumes' initial states. The wrong setting could lead to
system hang. 

(5). fix the assignment of arcmsr_cdb->Context as (unsigned long)arcmsr_cdb
[Discription]
This fixs the scsi command allocation to a predefined structure. It could lead
to system unstable while the AP accesses this address.

(6). add the checking state of (outbound_intstatus &
ARCMSR_MU_OUTBOUND_HANDLE_INT) == 0 in arcmsr_handle_hba_isr()
[Discription]
This fixs the interrupt routine to handle the controller abnormal interrupts. If
not, it could not handle share IRQ  and leads to system crash. 

(7). fix the scsi error handling in arcmsr_polling_hbb_ccbdone()
[Discription]
It fixs the host scsi error handling callback in case of I/O on failed volumes.
If not, the continual I/O on failed volume could hang the system. 


Comment 43 Tomas Henzl 2008-01-15 09:47:01 UTC
Nick,
thanks for the explanation. (No more questions now.)

Please do not forget to create bugzilla for 4.7 even without having the patch
right now.




Comment 44 Tomas Henzl 2008-01-15 16:06:09 UTC
Nick,
I've made minor changes to your previous patches, it would help me if you could
create the patch against our latest sources.I'm going to send the files to you
separately asap. 
Btw. I could swear that I've seen this part somewhere, but now it magically 
vanished from the patch
-		ver_addr = pci_alloc_consistent(acb->pdev, 1032, &buf_handle);
-		if (!ver_addr) {
+		tmp = kmalloc(1032, GFP_KERNEL|GFP_DMA);
+		ver_addr = (unsigned long *)tmp;

Comment 45 Andrius Benokraitis 2008-01-23 15:58:37 UTC
Moving back to MODIFIED. At the request of lwang, I created a new BZ (bug
429877) that should be used for the follow-up patch (The patch proposed after
what was committed to -68 kernel).

Please move the continuing discussion to bug 429877.

Comment 48 errata-xmlrpc 2008-05-21 15:00:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html