Bug 363961
Summary: | [Areca 5.2 feat] Update arcmsr to 1.20.00.15.RH | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Tomas Henzl <thenzl> |
Component: | kernel | Assignee: | Tomas Henzl <thenzl> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.2 | CC: | andriusb, coughlan, nick.cheng |
Target Milestone: | --- | Keywords: | FutureFeature, OtherQA |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | RHBA-2008-0314 | Doc Type: | Enhancement |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-05-21 15:00:35 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 262201, 429877 | ||
Attachments: |
Description
Tomas Henzl
2007-11-02 15:12:16 UTC
Tomas, There are some patches that is submitted by others after I upstreamed to kernel org. Do you just need the my patch or other sequential patches? Nick, we don't need everything backported. On the other hand it would be nice to keep the driver up to date as much as possible. I'm mostly interested in bugfixes and new HW support and it doesn't matter who posted the patches upstream. When you send a patch to us I would also like to know if it is in upstream and what does it relate to (bugfix, new hw support etc.). Tomas, I am in a dilemma as to the patch contents. There are some changes that accompany the functions in kernel 2.6.23 or later. If I have to backport to kernel 2.6.18-XXXX in RedHat, I need to modify it and therefore it will not the same version that I upstream to kernel org. Could you agree to that? Thank you, Nick, because of that we are not backporting everything, but only important parts (bugfixes, new hw support). We should try to be with the sources so close as possible to the upstream, but sometimes it is not feasible, then part of the code has to be rewritten. This exactly is what is kindly expected from you when you backport your driver to RHEL(4-5). Simply, yes I agree with that. OK, I got it. I will make a version for RH ASAP. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Nick, do you need any help with posting the patch from me ? Or is there something where I could help with ? Tomas, Sorry, I am almost occupied by other stuffs. Do you mind if I make a patch file for all arcmsr-relevant files, including arcmsr.h, arcmsr_hba.c, arcmsr_attr.c, Kconfig, Kconfig.arcmsr, ChangeLog.arcmsr, pci_ids, put in the same directory ? I don't have free machine to install RHEL5.1 to make to patch. Thank you, Nick, it should be working with our kernel (2.6.18), the correct directories I'm able to find myself. I hope that afterwards you will able to find a free machine to check if the patched kernel is working with the new hardware. Thank you. Created attachment 268831 [details]
The patch for Areca driver, arcmsr, against RHEL5.2
Created attachment 277881 [details]
The patch to fix the issue of handling unknown command and let kernel process command timeout
Nick, there is a lof of changes in the Kconfig part of your patch. (for example +config SCSI_SNI_53C710) Please post the part with the chandes for Kconfig ONLY needed for your driver as a separate file. The part which belongs to arcmsr*.{h,c} is OK. Really? I just checked it twice. But I don't see any change on it. Actually there is no any changes on kconfig.arcmsr file this time. Do you still need the patch for kconfig.arcmsr? Nick, I'm sorry I should have said it more precise. What I meant is for example this from your first patch (https://bugzilla.redhat.com/attachment.cgi?id=268831): diff -Naurp arcmsr-1.20.00.14/Kconfig arcmsr-1.20.00.15.RH-71106/Kconfig --- arcmsr-1.20.00.14/Kconfig 2007-11-26 15:40:54.000000000 +0800 +++ arcmsr-1.20.00.15.RH-71106/Kconfig 2007-11-26 15:40:54.000000000 +0800 @@ -3,11 +3,14 @@ menu "SCSI device support" config RAID_ATTRS tristate "RAID Transport Class" default n + depends on BLOCK ---help--- Provides RAID config SCSI tristate "SCSI device support" + depends on BLOCK + select SCSI_DMA if HAS_DMA ---help--- If you want to use a SCSI hard disk, SCSI tape drive, SCSI CD-ROM or any other SCSI device under Linux, say Y and make sure that you know @@ -27,6 +30,17 @@ config SCSI However, do not compile this as a module if your root file system (the one containing the directory /) is located on a SCSI device. +config SCSI_DMA + bool + default n + +config SCSI_TGT + tristate "SCSI target support" + depends on SCSI && EXPERIMENTAL + ---help--- + If you want to use SCSI target mode drivers enable this option. + If you choose M, the module will be called scsi_tgt. + config SCSI_NETLINK Created attachment 283511 [details]
The patch for Kconfig in arcmsr part
Created attachment 283521 [details]
The patch for Kconfig in arcmsr part
Nick, you are going to remove with your patch the shutdown function from your driver .remove = arcmsr_remove, - .shutdown = arcmsr_shutdown Because I was not able to find it upstream I'd like to ask - why do you remove this function ? - is the change posted upstream and where ? Are there also some other changes which are not in the upstream kernel ? From the Kconfig - + on condition that the kernel version is greater than 2.6.19. RHEL5 is based on kernel 2.6.18, does it mean that the SCSI_ARCMSR_AER is not used in your patch and we shouldn't change the Kconfig file ? >With regard to first question, shutdown function should be there.
>Would you mind if I post you a whole new patch?
No need for that I'll post an updated version here hope soon,so you can check it.
Besides that is in the patch something what is not accepted upstream ?
Sorry, I am not very sure what you mean. But the version that I upstream is not applicable to RHEL5.X because there are some changes which are coherent to kernel 2.619 or later. Therefore, I have to modify it to fit the kernel version in RHEL5.X. It's OK that have modified the sources. What I meant are new features not accepted upstream, something like this could cause as problems in the future, that's why I'm asking. I've posted now two modified patches, so please check them. Besides the shutdown function I've removed the changes in Kconfig(arcmsr). Created attachment 285711 [details]
Areca modified patch1
Created attachment 285721 [details]
Modified patch to fix the issue of handling unknown command and let kernel process command timeout
Nick, I've posted it to our list. Have you posted the patch form #11 (The patch to fix the issue of handling unknown command and let kernel process command timeout) to lkml ? I'm not able to find it there. Tomas, Sorry. I am fully occupied by other case currently. I will upstream to kernel org tomorrow. Thank you for your patience, Tomas, I found a lot of changes that I never saw before on git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git. I am still waiting for the response from the scsi maintainer. It will influence the patch that I will submit. Sorry for my delay. Tomas, I have got the response from the maintainer and followed the version on git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git. But I found a questionable segment of code. I need to clarify it. Sorry about that Nick, if you want to change the posted patch please keep in mind that the our deadline is in about one week. (In reply to comment #24) > Sorry. I am fully occupied by other case currently. > I will upstream to kernel org tomorrow. Nick, it's important (it will be better accepted in our list) when it is at least posted on lkml. Have you already done it ? If yes, where can I find the post ? Tomas, It is really complicated. Last week I would submit the patch to kernel.org but I found some changes are not acknowledged by me and than submitted directly to kernel.org. Among the changes, I found two places which are questionable. I tried to discuss with the submitter but don't get the response as yet. Therefore, I have to correct it and then do the long-term test on my site in case of any accidents. I know the deadline is approaching and I will submit my version to kernel.org and Red Hat tomorrow. But I have to remind you the patch to RedHat based on the version that is accepted by kernel.org will be different with the one that I submitted to you. Sorry about that, Created attachment 291262 [details]
patch files of arcmsr
Nick, what's has been changed in your patch ? Are the changes accepted upstream ? And last, but maybe most important - how important are the changes (data corruption,etc..), because I'm now not able to catch our internal beta deadline, so the has to be important to be accepted. Tomas, This is the patch against RHEL5.1. Most of parts are accepted by upstream. But others changes that fix the scsi error handling and unknown commands processing on new model HBA are not accepted. These are extremely important. Nick, I think that unknown command processing is not new https://bugzilla.redhat.com/attachment.cgi?id=285721 and I hope it'll go into RHEL5.2 beta, so as your new hw support. The question is - what is new in your latest patch against the previous stand ? As I've written in Comment #23 your previous patch is (I hope at least) in our RHEl5.2 internal beta. Tomas, The latest patch includes, (1). fix the portability problems (2). fix the iomem release on type B (3). add return -ENOMEM in case of ioremap() failing (4). modify acb->devstate[i][j] as ARECA_RAID_GONE in the initial stage instead of ARECA_RAID_GOOD in arcmsr_alloc_ccb_pool() (5). fix the assignment of arcmsr_cdb->Context as (unsigned long)arcmsr_cdb (6). add the checking state of (outbound_intstatus & ARCMSR_MU_OUTBOUND_HANDLE_INT) == 0 in arcmsr_handle_hba_isr() (7). fix the scsi error handling in arcmsr_polling_hbb_ccbdone() No.1 ~ No.3 are from kernel.org fellow and the others are from us. Of course, the first three have been upstreamed. The rest were submitted by me yesterday. Nick, is some part of your latest patch fixing a problem introduced with your previous patch ? That means - is there a regression in RHEL5.2 compared to RHEL5.1 ? (I'm currently not aware of any problem, I'm only questioning.) Is any of your latest patches needed to fix a problem with 1) data corruption ? 2) system crash/hang ? If you'll find something which corresponds to this, please explain it thoroughly and elaborate a new patch only with the needed part. For the other patches we'll open a new bugzilla and move them into RHEL5.3 Tomas, Previous patch does not introduce any bugs. In other words, the latest patch does not fix the bugs which are provoked by previous patch. It fixes the inherent bugs in arcmsr-1.20.00.15.RH. The patch does fix the probelms with data corruption and system crash/hang in certain cases, such as certain motherboards with multi-devices and less memory and long-term running. These fixes are for these issues. Do you still need more detailed explanation? We wish the patch could be included in RHEL5.2/RHEL4.7, because the fixs are not minor. If you still have any comments, please let me know. Thanks for your time and patience, Nick,
thanks for your effort, but I still have some points.
I think I should explain my situation better: the deadline for internal beta has
passed and only data corruption+system crash/hang fixes are accepted. And it's
not me who accepts/rejects those patches I only post them to our list. For
example I somehow could not follow how
'(1). fix the portability problems' could be causing a system crash.
Please be so patient with me and go through the seven parts of your patch and
either drop the part from your patch or write a explanation why is this
particular part important and how it's causing data corruption/system crash.
Example:
(1). fix the portability problems - several customers reported system hang when
not applied because of ...
(2). .... and so on until (7)
Then remove from the patch the dropped parts and post the reduced version again.
> We wish the patch could be included in RHEL5.2/RHEL4.7, because the fixs are not
> minor.
This bugzilla is for RHEL5.2 if want to include the patch also in RHEL4.7, the
only way is to open a new bugzilla for it, and please do it today as the
planning freeze for 4.7 is (I think) tomorrow. Send me then the bz number, so
I'm aware of it.
Nick, also this - ver_addr = pci_alloc_consistent(acb->pdev, 1032, &buf_handle); - if (!ver_addr) { + tmp = kmalloc(1032, GFP_KERNEL|GFP_DMA); + ver_addr = (unsigned long *)tmp; itself is a change in the right direction (I've sent to you a letter about this Jun.6.2007.) but it looks for me 'only' as a waste of resources and not as an critical error. in 2.6.18-68.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Hi Tomas, About Comment #39, I found if I use pci_alloc_consistent and then open Areca AP, the memory usage will increase gradually. But if I use the kmalloc, it will not. This was ever reported by Falconstor and Pogolinux. Especially for Falconstor, if their AP is open, it will use up the memory overnight and the system will hang. We though it only happens on specific version and the pci_alloc_consistent() was also proposed by the kernel org fellow as well then there is no need to change, but to my surprise it happens again. I check the kernel source for this piece of code of pci_alloc_consistent but found it has been unchanged for a long time, therefore we guest it malfunctions on the memory management or somewhere in the kernel code. This is why this time I decide to upstream to RedHat and kernel org. The most important is it will hang the system for long-term running. Tomas, Thanks for your patience. I would explain one by one and let you decide which is acceptable and which is not. (1). fix the portability problems [Discription] This fixs the endian issue. This will hang the system, which defines big-endian, while I/O. (2). fix the iomem release on type B [Discription] This fixs the io-memory allocation and release issue on Type B. It could cause memory leakage on the heavy load system and leads the system unstable. (3). add return -ENOMEM in case of ioremap() failing [Discription] It fixs the system unstable issue while iomemory allocation fails. (4). modify acb->devstate[i][j] as ARECA_RAID_GONE in the initial stage instead of ARECA_RAID_GOOD in arcmsr_alloc_ccb_pool() [Discription] This fixs the exiting volumes' initial states. The wrong setting could lead to system hang. (5). fix the assignment of arcmsr_cdb->Context as (unsigned long)arcmsr_cdb [Discription] This fixs the scsi command allocation to a predefined structure. It could lead to system unstable while the AP accesses this address. (6). add the checking state of (outbound_intstatus & ARCMSR_MU_OUTBOUND_HANDLE_INT) == 0 in arcmsr_handle_hba_isr() [Discription] This fixs the interrupt routine to handle the controller abnormal interrupts. If not, it could not handle share IRQ and leads to system crash. (7). fix the scsi error handling in arcmsr_polling_hbb_ccbdone() [Discription] It fixs the host scsi error handling callback in case of I/O on failed volumes. If not, the continual I/O on failed volume could hang the system. Nick, thanks for the explanation. (No more questions now.) Please do not forget to create bugzilla for 4.7 even without having the patch right now. Nick, I've made minor changes to your previous patches, it would help me if you could create the patch against our latest sources.I'm going to send the files to you separately asap. Btw. I could swear that I've seen this part somewhere, but now it magically vanished from the patch - ver_addr = pci_alloc_consistent(acb->pdev, 1032, &buf_handle); - if (!ver_addr) { + tmp = kmalloc(1032, GFP_KERNEL|GFP_DMA); + ver_addr = (unsigned long *)tmp; Moving back to MODIFIED. At the request of lwang, I created a new BZ (bug 429877) that should be used for the follow-up patch (The patch proposed after what was committed to -68 kernel). Please move the continuing discussion to bug 429877. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html |