Description of problem: The Mylex extremeRaid 3000 PCI based card does not work with FC5. I don't think it is working with any 2.6 based kernel. Version-Release number of selected component (if applicable): Any 2.6 based kernel How reproducible: Always Steps to Reproduce: 1. Install FC5 2. Try to access a created system drive (ex fdisk /dev/rd/c0d0) 3. Boom. It gives an illegal seek. Actual results: Illegal Seek on device Expected results: fdisk should open up the device for partitioning. Additional info: This same card worked under 2.4 based Red Hat OS's. In fact, this exact box and card was running on RH 7.3. I upgraded the box, and it no longer works. Other symptoms include a line speed of 125MB/s when checking /proc/rd/c0/current_status instead of the familiar 1000MB/s. I know that this card works because I installed FreeBSD on this exact box, and I am able to access the Mylex drives with no problems. I have tried this exact combination of cards with several different boxes (Dell 1650, Dell 2650, white box with dual P4 Xeons and SuperMicro motherboard, Sun Opteron Workstation), and with both FC5 and RHEL 4.3, and have had no luck. I know that other people are using Mylex controllers, probably SCSI version, and they are working, but since this is the last Fibre RAID controller available, it would be nice to have it working like it used to. I have also tried generic (vanilla) 2.6 kernels from kernel.org, same results.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Created attachment 141078 [details] lspci -vv and cat of /proc/rd/c0/current_status for a RH7.3 box for comparison This is a log file from a working RH 7.3 box, kernel 2.4.21 that has its Mylex 3000 PCI Raid card functioning properly.
I have sent an lspci -vv and cat /proc/rd/c0/current_status of a working Mylex ExtremeRaid 3000 PCI card from a RH 7.3 based box. Here are some things that I noticed are different from the 2.4 kernel and the 2.6 kernels. NOTE: The Mylex card I am currently using has 64 MB of cache, where the one in the attachment (id=141078) only has 32 MB cache. 1st) When the box boots up and goes through POST, this Mylex card says that it is trying to attach itself to IRQ 11 (Bus=2 DEV/Slot=8 IRQ=11). 2nd) When a 2.6 based kernel boots up, the Mylex card is found, and all of the drives are scanned. The enclosure is enumerated, and even the temperature and fans of the enclosure are read and checked. It even sees the current logical drive (/dev/rd/c0d0) and says it is online. Here is where I start to see differences: lspci of 2.4 kernel shows I/O port of bc80 [size=128]. Granted, this is for the 32 MB card. lspci of 2.6 kernel (64MB card) shows I/O port of dc80 [size=128]. Is this significant (both the 2.6.15 and the 2.6.18 kernels that I have tested)? Also, the PCI address mapping space has changed. In 2.4 kernel, the driver reports the following: PCI Address: 0xF8000000 mapped at 0xF880D000, IRQ Channel: 20 2.6.15 FC4smp kernel reports the following: PCI Address: 0xF8000000 mapped at 0xF881A000, IRQ Channel: 11 2.6.18 FC6 install kernel reports the following: PCI Address: 0xF8000000 mapped at 0xF8826000, IRQ Channel: 201 When you try and fdisk /dev/rd/c0d0, you get an illegal seek. Also, I double checked. This card was working before the box was moved to FC4. When it didn't work in FC4 in this particular box (A Dell 1650), I placed this card in a Sun Opteron Workstation (Opteron) running RHEL 4.2 (64 bit), and it still failed the same way. I then put the card back into the Dell 1650, installed NetBSD on it, and the card worked (slowly, but it worked). I then tried to install FC6 just last week, thinking that the new kernel may have the fix, but alas, it still is broken in Linux.
This could be related to: http://marc.theaimsgroup.com/?l=linux-scsi&m=115281981307012&w=2 It looks like a change was made in this area recently. I'm not sure exactly which kernel version. Is there any kernel error reported when you try fdisk and get an illegal seek?
No, I don't remember any particular errors coming from the kernel... It is weird. The card is recognized, but it is not like it is 100% there. And, it does not matter what system (node) that I move this card into and test with. If I have it in a Sun 64 bit Opteron workstation using a 2.6 kernel (RHEL 4), no go. If I have it in a white box system using an Asus motherboard and dual Xeons running FC2, 3, or 5, no go. I put it into a Dell 1650 or 2650 using FC4 , 5, or 6.... No go. The differences that I see are the I/O range using lscpi, and the IRQ reported once the card and node have booted, plus the fact that the drives now say they are 125 MB/s instead of the 1000 MB/s bus that they should be on. This is getting real frustrating... After reading the link provided in comment 4, I could see that this could be a problem if it was mis identifying the card. I am not above getting my hands dirty by looking into the code, but I am an "extreme" neophyte when it comes to this level of coding... Where should I start. Any suggestions would be a great help (I have a system completely dedicated for this work at this time, but I don't know how much longer they are going to let me have it). Thanks again for your help. Norman Weathers
It does seem as though interrupts are being received. Otherwise you would not see the storage devices being configured. Please post 1. /var/log/messages showing the boot messages, and 2. /proc/interrupts from a working and a non-working kernel, on the same hardware if possible. It appears as though some people have this working with 2.6...
Created attachment 144414 [details] Messages snippet from a working system (2.4.21 RH7.3)
Created attachment 144415 [details] /proc/interrupts on a working system. (RH7.3 with custom 2.4.21) /proc/interrupts on a working system. (RH7.3 with custom 2.4.21)
There are no errors on a non-working kernel during an fdisk except that it says that it cannot seek on the device. Further, all of the drives are showing up as 125 MB/s instead of 1000 MB/s in the /proc/rd/c0/current_status. I am trying to get my system back up now (I just tried to reboot with a very old kernel, FC2 based, to see if it would at least recogonize it, but it appears that it is too old of a kernel). Also, I tried FC6 with a custom 2.6.19 kernel... Still no joy. The devices are showing up the same, as 125MB/s drives, and still a seek error when trying to partition the devices. I will try to get a /var/log/messages and /proc/interrupts from the non working kernel soon. Thanks.
I have the non-working dmesg during boot and the /proc/interrupts. It is interesting... The node thinks that it should be on interrupt 18, but during boot up, the card, during POST, tells me that it is at interrupt 11. The function and slot information are correct in both cases, but the interrupt has changed...
Created attachment 144467 [details] /proc/interrupts on a non working system (Dell 2650, FC6, 2.6.19 custom kernel) This is a 2.6.19.1 kernel with the cks2 patch set. It exhibits the same type of errors as any "recent" 2.6 kernel.
Created attachment 144468 [details] dmesg bootup for a non working system (Dell 2650, FC6, 2.6.19 custom kernel) This is a custom 2.6.19.1 kernel with cks2 patch set. It exhibits same problems as all "recent" 2.6 kernels, ie., the Mylex card is not functioning.
Created attachment 144469 [details] Try using parted to partition the base mylex disk This is the output from an attempt to run parted on the base Mylex system disk. It shows the "Invalid argument during seek" that I get during install or any other time I try to run a command on the Mylex disk.
I patched together a DAC960 driver from the 2.6.10 base kernel into the 2.6.19.1 with cks2 patch set. It still has the same issues, which really was of no surprise since FC2 and FC3 neither one worked with the Mylex extremeRaid 3000 PLUS card. I am now trying to go clear back to the 2.6.0 drivers and see if I can somehow squeeze them into the current kernel source and try to get that driver to work...
I tried the 2.6.0 kernel level driver. It hung up during boot up (after the DAC960 driver banner, right either before or after the line containing the IRQ). I realize that the driver in the 2.6.0 kernel is 2.5.47, and the driver version in the later kernels is 2.5.48. I was able to get the driver compiled, but there was a warning, and it was about the irq (I remember having chased that one down quite a ways in the 2.6.10 kernel version of the driver, which runs as "well" as the 2.6.19 version, ie, the driver sees the array, but the system drive is not right). I am about at the level of what I can do here. I also changed the geometry setting on the card itself, from 2GB to 8GB disk geometry, no help (although, now the disk geometry shows up as 255/63 instead of 180/32). I tried passing various combinations of pci and acpi command lines, trying to see if that was it, still no joy. Combinations used: pci=biosirq,rom,assign-busses acpi=noirq pci=routeirq Still nothing.
Has this bug gone anywhere? It is still not working as of 2.6.19.2 (I haven't tried the 2.6.20 vanilla kernel). Thanks.
(In reply to comment #16) > Has this bug gone anywhere? I have looked at the logs, but I don't see anything that points to the problem. It seems odd to have no kernel I/O error messages when you try to do I/O, yet the I/Os appear to be failing. If you are still willing and able to try something, let's see if a simple command like badblocks fails. If it does, get an strace and post it. Start with a really simple read test: badblocks -v -b512 /dev/rd/c0d0 1 increase number of blocks, and remove -b, until you get a failure. If none, add a write to the test: badblocks -vw -b512 /dev/rd/c0d0 1 When you get a failure, then remove -v and get an strace: strace badblocks -b512 /dev/rd/c0d0 1 Hopefully this will indicate where the problem is. Tom
Created attachment 148633 [details] strace of a failed fdisk /dev/rd/c0d0 for Mylex ExtremeRAID 3000 PCI Here is an strace of the fdisk /dev/rd/c0d0. Notice the EINVAL during the _llseek. The output I get from doing the fdisk is one of "Unable to seek".
For comment #18, uname -a is: Linux hoepld25 2.6.18-1.2869.fc6 #1 SMP Wed Dec 20 14:51:19 EST 2006 i686 i686 i386 GNU/Linux And I get the same error during fdisk on any recent kernels (2.6.19 custom, 2.6.19 FC6 kernel).
Created attachment 148634 [details] Here is a /proc/rd/c0/current_status from the "broken" box Please compare this /proc/rd/c0/current_status to the attachment # 141078 [details]. This is a broken current_status. Notice how in this current_status the drives are saying that they are 125 MB/s, and on the RH73 boxes they are saying that they are 1000 MB/s drives. The RH73 boxes are the ones that work, while the FC (any kernel 2.6 based) builds do not work.
Removing NeedsRetesting from whiteboard so we can repurpose it.
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.