200777 – Mylex extremeRaid 3000 PCI not working

Bug 200777 - Mylex extremeRaid 3000 PCI not working

Summary: Mylex extremeRaid 3000 PCI not working

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	5
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Tom Coughlan
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:	OldNeedsRetesting bzcl34nup
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-07-31 16:06 UTC by Norman R. Weathers
Modified:	2008-05-06 16:10 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-05-06 16:10:51 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
lspci -vv and cat of /proc/rd/c0/current_status for a RH7.3 box for comparison (3.38 KB, text/plain) 2006-11-13 17:30 UTC, Norman R. Weathers	no flags	Details
Messages snippet from a working system (2.4.21 RH7.3) (5.39 KB, text/plain) 2006-12-27 20:05 UTC, Norman R. Weathers	no flags	Details
/proc/interrupts on a working system. (RH7.3 with custom 2.4.21) (533 bytes, text/plain) 2006-12-27 20:08 UTC, Norman R. Weathers	no flags	Details
/proc/interrupts on a non working system (Dell 2650, FC6, 2.6.19 custom kernel) (719 bytes, text/plain) 2006-12-28 15:15 UTC, Norman R. Weathers	no flags	Details
dmesg bootup for a non working system (Dell 2650, FC6, 2.6.19 custom kernel) (22.23 KB, text/plain) 2006-12-28 15:17 UTC, Norman R. Weathers	no flags	Details
Try using parted to partition the base mylex disk (2.23 KB, text/plain) 2006-12-28 15:22 UTC, Norman R. Weathers	no flags	Details
strace of a failed fdisk /dev/rd/c0d0 for Mylex ExtremeRAID 3000 PCI (4.03 KB, text/plain) 2007-02-22 21:18 UTC, Norman R. Weathers	no flags	Details
Here is a /proc/rd/c0/current_status from the "broken" box (3.08 KB, application/octet-stream) 2007-02-22 21:27 UTC, Norman R. Weathers	no flags	Details
View All

Description Norman R. Weathers 2006-07-31 16:06:37 UTC

Description of problem:
The Mylex extremeRaid 3000 PCI based card does not work with FC5.  I don't think
it is working with any 2.6 based kernel.

Version-Release number of selected component (if applicable):
Any 2.6 based kernel

How reproducible:
Always

Steps to Reproduce:
1. Install FC5
2. Try to access a created system drive (ex fdisk /dev/rd/c0d0)
3. Boom.  It gives an illegal seek.
  
Actual results:
Illegal Seek on device

Expected results:
fdisk should open up the device for partitioning.

Additional info:
This same card worked under 2.4 based Red Hat OS's.  In fact, this exact box and
card was running on RH 7.3.  I upgraded the box, and it no longer works.  Other
symptoms include a line speed of 125MB/s when checking
/proc/rd/c0/current_status instead of the familiar 1000MB/s.  I know that this
card works because I installed FreeBSD on this exact box, and I am able to
access the Mylex drives with no problems.  I have tried this exact combination
of cards with several different boxes (Dell 1650, Dell 2650, white box with dual
P4 Xeons and SuperMicro motherboard, Sun Opteron Workstation), and with both FC5
and RHEL 4.3, and have had no luck.  I know that other people are using Mylex
controllers, probably SCSI version, and they are working, but since this is the
last Fibre RAID controller available, it would be nice to have it working like
it used to.

I have also tried generic (vanilla) 2.6 kernels from kernel.org, same results.

Comment 1 Dave Jones 2006-10-16 19:01:46 UTC

A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 2 Norman R. Weathers 2006-11-13 17:30:14 UTC

Created attachment 141078 [details]
lspci -vv and cat of /proc/rd/c0/current_status for a RH7.3 box for comparison

This is a log file from a working RH 7.3 box, kernel 2.4.21 that has its Mylex
3000 PCI Raid card functioning properly.

Comment 3 Norman R. Weathers 2006-11-13 17:41:59 UTC

I have sent an lspci -vv and cat /proc/rd/c0/current_status of a working Mylex
ExtremeRaid 3000 PCI card from a RH 7.3 based box.  Here are some things that I
noticed are different from the 2.4 kernel and the 2.6 kernels.

NOTE:  The Mylex card I am currently using has 64 MB of cache, where the one in
the attachment (id=141078) only has 32 MB cache.

1st)  When the box boots up and goes through POST, this Mylex card says that it
is trying to attach itself to IRQ 11 (Bus=2  DEV/Slot=8  IRQ=11).

2nd)  When a 2.6 based kernel boots up, the Mylex card is found, and all of the
drives are scanned.  The enclosure is enumerated, and even the temperature and
fans of the enclosure are read and checked.  It even sees the current logical
drive (/dev/rd/c0d0) and says it is online.

Here is where I start to see differences:

lspci of 2.4 kernel shows I/O port of bc80 [size=128].  Granted, this is for the
32 MB card.
lspci of 2.6 kernel (64MB card) shows I/O port of dc80 [size=128].  Is this
significant (both the 2.6.15 and the 2.6.18 kernels that I have tested)?

Also, the PCI address mapping space has changed.

In 2.4 kernel, the driver reports the following:
PCI Address: 0xF8000000 mapped at 0xF880D000, IRQ Channel: 20

2.6.15 FC4smp kernel reports the following:
PCI Address: 0xF8000000 mapped at 0xF881A000, IRQ Channel: 11

2.6.18 FC6 install kernel reports the following:
PCI Address: 0xF8000000 mapped at 0xF8826000, IRQ Channel: 201

When you try and fdisk /dev/rd/c0d0, you get an illegal seek.

Also, I double checked.  This card was working before the box was moved to FC4.
 When it didn't work in FC4 in this particular box (A Dell 1650), I placed this
card in a Sun Opteron Workstation (Opteron) running RHEL 4.2 (64 bit), and it
still failed the same way.  I then put the card back into the Dell 1650,
installed NetBSD on it, and the card worked (slowly, but it worked).  I then
tried to install FC6 just last week, thinking that the new kernel may have the
fix, but alas, it still is broken in Linux.

Comment 4 Tom Coughlan 2006-11-21 19:54:37 UTC

This could be related to: 

http://marc.theaimsgroup.com/?l=linux-scsi&m=115281981307012&w=2

It looks like a change was made in this area recently. I'm not sure exactly
which kernel version. 

Is there any kernel error reported when you try fdisk and get an illegal seek?

Comment 5 Norman R. Weathers 2006-11-25 03:54:53 UTC

No, I don't remember any particular errors coming from the kernel... It is
weird.  The card is recognized, but it is not like it is 100% there.  And, it
does not matter what system (node) that I move this card into and test with.  If
I have it in a Sun 64 bit Opteron workstation using a 2.6 kernel (RHEL 4), no
go.  If I have it in a white box system using an Asus motherboard and dual Xeons
running FC2, 3, or 5, no go.  I put it into a Dell 1650 or 2650 using FC4 , 5,
or 6.... No go.

The differences that I see are the I/O range using lscpi, and the IRQ reported
once the card and node have booted, plus the fact that the drives now say they
are 125 MB/s instead of the 1000 MB/s bus that they should be on.   This is
getting real frustrating...

After reading the link provided in comment 4, I could see that this could be a
problem if it was mis identifying the card.  I am not above getting my hands
dirty by looking into the code, but I am an "extreme" neophyte when it comes to
this level of coding...  Where should I start.  Any suggestions would be a great
help (I have a system completely dedicated for this work at this time, but I
don't know how much longer they are going to let me have it).

Thanks again for your help.

Norman Weathers

Comment 6 Tom Coughlan 2006-11-28 18:11:29 UTC

It does seem as though interrupts are being received. Otherwise you would not
see the storage devices being configured. 

Please post 

1. /var/log/messages showing the boot messages, and 
2. /proc/interrupts

from a working and a non-working kernel, on the same hardware if possible. 

It appears as though some people have this working with 2.6...

Comment 7 Norman R. Weathers 2006-12-27 20:05:24 UTC

Created attachment 144414 [details]
Messages snippet from a working system (2.4.21 RH7.3)

Comment 8 Norman R. Weathers 2006-12-27 20:08:11 UTC

Created attachment 144415 [details]
/proc/interrupts on a working system. (RH7.3 with custom 2.4.21)

/proc/interrupts on a working system. (RH7.3 with custom 2.4.21)

Comment 9 Norman R. Weathers 2006-12-27 20:12:32 UTC

There are no errors on a non-working kernel during an fdisk except that it says
that it cannot seek on the device.  Further, all of the drives are showing up as
125 MB/s instead of 1000 MB/s in the /proc/rd/c0/current_status.  I am trying to
get my system back up now (I just tried to reboot with a very old kernel, FC2
based, to see if it would at least recogonize it, but it appears that it is too
old of a kernel).

Also, I tried FC6 with a custom 2.6.19 kernel...  Still no joy.  The devices are
showing up the same, as 125MB/s drives, and still a seek error when trying to
partition the devices.

I will try to get a /var/log/messages and /proc/interrupts from the non working
kernel soon.

Thanks.

Comment 10 Norman R. Weathers 2006-12-28 15:14:14 UTC

I have the non-working dmesg during boot and the /proc/interrupts.  It is
interesting...  The node thinks that it should be on interrupt 18, but during
boot up, the card, during POST, tells me that it is at interrupt 11.  The
function and slot information are correct in both cases, but the interrupt has
changed...

Comment 11 Norman R. Weathers 2006-12-28 15:15:47 UTC

Created attachment 144467 [details]
/proc/interrupts on a non working system (Dell 2650, FC6, 2.6.19 custom kernel)

This is a 2.6.19.1 kernel with the cks2 patch set.  It exhibits the same type
of errors as any "recent" 2.6 kernel.

Comment 12 Norman R. Weathers 2006-12-28 15:17:26 UTC

Created attachment 144468 [details]
dmesg bootup for a non working system (Dell 2650, FC6, 2.6.19 custom kernel)

This is a custom 2.6.19.1 kernel with cks2 patch set.  It exhibits same
problems as all "recent" 2.6 kernels, ie., the Mylex card is not functioning.

Comment 13 Norman R. Weathers 2006-12-28 15:22:12 UTC

Created attachment 144469 [details]
Try using parted to partition the base mylex disk

This is the output from an attempt to run parted on the base Mylex system disk.
 It shows the "Invalid argument during seek" that I get during install or any
other time I try to run a command on the Mylex disk.

Comment 14 Norman R. Weathers 2006-12-28 18:45:26 UTC

I patched together a DAC960 driver from the 2.6.10 base kernel into the 2.6.19.1
with cks2 patch set.  It still has the same issues, which really was of no
surprise since FC2 and FC3 neither one worked with the Mylex extremeRaid 3000
PLUS card.  I am now trying to go clear back to the 2.6.0 drivers and see if I
can somehow squeeze them into the current kernel source and try to get that
driver to work...

Comment 15 Norman R. Weathers 2006-12-28 19:48:19 UTC

I tried the 2.6.0 kernel level driver.  It hung up during boot up (after the
DAC960 driver banner, right either before or after the line containing the IRQ).
 I realize that the driver in the 2.6.0 kernel is 2.5.47, and the driver version
in the later kernels is 2.5.48.  I was able to get the driver compiled, but
there was a warning, and it was about the irq (I remember having chased that one
down quite a ways in the 2.6.10 kernel version of the driver, which runs as
"well" as the 2.6.19 version, ie, the driver sees the array, but the system
drive is not right).

I am about at the level of what I can do here.

I also changed the geometry setting on the card itself, from 2GB to 8GB disk
geometry, no help (although, now the disk geometry shows up as 255/63 instead of
180/32).  I tried passing various combinations of pci and acpi command lines,
trying to see if that was it, still no joy. 

Combinations used:

pci=biosirq,rom,assign-busses

acpi=noirq pci=routeirq

Still nothing.

Comment 16 Norman R. Weathers 2007-02-15 19:17:55 UTC

Has this bug gone anywhere?  It is still not working as of 2.6.19.2 (I haven't
tried the 2.6.20 vanilla kernel).

Thanks.

Comment 17 Tom Coughlan 2007-02-22 19:29:00 UTC

(In reply to comment #16)
> Has this bug gone anywhere?  

I have looked at the logs, but I don't see anything that points to the problem.
It seems odd to have no kernel I/O error messages when you try to do I/O, yet
the I/Os appear to be failing.  

If you are still willing and able to try something, let's see if a simple
command like badblocks fails. If it does, get an strace and post it.

Start with a really simple read test:

badblocks -v -b512 /dev/rd/c0d0 1

increase number of blocks, and remove -b, until you get a failure. If none, add
a write to the test:

badblocks -vw -b512 /dev/rd/c0d0 1

When you get a failure, then remove -v and get an strace: 

strace badblocks -b512 /dev/rd/c0d0 1

Hopefully this will indicate where the problem is. 

Tom

Comment 18 Norman R. Weathers 2007-02-22 21:18:13 UTC

Created attachment 148633 [details]
strace of a failed fdisk /dev/rd/c0d0 for Mylex ExtremeRAID 3000 PCI

Here is an strace of the fdisk /dev/rd/c0d0.  Notice the EINVAL during the
_llseek.  The output I get from doing the fdisk is one of "Unable to seek".

Comment 19 Norman R. Weathers 2007-02-22 21:20:18 UTC

For comment #18, uname -a is:

Linux hoepld25 2.6.18-1.2869.fc6 #1 SMP Wed Dec 20 14:51:19 EST 2006 i686 i686
i386 GNU/Linux

And I get the same error during fdisk on any recent kernels (2.6.19 custom,
2.6.19 FC6 kernel).

Comment 20 Norman R. Weathers 2007-02-22 21:27:27 UTC

Created attachment 148634 [details]
Here is a /proc/rd/c0/current_status from the "broken" box

Please compare this /proc/rd/c0/current_status to the attachment # 141078 [details]. 
This is a broken current_status.  Notice how in this current_status the drives
are saying that they are 125 MB/s, and on the RH73 boxes they are saying that
they are 1000 MB/s drives.  The RH73 boxes are the ones that work, while the FC
(any kernel 2.6 based) builds do not work.

Comment 21 Jon Stanley 2008-03-31 18:31:16 UTC

Removing NeedsRetesting from whiteboard so we can repurpose it.

Comment 22 Bug Zapper 2008-04-04 03:24:48 UTC

Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 23 Bug Zapper 2008-05-06 16:10:49 UTC

This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.