Bug 242254

Summary: New firewire stack only recognizing half of a chain of drives
Product: [Fedora] Fedora Reporter: George Shearer <doc>
Component: kernelAssignee: Kristian Høgsberg <krh>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: low    
Version: 7CC: cebbert, chris.brown, davej, jarod, stefan-r-rhbz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-04-25 03:52:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
bootlog of 2.6.22.1-33.local.fc7 none

Description George Shearer 2007-06-02 18:56:31 UTC
Description of problem:

I have a Norco DS-1210 Firewire chassis. It has 12 slots for PATA drives. The
chassis converts all of these drives to nodes on a single 1394a chain.
Interestingly, the PATA drives must be used in master and slave pairs. In other
words, odd numbered drives (1,3,5,7,9,11) are set to 'master mode', while even
numbered drives (2,4,6,8,10,12) are set to 'slave mode'.

Kernels prior to 2.6.21 recognize all drives perfectly, and present them as
/dev/sdf through /dev/sdp.

Fedora 7 with 2.6.21 only recognizes my odd-numbered drives and presents them as
/dev/sdf through /dev/sdk.


Version-Release number of selected component (if applicable):


How reproducible:

Consistently.

Steps to Reproduce:
1. Boot F7 base install.
2.
3.
  
Actual results:

New FW code appears to lack the ability to deal with master AND slave based
1394a/PATA bridge chips.

Expected results:

All drives recognized and accessable.

Additional info:

Comment 1 Stefan Richter 2007-06-10 12:13:25 UTC
This is a known bug (or missing feature) of the new fw-sbp2 driver.  On devices
with multiple logical units (in the same unit directory of the ROM), only one
unit is used.  Many if not all FireWire-IDE bridges which support two IDE
devices on the same IDE channel are of this type.

One cause that this is not working with fw-sbp2 (and didn't work with the old
stack before Linux 2.6.12 either) is that the SBP-2 transport specification has
additional provisions on top of IEEE 1212's generic scheme to represent
multi-unit devices.

Comment 2 Stefan Richter 2007-06-10 12:17:04 UTC
PS, regarding the impact of this bug:  The popular dual-disk bridges with
RAID-1/-0/JBOD implemented in firmware are of course not affected.  They only
show one logical unit.

Comment 3 George Shearer 2007-06-14 13:36:41 UTC
Sadly, I must report that the status of this issue has not changed with the
recent release of kernel 2.6.21-1.3194.fc7.

Comment 4 George Shearer 2007-06-18 20:00:40 UTC
Still a problem with 2.6.21-1.3228.fc7 :(


Comment 5 Stefan Richter 2007-06-18 20:23:02 UTC
I'm sure Kristian will add a notification here as soon as an update package with
the necessary fix is available.  If you need an intermediary solution for the
short term, you need to install a kernel which has the older drivers ieee1394,
ohci1394, and sbp2 enabled.

Comment 6 Stefan Richter 2007-06-18 20:29:11 UTC
PS: I am currently doing a little bit of work with the fw-sbp2 driver (mainly on
weekends, and not in affiliation with Fedora or Red Hat) and will post a patch
here when I got one.

Comment 7 George Shearer 2007-07-21 08:23:59 UTC
kernel-2.6.22.1-27.fc7 == :-(


Comment 8 Stefan Richter 2007-07-22 00:39:48 UTC
I'm in the middle of implementing it, but I'm currently slowed down by other
work. I will post here when I'm through with it unless somebody else gets it
done faster.

Comment 9 Stefan Richter 2007-07-25 23:57:56 UTC
Here is my first take: 
http://thread.gmane.org/gmane.linux.kernel.firewire.devel/10453

The patches are also temporarily available at
http://me.in-berlin.de/~s5r6/linux1394/pending/.

Comment 10 George Shearer 2007-07-26 01:04:54 UTC
(In reply to comment #9)
> Here is my first take: 
> http://thread.gmane.org/gmane.linux.kernel.firewire.devel/10453
> 
> The patches are also temporarily available at
> http://me.in-berlin.de/~s5r6/linux1394/pending/.

Thank you very much. I am eager to try the patches, unfortunately I am traveling
at the moment. I will be home on Tuesday Jul 31st and will try them then. Thanks
again!


Comment 11 Stefan Richter 2007-07-29 19:34:09 UTC
I fixed a few errors in my patches.  You can get them for 2.6.23-rc1 or later
from git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6.git or
for 2.6.23-rc1, 2.6.22 and a few older kernels from
http://me.in-berlin.de/~s5r6/linux1394/updates/.  I don't know which patchset
would be best to apply on top of the Fedora kernel sources.

Comment 12 George Shearer 2007-07-31 19:29:10 UTC
Created attachment 160351 [details]
bootlog of 2.6.22.1-33.local.fc7

This is the bootlog of the latest fedora released i686 kernel, with Stephan's
2.6.22 patch applied. Looks like it recognizes all drives. However, I see lots
of 'attempt to access beyond end of device' errors which doesn't sound good to
me.

Comment 13 Stefan Richter 2007-07-31 19:48:48 UTC
You have one Seagate ST350063, correctly recognized as 500 GB disk, a few Maxtor
6 Y250P0 and Maxtor 6 B250R0, correctly recognized as 250 GB disks, and two
Samsung SP2514N, which are incorrectly recognized as 32 GiB disks.

Did you change the jumpers on the Samsung disks some time after you partitioned
them?  Many HDDs can be jumpered to pretend a 32 GiB limit for old BIOSes, and
maybe you accidentally enabled that limitation.

Comment 14 Stefan Richter 2007-07-31 19:51:19 UTC
PS:  Note these log lines:

scsi 8:0:1:1: Direct-Access-RBC SAMSUNG  SP2514N          VF10 PQ: 0 ANSI: 4
sd 8:0:1:1: [sdh] 66055248 512-byte hardware sectors (33820 MB)
...
 sdh: sdh1
 sdh: p1 exceeds device capacity


Comment 15 George Shearer 2007-08-01 01:30:36 UTC
Ugh! I should have looked closer. Your hunch was correct. The instructions on
the samsung drives are a bit misleading. So I downloaded the PDF from the
manufacturer's website, which clearly indicates the correct settings. The
problem has been fixed.

Humorously enough, they've always been set this way and I've never had a problem
out of them. Looks like the new driver is much better at reporting such issues.

As a side note, I unmounted all of my firewire drives and then extracted the two
drives from the chassis to change the jumper. Unfortunately, the kernel panic'd
in this process, which never happened with the old driver during hot swapping.


Comment 16 Stefan Richter 2007-08-01 07:02:41 UTC
In theory, the capacity should have been detected and checked the same with the
old drivers.  It happens at the SCSI level, transparent to the FireWire drivers.

Does the hotswap mechanism detach the disks from IDE, or did the detachment
happen on the FireWire side of the FireWire-IDE bridge board?

Comment 17 Stefan Richter 2007-08-01 07:40:17 UTC
PS, re capacity:  Maybe something else in the FS, block IO, or SCSI code or
configuration changed alongside with the FireWire drivers.  The differences in
the FireWire drivers, as far as I am aware of them, seem to me very unlikely to
be related.

PS, re panic:  The pictures and manual from http://www.norcotek.com/ looks like
the physical disconnection happens on the IDE side.  A kernel panic message
would be good to have to debug this.  Alas the backplane in the enclosure looks
like there is no other source of the bridge board(s) to get a test sample than
Norco.

Comment 18 George Shearer 2007-08-01 14:38:38 UTC
afaik, this norco box uses oxford911 bridge chips, though it uses both the
primary & slave portions which I've found to be unique. At any rate, it seems
that this particular problem has been solved. Thank you for taking this on. All
of my drives now work again, and it's nice to see all of them under the same
scsi process. When I get time I'll attempt to reproduce the panic and capture
the kernel output with serial console, and open a new ticket.

Comment 19 Stefan Richter 2007-08-01 15:39:52 UTC
I've got an old 911 and newer 922 and 912 based enclosures but they don't have
hotpluggable IDE headers.  It would probably not be a good idea to attempt IDE
hotplugging with them... :-)

Regarding the dual disk recognition, there are some refinements that Kristian
suggested to me which I will implement sometime soon, and then the Fedora kernel
maintainers have to merge the patch(es) in one of their next kernel package
updates.  I hope they will inform you here when they released that update.

Comment 20 Kristian Høgsberg 2007-08-27 15:18:44 UTC
I added the patch and started a build:

  http://koji.fedoraproject.org/koji/taskinfo?taskID

It will be available as kernel-2.6.23-0.140.rc3.git10.fc8 when it's done.  You
can download the build from that page or wait for tomorrows rawhide.  Please
give it a try and let us know if it works for you.

Thanks (and to you too, Stefan)
Kristian

Comment 21 George Shearer 2007-10-02 16:58:06 UTC
These fixes have not made it into a released F7 kernel yet.. Whats the process
to make that happen? 

Comment 22 Chuck Ebbert 2007-10-02 17:30:59 UTC
That patch does not apply to kernel 2.6.22.

Comment 23 George Shearer 2007-10-02 17:45:35 UTC
(In reply to comment #22)
> That patch does not apply to kernel 2.6.22.

I'm running a kernel I built myself using Stephan's patches.. and it's 2.6.22.
See comment #12. No reason why this can't be included in an F7 official kernel
release. This is a pretty major bug for those of us who rely on large Firewire
arrays.


Comment 24 Stefan Richter 2007-10-02 20:31:15 UTC
The patch will make it into mainline (kernel.org's kernel) in 2.6.24-rc1.  Even
though it is more or less a bug fix, I decided against pushing the patch to
Linus earlier because the patch has a huge line count and modifies core data
structures of the firewire-sbp2 driver.  Surely, distributors who switched to
the new stack may consider to incorporate the patch into their 2.6.{22,23} based
kernels.  I am quite confident that the patch is correct and safe.  (Famous last
words.)

If you guys think about taking it into an FC7 kernel, you could either wait a
few days until Linus pulled all post-2.6.23 driver updates, then look at
firewire-sbp2's history in Linus' tree to grab relevant patches on which the
multi LU patch depends on.  Or you could have a look at my personal site (see
comment #11) to get a picture of the patch queue.  (kernel.org's
linux1394-2.6.git is somewhat messy at the moment and will change soon after
Linus releases 2.6.23, therefore this git tree is not so well suited to pick
backport candidates.)

Comment 25 George Shearer 2007-11-05 00:41:46 UTC
I recently updated to the latest F7 kernel 2.6.23.1-10.fc7.i686. All of my FW
drives are recognized properly. 

I can read reliably from any firewire drive. I can write reliably to any
firewire drive as well. However, if I attempt to do both simultaneously either
to the same drive or different drives, a kernel panic will happen. :(



Comment 26 Chuck Ebbert 2007-11-05 20:23:31 UTC
Can you post the panic?


Comment 27 Christopher Brown 2008-01-09 01:00:13 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

Are you still having this issue and if so could you attach the kernel panic as
text/plain to this bug.

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Comment 28 Jarod Wilson 2008-03-05 21:52:06 UTC
Can the kernel panic still be reproduced with the latest kernel available in the
updates repo?

Comment 29 Brian Powell 2008-04-25 03:52:24 UTC
Note that maintenance for Fedora 7 will end 30 days after the GA of Fedora 9.

Comment 30 Brian Powell 2008-04-25 04:03:42 UTC
The information we've requested above is required in order
to review this problem report further and diagnose/fix the
issue if it is still present.  Since there have not been any
updates to the report since thirty (30) days or more since we
requested additional information, we're assuming the problem
is either no longer present in the current Fedora release, or
that there is no longer any interest in tracking the problem.

Setting status to "CLOSED INSUFFICIENT_DATA".  If you still
experience this problem after updating to our latest Fedora
release and can provide the information previously requested, 
please feel free to reopen the bug report.

Thank you in advance.

Note that maintenance for Fedora 7 will end 30 days after the GA of Fedora 9.