Bug 66885 - Tape drive and HD"s hangs
Summary: Tape drive and HD"s hangs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-06-18 02:11 UTC by Tom Tilmant
Modified: 2008-08-01 16:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:39:40 UTC
Embargoed:


Attachments (Terms of Use)

Description Tom Tilmant 2002-06-18 02:11:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Description of problem:
After istalling cleaning 7.3 with the aic7xxx module, I receive the following 
errors when trying to restore files from a tap backup.

Jun 17 18:26:31 ns kernel: (scsi0:A:5:0): Unexpected busfree in Data-in phase
Jun 17 18:26:31 ns kernel: SEQADDR == 0xa6
Jun 17 18:26:31 ns kernel: st0: Error 70000 (sugg. bt 0x0, driver bt 0x0, host 
bt 0x7).
Jun 17 18:26:31 ns kernel: st0: Error 10000 (sugg. bt 0x0, driver bt 0x0, host 
bt 0x1).
Jun 17 18:26:31 ns kernel: st0: Error 10000 (sugg. bt 0x0, driver bt 0x0, host 
bt 0x1).

Then after shutting down the backup application, The hard drives hang for 
aprox 3 minutes and then display the following error:

Jun 16 22:51:23 ns kernel: scsi0:0:0:0: Attempting to queue an ABORT message
Jun 16 22:51:23 ns kernel: scsi0:0:0:0: Command not found
Jun 16 22:51:23 ns kernel: aic7xxx_abort returns 0x2002

I have tried it with the tar command and Arkeia backup software.  Don't think 
the problem with either since the HD's hang at least once a day without the 
Backup apps running.

I have tried Adaptec suggestions of different cables, terminators, different 
settings, and even an new 29160, same results.  I went back to 7.2 and saw the 
same results, guess I never had to restore since I had upgraded to that 
version :-)  One other note, I can backup to tape without any problems.  I am 
using RH raid 0, but have tried it without raid too.  

Looks to me to be a problem with aic7xxx driver.  This driver as never been 
the same since 7.0 when you change the OS and the tape was nolonger supported 
under the old driver forcing me to use the new mod driver.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.use a adaptec 29160 with HP DLT8000 drive and two quantum atlas 34g drive

2. Install 7.3 using the aic7xxx (not the old drive)
3. tar to tape until it is full
4. restore all files, error will happen about 30-40 minutes into restore.
	

Expected Results:  A fix to the aic7xxx driver.

Additional info:

Comment 1 Tom Tilmant 2002-06-18 04:00:52 UTC
Just received another error message, loks like there is more info.  Here it is:

un 17 20:54:36 ns kernel: scsi0:0:0:0: Attempting to queue an ABORT message
Jun 17 20:54:36 ns kernel: scsi0: Dumping Card State while idle, at SEQADDR 0x9
Jun 17 20:54:36 ns kernel: ACCUM = 0x0, SINDEX = 0x7, DINDEX = 0xe4, ARG_2 = 
0x0
Jun 17 20:54:36 ns kernel: HCNT = 0x0 SCBPTR = 0x11
Jun 17 20:54:36 ns kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Jun 17 20:54:36 ns kernel:  DFCNTRL = 0x0, DFSTATUS = 0x89
Jun 17 20:54:36 ns kernel: LASTPHASE = 0x1, SCSISIGI = 0x0, SXFRCTL0 = 0x80
Jun 17 20:54:36 ns kernel: SSTAT0 = 0x0, SSTAT1 = 0x8
Jun 17 20:54:36 ns kernel: SCSIPHASE = 0x0
Jun 17 20:54:38 ns kernel: STACK == 0x3, 0x108, 0x160, 0x0
Jun 17 20:54:38 ns kernel: SCB count = 96
Jun 17 20:54:43 ns kernel: Kernel NEXTQSCB = 18
Jun 17 20:54:43 ns kernel: Card NEXTQSCB = 18
Jun 17 20:54:43 ns kernel: QINFIFO entries: 
Jun 17 20:54:44 ns kernel: Waiting Queue entries: 
Jun 17 20:54:44 ns kernel: Disconnected Queue entries: 8:15 
Jun 17 20:54:44 ns kernel: QOUTFIFO entries: 
Jun 17 20:54:45 ns kernel: Sequencer Free SCB List: 17 26 1 14 6 24 19 16 28 
21 29 25 27 3 12 4 22 15 0 20 5 31 23 13 18 11 2 30 7 10 9 
Jun 17 20:54:45 ns kernel: Sequencer SCB Info: 0(c 0x60, s 0x7, l 0, t 0xff) 1
(c 0x60, s 0x7, l 0, t 0xff) 2(c 0x60, s 0x7, l 0, t 0xff) 3(c 0x60, s 0x7, l 
0, t 0xff) 4(c 0x60, s 0x7, l 0, t 0xff) 5(c 0x60, s 0x7, l 0, t 0xff) 6(c 
0x60, s 0x7, l 0, t 0xff) 7(c 0x60, s 0x7, l 0, t 0xff) 8(c 0x64, s 0x7, l 0, 
t 0xf) 9(c 0x60, s 0x7, l 0, t 0xff) 10(c 0x60, s 0x7, l 0, t 0xff) 11(c 0x60, 
s 0x7, l 0, t 0xff) 12(c 0x60, s 0x7, l 0, t 0xff) 13(c 0x60, s 0x7, l 0, t 
0xff) 14(c 0x60, s 0xc7, l 0, t 0xff) 15(c 0x60, s 0x7, l 0, t 0xff) 16(c 0x0, 
s 0x57, l 0, t 0xff) 17(c 0x60, s 0x7, l 0, t 0xff) 18(c 0x60, s 0x7, l 0, t 
0xff) 19(c 0x0, s 0x57, l 0, t 0xff) 20(c 0x60, s 0x7, l 0, t 0xff) 21(c 0x60, 
s 0x7, l 0, t 0xff) 22(c 0x60, s 0x7, l 0, t 0xff) 23(c 0x60, s 0x7, l 0, t 
0xff) 24(c 0x0, s 0x57, l 0, t 0xff) 25(c 0x60, s 0x7, l 0, t 0xff) 26(c 0x60, 
s 0x7, l 0, t 0xff) 27(c 0x60, s 0x7, l 0, t 0xff) 28(c 0x60, s 0x7, l 0, t 
0xff) 29(c 0x60, s 0x7, l 0, t 0xff) 30(c 0x60, s 0x7, l 0, t 0xff) 31(c 0x60, 
s 0x7, l 0, t
Jun 17 20:54:45 ns kernel: 0xff) 
Jun 17 20:54:47 ns kernel: Pending list: 15(c 0x60, s 0x7, l 0)
Jun 17 20:54:47 ns kernel: Kernel Free SCB list: 7 60 85 42 35 29 32 13 14 41 
37 46 38 66 61 5 63 11 54 30 12 25 65 0 40 17 58 28 19 50 8 55 48 3 57 20 34 
45 94 31 26 2 24 44 90 21 23 1 33 39 6 49 84 67 47 4 59 62 51 52 10 16 27 56 
36 91 88 89 95 93 64 9 43 86 87 80 81 82 83 76 77 78 79 72 73 74 75 68 69 70 
71 53 22 92 
Jun 17 20:54:48 ns kernel: DevQ(0:0:0): 0 waiting
Jun 17 20:54:48 ns kernel: DevQ(0:5:0): 0 waiting
Jun 17 20:54:48 ns kernel: DevQ(0:12:0): 0 waiting
Jun 17 20:54:48 ns kernel: (scsi0:A:0:0): Queuing a recovery SCB
Jun 17 20:54:49 ns kernel: scsi0:0:0:0: Device is disconnected, re-queuing SCB
Jun 17 20:54:49 ns kernel: Recovery code sleeping
Jun 17 20:54:49 ns kernel: (scsi0:A:0:0): Abort Tag Message Sent
Jun 17 20:54:51 ns kernel: (scsi0:A:0:0): SCB 15 - Abort Tag Completed.
Jun 17 20:54:51 ns kernel: Recovery SCB completes
Jun 17 20:54:51 ns kernel: Recovery code awake
Jun 17 20:54:51 ns kernel: aic7xxx_abort returns 0x2002
Jun 17 20:54:52 ns kernel: scsi0:0:0:0: Attempting to queue an ABORT message
Jun 17 20:54:52 ns kernel: scsi0:0:0:0: Command not found
Jun 17 20:54:52 ns kernel: aic7xxx_abort returns 0x2002

Comment 2 Ngo Than 2002-06-18 10:55:15 UTC
assigned to component kernel.

Comment 3 Doug Ledford 2002-06-26 22:54:01 UTC
Justin, do you have any input on this one?

Comment 4 Michael Abboud 2002-10-17 23:30:32 UTC
We are seeing similar behavior and similar messages during backups to a Compaq 
SDLT.  I can provide the full sequencer code dump on request, but in our case 
it starts as follows:

Oct  3 03:32:04 castor kernel: scsi0:0:6:0: Attempting to queue an ABORT message
Oct  3 03:32:04 castor kernel: scsi0: Dumping Card State in Command phase, at 
SEQADDR 0x168
.
.
and finishes with:

Oct  3 03:32:04 castor kernel: Pending list: 2(c 0x40, s 0x67, l 0)
Oct  3 03:32:04 castor kernel: Kernel Free SCB list: 1 0
Oct  3 03:32:04 castor kernel: Untagged Q(6): 2
Oct  3 03:32:04 castor kernel: DevQ(0:6:0): 0 waiting
Oct  3 03:32:04 castor kernel: scsi0:0:6:0: Device is active, asserting ATN
Oct  3 03:32:04 castor kernel: Recovery code sleeping
Oct  3 03:32:04 castor kernel: (scsi0:A:6:0): Abort Message Sent
Oct  3 03:32:04 castor kernel: (scsi0:A:6:0): SCB 2 - Abort Completed.
Oct  3 03:32:04 castor kernel: Recovery SCB completes
Oct  3 03:32:04 castor kernel: Recovery code awake
Oct  3 03:32:04 castor kernel: aic7xxx_abort returns 0x2002

The kernel version we are using is:

2.4.18-10smp

Relevant dmesg output:

SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.6
        <Adaptec (Compaq OEM) 3960D Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.6
        <Adaptec (Compaq OEM) 3960D Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

  Vendor: COMPAQ    Model: SuperDLT1         Rev: 2E2E
  Type:   Sequential-Access                  ANSI SCSI revision: 02

The tape unit has been replaced, and its firmware upgraded to the latest level 
by Compaq.  No dice.  Also, the tape unit was attached to a second, 
indentically configured system (RH7.3, same kernel) and sure enough, the 
backups fail the same way.

Underlying hardware is Compaq ML 370 G2, with twin CPUS.

Michael.

Comment 5 Need Real Name 2002-12-04 18:14:10 UTC
As with others, please review bug 75916 or http://www.linuxtapecert.org.  This
is not a kernel or Linux issue.  Tape drives and Adaptec 789x chipsets are not
getting along.

Tim


Comment 6 Bugzilla owner 2004-09-30 15:39:40 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.