Bug 75916 - 'dump' hangs while backing up filesystem to tape
Summary: 'dump' hangs while backing up filesystem to tape
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 8.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-10-14 23:24 UTC by Mark Harig
Modified: 2008-08-01 16:22 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:40:04 UTC
Embargoed:


Attachments (Terms of Use)
The ChangeLog for the 'ips' SCSI device driver (5.44 KB, text/plain)
2003-07-30 23:13 UTC, Mark Harig
no flags Details
The source file 'ips.c' for version 6.00.00 of the 'ips' device driver. (278.18 KB, text/plain)
2003-07-30 23:14 UTC, Mark Harig
no flags Details
The source file 'ips.h' for version 6.00.00 of the 'ips' device driver (43.23 KB, text/plain)
2003-07-30 23:19 UTC, Mark Harig
no flags Details

Description Mark Harig 2002-10-14 23:24:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

Description of problem:
The 'dump' program hangs after displaying the following messages:

# dump -0u -f /dev/st0 /
  DUMP: Date of this level 0 dump: Mon Oct 14 18:51:48 2002
  DUMP: Dumping /dev/sda2 (/) to /dev/st0
  DUMP: Added inode 8 to exclude list (journal inode)
  DUMP: Added inode 7 to exclude list (resize inode)
  DUMP: Label: /
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 1526917 tape blocks.
  DUMP: Volume 1 started with block 1 at: Mon Oct 14 18:51:51 2002
  DUMP: dumping (Pass III) [directories]

At this point, /var/log/messages contains the following lines:

Oct 14 18:51:51 localhost kernel: scsi0:A:5: Missed busfree. Lastphase = 0xe0,
Curphase = 0x0
Oct 14 18:51:51 localhost last message repeated 2 times
Oct 14 18:51:51 localhost kernel: scsi0: Missing case in ahc_handle_scsiint.
status = 8

A check of /lib/modules/2.4.18-14/kernel/drivers/scsi/aic7xxx/aic7xxx.o shows
that it is the source of those error messages.

# lsmod | grep aic
aic7xxx               137140   1
scsi_mod              107144   4  [st ips aic7xxx sd_mod]



Version-Release number of selected component (if applicable):

# rpm -q dump kernel
dump-0.4b28-4
kernel-2.4.18-14

How reproducible:
Always

Steps to Reproduce:
1. Before starting Linux, set the SCSI data transfer rate of Adaptec onboard
SCSI card to 'ASYNC' for the DLT tape drive.  The default rate, 160Mb/sec, is
too high for the tape drive, which will only accept data at approximately 6Mb/sec.

2.Insert tape into the DLT tape drive.

3.Boot Linux and log on as 'root'.

4.At the shell prompt, run the 'dump' command:
  # dump -0u -f /dev/st0 /


Actual Results:

'dump' does not display any messages after the messages above, even after a
period of 30 minutes.

/var/log/messages:
------------------
Oct 14 14:55:12 localhost kernel: (scsi0:A:5): 6.600MB/s transfers (16bit)
Oct 14 14:55:12 localhost kernel: st0: Block limits 2 - 16777214 bytes.

(text omitted)

Oct 14 18:51:51 localhost kernel: scsi0:A:5: Missed busfree. Lastphase = 0xe0,
Curphase = 0x0
Oct 14 18:51:51 localhost last message repeated 2 times
Oct 14 18:51:51 localhost kernel: scsi0: Missing case in ahc_handle_scsiint.
status = 8


Expected Results:  The 'dump' program is supposed to copy the entire contents of
the '/' directory tree to the backup tape.


Additional info:

From /var/log/dmesg:
--------------------
SCSI subsystem driver Revision: 1.00
kmod: failed to exec /sbin/modprobe -s -k scsi_hostadapter, errno = 2
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

blk: queue c3620e14, I/O limit 4095Mb (mask 0xffffffff)
  Vendor: QUANTUM   Model: DLT8000           Rev: 0250
  Type:   Sequential-Access                  ANSI SCSI revision: 02

blk: queue f7fbea14, I/O limit 4095Mb (mask 0xffffffff)
blk: queue f7fbea14, I/O limit 4095Mb (mask 0xffffffff)
scsi2 : IBM PCI ServeRAID 5.10.21  <ServeRAID 4Lx>
  Vendor: IBM       Model: SERVERAID         Rev: 1.00
  Type:   Direct-Access                      ANSI SCSI revision: 02
  Vendor: IBM       Model: SERVERAID         Rev: 1.00
  Type:   Processor                          ANSI SCSI revision: 02
  Vendor: IBM       Model: YGLv3 S2          Rev: 0   
  Type:   Processor                          ANSI SCSI revision: 02
Attached scsi disk sda at scsi2, channel 0, id 0, lun 0
SCSI device sda: 71096320 512-byte hdwr sectors (36401 MB)
Partition check:
 sda: sda1 sda2 sda3
Journalled Block Device driver loaded

(text omitted)

EXT3 FS 2.4-0.9.18, 14 May 2002 on sd(8,2), internal journal
Adding Swap: 2040244k swap-space (priority -1)
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on sd(8,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
st: Version 20020205, bufsize 32768, wrt 30720, max init. bufs 4, s/g segs 16
Attached scsi tape st0 at scsi0, channel 0, id 5, lun 0


From /var/log/messages:  (prior to starting 'dump')
-----------------------

Oct 14 14:50:27 localhost kernel: EXT3 FS 2.4-0.9.18, 14 May 2002 on sd(8,1),
internal journal
Oct 14 14:50:27 localhost kernel: EXT3-fs: mounted filesystem with ordered data
mode.
Oct 14 14:50:27 localhost kernel: st: Version 20020205, bufsize 32768, wrt
30720, max init. bufs 4, s/g segs 16
Oct 14 14:50:27 localhost kernel: Attached scsi tape st0 at scsi0, channel 0, id
5, lun 0
Oct 14 14:50:27 localhost kernel: parport0: PC-style at 0x378 [PCSPP]
Oct 14 14:50:27 localhost kernel: ohci1394: pci_module_init failed

Comment 1 Mark Harig 2002-10-14 23:27:46 UTC
I reported this problem to the 'dump' mailing list (dump-users),
and got the following reply from dump's maintainer:

> I am having a problem getting 'dump' to work.
> It hangs shortly after starting.
[...]

> Oct  9 17:00:02 localhost kernel: scsi0:A:5: Missed busfree. Lastphase =
> 0xe0, Curphase = 0x0
> Oct  9 17:00:03 localhost last message repeated 3 times
> Oct  9 17:00:03 localhost kernel: scsi0: Missing case in
> ahc_handle_scsiint. status = 8

This is not a dump issue but a kernel issue. For some reason, the
big amount of data dump is trying to send to your tape drive causes
problems in the kernel's SCSI subsystem.

Be sure to report this to the maintainer of this specific SCSI
driver and/or to the linux-kernel mailing list.

Thanks,

Stelian.
--
Stelian Pop <stelian.pop.com>
Alcove - http://www.alcove.com


Comment 2 Mark Harig 2002-10-15 16:41:51 UTC
I tried using the older aix7xxx module, 'aix7xxx_old', but got the same result,
that is, 'dump' hangs after printing the line that starts with 'DUMP: Volume 1
started with block 1 at:...".

Here are the steps I took (as 'root'):

# vi /etc/modules.conf
Replace 'alias scsi_hostadapter aic7xxx' with 'alias scsi_hostadapter aic7xxx_old'

# /sbin/rmmod aic7xxx
# /sbin/modprobe -s -k aic7xxx_old
# /sbin/lsmod | grep aic7xxx
aic7xxx_old           125664   1  (autoclean)
scsi_mod              107144   4  [aic7xxx_old st ips sd_mod]
# /sbin/dump -0u -f /dev/st0 /
  DUMP: Date of this level 0 dump: Tue Oct 15 12:18:12 2002
  DUMP: Dumping /dev/sda2 (/) to /dev/st0
  DUMP: Added inode 8 to exclude list (journal inode)
  DUMP: Added inode 7 to exclude list (resize inode)
  DUMP: Label: /
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 1527330 tape blocks.
  DUMP: Volume 1 started with block 1 at: Tue Oct 15 12:18:15 2002
('dump' hangs until Ctrl-C is pressed)

aic7xxx_old is less informative about what is causing the problem; there are no
messages in /var/log/messages after:

Oct 15 12:18:15 localhost kernel: st0: Block limits 2 - 16777214 bytes.


Comment 3 Mark Harig 2002-10-23 16:47:20 UTC
I upgraded the kernel on my computer to the latest patch level, 2.4.18-17.8.0, 
and retried backing up my RAID drive to tape.  The 'dump' utility fails/hangs 
again in the same location and the SCSI driver displays the same messages 
in /var/log/messages:

localhost kernel: scsi0:A:5: Missed busfree. Lastphase = 0xe0, Curphase = 0x0
localhost kernel: scsi0: Missing case in ahc_handle_scsiint. status = 8


Comment 4 Mark Harig 2002-10-23 17:03:33 UTC
I might be able to provide more information about this problem if someone at 
Red Hat could tell me the following:

  1. Where to get the kernel 2.4.18 source.  I looked at www.kernel.org.  It 
listed 2.4.19.

  2. The exact steps for patching the source with the kernel source .rpm 
provides (essentially, it's a big patch file).  Probably # patch -p0 < 
patchfile?  I am assuming that I can use the kernel-2.4.18-17.8.0.src.rpm for 
patching 2.4.18.

  3. The steps Red Hat takes to build the kernel (what optimization flags, etc.)
Or, if I don't need all of the kernel, then the steps to build the device 
driver aic7xxx.o.

  4. The steps needed to install a new kernel over a running kernel.

  5. Any help you can give me with these steps will help me to produce a valid, 
debuggable aic7xxx.o file.



Comment 5 Mark Harig 2002-10-23 21:54:34 UTC
Please disregard my most recent questions.  The answers are provided in the Red 
Hat Linux 8.0 Customization Guide, Appendix A. Building a Custom Kernel.  After 
I installed the kernel-source .rpm file on CD 2 and ran Red Hat Update Agent, I 
now have the source code for kernel 2.4.18-17.8.0 installed.

Comment 6 Need Real Name 2002-12-04 17:18:33 UTC
This may not be a problem with anything Linux.  We are witnessing tape issues
under FreeBSD, Windows, and Solaris when the tape drive is connected to any
Adaptec U160 (aic789x) controller.  In some instances, we actually witness the
system hang during BIOS POST, long before any Linux elements come into play.

Tested controllers include 29160, 39160, embedded 7899 (Dell and Supermicro) on
Intel 810, 815, and 845, Asus, FIC, MSI, and Shuttle mainboards.

Tests with other U160 controller have resulted in proper operations (Symbios and
LSI).

This has been reported to both the tape drive manufacturers and to Adaptec, but
no solution seems readily forthcoming.

Our recommendation is to add a non-789x controller (Adaptec 7880 cards,
Advansys, ACCard, Symbios, Initiao, et al) to the system for the tape drive.

One easy fix is the SiiG AP40 from sources like Microcenter and CompUSA.  US$70
gets you an Ultrawide 40MB/sec controller that uses the ACCard artp870u driver.

Tim


Comment 7 Mark Harig 2002-12-04 17:54:31 UTC
FYI, I rebuilt the kernel including the (diagnostic) changes to aic7xxx_core.c 
that were requested by the maintainer of this driver, Justin Gibbs.  Below are
the results that I sent to him on 25 October 2002.  I haven't heard a reply (I 
sent the message a second time a few weeks later).

Justin,

   Below is the log from /var/log/messages after I added the diagnostic code 
that you requested to aic7xxx_core.c and rebuilt the kernel.  (Please reply to 
this message to let me know that you got it.)  Thanks for your help with this 
problem!

-mark

Oct 25 18:35:53 localhost kernel: (scsi0:A:5): 6.600MB/s transfers (16bit)
Oct 25 18:35:53 localhost kernel: st0: Block limits 2 - 16777214 bytes.
Oct 25 18:35:53 localhost kernel: scsi0:A:5: Missed busfree. Lastphase = 0xe0, 
Curphase = 0x0
Oct 25 18:35:53 localhost kernel: scsi0: Missing case in ahc_handle_scsiint. 
status = 8
Oct 25 18:35:53 localhost kernel: scsi0: Dumping Card State while idle, at 
SEQADDR 0x1
Oct 25 18:35:53 localhost kernel: ACCUM = 0x0, SINDEX = 0xb1, DINDEX = 0xe4, 
ARG_2 = 0x0
Oct 25 18:35:53 localhost kernel: HCNT = 0x0 SCBPTR = 0x0
Oct 25 18:35:53 localhost kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
Oct 25 18:35:53 localhost kernel:  DFCNTRL = 0x4, DFSTATUS = 0x89
Oct 25 18:35:53 localhost kernel: LASTPHASE = 0x1, SCSISIGI = 0x0, SXFRCTL0 = 
0x80
Oct 25 18:35:53 localhost kernel: SSTAT0 = 0x0, SSTAT1 = 0x9
Oct 25 18:35:53 localhost kernel: SCSIPHASE = 0x0
Oct 25 18:35:53 localhost kernel: STACK == 0x43, 0x0, 0x160, 0x108
Oct 25 18:35:53 localhost kernel: SCB count = 4
Oct 25 18:35:53 localhost kernel: Kernel NEXTQSCB = 3
Oct 25 18:35:53 localhost kernel: Card NEXTQSCB = 3
Oct 25 18:35:53 localhost kernel: QINFIFO entries: 
Oct 25 18:35:53 localhost kernel: Waiting Queue entries: 
Oct 25 18:35:53 localhost kernel: Disconnected Queue entries: 
Oct 25 18:35:53 localhost kernel: QOUTFIFO entries: 
Oct 25 18:35:53 localhost kernel: Sequencer Free SCB List: 0 1 2 3 4 5 6 7 8 9 
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Oct 25 18:35:53 localhost kernel: Sequencer SCB Info: 0(c 0x40, s 0x57, l 0, t 
0xff) 1(c 0x0, s 0xff, l 255, t 0xff) 2(c 0x0, s 0xff, l 255, t 0xff) 3(c 0x0, 
s 0xff, l 255, t 0xff) 4(c 0x0, s 0xff, l 255, t 0xff) 5(c 0x0, s 0xff, l 255, 
t 0xff) 6(c 0x0, s 0xff, l 255, t 0xff) 7(c 0x0, s 0xff, l 255, t 0xff) 8(c 
0x0, s 0xff, l 255, t 0xff) 9(c 0x0, s 0xff, l 255, t 0xff) 10(c 0x0, s 0xff, l 
255, t 0xff) 11(c 0x0, s 0xff, l 255, t 0xff) 12(c 0x0, s 0xff, l 255, t 0xff) 
13(c 0x0, s 0xff, l 255, t 0xff) 14(c 0x0, s 0xff, l 255, t 0xff) 15(c 0x0, s 
0xff, l 255, t 0xff) 16(c 0x0, s 0xff, l 255, t 0xff) 17(c 0x0, s 0xff, l 255, 
t 0xff) 18(c 0x0, s 0xff, l 255, t 0xff) 19(c 0x0, s 0xff, l 255, t 0xff) 20(c 
0x0, s 0xff, l 255, t 0xff) 21(c 0x0, s 0xff, l 255, t 0xff) 22(c 0x0, s 0xff, 
l 255, t 0xff) 23(c 0x0, s 0xff, l 255, t 0xff) 24(c 0x0, s 0xff, l 255, t 
0xff) 25(c 0x0, s 0xff, l 255, t 0xff) 26(c 0x0, s 0xff, l 255, t 0xff) 27(c 
0x0, s 0xff, l 255, t 0xff) 28(c 0x0, s 0xff, l 255, t 0xff) 29(c 0x0, s 0xff, 
l 255, t 0x
Oct 25 18:35:53 localhost kernel: f) 30(c 0x0, s 0xff, l 255, t 0xff) 31(c 0x0, 
s 0xff, l 255, t 0xff) 
Oct 25 18:35:53 localhost kernel: Pending list: 
Oct 25 18:35:53 localhost kernel: Kernel Free SCB list: 2 1 0 
Oct 25 18:35:53 localhost kernel: DevQ(0:5:0): 0 waiting'

> -----Original Message-----
> From: Gibbs, Justin [Justin_Gibbs]
> Sent: Tuesday, October 22, 2002 1:30 PM
> To: Harig, Mark A.
> Subject: RE: Problem with Adaptec aic7899 Ultra160 SCSI adapter driver
> 
> 
> Mark,
> 
> This is going to be a bit tricky to debug remotely.  One thing you can
> do for me is to insert a call to "ahc_dump_card_state(ahc);" 
> right below
> the line in 
> drivers/scsi/aic7xxx/aic7xxx_core.c:ahc_handle_scsiint where
> it says:
> 
> 	printf("%s: Missing case in ahc_handle_scsiint.  status 
> = 0x%x\n",
> 		 ahc_name(ahc), status);
> 
> It's near the end of the function.
> 
> BTW, just because your tape drive can only stream at 6.6MB/s, 
> you don't
> need to set the sync rate speed that low.  The drive will 
> actually burst
> transfers across the SCSI bus at much higher rates.
> 
> We have a DLT4000 here that I'm trying to use to reproduce 
> your problem.
> I'll let you know what I find.
> 
> --
> Justin
> 


Comment 8 Mark Harig 2003-06-18 20:13:28 UTC
IBM has released a new version of their UpdateXpress CD (version 2.03).  This 
includes updates to the POST/BIOS firmware and to the ServeRAID firmware (to 
version 6.00).  These updates appear to have fixed the problem that I reported 
originally, i.e., 'dump' no longer hangs.

However, during the boot process the ips.o device driver reports sever warning 
messages (recorded in /var/log/dmesg):

Warning: Adapter 0 Firmware Compatible version is MR600, but should be SA510
Warning: Adapter 0 BIOS Compatible version is MR600, but should be SA510
Warning: ! ! ! ServeRAID Version mismatch

I examined the source code for ips.c that IBM provides on the UpdateXpress CD 
and compared it with the source code that is provided with the latest Red Hat 
8.0 kernel version, 2.4.20-18.8.  According to the changelogs at the top of the 
source files, the Red Hat 8.0 version of ips.c is version 5.00.01 while the IBM 
version of ips.c is 6.00.00 and includes for three other versions that the Red 
Hat version does not.

Are there any plans to include version 6.00.00 of ips.c in a future release of 
the Red Hat kernels?






Comment 9 Mark Harig 2003-07-30 23:13:17 UTC
Created attachment 93279 [details]
The ChangeLog for the 'ips' SCSI device driver

Comment 10 Mark Harig 2003-07-30 23:14:47 UTC
Created attachment 93280 [details]
The source file 'ips.c' for version 6.00.00 of the 'ips' device driver.

Comment 11 Mark Harig 2003-07-30 23:19:11 UTC
Created attachment 93281 [details]
The source file 'ips.h' for version 6.00.00 of the 'ips' device driver

I have rebuilt the kernel using the source code for versions 2.4.20-18.8 and
2.4.20-19.9 and these attached 'ips' device driver files without any problems.

Comment 12 Bugzilla owner 2004-09-30 15:40:04 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.