Bug 468680 - kudzu causes tapes to rewind causing data corruption.
kudzu causes tapes to rewind causing data corruption.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kudzu (Show other bugs)
4.7
All Linux
medium Severity high
: rc
: ---
Assigned To: Bill Nottingham
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-27 08:48 EDT by Brian Parks (Quantum Corp)
Modified: 2014-03-16 23:16 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* during SCSI probing, kudzu would open the /dev/st device node for tape drives. This would cause the tape drive to rewind, which created a risk of data loss. A new version of the probing code is included in this updated package, which does not open this node and therefore avoids this issue.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 16:12:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
potential patch (975 bytes, patch)
2008-11-25 12:23 EST, Bill Nottingham
no flags Details | Diff
proposed_patch_mod1 (1007 bytes, patch)
2008-12-08 16:49 EST, Brian Parks (Quantum Corp)
no flags Details | Diff

  None (edit)
Description Brian Parks (Quantum Corp) 2008-10-27 08:48:00 EDT
Description of problem:
"kudzu" opens the rewind-on-close tape device (/dev/st<x>) when obtaining drive information instead of opening the no-rewind tape device (/dev/nst<x>). Other routines that link with this library also have the same problem. For example, /sbin/kmodule.

If a tape drive is in use when kudzu runs the tape therein is likely to be rewound without the application knowing it. This may cause data to be lost. The problem is compounded when the tape drive is on a SAN. Running kudzu on any system with access to the tape drive will cause the problem if a "scsi reservation" is not in place.

This also occurs when /sbin/kmodule is run during system boot so booting another system on a SAN with access to a tape drive may cause tapes therein to rewind.


Version-Release number of selected component (if applicable):
All versions of RedHat Enterprise Linux 4 including update 7.


How reproducible:
100% if the tape device appears in the first block read from /proc/scsi/scsi (see "Additional info" below.


Steps to Reproduce:
1. insert a tape in a drive
2. position tape past BOT, e.g. "mt -f /dev/nst0 fsr 1"
3. run kudzu
4. check tape status, e.g. "mt -f /dev/nst0 status"
  
Actual results:
Tape is at BOT

Expected results:
Tape position should remain unchanged.

Additional info:
Sometimes all tape drives are not seen by kudzu. When reading /proc/scsi/scsi on RedHat Enterprise Linux 4, sometimes the buffer is not entirely filled even though there is more data to be read.

For example, "dd if=/proc/scsi/scsi of=/tmp/proc_scsi_scsi.txt bs=16k count=1"
may return 3968 bytes while "dd if=/proc/scsi/scsi of=/tmp/proc_scsi_scsi.txt bs=16k count=2" returns 6970 bytes (the entire contents of /proc/scsi/scsi").

Because of this, kudzu may only read the first 3968 bytes and not see any tape drives thereafter.

Problem does not seem to exist in RedHat Enterprise Linux 5.
Comment 1 Bill Nottingham 2008-11-25 12:23:13 EST
Created attachment 324632 [details]
potential patch

Does the following solve it for you?
Comment 3 Brian Parks (Quantum Corp) 2008-12-08 16:49:30 EST
Created attachment 326207 [details]
proposed_patch_mod1

The proposed patch dumped core sometimes, because of the inability to open one of the /dev/nst* files. The following is from a printf added for debug purposes in proposed_patch_mod1 (which worked okay to the extent it has been tested):

dev->device nst0
dev->device nst1
dev->device nst2
dev->device (null)
dev->device nst4
dev->device nst5
dev->device nst6
dev->device nst7


The (null) caused the problem. An additional "if (dev->device)" in proposed_patch_mod1 seems to have resolved that. The following shows some status from references to /dev/nst[2-4] (there was only a tape in /dev/nst4):

# mt -f /dev/nst2 status
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (50000):
 DR_OPEN IM_REP_EN

# mt -f /dev/nst3 status
/dev/nst3: Input/output error

# mt -f /dev/nst4 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x44 (no translation).
Soft error count since last status=0
General status bits on (81010000):
 EOF ONLINE IM_REP_EN

Additional fyi:
# cat /proc/scsi/scsi | grep ULT
  Vendor: IBM      Model: ULTRIUM-TD2      Rev: 4772
  Vendor: IBM      Model: ULTRIUM-TD2      Rev: 53Y2
  Vendor: IBM      Model: ULTRIUM-TD3      Rev: 73P5
  Vendor: IBM      Model: ULTRIUM-TD3      Rev: 73P5
  Vendor: IBM      Model: ULTRIUM-TD4      Rev: 7BG2
  Vendor: IBM      Model: ULTRIUM-TD3      Rev: 73P5
  Vendor: IBM      Model: ULTRIUM-TD2      Rev: 4772
  Vendor: IBM      Model: ULTRIUM-TD2      Rev: 53Y2

There were also 65 fibre-attached disks in addition to these fibre-attached tape drives (the last 2 were actually a second view of the first 2 due to an alternate fibre path).

It is unclear whether or not the "if (dev->type == CLASS_TAPE) {" addition is needed for kudzu (it seems to work okay, i.e. not rewind, without that change), though perhaps it is needed for other executables using libkudzu* (/sbin/kmodule has not yet been tested).
Comment 5 Bill Nottingham 2008-12-22 12:54:14 EST
Fixed in 1.1.95.24-1.
Comment 7 Brian Parks (Quantum Corp) 2008-12-24 13:05:41 EST
(In reply to comment #5)
> Fixed in 1.1.95.24-1.

Initial customer test looks promising.

Will this and related fixes be included in RH4u8?

Might this fix be available via up2date?

Might a fix for /sbin/kmodule (statically linked with libkudzu, part of the "initscripts" package) also be available via up2date?

"anaconda" and related python scripts/libraries also use the affected kudzu routine. Obviously, it is impractical for a fix for those to be available by up2date. If/when released in an official update, will "anaconda" and "python" also include the fix?

Do any additional bugs need to be submitted?
Comment 8 Bill Nottingham 2009-01-05 13:31:27 EST
(In reply to comment #7)
> Might a fix for /sbin/kmodule (statically linked with libkudzu, part of the
> "initscripts" package) also be available via up2date?

Please file an additional bug for this.

> "anaconda" and related python scripts/libraries also use the affected kudzu
> routine. Obviously, it is impractical for a fix for those to be available by
> up2date. If/when released in an official update, will "anaconda" and "python"
> also include the fix?

And for this as well.
Comment 9 Ruediger Landmann 2009-01-16 01:55:14 EST
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
* during SCSI probing, kudzu would open the /dev/st device node for tape drives. This would cause the tape drive to rewind, which created a risk of data loss. A new version of the probing code is included in this updated package, which does not open this node and therefore avoids this issue.
Comment 12 Brian Parks (Quantum Corp) 2009-04-22 09:42:41 EDT
(In reply to comment #8)
Initial tests with RedHat Enterprise Linux 4 update 8 BETA indicate that the rewind problem is fixed, including /sbin/kmodule (bug # 478881). The anaconda/python scripts variant (bug # 478885 - state closed) still has the bug in the update 8 BETA. Hopefully, a version containing the fix will be in the official release.
Comment 13 errata-xmlrpc 2009-05-18 16:12:01 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0970.html

Note You need to log in before you can comment on or make changes to this bug.