This service will be undergoing maintenance at 20:00 UTC, 2017-04-03. It is expected to last about 30 minutes
Bug 427576 - Reading/writing a tape drive sometimes fails with scsi errors
Reading/writing a tape drive sometimes fails with scsi errors
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
i386 Linux
low Severity high
: rc
: ---
Assigned To: Red Hat Kernel Manager
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-01-04 15:52 EST by Kern Sibbald
Modified: 2013-04-30 08:47 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-04-30 08:47:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
scsi errors found in /var/log/messages corresponding to failures (8.85 KB, text/plain)
2008-01-04 15:52 EST, Kern Sibbald
no flags Details

  None (edit)
Description Kern Sibbald 2008-01-04 15:52:58 EST
Description of problem:
When running Bacula (network backup) and attempting to write to a tape, it 
fails.  The kernel reports a number of scsi errors.


Version-Release number of selected component (if applicable):
kernel-2.6.18-53.1.4.el5

The problem does not exist in:
kernel-2.6.18-8.1.15.el5



How reproducible:
Run Bacula regression testing with a tape test. Activating the autochanger 
causes the situation to worsen to the point that no Bacula I/O can be done to 
the drive even when the autochanger is not used.


Steps to Reproduce:
1. Run tape regression scripts for Bacula preferably with an autochanger.
2.
3.
  
Actual results:
Unable to use Bacula to write to a tape drive.


Expected results:
With a different kernel, everything works fine.


Additional info:
- I am actually running CentOS 5.1
- The problem does not exist on the CentOS 5.0 kernel (running on a system 
that is otherwise 5.1).
- This may have something to do with autochangers, but I don't think so.  In 
any case, activating an autochanger using mtx increases the problem.
- This problem looks much like one I saw on SuSE a year ago when they were 
running a similar kernel version.  It was a *very* serious problem that lead 
to system lock up, and on one of my machines the total loss of a hard disk. 

Kernel (scsi) error messages attached
Comment 1 Kern Sibbald 2008-01-04 15:52:58 EST
Created attachment 290859 [details]
scsi errors found in /var/log/messages corresponding to failures
Comment 2 Kern Sibbald 2008-01-05 15:11:21 EST
In checking back on the previous SuSE kernel problem, it was in kernel 2.6.16 
and so is not related to this bug.

This is a bit complicated and I am still analyzing it, but it looks like while 
one thread is doing write() operations to the drive another thread calls mtx 
which removes the tape from the drive.  That causes the kernel tape driver to 
get confused, and thereafter even though the drive is close()ed and open()ed 
again, it fails to work always getting either a Resource is Busy or an I/O 
Error. The only way I have found to be able to re-use the drive is to reboot.  
If you have any less drastic way of clearing the kernel driver, I would like 
to know.

In addition, despite what I previously said, it seems to occur on both kernels 
mentioned, though I cannot reproduce it on later kernels (SuSE 10.2).
Comment 3 Tom Coughlan 2009-02-11 09:12:47 EST
(In reply to comment #2)
 
> If you have any less drastic way of clearing the kernel driver, I would like 
> to know.

I assume you tried rmmod st.ko ? 

From the messages "Dumping Card State" it appears you have an aic* HBA. You should rmmod/insmod that as well to try to clear the error. 


> In addition, despite what I previously said, it seems to occur on both kernels 
> mentioned, though I cannot reproduce it on later kernels (SuSE 10.2).

So, kernel-2.6.18-53.1.4.el5 fails. Please try a newer kernel-2.6.18...el5 version.
Comment 4 Jes Sorensen 2013-04-30 08:47:03 EDT
This bugzilla is really old and there has been no reply to Tom's questions
for years, so taking the liberty to close it.

If you experience similar problems with a newer version of RHEL, please open
a new bugzilla.

Jes

Note You need to log in before you can comment on or make changes to this bug.