Red Hat Bugzilla – Bug 427576
Reading/writing a tape drive sometimes fails with scsi errors
Last modified: 2013-04-30 08:47:03 EDT
Description of problem:
When running Bacula (network backup) and attempting to write to a tape, it
fails. The kernel reports a number of scsi errors.
Version-Release number of selected component (if applicable):
The problem does not exist in:
Run Bacula regression testing with a tape test. Activating the autochanger
causes the situation to worsen to the point that no Bacula I/O can be done to
the drive even when the autochanger is not used.
Steps to Reproduce:
1. Run tape regression scripts for Bacula preferably with an autochanger.
Unable to use Bacula to write to a tape drive.
With a different kernel, everything works fine.
- I am actually running CentOS 5.1
- The problem does not exist on the CentOS 5.0 kernel (running on a system
that is otherwise 5.1).
- This may have something to do with autochangers, but I don't think so. In
any case, activating an autochanger using mtx increases the problem.
- This problem looks much like one I saw on SuSE a year ago when they were
running a similar kernel version. It was a *very* serious problem that lead
to system lock up, and on one of my machines the total loss of a hard disk.
Kernel (scsi) error messages attached
Created attachment 290859 [details]
scsi errors found in /var/log/messages corresponding to failures
In checking back on the previous SuSE kernel problem, it was in kernel 2.6.16
and so is not related to this bug.
This is a bit complicated and I am still analyzing it, but it looks like while
one thread is doing write() operations to the drive another thread calls mtx
which removes the tape from the drive. That causes the kernel tape driver to
get confused, and thereafter even though the drive is close()ed and open()ed
again, it fails to work always getting either a Resource is Busy or an I/O
Error. The only way I have found to be able to re-use the drive is to reboot.
If you have any less drastic way of clearing the kernel driver, I would like
In addition, despite what I previously said, it seems to occur on both kernels
mentioned, though I cannot reproduce it on later kernels (SuSE 10.2).
(In reply to comment #2)
> If you have any less drastic way of clearing the kernel driver, I would like
> to know.
I assume you tried rmmod st.ko ?
From the messages "Dumping Card State" it appears you have an aic* HBA. You should rmmod/insmod that as well to try to clear the error.
> In addition, despite what I previously said, it seems to occur on both kernels
> mentioned, though I cannot reproduce it on later kernels (SuSE 10.2).
So, kernel-2.6.18-53.1.4.el5 fails. Please try a newer kernel-2.6.18...el5 version.
This bugzilla is really old and there has been no reply to Tom's questions
for years, so taking the liberty to close it.
If you experience similar problems with a newer version of RHEL, please open
a new bugzilla.