Description of problem: When running Bacula (network backup) and attempting to write to a tape, it fails. The kernel reports a number of scsi errors. Version-Release number of selected component (if applicable): kernel-2.6.18-53.1.4.el5 The problem does not exist in: kernel-2.6.18-8.1.15.el5 How reproducible: Run Bacula regression testing with a tape test. Activating the autochanger causes the situation to worsen to the point that no Bacula I/O can be done to the drive even when the autochanger is not used. Steps to Reproduce: 1. Run tape regression scripts for Bacula preferably with an autochanger. 2. 3. Actual results: Unable to use Bacula to write to a tape drive. Expected results: With a different kernel, everything works fine. Additional info: - I am actually running CentOS 5.1 - The problem does not exist on the CentOS 5.0 kernel (running on a system that is otherwise 5.1). - This may have something to do with autochangers, but I don't think so. In any case, activating an autochanger using mtx increases the problem. - This problem looks much like one I saw on SuSE a year ago when they were running a similar kernel version. It was a *very* serious problem that lead to system lock up, and on one of my machines the total loss of a hard disk. Kernel (scsi) error messages attached
Created attachment 290859 [details] scsi errors found in /var/log/messages corresponding to failures
In checking back on the previous SuSE kernel problem, it was in kernel 2.6.16 and so is not related to this bug. This is a bit complicated and I am still analyzing it, but it looks like while one thread is doing write() operations to the drive another thread calls mtx which removes the tape from the drive. That causes the kernel tape driver to get confused, and thereafter even though the drive is close()ed and open()ed again, it fails to work always getting either a Resource is Busy or an I/O Error. The only way I have found to be able to re-use the drive is to reboot. If you have any less drastic way of clearing the kernel driver, I would like to know. In addition, despite what I previously said, it seems to occur on both kernels mentioned, though I cannot reproduce it on later kernels (SuSE 10.2).
(In reply to comment #2) > If you have any less drastic way of clearing the kernel driver, I would like > to know. I assume you tried rmmod st.ko ? From the messages "Dumping Card State" it appears you have an aic* HBA. You should rmmod/insmod that as well to try to clear the error. > In addition, despite what I previously said, it seems to occur on both kernels > mentioned, though I cannot reproduce it on later kernels (SuSE 10.2). So, kernel-2.6.18-53.1.4.el5 fails. Please try a newer kernel-2.6.18...el5 version.
This bugzilla is really old and there has been no reply to Tom's questions for years, so taking the liberty to close it. If you experience similar problems with a newer version of RHEL, please open a new bugzilla. Jes