Description of problem: The st (SCSI tape) driver is broken in (all?) Fedora 29 kernel(s), at least tested with 4.19.5-300, 4.19.6-300, and 4.19.10-300. The Fedora 28 kernel 4.18.18-200.fc28.x86_64 is working fine! Tested on a Tandberg Storage Loader 1U and on a Quantum Superloader, both with LTO4 tape drives attached over SAS. Version-Release number of selected component (if applicable): kernel-4.19.5-300.fc29.x86_64 kernel-4.19.6-300.fc29.x86_64 kernel-4.19.10-300.fc29.x86_64 How reproducible: Always. Steps to Reproduce: 0. Test running under most recent Fedora 29 kernel: # uname -a Linux xxx 4.19.10-300.fc29.x86_64 #1 SMP Mon Dec 17 15:34:44 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux 1. Create some test data: # dd if=/dev/urandom of=/var/tmp/testdata bs=512k count=1 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 0.00547149 s, 95.8 MB/s # hexdump /var/tmp/testdata | head 0000000 e238 2af2 2b42 842b 1376 3346 675a 39e5 0000010 30a9 79a7 7053 1dab de04 3562 849e 6b9c 0000020 0f74 d947 89df dc43 8282 58d3 d53c b507 0000030 223a 57e6 85a5 d834 c08d 3978 80d2 8d9d 0000040 bb82 4b8e 32e0 c684 974d cd96 2a23 b850 0000050 a4fc 337e 2373 cb9e b64a 3a32 a147 6277 0000060 818c 94c7 01ce 181b b279 9a96 8b40 4831 0000070 f10d 6a23 29b0 09b9 f045 6372 3ddc 1cd7 0000080 3289 4328 6de1 7df6 7ca4 a15d a0e2 af19 0000090 3bc9 b3e5 56ef 0fb9 dd0a a708 6154 a1f5 2. Write test data to a tape: # mt -f /dev/nst0 rewind # dd if=/var/tmp/testdata of=/dev/nst0 bs=512k 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 2.40414 s, 218 kB/s 3. Read test data back and compare: # mt -f /dev/nst0 rewind # dd if=/dev/nst0 of=/tmp/foo bs=512k 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 0.00749064 s, 70.0 MB/s # hexdump /tmp/foo | head 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0080000 # cmp /tmp/foo /var/tmp/testdata ; echo RC=$? /tmp/foo /var/tmp/testdata differ: byte 1, line 1 RC=1 Actual results: Instead of real data, only zeros were written, without any indication of the error, which mans your backup system will only write garbage but you won't even notice it! Expected results: Compare procedure running the Fedora 28 kernel 4.18.18-200.fc28.x86_64 : [using same test data] # mt -f /dev/nst0 rewind # dd if=/var/tmp/testdata of=/dev/nst0 bs=512k 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 2.38308 s, 220 kB/s # mt -f /dev/nst0 rewind # dd if=/dev/nst0 of=/tmp/foo bs=512k 1+0 records in 1+0 records out 524288 bytes (524 kB, 512 KiB) copied, 0.00834666 s, 62.8 MB/s # hexdump /tmp/foo | head 0000000 e238 2af2 2b42 842b 1376 3346 675a 39e5 0000010 30a9 79a7 7053 1dab de04 3562 849e 6b9c 0000020 0f74 d947 89df dc43 8282 58d3 d53c b507 0000030 223a 57e6 85a5 d834 c08d 3978 80d2 8d9d 0000040 bb82 4b8e 32e0 c684 974d cd96 2a23 b850 0000050 a4fc 337e 2373 cb9e b64a 3a32 a147 6277 0000060 818c 94c7 01ce 181b b279 9a96 8b40 4831 0000070 f10d 6a23 29b0 09b9 f045 6372 3ddc 1cd7 0000080 3289 4328 6de1 7df6 7ca4 a15d a0e2 af19 0000090 3bc9 b3e5 56ef 0fb9 dd0a a708 6154 a1f5 # cmp /tmp/foo /var/tmp/testdata ; echo RC=$? RC=0 Additional info: The test with the fedora 28 kernel was running from the very same root file system, i. e. this is a kernel problem only. The test was notived when the bacula system would not append data to previously used tapes any more - it turned out, that instead of valid tape labels and data only zeros (garbage) had been written. In other words: for several days all backups were just garbave, without any indication of the error. Hardware: - msi Z370 TOMAHAWK Mainboard with 16 GB RAM - LSI SAS3444E (= IBM 25R8060/8071) - Quantum and Tandberg LTO4 tape libraries The problem happens on all tape drives and with all tested LTO3 and LTO4 media.
To exclude problems with the SAS controller I ran the same tests on a LTO3 tape drive attached to an Adaptec ASC-29320ALP U320 SCSI controller. The problem is the same there. Now for the interesting part: The same tests with the 4.18.18-300.fc29.x86_64 kernel with a LTO4 library attached to a LSI SAS1068E SAS controller work fine. This system is running on a Supermicro motherboard X6DH8-G So I start suspecting the msi Z370 TOMAHAWK mainboard to cause the issues?
I don't think it is your mainboard. There was a kernel bug that has been fixed. There is more info here: https://bugzilla.kernel.org/show_bug.cgi?id=201935
(In reply to Steven A. Falco from comment #2) > I don't think it is your mainboard. There was a kernel bug that has been > fixed. There is more info here: > > https://bugzilla.kernel.org/show_bug.cgi?id=201935 Indeed, this would indeed explain what I'm seeing. However, the other, working system is also running the 4.19.10-300.fc29.x86_64 kernel, and it appears to be working fine. This I don't understand, then... Will update the kernel and re-test ASAP.
I'll be interested in how it works for you. It definitely fixed the issue for me on an LTO-6 drive. Perhaps it is somehow speed related?
I confirm that the problem is fixed in recent kernel version. Tested with 4.19.13-300.fc29.x86_64 - this works without problems. Thanks!
Thanks for letting us know.