Bug 1930388
Summary: | [RHEL8] st driver unit attention behavior over iSCSI | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Alex Wang <alex.wang> | ||||||||
Component: | kernel | Assignee: | John Meneghini <jmeneghi> | ||||||||
kernel sub component: | Storage Drivers | QA Contact: | Martin Hoyer <mhoyer> | ||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||
Severity: | high | ||||||||||
Priority: | high | CC: | ccalhoun, cleech, jmagrini, jmeneghi, jpittman, loberman, mgandhi, mhoyer, mpatalan, revers | ||||||||
Version: | 8.3 | Keywords: | MigratedToJIRA, Reopened, Triaged | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | kernel-4.18.0-553.el8_10 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | |||||||||||
: | 2210024 (view as bug list) | Environment: | |||||||||
Last Closed: | 2024-05-22 09:48:46 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 2210024 | ||||||||||
Attachments: |
|
Comment 13
loberman
2021-12-14 19:39:06 UTC
Customers current workaround (*) It turns out that an otherwise-innocuous MTIOCTOP with opcode MTTELL, while not actually supported by st, will reset STps->rw to ST_IDLE, preventing the automatic filemark WEOF that leads to the problem. This itself is probably a bug, but we see no shame in exploiting it to get the behavior we need. I am trying to help now I have a device ready now served by another server connected to a Ultrium tape drive [ 388.884193] iscsi: registered transport (tcp) [ 465.744253] scsi host15: iSCSI Initiator over TCP/IP [ 494.465017] scsi host15: iSCSI Initiator over TCP/IP [ 494.504173] scsi 15:0:0:0: Sequential-Access HP Ultrium 5-SCSI Z64D PQ: 0 ANSI: 6 [ 494.546018] scsi 15:0:0:0: alua: disable for non-disk devices [ 494.573621] scsi 15:0:0:0: Attached scsi generic sg5 type 1 [ 494.601449] st: Version 20160209, fixed bufsize 32768, s/g segs 256 [ 494.632130] st 15:0:0:0: Attached scsi tape st0 [ 494.653792] st 15:0:0:0: st0: try direct i/o: yes (alignment 1 B) [15:0:0:0] tape HP Ultrium 5-SCSI Z64D /dev/st0 /dev/sg5 Reproducer Start a backup to the iscsi tape Down the network interface on the target server and wait for the initiator to time out ifconfig eno1 down /usr/lib/debug/usr/lib/modules/4.18.0-305.30.1.el8_4.x86_64/kernel/drivers/mfd/ /usr/lib/debug/usr/lib/modules/4.18.0-305.30.1.el8_4.x86_64/kernel/drivers/mfd/intel-lpss-acpi.ko.debug /usr/lib/debug/usr/lib/modules/4.18.0-305.30.1.el8_4.x86_64/kernel/drivers/mfd/intel-lpss-pci.ko.debug tar: /dev/st0: Cannot write: Input/output error tar: Error is not recoverable: exiting now ifconfig eno1 up I hear the tape rewind List the contents and they are there so in this case I never got the issue reproduced This issue "our next rewind attempt also triggers an implied MTWEOF, which does _not_ fail" [root@ml150 ~]# tar tvf /dev/st0 drwxr-xr-x root/root 0 2021-11-11 16:04 usr/ dr-xr-xr-x root/root 0 2022-07-27 12:27 usr/bin/ -rwxr-xr-x root/root 71504 2019-11-19 09:21 usr/bin/diff3 -rwxr-xr-x root/root 3283 2022-06-08 15:13 usr/bin/catchsegv -rwxr-xr-x root/root 54768 2019-11-19 09:21 usr/bin/sdiff -rwxr-xr-x root/root 25424 2022-06-08 15:25 usr/bin/gencat -rwxr-xr-x root/root 163416 2021-09-23 08:56 usr/bin/cpio From the BZ Per AWS support, when the gateway crashes in this manner, its notion of the current tape position is reset to the beginning of the tape. It also sets a unit attention condition, such that the next request results in a CHECK CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a device reset. According to their logs and our own, separate analysis, the next command being sent is WRITE FILEMARK, which results in writing an FM at the beginning of the tape, effectively discarding its contents. In fact, once the write fails with EIO, our software attempts to recover by rewinding and repositioning the tape, then resuming operation. If this fails, it attempts to rewind and reposition again, write a marker at the end of the tape, and then unmount. It does not under any circumstances write either data or filemarks without having successfully positioned the tape to a known point. What actually happens is that, since the last operation was a write, the kernel executes an implied MTWEOF operation (which translate to a Write Filemarks command) before the rewind that was actually requested. This seems not entirely unreasonable, provided the tape position is known. However, once this request fails (due to the unit attention condition), our next rewind attempt also triggers an implied MTWEOF, which does _not_ fail (the unit attention condition persists only until the initiator has been notified); this is the command that unexpectedly erases the tape. I will keep at this but it will probably need code inspection review etc. to figure out unless we can get a reproducer from the customer to use. Its been so long, I am not keen to reach back out the customer for a reproducer. Regards Laurence Multiple attempts using tar to reproduce this have not worked The last attempt included stopping and then restarting the target daemon on the target server but still never resulted in the MTWEOF being sent such that the beginning of the tape was compromised. We will need to be doing what their backup software was doing to reproduce this I think. [root@dl385g10 ~]# systemctl stop target /usr/bin/mysql_waitpid /usr/bin/mysqlaccess /usr/bin/mysqladmin /usr/bin/mysqlbinlog /usr/bin/mysqlcheck /usr/bin/mysqldump tar: /dev/nst0: Cannot write: Input/output error tar: Error is not recoverable: exiting now After waiting a while and restarting the target daemon (iscsit) tar: Error is not recoverable: exiting now Then list the tape, I hear it rewind, we are at BOT and data is still intact [root@loberman ~]# tar tvf /dev/st0 drwxr-xr-x root/root 0 2021-06-28 14:41 usr/ drwxr-xr-x root/root 0 2022-05-13 09:11 usr/bin/ -rwxr-xr-x root/root 2161 2016-03-04 07:56 usr/bin/xzmore -rwxr-xr-x root/root 3336 2021-03-26 08:45 usr/bin/catchsegv -rwxr-xr-x root/root 199288 2018-05-03 11:50 usr/bin/find -rwxr-xr-x root/root 154808 2014-08-15 00:05 usr/bin/m4 -rwxr-xr-x root/root 22640 2021-03-26 09:15 usr/bin/gencat (In reply to loberman from comment #18) > > Per AWS support, when the gateway crashes in this manner, its notion of the > current tape position is reset to the beginning of the tape. It also sets a > unit attention condition, such that the next request results in a CHECK > CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a > device reset. According to their logs and our own, separate analysis, the > next command being sent is WRITE FILEMARK, which results in writing an FM at > the beginning of the tape, effectively discarding its contents. Yes, agreed. I've looked at the provided .pcap trace with Wireshark. The problem is a number of Malformed Packet errors led to a TCP RST. This had the effect of an iSCSI logout, which is a nexus clearing event. This resulted in SCSI Target device doing in internal LUN reset. There is no way the SCSI Target can be expected to preserve any internal state following the LUN reset. After the host logs in again the first unit attention advises the initiator that the device has been reset and the tape should be re-positioned. There is no way to continue at the same position following a reset. This has nothing to do with iSCSI, the SCSI protocols dictates that the tape must be re-positioned following a reset. > In fact, once the write fails with EIO, our software attempts to recover by > rewinding and repositioning the tape, then resuming operation. If this > fails, it attempts to rewind and reposition again, write a marker at the end > of the tape, and then unmount. It does not under any circumstances write > either data or filemarks without having successfully positioned the tape to > a known point. Yes, this is what it should do. > What actually happens is that, since the last operation was a write, the > kernel executes an implied MTWEOF operation (which translate to a Write > Filemarks command) before the rewind that was actually requested. This seems > not entirely unreasonable, provided the tape position is known. However, > once this request fails (due to the unit attention condition), our next > rewind attempt also triggers an implied MTWEOF, which does _not_ fail (the > unit attention condition persists only until the initiator has been > notified); this is the command that unexpectedly erases the tape. Following a reset many tape devices will immediately rewind. So writing a filemark without knowing the position of the tape is an error. The initiator should first rewind and then forward to the end of the tape to find the last valid file mark. I agree that these implied MTWEOF operations coming from the kernel are probably the cause of the unexpected erase. > I will keep at this but it will probably need code inspection review etc. to > figure out unless we can get a reproducer from the customer to use. > > Its been so long, I am not keen to reach back out the customer for a > reproducer. Agreed. If you have the ability to reproduce this, we should probably work on the MTWEOF code. These implied write filemarks operations are probably the source of the problem. See the attached Wireshark images. Created attachment 1904603 [details]
Wireshark Expert Summary
Created attachment 1904604 [details]
Wireshark command summary
Created attachment 1904605 [details]
Wireshark SCSI Unit Attention
Yep, its not iscsi, the customer specifically points out behavior in the st driver. The WireShark has the customer comments you can click on to see each event they speak of. Whatever is creating the MTWEOF as an un-flushed pending command to the tape does not happen with the test setup I have. I am looking writing a test program that directly communicated with the tape device and issues commands in the order the customer describes. This may be a combination of behavior in that AWS VTL and the customers backup (AFS) application. Regards Laurence Customer Update -------------------- I'm sorry; I don't have a test program per se. I'm attaching the relevant portions of the backup software, which may be helpful in following my earlier description of the problem and understanding the sequence of operations. The bulk of the work happens in tapeappend(), dumptotape(), and the support functions in tapedrive.c. You may also be interested in the sections protected by #ifdef MTTELL, which are the workaround we applied to avoid the problem. Taking down the network interface on the target server is not going to trigger this problem. That results in a timeout generated in the iscsi stack on the initiator, but doesn't actually change the state of the target. In order to trigger the problem, you need the following sequence of events: 1) iSCSI communication is interrupted, e.g. because the target process crashed 2) the target resets its notion of the tape position to the beginning of the tape 3) the target responds to the next command with CHECK CONDITION / UNIT ATTENTION / device reset In the test case you describe, all you are doing is causing (1) by temporarily taking down the network interface. Since the target itself is still fully operational, the system recovers (albeit after generating EIO) once network connectivity is restored. If I understand correctly, your test case involves a software iscsi target fronting for a physical tape drive. In that case, even if the target process crashes, it seems unlikely that the drive is actually rewound, so the unwanted EOF (if written) will not be at the beginning of the tape. Our targets are virtual tape drives; the state is presumably stored entirely in memory and certainly not in hardware that doesn't magically rewind when the process crashes. Item (3) is also essential, because the bug is that the st driver is ignoring the UNIT ATTENTION condition entirely, and therefore does not realize that its model of the state of the tape drive may no longer be correct. If, after a crash or similar situation, the target does _not_ respond to the next command with a UNIT ATTENTION condition, then there is no way for the initiator to know anything is wrong. That would be a (probable) bug in the target, but it's not what we're seeing. Our test rig involved an AWS virtual tape library using their Storage Gateway product. The library is an on-premises VM which communicates with the AWS-hosted storage from behind a firewall, via an HTTP proxy we control. Testing consisted of running a copy of our software against one of those drives, with the proxy configured to inject failures into the VTL appliance's connections to AWS (_not_ into the iSCSI connection). This eventually causes the iSCSI target process to fail in the manner described in the original ticket, thereby reproducing the problem. I did try restarting the target to see if that made a difference but I don't have one of these Amazon VTL's. I updated the customer as follows: Hello Jeff Many thanks for the details. I did restart the iscsi target service as well to try reproduce, i.e. not interrupt the network. I think its the fact that have a physical-backed tape drive perhaps that is preventing a reproducer. I do have a way to use mhvtl but I suppose what we really need is to get access to AWS and a virt guest with the AWS type tape devices. I will keep you updated, but I expect engineering will deal with this via code review rather than a reproducer and then provide a test you can run yourself. Sincerely Laurence Oberman My update to them was in response to their detailed update. I will attach the two c code files they attached to the case for review. Chris, can we at least get an explanation on the close here? The customer put in a lot of effort on this and deserves an explanation. Hello John I have this setup Client (iscsi initiator Virtual machine) is accessing a tape on the ISCSI server Linux rhel9.1 5.14.0-284.18.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Wed May 31 10:39:18 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux [root@rhel9 ~]# lsscsi -g [33:0:0:0] tape HP Ultrium 5-SCSI Z64D /dev/st0 /dev/sg1 The tape is connected via SAS to a ML150G9 and shared with ISCSI over 1Gbit Ethernet root@dl385 ~]# targetcli ls o- / ..................................................................... [...] o- backstores .......................................................... [...] | o- block .............................................. [Storage Objects: 0] | o- fileio ............................................. [Storage Objects: 0] | o- pscsi .............................................. [Storage Objects: 1] | | o- tape0 ............................................ [/dev/st0 activated] | | o- alua ............................................... [ALUA Groups: 0] | o- ramdisk ............................................ [Storage Objects: 0] o- iscsi ........................................................ [Targets: 1] | o- iqn.2003-01.org.linux-iscsi.dl385.x8664:sn.f4fb265e42ba ....... [TPGs: 1] | o- tpg1 ........................................... [no-gen-acls, no-auth] | o- acls ...................................................... [ACLs: 1] | | o- iqn.1994-05.com.redhat:9aab1462433f .............. [Mapped LUNs: 1] | | o- mapped_lun0 ............................. [lun0 pscsi/tape0 (rw)] | o- luns ...................................................... [LUNs: 1] | | o- lun0 .............................. [pscsi/tape0 (/dev/st0) (None)] | o- portals ................................................ [Portals: 1] | o- 0.0.0.0:3260 ................................................. [OK] o- loopback ..................................................... [Targets: 0] o- srpt ......................................................... [Targets: 0] It is here at my house so what would you like to capture etc. We can install an upstream kernel if needed. Regards Laurence Rest by other initiator Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0 Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write: Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Number of r/w requests 35913, dio used in 35913, pages 107739. Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0. Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0 Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write filemark. Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Rewinding tape. Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0 Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Tape powered off here Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0 Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready [current] Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Logical unit communication failure Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write: Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Number of r/w requests 27982, dio used in 27982, pages 83946. Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0. Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0 Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready [current] Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Logical unit communication failure Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write filemark. Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Rewinding tape. Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0 Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready [current] Next steps capture a wireshark trace on the initiator where it fails and match the trace to the logging. wireshark/tcpdump of this sequence Jul 27 15:58:51 iscsi2rhel9 kernel: device ens160 entered promiscuous mode Jul 27 15:59:12 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Block limits 1 - 16777215 bytes. Jul 27 15:59:12 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8 Jul 27 15:59:12 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Density 58, tape length: 0, drv buffer: 1 Jul 27 15:59:12 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks). Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0 Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write: Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Number of r/w requests 6045, dio used in 6045, pages 18135. Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0. Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0 Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write filemark. Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Rewinding tape. Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0 Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 15:59:31 iscsi2rhel9 kernel: device ens160 left promiscuous mode Opcode 0x10 is Synchronize buffer in below trace See attachments screenshots.tar.xz has the screenshots from WireShark taken on the initiator doing the writes I will add a people page link for the pcap file ens160.pcap https://people.redhat.com/loberman/customer/.1930388/ Second attempt no wireshark trace Resetting device using sg_reset from other initiator Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0 Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write: Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Number of r/w requests 104974, dio used in 104974, pages 314922. Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0. Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0 Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write filemark. Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Rewinding tape. Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0 Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Repeating the original description to tie it in with what we are seeing Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0 Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write: Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Number of r/w requests 6045, dio used in 6045, pages 18135. Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0. Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0 Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write filemark. ***** Failed here Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Rewinding tape. Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0 Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress 31890 15:59:20.574462 192.168.1.146 → 192.168.1.143 iSCSI 10306 SCSI: Write(6) LUN: 0x00 (LBA: 0x000028, Len: 0)SCSI: Data Out LUN: 0x00 (Write(6) Request Data) 31893 15:59:20.575448 192.168.1.143 → 192.168.1.146 iSCSI 114 SCSI: Response LUN: 0x00 (Write(6)) (Good) 31895 15:59:20.575711 192.168.1.146 → 192.168.1.143 iSCSI 10306 SCSI: Write(6) LUN: 0x00 (LBA: 0x000028, Len: 0)SCSI: Data Out LUN: 0x00 (Write(6) Request Data) 31898 15:59:20.576752 192.168.1.143 → 192.168.1.146 iSCSI 114 SCSI: Response LUN: 0x00 (Write(6)) (Good) 31900 15:59:20.577061 192.168.1.146 → 192.168.1.143 iSCSI 10306 SCSI: Write(6) LUN: 0x00 (LBA: 0x000028, Len: 0)SCSI: Data Out LUN: 0x00 (Write(6) Request Data) reset from other initiator here 31903 15:59:20.579732 192.168.1.143 → 192.168.1.146 iSCSI 214 SCSI Response (Check Condition) LUN:0x00 31904 15:59:20.579947 192.168.1.146 → 192.168.1.143 iSCSI 114 SCSI Command: 0x10 LUN:0x00 *** 10 SYNCHRONIZE BUFFER 31907 15:59:20.581635 192.168.1.143 → 192.168.1.146 iSCSI 214 SCSI Response (Check Condition) LUN:0x00 31908 15:59:20.581734 192.168.1.146 → 192.168.1.143 iSCSI 114 SCSI Command: 0x01 LUN:0x00 *** 01 M REWIND 31909 15:59:20.583450 192.168.1.143 → 192.168.1.146 iSCSI 214 SCSI Response (Check Condition) LUN:0x00 31926 15:59:25.639608 192.168.1.146 → 192.168.1.143 iSCSI 114 NOP Out 31927 15:59:25.641052 192.168.1.143 → 192.168.1.146 iSCSI 114 NOP In I dont see a Unit attention in the trace, just the check condition, going to put SCSI debug on and test again tomorrow Per AWS support, when the gateway crashes in this manner, its notion of the current tape position is reset to the beginning of the tape. It also sets a unit attention condition, such that the next request results in a CHECK CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a device reset. According to their logs and our own, separate analysis, the next command being sent is WRITE FILEMARK, which results in writing an FM at the beginning of the tape, effectively discarding its contents. In fact, once the write fails with EIO, our software attempts to recover by rewinding and repositioning the tape, then resuming operation. If this fails, it attempts to rewind and reposition again, write a marker at the end of the tape, and then unmount. It does not under any circumstances write either data or filemarks without having successfully positioned the tape to a known point. What actually happens is that, since the last operation was a write, the kernel executes an implied MTWEOF operation (which translate to a Write Filemarks command) before the rewind that was actually requested. This seems not entirely unreasonable, provided the tape position is known. However, once this request fails (due to the unit attention condition), our next rewind attempt also triggers an implied MTWEOF, which does _not_ fail (the unit attention condition persists only until the initiator has been notified); this is the command that unexpectedly erases the tape. Our analysis is that the st driver is in fact completely ignoring the UNIT ATTENTION and associated reset notification from the device. This is not a condition that can be detected in the transport or mid-layer, as it occurs entirely within the target and is reported only via the UNIT ATTENTION sense key. The upper driver (i.e. st) needs to detect this indication and reset its internal model of the device to an unknown state. Typical tape stuff that should apply to all tapes Table 1 gives the ASC and ASCQ summary for Sense Key 6 (Unit Attention) in the 3584 Tape Library. SENSE key 6 is UA Table 1. ASC and ASCQ Summary for Sense Key 6 (Unit Attention) ASC ASCQ Description 28 00 Not Ready to Ready Transition, Medium May Have Changed 28 01 Import or Export Element Accessed 29 00 Power On, Reset, or Bus Device Reset Occurred 2A 01 Mode Parameters Changed 3F 01 Microcode Has Been Changed Added scsi_logging Did not do any wireshark trace as its not showing anything extra I am thinking we have not reproduced enough of this to make sense of it. I mean we send a reset and there is data at the BOT until th reset happened so we are not seeing a write-file-mark at BOT like The customer sees -rwxr-xr-x root/root 30472 2021-08-10 00:27 usr/bin/iptc -rwxr-xr-x root/root 16040 2021-08-10 19:41 usr/bin/dltest -rwxr-xr-x root/root 199672 2021-08-10 12:47 usr/bin/patch -rwxr-xr-x root/root 40952 2021-08-10 19:41 usr/bin/isql -rwxr-xr-x root/root 132976 2021-10-18 09:56 usr/bin/pigz -rwxr-xr-x root/root 9926208 2022-10-13 09:46 usr/bin/runc tar: /dev/st0: Cannot read: Input/output error tar: /dev/st0: Cannot read: Input/output error tar: /dev/st0: Cannot read: Input/output error tar: /dev/st0: Cannot read: Input/output error tar: /dev/st0: Cannot read: Input/output error tar: /dev/st0: Cannot read: Input/output error Where is Sense Key 6 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 CDB: Write(6) 0a 00 00 28 00 00 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 CDB: Write(6) 0a 00 00 28 00 00 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 scsi host busy 1 failed 0 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: Notifying upper driver of completion (result 0) Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 20 sectors total, 10240 bytes done. Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 Send: scmd 0x00000000f79ed774 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 CDB: Write(6) 0a 00 00 28 00 00 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 CDB: Write(6) 0a 00 00 28 00 00 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 Sense Key : No Sense [current] Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 Add. Sense: Rewind operation in progress Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 scsi host busy 1 failed 0 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: Notifying upper driver of completion (result 2) Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 20 sectors total, 10240 bytes done. Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write: Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Number of r/w requests 2478, dio used in 2478, pages 7434. Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0. Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 Send: scmd 0x000000000cfe376e Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 CDB: Write Filemarks 10 00 00 00 01 00 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 CDB: Write Filemarks 10 00 00 00 01 00 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 Sense Key : No Sense [current] Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 Add. Sense: Rewind operation in progress Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 scsi host busy 1 failed 0 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: Notifying upper driver of completion (result 2) Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 0 sectors total, 0 bytes done. Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write filemark. Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Rewinding tape. Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 Send: scmd 0x0000000029d3c08e Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 CDB: Rezero Unit/Rewind 01 00 00 00 00 00 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 CDB: Rezero Unit/Rewind 01 00 00 00 00 00 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 Sense Key : No Sense [current] Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 Add. Sense: Rewind operation in progress Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 scsi host busy 1 failed 0 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: Notifying upper driver of completion (result 2) Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 0 sectors total, 0 bytes done. Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0 Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress Thanks Laurence. Here's what I think we've learned. The tape drive we are using is not returning proper sense data. It looks like it isn't returning any kind of Unit Attention at all. However, what is is doing is: rewinding the tape. This is good because that's exactly what the customer is seeing with the AWS gateway: after a reset or power up event: 1) the tape rewinds and 2) a Unit Attention 0x29 is returned. I suspect that the AWS gateway has properly implemented Unit Attentions, according to the SCSI specification - while the tape drive is not. Comments below. (In reply to loberman from comment #33) > Reset by other initiator > > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 > 0 This is a write command... which is what we expect. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense > [current] > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind > operation in progress The write command failed and returned an additional sense code of 00h/1Ah - rewind in progress. The problem is, the Tape drive did not fill in the Key Field in the sense data. So the sense key is 0x0, No Sense. This should have been a Unit Attention sense key 0x6, 29h/03h BUS DEVICE RESET FUNCTION OCCURRED or something. I think that's a bug in the tape drive, but at least the tape is rewinding, so that's what we expect. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write: > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Number of r/w requests > 35913, dio used in 35913, pages 107739. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Async write waits 0, > finished 0. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 > 0 This is very important also. The next command is a Write Filemark. So it look like either the application, or the driver is attempting to write a file make immediately after the error. We need to figure out if this Write Filemark is coming from the application or from the driver. This is what the customer is seeing too, a write filemark after a rewind. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense > [current] > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind > operation in progress The Write Filemark command fails, but that's only because the tape device is still rewinding. The Tape returns the same error: No Sense with an ASC/ASCQ of 00h/1Ah - rewind in progress. One difference between this tape and the AWS gateway is: rewinds are probably instantaneous with AWS. So I'll bet that if this were the AWS gateway, instead of this physical tape which is slowly rewinding, we would have seen the corrupted data. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write filemark. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) > written Next the driver sends a Rewind command - 0x1. Again we need understand if this is the driver, or the application that's doing this. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Rewinding tape. > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0 > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense > [current] > Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind > operation in progress The Rewind command also fails with No Sense and ASC/ASCQ of 00h/1Ah. So the driver is just plain clueless. It has no idea that the tape has been re-positioned already. > > Tape powered off here The power on condition is slightly different because the Tape drive is returning a different error. But the st driver looks like it's behaving the same. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 > 0 Here's the Write command, which we expect. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready > [current] > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Logical unit > communication failure The tape is returning a Not Ready Sense Key (0x2) with an asc/ascq of 08h/00h LOGICAL UNIT COMMUNICATION FAILURE. This is still incorrect. This should have been a Unit Attention sense key 0x6, 29h/00h POWER ON, RESET, OR BUS DEVICE RESET OCCURRED. It's not reporting anything about a rewind or tape position error. We should probably double check and see if the tape has physically rewound when you power cycled. It should have. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write: > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Number of r/w requests > 27982, dio used in 27982, pages 83946. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Async write waits 0, > finished 0. The driver or the application turns around and does the same thing: tries to write a file make. If this is really the driver, it should NOT be doing this. Write Filemarks after an unknown error should not be done. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 > 0 > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready > [current] > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Logical unit > communication failure Again, the tape is going slow, and returning the same Not Ready condition. If this were the AWS gateway, or some other software emulated SCSI Tape target, this command would have succeeded. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write filemark. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) > written And the driver, or something, tries to Rewind the tape again. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Rewinding tape. > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0 > Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready > [current] I think a wireshark trace would be helpful, but the information we have so far has given me a lot to work with. Thanks Excellent, Always happy to try help Engineering when possible I have multiple wireshark traces from before, just not the last one, Let me redo the test with all logging on and do a trace on initiator and target I will summarize it all and report back later I do expect a UA from this tape device for a media error or when media is inserted though. I will check that to make sure generic UA's are sent and maybe just not for the rewind. Regards Laurence (In reply to loberman from comment #41) > Excellent, Always happy to try help Engineering when possible > > I have multiple wireshark traces from before, just not the last one, > Let me redo the test with all logging on and do a trace on initiator and > target. I will summarize it all and report back later No need Laurence, if you can just get me the .pcap files I'd like to look at the trace myself. > I do expect a UA from this tape device for a media error or when media is > inserted though. Yes, this tape drive is definitely NOT returning Unit Attentions when it should. If you can verify that ejecting media returns a unit attention, that would be good. > I will check that to make sure generic UA's are sent and maybe just not for > the rewind. Agreed. Thanks. > Per AWS support, when the gateway crashes in this manner, its notion of the > current tape position is reset to the beginning of the tape. It also sets a > unit attention condition, such that the next request results in a CHECK > CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a > device reset. According to their logs and our own, separate analysis, the > next command being sent is WRITE FILEMARK, which results in writing an FM at > the beginning of the tape, effectively discarding its contents. This is what we are seeing for ANY unknown error. Not just a Unit Attention. This a problem with the st.c driver. It should not be writing file marks after error. I will look at this further and see if I can come up with a patch. I think the customers patch would help with AWS, but I think there is also a fundamental problem here in that the driver should not be writing filemarks following an error... any error. Hi John See comment https://bugzilla.redhat.com/show_bug.cgi?id=1930388#c35 I had already provided pcap and other notes Set needinfo so you see this server, and both clients running 4.18.0-425.13.1.el8.stfix.x86_64 Has the patch [loberman@lobefedora wrk]$ cat linux-kernel-test.patch diff -Nurp linux-4.18.0-425.13.1.el8_7.orig/drivers/scsi/st.c linux-4.18.0-425.13.1.el8_7/drivers/scsi/st.c --- linux-4.18.0-425.13.1.el8_7.orig/drivers/scsi/st.c 2023-02-02 11:42:46.000000000 -0500 +++ linux-4.18.0-425.13.1.el8_7/drivers/scsi/st.c 2023-08-21 11:25:15.677487154 -0400 @@ -419,6 +419,13 @@ static int st_chk_result(struct scsi_tap cmdstatp->sense_hdr.asc == 0 && cmdstatp->sense_hdr.ascq == 0x17) STp->cleaning_req = 1; /* ASC and ASCQ => cleaning requested */ + if (cmdstatp->have_sense && scode == NOT_READY && cmdstatp->sense_hdr.asc == 0x08 && cmdstatp->sense_hdr.ascq == 0) + STp->pos_unknown = 1; /* ASC => Logical Unit Commncation Failure */ + if (cmdstatp->have_sense && cmdstatp->sense_hdr.asc == 0 && cmdstatp->sense_hdr.ascq == 0x1a) + STp->pos_unknown = 1; /* ASCQ => rewind in progress */ + if (cmdstatp->have_sense && cmdstatp->sense_hdr.asc == 0x29) + STp->pos_unknown = 1; /* ASC => power on / reset */ + STp->pos_unknown |= STp->device->was_reset; if (cmdstatp->have_sense && client doing the tar writing to tape Aug 21 13:33:57 rhel8iscsi1 kernel: st 32:0:0:0: Attached scsi tape st0 Aug 21 13:33:57 rhel8iscsi1 kernel: st 32:0:0:0: st0: try direct i/o: yes (alignment 1 B) Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Block limits 1 - 16777215 bytes. Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8 Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Density 58, tape length: 0, drv buffer: 1 Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks). Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error: 8000002, cmd: a 0 0 28 0 0 Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Sense Key : No Sense [current] Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Add. Sense: Rewind operation in progress Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error on write: Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Number of r/w requests 1624, dio used in 1624, pages 4872. Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Rewinding tape. Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error: 8000002, cmd: 1 0 0 0 0 0 Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Sense Key : No Sense [current] Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Add. Sense: Rewind operation in progress .. .. /usr/bin/rview /usr/bin/zcmp /usr/bin/nl-link-stats /usr/bin/dltest tar: /dev/st0: Cannot write: Input/output error tar: /dev/st0: Cannot close: Input/output error tar: Error is not recoverable: exiting now Other client doing the sg_reset while writes are happening from rhel8iscsi1 Aug 21 13:33:45 rhel8iscsi2 kernel: st: Debugging enabled debug_flag = 1 Aug 21 13:33:45 rhel8iscsi2 kernel: st 32:0:0:0: Attached scsi tape st0 Aug 21 13:33:45 rhel8iscsi2 kernel: st 32:0:0:0: st0: try direct i/o: yes (alignment 1 B) Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Block limits 1 - 16777215 bytes. Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8 Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Density 58, tape length: 0, drv buffer: 1 Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks). Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Rewinding tape. [root@rhel8iscsi2 ~]# sg_reset /dev/st0 [root@rhel8iscsi2 ~]# Let me know what you see and think about this Thanks Laurence (In reply to loberman from comment #45) > Aug 21 13:33:57 rhel8iscsi1 kernel: st 32:0:0:0: Attached scsi tape st0 > Aug 21 13:33:57 rhel8iscsi1 kernel: st 32:0:0:0: st0: try direct i/o: yes > (alignment 1 B) > Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Block limits 1 - > 16777215 bytes. > Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Mode sense. Length > 11, medium 0, WBS 10, BLL 8 > Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Density 58, tape > length: 0, drv buffer: 1 > Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Block size: 0, buffer > size: 4096 (1 blocks). > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error: 8000002, cmd: > a 0 0 28 0 0 This is the write command. We expect to see this because the application is writing to the tape. > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Sense Key : No Sense > [current] > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Add. Sense: Rewind > operation in progress And here is the sense data indicating the write command failed because the tape drive is rewinding. This tape device is NOT correctly returning a Unit Attention or any other Sense Key, but our patch anticipates this handles the NOSENSE condition by setting the pos_unknown state in the driver. This should be safe with any tape drive because the result of a bus device reset should always rewind the tape. > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error on write: > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Number of r/w > requests 1624, dio used in 1624, pages 4872. Here's the error sent to the application. This is expected. > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Rewinding tape. > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error: 8000002, cmd: > 1 0 0 0 0 0 This is a rewind command. The driver responds by sending a rewind command to the tape. > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Sense Key : No Sense > [current] > Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Add. Sense: Rewind > operation in progress OK. This is good. We see the tape drive fails the rewind command with the same No Sense/Rewind status. The important part is, setting the pos_unknown state in the driver does, in fact, prevent the Write Filemarks command. We see no interposing Write File Marks command, and this is what we want. > .. > .. > /usr/bin/rview > /usr/bin/zcmp > /usr/bin/nl-link-stats > /usr/bin/dltest > tar: /dev/st0: Cannot write: Input/output error > tar: /dev/st0: Cannot close: Input/output error > tar: Error is not recoverable: exiting now The application fails, but the data on the tape should be intact up to the last valid filemark. So at this point, some application level recover should be possible by opening the tape device and seaching for the last valie filemark. This is what I wanted to see. Excellent Thanks for all your work on this. I guess an upstream submission by you then may follow. Regards Laurence Oberman (In reply to loberman from comment #47) > Excellent > I guess an upstream submission by you then may follow. Yes, I will put together a patch to send up stream. It will basically be the patch proposed by the customer. At this point I'm convinced the proposed patch, which catches the Unit Attention/Reset condition and sets pos_unknown in the driver is safe. However, we need the customer to be aware that this may not completely solve their problem. As mentioned above, when the AWS gateway experiences a crash and then restarts, I expect the tape backup application running on the client will see an error. The data on the tape should still be valid up to the most recently written filemark, but there may be no valid EOM/filemark at the end of the tape. It will take some addition work by the application to find the end of any valid data on the tape device and recover before restarting any backup that was in progress. So, we can push the patch upstream and take the fix into RHEL, but it is still TBD if the customers problem will be completely solved. I can email the customer and let him know the situation. However I think we must add to the upstream submission Suggested-by: Jeffrey Hutzelman <jhutz) The patch that addressed this issue in the st driver has been accepted upstream. https://lore.kernel.org/linux-scsi/20230822181413.1210647-1-jmeneghi@redhat.com/ Targeting this fix for RHEL 8.10 Marco, this is for RHEL 8.10. Please set qa_ack+. Laurence, we as no real way to test this. Please see if the customer can test this build for us. I'm setting OtherQA on this BZ. Seems customer is on 7, not 8 yest but I will reach out and see if Jeff can do anything to test this. We tested the functionality but we dot have access to these Amazon VTL tape servers. I attached a RHEL7.9 test kernel to the customer case, and asked if the customer had time to test it. Thanks Laurence I'm moving this but to jira. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3138 |