Bug 1930388

Summary:

[RHEL8] st driver unit attention behavior over iSCSI

Product:

Red Hat Enterprise Linux 8

Reporter:

Alex Wang <alex.wang>

Component:

kernel

Assignee:

John Meneghini <jmeneghi>

kernel sub component:

Storage Drivers

QA Contact:

Martin Hoyer <mhoyer>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

high

Priority:

high

CC:

ccalhoun, cleech, jmagrini, jmeneghi, jpittman, loberman, mgandhi, mhoyer, mpatalan, revers

Version:

8.3

Keywords:

MigratedToJIRA, Reopened, Triaged

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

kernel-4.18.0-553.el8_10

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

2210024 (view as bug list)

Environment:

Last Closed:

2024-05-22 09:48:46 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

2210024

Attachments:

Description	Flags
Wireshark Expert Summary	none
Wireshark command summary	none
Wireshark SCSI Unit Attention	none

Comment 13 loberman 2021-12-14 19:39:06 UTC

Chris

I managed to get this working with ESOS and a physical tape drive.

I will work on this now, may need help though if I get stuck.
We can get protocol traces using tcpdump

[7:0:0:0]    tape    HP       Ultrium 5-SCSI   Z64D  /dev/st0

target: iqn.2003-01.org.linux-iscsi.trace.x8664:sn.a59f99ea082b
	Portal: 192.168.1.107:3260,1
		Iface Name: default
Target: iqn.2021-12.esos.localhost:89136
	Portal: 192.168.1.178:3260,1
		Iface Name: default

[root@loberhel ~]# iscsiadm -m session -P3
iSCSI Transport Class version 2.0-870
version 6.2.0.874-19
Target: iqn.2021-12.esos.localhost:89136 (non-flash)
	Current Portal: 192.168.1.178:3260,1
	Persistent Portal: 192.168.1.178:3260,1
		**********
		Interface:
		**********
		Iface Name: default
		Iface Transport: tcp
		Iface Initiatorname: iqn.1994-05.com.redhat:29d829ee4a97
		Iface IPaddress: 192.168.1.214
		Iface HWaddress: <empty>
		Iface Netdev: <empty>
		SID: 3
		iSCSI Connection State: LOGGED IN
		iSCSI Session State: LOGGED_IN
		Internal iscsid Session State: NO CHANGE
		*********
		Timeouts:
		*********
		Recovery Timeout: 120
		Target Reset Timeout: 30
		LUN Reset Timeout: 30
		Abort Timeout: 15
		*****
		CHAP:
		*****
		username: <empty>
		password: ********
		username_in: <empty>
		password_in: ********
		************************
		Negotiated iSCSI params:
		************************
		HeaderDigest: None
		DataDigest: None
		MaxRecvDataSegmentLength: 262144
		MaxXmitDataSegmentLength: 1048576
		FirstBurstLength: 65536
		MaxBurstLength: 1048576
		ImmediateData: Yes
		InitialR2T: No
		MaxOutstandingR2T: 1
		************************
		Attached SCSI devices:
		************************
		Host Number: 7	State: running
		scsi7 Channel 00 Id 0 Lun: 0

Comment 14 loberman 2021-12-15 17:15:55 UTC

Customers current workaround

(*) It turns out that an otherwise-innocuous MTIOCTOP with opcode MTTELL, while not actually supported by st, will reset STps->rw to ST_IDLE, preventing the automatic filemark WEOF that leads to the problem. This itself is probably a bug, but we see no shame in exploiting it to get the behavior we need.

Comment 17 loberman 2022-08-09 15:00:59 UTC

I am trying to help now 

I have a device ready now served by another server connected to a Ultrium tape drive

[  388.884193] iscsi: registered transport (tcp)
[  465.744253] scsi host15: iSCSI Initiator over TCP/IP
[  494.465017] scsi host15: iSCSI Initiator over TCP/IP
[  494.504173] scsi 15:0:0:0: Sequential-Access HP       Ultrium 5-SCSI   Z64D PQ: 0 ANSI: 6
[  494.546018] scsi 15:0:0:0: alua: disable for non-disk devices
[  494.573621] scsi 15:0:0:0: Attached scsi generic sg5 type 1
[  494.601449] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[  494.632130] st 15:0:0:0: Attached scsi tape st0
[  494.653792] st 15:0:0:0: st0: try direct i/o: yes (alignment 1 B)

[15:0:0:0]   tape    HP       Ultrium 5-SCSI   Z64D  /dev/st0   /dev/sg5

Comment 18 loberman 2022-08-09 15:14:30 UTC

Reproducer

Start a backup to the iscsi tape
Down the network interface on the target server and wait for the initiator to time out

ifconfig eno1 down

/usr/lib/debug/usr/lib/modules/4.18.0-305.30.1.el8_4.x86_64/kernel/drivers/mfd/
/usr/lib/debug/usr/lib/modules/4.18.0-305.30.1.el8_4.x86_64/kernel/drivers/mfd/intel-lpss-acpi.ko.debug
/usr/lib/debug/usr/lib/modules/4.18.0-305.30.1.el8_4.x86_64/kernel/drivers/mfd/intel-lpss-pci.ko.debug
tar: /dev/st0: Cannot write: Input/output error

tar: Error is not recoverable: exiting now

ifconfig eno1 up

I hear the tape rewind

List the contents and they are there so in this case I never got the issue reproduced
This issue "our next rewind attempt also triggers an implied MTWEOF, which does _not_ fail"

[root@ml150 ~]# tar tvf /dev/st0
drwxr-xr-x root/root         0 2021-11-11 16:04 usr/
dr-xr-xr-x root/root         0 2022-07-27 12:27 usr/bin/
-rwxr-xr-x root/root     71504 2019-11-19 09:21 usr/bin/diff3
-rwxr-xr-x root/root      3283 2022-06-08 15:13 usr/bin/catchsegv
-rwxr-xr-x root/root     54768 2019-11-19 09:21 usr/bin/sdiff
-rwxr-xr-x root/root     25424 2022-06-08 15:25 usr/bin/gencat
-rwxr-xr-x root/root    163416 2021-09-23 08:56 usr/bin/cpio


From the BZ

Per AWS support, when the gateway crashes in this manner, its notion of the current tape position is reset to the beginning of the tape. It also sets a unit attention condition, such that the next request results in a CHECK CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a device reset. According to their logs and our own, separate analysis, the next command being sent is WRITE FILEMARK, which results in writing an FM at the beginning of the tape, effectively discarding its contents.

In fact, once the write fails with EIO, our software attempts to recover by rewinding and repositioning the tape, then resuming operation. If this fails, it attempts to rewind and reposition again, write a marker at the end of the tape, and then unmount. It does not under any circumstances write either data or filemarks without having successfully positioned the tape to a known point.

What actually happens is that, since the last operation was a write, the kernel executes an implied MTWEOF operation (which translate to a Write Filemarks command) before the rewind that was actually requested. This seems not entirely unreasonable, provided the tape position is known.  However, once this request fails (due to the unit attention condition), our next rewind attempt also triggers an implied MTWEOF, which does _not_ fail (the unit attention condition persists only until the initiator has been notified); this is the command that unexpectedly erases the tape.

I will keep at this but it will probably need code inspection review etc. to figure out unless we can get a reproducer from the customer to use.

Its been so long, I am not keen to reach back out the customer for a reproducer.

Regards
Laurence

Comment 19 loberman 2022-08-09 20:55:52 UTC

Multiple attempts using tar to reproduce this have not worked

The last attempt included stopping and then restarting the target daemon on the target server but still never resulted in the MTWEOF being sent such that the beginning of the tape was compromised.

We will need to be doing what their backup software was doing to reproduce this I think. 

[root@dl385g10 ~]# systemctl stop target

/usr/bin/mysql_waitpid
/usr/bin/mysqlaccess
/usr/bin/mysqladmin
/usr/bin/mysqlbinlog
/usr/bin/mysqlcheck
/usr/bin/mysqldump
tar: /dev/nst0: Cannot write: Input/output error
tar: Error is not recoverable: exiting now

After waiting a while and restarting the target daemon (iscsit)

tar: Error is not recoverable: exiting now

Then list the tape, I hear it rewind, we are at BOT and data is still intact

[root@loberman ~]# tar tvf /dev/st0
drwxr-xr-x root/root         0 2021-06-28 14:41 usr/
drwxr-xr-x root/root         0 2022-05-13 09:11 usr/bin/
-rwxr-xr-x root/root      2161 2016-03-04 07:56 usr/bin/xzmore
-rwxr-xr-x root/root      3336 2021-03-26 08:45 usr/bin/catchsegv
-rwxr-xr-x root/root    199288 2018-05-03 11:50 usr/bin/find
-rwxr-xr-x root/root    154808 2014-08-15 00:05 usr/bin/m4
-rwxr-xr-x root/root     22640 2021-03-26 09:15 usr/bin/gencat

Comment 20 John Meneghini 2022-08-10 03:07:18 UTC

(In reply to loberman from comment #18)
> 
> Per AWS support, when the gateway crashes in this manner, its notion of the
> current tape position is reset to the beginning of the tape. It also sets a
> unit attention condition, such that the next request results in a CHECK
> CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a
> device reset. According to their logs and our own, separate analysis, the
> next command being sent is WRITE FILEMARK, which results in writing an FM at
> the beginning of the tape, effectively discarding its contents.

Yes, agreed.

I've looked at the provided .pcap trace with Wireshark.  The problem is a number of Malformed Packet
errors led to a TCP RST.  This had the effect of an iSCSI logout, which is a nexus clearing event. 
This resulted in SCSI Target device doing in internal LUN reset.  There is no way the SCSI Target
can be expected to preserve any internal state following the LUN reset. 

After the host logs in again the first unit attention advises the initiator that the device has
been reset and the tape should be re-positioned.  There is no way to continue at the same position
following a reset.  This has nothing to do with iSCSI, the SCSI protocols dictates that the tape must
be re-positioned following a reset.
   
> In fact, once the write fails with EIO, our software attempts to recover by
> rewinding and repositioning the tape, then resuming operation. If this
> fails, it attempts to rewind and reposition again, write a marker at the end
> of the tape, and then unmount. It does not under any circumstances write
> either data or filemarks without having successfully positioned the tape to
> a known point.

Yes, this is what it should do.

> What actually happens is that, since the last operation was a write, the
> kernel executes an implied MTWEOF operation (which translate to a Write
> Filemarks command) before the rewind that was actually requested. This seems
> not entirely unreasonable, provided the tape position is known.  However,
> once this request fails (due to the unit attention condition), our next
> rewind attempt also triggers an implied MTWEOF, which does _not_ fail (the
> unit attention condition persists only until the initiator has been
> notified); this is the command that unexpectedly erases the tape.

Following a reset many tape devices will immediately rewind.  So writing a filemark without knowing
the position of the tape is an error.  The initiator should first rewind and then forward to the end of
the tape to find the last valid file mark.  I agree that these implied MTWEOF operations coming from
the kernel are probably the cause of the unexpected erase.
  
> I will keep at this but it will probably need code inspection review etc. to
> figure out unless we can get a reproducer from the customer to use.
> 
> Its been so long, I am not keen to reach back out the customer for a
> reproducer.

Agreed. If you have the ability to reproduce this, we should probably work on the MTWEOF code. These implied
write filemarks operations are probably the source of the problem.

See the attached Wireshark images.

Comment 21 John Meneghini 2022-08-10 03:10:27 UTC

Created attachment 1904603 [details]
Wireshark Expert Summary

Comment 22 John Meneghini 2022-08-10 03:11:21 UTC

Created attachment 1904604 [details]
Wireshark command summary

Comment 23 John Meneghini 2022-08-10 03:12:32 UTC

Created attachment 1904605 [details]
Wireshark SCSI Unit Attention

Comment 24 loberman 2022-08-10 13:13:59 UTC

Yep, its not iscsi, the customer specifically points out behavior in the st driver.

The WireShark has the customer comments you can click on to see each event they speak of.
Whatever is creating the MTWEOF as an un-flushed pending command to the tape does not happen with the test setup I have.

I am looking writing a test program that directly communicated with the tape device and issues commands in the order the customer describes.

This may be a combination of behavior in that AWS VTL and the customers backup (AFS) application.

Regards
Laurence

Comment 25 loberman 2022-08-25 16:11:39 UTC

Customer Update
--------------------

I'm sorry; I don't have a test program per se. I'm attaching the relevant portions of the backup software, which may be helpful in following my earlier description of the problem and understanding the sequence of operations. The bulk of the work happens in tapeappend(), dumptotape(), and the support functions in tapedrive.c. You may also be interested in the sections protected by #ifdef MTTELL, which are the workaround we applied to avoid the problem.

Taking down the network interface on the target server is not going to trigger this problem. That results in a timeout generated in the iscsi stack on the initiator, but doesn't actually change the state of the target. In order to trigger the problem, you need the following sequence of events:
1) iSCSI communication is interrupted, e.g. because the target process crashed
2) the target resets its notion of the tape position to the beginning of the tape
3) the target responds to the next command with CHECK CONDITION / UNIT ATTENTION / device reset

In the test case you describe, all you are doing is causing (1) by temporarily taking down the network interface. Since the target itself is still fully operational, the system recovers (albeit after generating EIO) once network connectivity is restored.

If I understand correctly, your test case involves a software iscsi target fronting for a physical tape drive. In that case, even if the target process crashes, it seems unlikely that the drive is actually rewound, so the unwanted EOF (if written) will not be at the beginning of the tape. Our targets are virtual tape drives; the state is presumably stored entirely in memory and certainly not in hardware that doesn't magically rewind when the process crashes.

Item (3) is also essential, because the bug is that the st driver is ignoring the UNIT ATTENTION condition entirely, and therefore does not realize that its model of the state of the tape drive may no longer be correct. If, after a crash or similar situation, the target does _not_ respond to the next command with a UNIT ATTENTION condition, then there is no way for the initiator to know anything is wrong. That would be a (probable) bug in the target, but it's not what we're seeing.

Our test rig involved an AWS virtual tape library using their Storage Gateway product. The library is an on-premises VM which communicates with the AWS-hosted storage from behind a firewall, via an HTTP proxy we control. Testing consisted of running a copy of our software against one of those drives, with the proxy configured to inject failures into the VTL appliance's connections to AWS (_not_ into the iSCSI connection). This eventually causes the iSCSI target process to fail in the manner described in the original ticket, thereby reproducing the problem.

I did try restarting the target to see if that made a difference but I don't have one of these Amazon VTL's.

I updated the customer as follows:

Hello Jeff

Many thanks for the details.
I did restart the iscsi target service as well to try reproduce, i.e. not interrupt the network.
I think its the fact that have a physical-backed tape drive perhaps that is preventing a reproducer.
I do have a way to use mhvtl but I suppose what we really need is to get access to AWS and a virt guest with the AWS type tape devices.

I will keep you updated, but I expect engineering will deal with this via code review rather than a reproducer and then provide a test you can run yourself.

Sincerely
Laurence Oberman

My update to them was in response to their detailed update. I will attach the two c code files they attached to the case for review.

Comment 28 John Pittman 2023-03-14 23:04:30 UTC

Chris, can we at least get an explanation on the close here?  The customer put in a lot of effort on this and deserves an explanation.

Comment 32 loberman 2023-07-26 14:36:34 UTC

Hello John

I have this setup 
Client (iscsi initiator Virtual machine) is accessing a tape on the ISCSI server

Linux rhel9.1 5.14.0-284.18.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Wed May 31 10:39:18 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
[root@rhel9 ~]# lsscsi -g
[33:0:0:0]   tape    HP       Ultrium 5-SCSI   Z64D  /dev/st0   /dev/sg1 

The tape is connected via SAS to a ML150G9 and shared with ISCSI over 1Gbit Ethernet

root@dl385 ~]# targetcli ls
o- / ..................................................................... [...]
  o- backstores .......................................................... [...]
  | o- block .............................................. [Storage Objects: 0]
  | o- fileio ............................................. [Storage Objects: 0]
  | o- pscsi .............................................. [Storage Objects: 1]
  | | o- tape0 ............................................ [/dev/st0 activated]
  | |   o- alua ............................................... [ALUA Groups: 0]
  | o- ramdisk ............................................ [Storage Objects: 0]
  o- iscsi ........................................................ [Targets: 1]
  | o- iqn.2003-01.org.linux-iscsi.dl385.x8664:sn.f4fb265e42ba ....... [TPGs: 1]
  |   o- tpg1 ........................................... [no-gen-acls, no-auth]
  |     o- acls ...................................................... [ACLs: 1]
  |     | o- iqn.1994-05.com.redhat:9aab1462433f .............. [Mapped LUNs: 1]
  |     |   o- mapped_lun0 ............................. [lun0 pscsi/tape0 (rw)]
  |     o- luns ...................................................... [LUNs: 1]
  |     | o- lun0 .............................. [pscsi/tape0 (/dev/st0) (None)]
  |     o- portals ................................................ [Portals: 1]
  |       o- 0.0.0.0:3260 ................................................. [OK]
  o- loopback ..................................................... [Targets: 0]
  o- srpt ......................................................... [Targets: 0]

It is here at my house so what would you like to capture etc.

We can install an upstream kernel if needed.

Regards
Laurence

Comment 33 loberman 2023-07-27 19:32:10 UTC

Rest by other initiator

Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write:
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Number of r/w requests 35913, dio used in 35913, pages 107739.
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0.
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write filemark.
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Rewinding tape.
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress



Tape powered off here

Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready [current] 
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Logical unit communication failure
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write:
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Number of r/w requests 27982, dio used in 27982, pages 83946.
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0.
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready [current] 
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Logical unit communication failure
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write filemark.
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Rewinding tape.
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0
Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready [current]

Comment 34 loberman 2023-07-27 19:39:31 UTC

Next steps
capture a wireshark trace on the initiator where it fails and match the trace to the logging.

Comment 35 loberman 2023-07-27 20:15:21 UTC

wireshark/tcpdump of this sequence

Jul 27 15:58:51 iscsi2rhel9 kernel: device ens160 entered promiscuous mode
Jul 27 15:59:12 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Block limits 1 - 16777215 bytes.
Jul 27 15:59:12 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
Jul 27 15:59:12 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Density 58, tape length: 0, drv buffer: 1
Jul 27 15:59:12 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write:
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Number of r/w requests 6045, dio used in 6045, pages 18135.
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0.
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write filemark.
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Rewinding tape.
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 15:59:31 iscsi2rhel9 kernel: device ens160 left promiscuous mode


Opcode 0x10 is Synchronize buffer in below trace

See attachments
screenshots.tar.xz has the screenshots from WireShark taken on the initiator doing the writes

I will add a people page link for the pcap file ens160.pcap
https://people.redhat.com/loberman/customer/.1930388/

Comment 36 loberman 2023-07-27 20:24:15 UTC

Second attempt no wireshark trace
Resetting device using sg_reset from other initiator

Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write:
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Number of r/w requests 104974, dio used in 104974, pages 314922.
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0.
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write filemark.
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Rewinding tape.
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 16:22:48 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress

Comment 37 loberman 2023-07-27 21:13:55 UTC

Repeating the original description to tie it in with what we are seeing


Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write:
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Number of r/w requests 6045, dio used in 6045, pages 18135.
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0.
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write filemark.                               ***** Failed here
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Rewinding tape.
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 15:59:20 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress


31890 15:59:20.574462 192.168.1.146 → 192.168.1.143 iSCSI 10306 SCSI: Write(6) LUN: 0x00 (LBA: 0x000028, Len: 0)SCSI: Data Out LUN: 0x00 (Write(6) Request Data) 
31893 15:59:20.575448 192.168.1.143 → 192.168.1.146 iSCSI 114 SCSI: Response LUN: 0x00 (Write(6)) (Good)
31895 15:59:20.575711 192.168.1.146 → 192.168.1.143 iSCSI 10306 SCSI: Write(6) LUN: 0x00 (LBA: 0x000028, Len: 0)SCSI: Data Out LUN: 0x00 (Write(6) Request Data) 
31898 15:59:20.576752 192.168.1.143 → 192.168.1.146 iSCSI 114 SCSI: Response LUN: 0x00 (Write(6)) (Good)
31900 15:59:20.577061 192.168.1.146 → 192.168.1.143 iSCSI 10306 SCSI: Write(6) LUN: 0x00 (LBA: 0x000028, Len: 0)SCSI: Data Out LUN: 0x00 (Write(6) Request Data) 

reset from other initiator here

31903 15:59:20.579732 192.168.1.143 → 192.168.1.146 iSCSI 214 SCSI Response (Check Condition) LUN:0x00 
31904 15:59:20.579947 192.168.1.146 → 192.168.1.143 iSCSI 114 SCSI Command: 0x10 LUN:0x00                       ***  10                 SYNCHRONIZE BUFFER 
31907 15:59:20.581635 192.168.1.143 → 192.168.1.146 iSCSI 214 SCSI Response (Check Condition) LUN:0x00 
31908 15:59:20.581734 192.168.1.146 → 192.168.1.143 iSCSI 114 SCSI Command: 0x01 LUN:0x00                       *** 01    M            REWIND
31909 15:59:20.583450 192.168.1.143 → 192.168.1.146 iSCSI 214 SCSI Response (Check Condition) LUN:0x00 
31926 15:59:25.639608 192.168.1.146 → 192.168.1.143 iSCSI 114 NOP Out
31927 15:59:25.641052 192.168.1.143 → 192.168.1.146 iSCSI 114 NOP In

I dont see a Unit attention in the trace, just the check condition, going to put SCSI debug on and test again tomorrow


Per AWS support, when the gateway crashes in this manner, its notion of the current tape position is reset to the beginning of the tape. It also sets a unit attention condition, such that the next request results in a CHECK CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a device reset. According to their logs and our own, separate analysis, the next command being sent is WRITE FILEMARK, which results in writing an FM at the beginning of the tape, effectively discarding its contents.

In fact, once the write fails with EIO, our software attempts to recover by rewinding and repositioning the tape, then resuming operation. If this fails, it attempts to rewind and reposition again, write a marker at the end of the tape, and then unmount. It does not under any circumstances write either data or filemarks without having successfully positioned the tape to a known point.

What actually happens is that, since the last operation was a write, the kernel executes an implied MTWEOF operation (which translate to a Write Filemarks command) before the rewind that was actually requested. This seems not entirely unreasonable, provided the tape position is known.  However, once this request fails (due to the unit attention condition), our next rewind attempt also triggers an implied MTWEOF, which does _not_ fail (the unit attention condition persists only until the initiator has been notified); this is the command that unexpectedly erases the tape.

Our analysis is that the st driver is in fact completely ignoring the UNIT ATTENTION and associated reset notification from the device. This is not a condition that can be detected in the transport or mid-layer, as it occurs entirely within the target and is reported only via the UNIT ATTENTION sense key. The upper driver (i.e. st) needs to detect this indication and reset its internal model of the device to an unknown state.

Comment 38 loberman 2023-07-27 21:21:57 UTC

Typical tape stuff that should apply to all tapes

Table 1 gives the ASC and ASCQ summary for Sense Key 6 (Unit Attention) in the 3584 Tape Library.

SENSE key 6 is UA

Table 1. ASC and ASCQ Summary for Sense Key 6 (Unit Attention)
ASC ASCQ
Description
28 00	Not Ready to Ready Transition, Medium May Have Changed
28 01	Import or Export Element Accessed
29 00	Power On, Reset, or Bus Device Reset Occurred
2A 01	Mode Parameters Changed
3F 01	Microcode Has Been Changed

Comment 39 loberman 2023-07-27 21:48:34 UTC

Added scsi_logging

Did not do any wireshark trace as its not showing anything extra
I am thinking we have not reproduced enough of this to make sense of it.

I mean we send a reset and there is data at the BOT until th reset happened
so we are not seeing a write-file-mark at BOT like The customer sees

-rwxr-xr-x root/root     30472 2021-08-10 00:27 usr/bin/iptc
-rwxr-xr-x root/root     16040 2021-08-10 19:41 usr/bin/dltest
-rwxr-xr-x root/root    199672 2021-08-10 12:47 usr/bin/patch
-rwxr-xr-x root/root     40952 2021-08-10 19:41 usr/bin/isql
-rwxr-xr-x root/root    132976 2021-10-18 09:56 usr/bin/pigz
-rwxr-xr-x root/root   9926208 2022-10-13 09:46 usr/bin/runc
tar: /dev/st0: Cannot read: Input/output error
tar: /dev/st0: Cannot read: Input/output error
tar: /dev/st0: Cannot read: Input/output error
tar: /dev/st0: Cannot read: Input/output error
tar: /dev/st0: Cannot read: Input/output error
tar: /dev/st0: Cannot read: Input/output error


Where is Sense Key 6

Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 CDB: Write(6) 0a 00 00 28 00 00
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 CDB: Write(6) 0a 00 00 28 00 00
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 scsi host busy 1 failed 0
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: Notifying upper driver of completion (result 0)
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#111 20 sectors total, 10240 bytes done.
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 Send: scmd 0x00000000f79ed774
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 CDB: Write(6) 0a 00 00 28 00 00
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 CDB: Write(6) 0a 00 00 28 00 00

Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 Sense Key : No Sense [current] 
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 Add. Sense: Rewind operation in progress
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 scsi host busy 1 failed 0
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: Notifying upper driver of completion (result 2)
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#96 20 sectors total, 10240 bytes done.

Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0 0
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write:
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Number of r/w requests 2478, dio used in 2478, pages 7434.
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Async write waits 0, finished 0.
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 Send: scmd 0x000000000cfe376e
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 CDB: Write Filemarks 10 00 00 00 01 00
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 CDB: Write Filemarks 10 00 00 00 01 00
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 Sense Key : No Sense [current] 
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 Add. Sense: Rewind operation in progress
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 scsi host busy 1 failed 0
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: Notifying upper driver of completion (result 2)
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#97 0 sectors total, 0 bytes done.
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1 0
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error on write filemark.
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s) written
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Rewinding tape.
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 Send: scmd 0x0000000029d3c08e
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 CDB: Rezero Unit/Rewind 01 00 00 00 00 00
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 CDB: Rezero Unit/Rewind 01 00 00 00 00 00
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 Sense Key : No Sense [current] 
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 Add. Sense: Rewind operation in progress
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 scsi host busy 1 failed 0
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: Notifying upper driver of completion (result 2)
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: tag#98 0 sectors total, 0 bytes done.
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Sense Key : No Sense [current] 
Jul 27 17:42:39 iscsi2rhel9 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind operation in progress

Comment 40 John Meneghini 2023-07-28 16:04:24 UTC

Thanks Laurence.  Here's what I think we've learned.

The tape drive we are using is not returning proper sense data.  It looks like it isn't returning any kind of Unit Attention at all.  However, what is is doing is: rewinding the tape.  This is good because that's exactly what the customer is seeing with the AWS gateway: after a reset or power up event: 1) the tape rewinds and 2) a Unit Attention 0x29 is returned.  I suspect that the AWS gateway has properly implemented Unit Attentions, according to the SCSI specification - while the tape drive is not.

Comments below.
 
(In reply to loberman from comment #33)
> Reset by other initiator
> 
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0
> 0

This is a write command... which is what we expect.

> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense
> [current] 
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind
> operation in progress


The write command failed and returned an additional sense code of 00h/1Ah - rewind in progress.  The problem is, the Tape drive did not fill in the Key Field in the sense data.  So the sense key is 0x0, No Sense.  This should have been a Unit Attention sense key 0x6, 29h/03h  BUS DEVICE RESET FUNCTION OCCURRED or something. I think that's a bug in the tape drive, but at least the tape is rewinding, so that's what we expect. 

> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write:
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Number of r/w requests
> 35913, dio used in 35913, pages 107739.
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Async write waits 0,
> finished 0.
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1
> 0

This is very important also.  The next command is a Write Filemark.  So it look like either the application, or the driver is attempting to write a file make immediately after the error.  We need to figure out if this Write Filemark is coming from the application or from the driver. This is what the customer is seeing too, a write filemark after a rewind.

> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense
> [current] 
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind
> operation in progress

The Write Filemark command fails, but that's only because the tape device is still rewinding.  The Tape returns the same error: No Sense with an ASC/ASCQ of 00h/1Ah - rewind in progress.  One difference between this tape and the AWS gateway is: rewinds are probably instantaneous with AWS. So I'll bet that if this were the AWS gateway, instead of this physical tape which is slowly rewinding, we would have seen the corrupted data. 
 
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write filemark.
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s)
> written

Next the driver sends a Rewind command - 0x1. Again we need understand if this is the driver, or the application that's doing this.

> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Rewinding tape.
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : No Sense
> [current] 
> Jul 27 15:18:29 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Rewind
> operation in progress

The Rewind command also fails with No Sense and ASC/ASCQ of 00h/1Ah.  So the driver is just plain clueless. It has no idea that the tape has been re-positioned already.
 
> 
> Tape powered off here

The power on condition is slightly different because the Tape drive is returning a different error.  But the st driver looks like it's behaving the same. 

> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: a 0 0 28 0
> 0

Here's the Write command, which we expect.

> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready
> [current] 
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Logical unit
> communication failure

The tape is returning a Not Ready Sense Key (0x2) with an asc/ascq of 08h/00h LOGICAL UNIT COMMUNICATION FAILURE.  This is still incorrect. This should have been a Unit Attention sense key 0x6, 29h/00h  POWER ON, RESET, OR BUS DEVICE RESET OCCURRED.

It's not reporting anything about a rewind or tape position error.  We should probably double check and see if the tape has physically rewound when you power cycled.  It should have. 

> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write:
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Number of r/w requests
> 27982, dio used in 27982, pages 83946.
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Async write waits 0,
> finished 0.

The driver or the application turns around and does the same thing: tries to write a file make. If this is really the driver, it should NOT be doing this.  Write Filemarks after an unknown error should not be done.
  
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 10 0 0 0 1
> 0
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready
> [current] 
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Add. Sense: Logical unit
> communication failure

Again, the tape is going slow, and returning the same Not Ready condition.  If this were the AWS gateway, or some other software emulated SCSI Tape target, this command would have succeeded. 

> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error on write filemark.
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Buffer flushed, 1 EOF(s)
> written

And the driver, or something, tries to Rewind the tape again.

> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Rewinding tape.
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Error: 2, cmd: 1 0 0 0 0 0
> Jul 27 15:21:48 rhel9.1 kernel: st 33:0:0:0: [st0] Sense Key : Not Ready
> [current]

I think a wireshark trace would be helpful, but the information we have so far has given me a lot to work with.

Thanks

Comment 41 loberman 2023-07-28 16:09:07 UTC

Excellent, Always happy to try help Engineering when possible

I have multiple wireshark traces from before, just not the last one,
Let me redo the test with all logging on and do a trace on initiator and target 
I will summarize it all and report back later

I do expect a UA from this tape device for a media error or when media is inserted though.
I will check that to make sure generic UA's are sent and maybe just not for the rewind.

Regards
Laurence

Comment 42 John Meneghini 2023-07-28 16:18:45 UTC

(In reply to loberman from comment #41)
> Excellent, Always happy to try help Engineering when possible
> 
> I have multiple wireshark traces from before, just not the last one,
> Let me redo the test with all logging on and do a trace on initiator and
> target. I will summarize it all and report back later

No need Laurence, if you can just get me the .pcap files I'd like to look at the trace myself.

> I do expect a UA from this tape device for a media error or when media is
> inserted though.

Yes, this tape drive is definitely NOT returning Unit Attentions when it should.  If you can verify that ejecting media returns a unit attention, that would be good.
 
> I will check that to make sure generic UA's are sent and maybe just not for
> the rewind.

Agreed. Thanks.

> Per AWS support, when the gateway crashes in this manner, its notion of the
> current tape position is reset to the beginning of the tape. It also sets a
> unit attention condition, such that the next request results in a CHECK
> CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a
> device reset. According to their logs and our own, separate analysis, the
> next command being sent is WRITE FILEMARK, which results in writing an FM at
> the beginning of the tape, effectively discarding its contents.

This is what we are seeing for ANY unknown error.  Not just a Unit Attention.  
This a problem with the st.c driver.  It should not be writing file marks after error.

I will look at this further and see if I can come up with a patch. I think the customers patch would help 
with AWS, but I think there is also a fundamental problem here in that the driver should not be writing filemarks 
following an error... any error.

Comment 43 loberman 2023-07-28 17:51:41 UTC

Hi John

See comment https://bugzilla.redhat.com/show_bug.cgi?id=1930388#c35
I had already provided pcap and other notes

Set needinfo so you see this

Comment 45 loberman 2023-08-21 17:38:18 UTC

server, and both clients running
4.18.0-425.13.1.el8.stfix.x86_64

Has the patch
[loberman@lobefedora wrk]$ cat linux-kernel-test.patch 
diff -Nurp linux-4.18.0-425.13.1.el8_7.orig/drivers/scsi/st.c linux-4.18.0-425.13.1.el8_7/drivers/scsi/st.c
--- linux-4.18.0-425.13.1.el8_7.orig/drivers/scsi/st.c	2023-02-02 11:42:46.000000000 -0500
+++ linux-4.18.0-425.13.1.el8_7/drivers/scsi/st.c	2023-08-21 11:25:15.677487154 -0400
@@ -419,6 +419,13 @@ static int st_chk_result(struct scsi_tap
 	    cmdstatp->sense_hdr.asc == 0 && cmdstatp->sense_hdr.ascq == 0x17)
 		STp->cleaning_req = 1; /* ASC and ASCQ => cleaning requested */
 
+     if (cmdstatp->have_sense && scode == NOT_READY && cmdstatp->sense_hdr.asc == 0x08 && cmdstatp->sense_hdr.ascq == 0)
+ 		STp->pos_unknown = 1; /* ASC => Logical Unit Commncation Failure */
+     if (cmdstatp->have_sense && cmdstatp->sense_hdr.asc == 0 && cmdstatp->sense_hdr.ascq == 0x1a)
+ 		STp->pos_unknown = 1; /* ASCQ => rewind in progress */
+     if (cmdstatp->have_sense && cmdstatp->sense_hdr.asc == 0x29)
+ 		STp->pos_unknown = 1; /* ASC => power on / reset */
+
 	STp->pos_unknown |= STp->device->was_reset;
 
 	if (cmdstatp->have_sense &&



client doing the tar writing to tape



Aug 21 13:33:57 rhel8iscsi1 kernel: st 32:0:0:0: Attached scsi tape st0
Aug 21 13:33:57 rhel8iscsi1 kernel: st 32:0:0:0: st0: try direct i/o: yes (alignment 1 B)
Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Block limits 1 - 16777215 bytes.
Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Density 58, tape length: 0, drv buffer: 1
Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error: 8000002, cmd: a 0 0 28 0 0
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Sense Key : No Sense [current] 
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Add. Sense: Rewind operation in progress
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error on write:
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Number of r/w requests 1624, dio used in 1624, pages 4872.
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Rewinding tape.
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error: 8000002, cmd: 1 0 0 0 0 0
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Sense Key : No Sense [current] 
Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Add. Sense: Rewind operation in progress

..
..
/usr/bin/rview
/usr/bin/zcmp
/usr/bin/nl-link-stats
/usr/bin/dltest
tar: /dev/st0: Cannot write: Input/output error
tar: /dev/st0: Cannot close: Input/output error
tar: Error is not recoverable: exiting now

Other client doing the sg_reset while writes are happening from rhel8iscsi1

Aug 21 13:33:45 rhel8iscsi2 kernel: st: Debugging enabled debug_flag = 1
Aug 21 13:33:45 rhel8iscsi2 kernel: st 32:0:0:0: Attached scsi tape st0
Aug 21 13:33:45 rhel8iscsi2 kernel: st 32:0:0:0: st0: try direct i/o: yes (alignment 1 B)
Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Block limits 1 - 16777215 bytes.
Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Density 58, tape length: 0, drv buffer: 1
Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
Aug 21 13:34:08 rhel8iscsi2 kernel: st 32:0:0:0: [st0] Rewinding tape.

[root@rhel8iscsi2 ~]# sg_reset /dev/st0
[root@rhel8iscsi2 ~]# 

Let me know what you see and think about this
Thanks
Laurence

Comment 46 John Meneghini 2023-08-21 18:01:52 UTC

(In reply to loberman from comment #45)
 
> Aug 21 13:33:57 rhel8iscsi1 kernel: st 32:0:0:0: Attached scsi tape st0
> Aug 21 13:33:57 rhel8iscsi1 kernel: st 32:0:0:0: st0: try direct i/o: yes
> (alignment 1 B)
> Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Block limits 1 -
> 16777215 bytes.
> Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Mode sense. Length
> 11, medium 0, WBS 10, BLL 8
> Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Density 58, tape
> length: 0, drv buffer: 1
> Aug 21 13:34:04 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Block size: 0, buffer
> size: 4096 (1 blocks).
> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error: 8000002, cmd:
> a 0 0 28 0 0

This is the write command.  We expect to see this because the application is writing to the tape.

> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Sense Key : No Sense
> [current] 
> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Add. Sense: Rewind
> operation in progress

And here is the sense data indicating the write command failed because the tape drive is rewinding.  This tape device is NOT correctly returning a Unit Attention or any other Sense Key, but our patch anticipates this handles the NOSENSE condition by setting the pos_unknown state in the driver.  This should be safe with any tape drive because the result of a bus device reset should always rewind the tape.
 
> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error on write:
> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Number of r/w
> requests 1624, dio used in 1624, pages 4872.

Here's the error sent to the application. This is expected.

> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Rewinding tape.
> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Error: 8000002, cmd:
> 1 0 0 0 0 0

This is a rewind command.  The driver responds by sending a rewind command to the tape.

> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Sense Key : No Sense
> [current] 
> Aug 21 13:34:08 rhel8iscsi1 kernel: st 32:0:0:0: [st0] Add. Sense: Rewind
> operation in progress

OK.  This is good.  We see the tape drive fails the rewind command with the same No Sense/Rewind status.

The important part is, setting the pos_unknown state in the driver does, in fact, prevent the Write Filemarks command.  We see no interposing Write File Marks command, and this is what we want.


> ..
> ..
> /usr/bin/rview
> /usr/bin/zcmp
> /usr/bin/nl-link-stats
> /usr/bin/dltest
> tar: /dev/st0: Cannot write: Input/output error
> tar: /dev/st0: Cannot close: Input/output error
> tar: Error is not recoverable: exiting now

The application fails, but the data on the tape should be intact up to the last valid filemark.  So at this point, some application level recover should be possible by opening the tape device and seaching for the last valie filemark.

This is what I wanted to see.

Comment 47 loberman 2023-08-21 18:04:12 UTC

Excellent
Thanks for all your work on this.
I guess an upstream submission by you then may follow.

Regards
Laurence Oberman

Comment 48 John Meneghini 2023-08-21 18:40:45 UTC

(In reply to loberman from comment #47)
> Excellent
> I guess an upstream submission by you then may follow.

Yes, I will put together a patch to send up stream.  It will basically be the patch proposed by the customer. At this point I'm convinced the proposed patch, which catches the Unit Attention/Reset condition and sets pos_unknown in the driver is safe.  However, we need the customer to be aware that this may not completely solve their problem. As mentioned above, when the AWS gateway experiences a crash and then restarts, I expect the tape backup application running on the client will see an error.  The data on the tape should still be valid up to the most recently written filemark, but there may be no valid EOM/filemark at the end of the tape. It will take some addition work by the application to find the end of any valid data on the tape device and recover before restarting any backup that was in progress.

So, we can push the patch upstream and take the fix into RHEL, but it is still TBD if the customers problem will be completely solved.

Comment 49 loberman 2023-08-21 18:59:10 UTC

I can email the customer and let him know the situation.

However I think we must add to the upstream submission
Suggested-by: Jeffrey Hutzelman <jhutz)

Comment 50 John Meneghini 2023-08-31 16:28:35 UTC

The patch that addressed this issue in the st driver has been accepted upstream.

https://lore.kernel.org/linux-scsi/20230822181413.1210647-1-jmeneghi@redhat.com/

Targeting this fix for RHEL 8.10

Comment 52 John Meneghini 2023-09-01 14:13:17 UTC

Marco, this is for RHEL 8.10. Please set qa_ack+.  

Laurence, we as no real way to test this.  Please see if the customer can test this build for us.  I'm setting OtherQA on this BZ.

Comment 53 loberman 2023-09-01 17:20:43 UTC

Seems customer is on 7, not 8 yest but I will reach out and see if Jeff can do anything to test this.
We tested the functionality but we dot have access to these Amazon VTL tape servers.

Comment 59 loberman 2023-09-19 19:22:53 UTC

I attached a RHEL7.9 test kernel to the customer case,  and asked if the customer had time to test it.

Thanks
Laurence

Comment 66 John Meneghini 2024-03-08 15:33:49 UTC

I'm moving this but to jira.

Comment 68 errata-xmlrpc 2024-05-22 09:48:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3138