Description of problem: Problem reported by Jim Kam (jim.kam - HP 3rd level escalation engineer) Red Hat TAM: Chris Williams [Hardware] The machine is a Proliant Server (DL 360) with an LSI Ultra 320 controller (which uses the mptscsi driver), to which is attached a HP Ultrium 460 tape drive. The box runs RHEL 3 and has tried U3, U4, and U6-beta with the same problem. [Application] The application is Legato Tape backup software with a tape diags utility that lets users send SCSI commands to the device. They can send an Inquiry and it works fine. However once they do a write to tape then follow it with an Inquiry, the tape device "hangs" until a signal is sent to the device. [Additional Info] When an Adaptec controller is used, the operation completes normally. In talking to the Adaptec folk (between HP and Adaptec) some time ago, they had mentioned that they bypassed some of the mid layer in favor of their own routines. With the straces as well as SCSI traces, Jim observed that after the tape has done the write, nothing else gets sent to the driver. Upload two sets of data sent by Jim Kam - first data is an strace log showing the difference between an Adaptec controller and the LSI controller on U3. The second set repeats the experiment with U6-beta kernel (2.4.21-34.ELsmp). Version-Release number of selected component (if applicable): smp-2.4.21-34.EL.i686
Created attachment 117676 [details] strace and scsi trace
Created attachment 117679 [details] U6-beta.strace.txt.data-3-1
Created attachment 117680 [details] U6-beta-messages-data-3-2
Created attachment 117681 [details] U6-beta-sgdump-data-3-3
Highlight of Jim's trace - the process hung at #788 __wait_event_interruptible() whenever an INQ command is followed by tape write on mptscsi driver: 772 case SG_IO: 773 { 774 int blocking = 1; /* ignore O_NONBLOCK flag */ 775 776 if (sdp->detached) 777 return -ENODEV; 778 if(! scsi_block_when_processing_errors(sdp->device) ) 779 return -ENXIO; 780 result = verify_area(VERIFY_WRITE, (void *)arg, SZ_SG_IO_HDR); 781 if (result) return result; 782 result = sg_new_write(sfp, (const char *)arg, SZ_SG_IO_HDR, 783 blocking, read_only, &srp); 784 if (result < 0) return result; 785 srp->sg_io_owned = 1; 786 while (1) { 787 result = 0; /* following macro to beat race condition */ 788 __wait_event_interruptible(sfp->read_wait, 789 (sdp->detached || sfp->closed || srp->done), result); 790 if (sdp->detached) 791 return -ENODEV; 792 if (sfp->closed) 793 return 0; /* request packet dropped already */ 794 if (0 == result) "drivers/scsi/sg.c" line 794 of 3104 --25%-- col 1-8
Created attachment 117772 [details] data requested by dledford - 2-1: dmesg file data generated via: echo "scsi log mlqueue 3" > /proc/scsi/scsi echo "scsi log mlcomplete 3" > /proc/scsi/scsi
Created attachment 117773 [details] data requested by dledford - 2-2: messages file
From Jim: On the CD, I have included some rpms - nsrserv-7.1A00-04.i386.rpm and nsrdiag.rpm. I believe I also included openmotif as well. Before nsrserv can be installed, openmotif must first be installed. Then install nsrserv, then nsrdiag. To duplicate the problem, run the following: /opt/nsr/diag/tapediag -vvv /dev/st0 From here you can send commands to the tape drive. For instance, from the prompt you can send an inq Command to do an inqiry. Here is the sequence of commands. inq /*send it an initial inquiry to make sure that it is talking to the tape drive */ readonly off /*set it so that it can write to tape */ open wr /*open for writing */ write /*write to tape*/ inq /* Here's where it will "Hang" until you issue a ^C. Prior inquires are fine. Once you do the write, inq will result in the hang */ If you have any questions, please contact me by email of phone. If I do not pick up the phone, please feel free to page me. Thanks, Jimk Jim M. Kam Engineering Problem Resolution * Jim.Kam * 281-518-1076, Pager 713-710-6504
The reason aic79xx works is because it always has a queue depth of at least 2. From the aic79xx code: /* * We allow the OS to queue 2 untagged transactions to * us at any time even though we can only execute them * serially on the controller/device. This should * remove some latency. */ scsi_adjust_queue_depth(dev->scsi_device, /*NON-TAGGED*/0, /*queue depth*/2); mpt fusion sets queue_depth to 1 for tapes. This is appropriate, since tapes do not support SCSI tagged commands. Other drivers probably do this as well. Unfortunately, it causes a problem with the way st driver does asynchronous writes (the default behavior). The way st driver works is that it holds on to the write command structure after the write completes. Then, when the next request comes along st driver frees the old command structure (in write_behind_check) and then issues the new command. Everything works fine, even if there is only one command structure. This is why a write followed by another write, or followed by a read, work fine even on mpt fusion. The problem comes up when other commands, like Inquiry and Rewind, that use sg driver instead of st driver are issued. If these requests are issued, and sg driver finds that the one and only command structure is still being held by st driver, then the process hangs waiting for a free command structure. As a test, I changed the min. mpt fusion queue_depth to 2, and now I see the same behavior for both mpt fusion and aic79xx. Unfortunately, this is not an appropriate long-term solution. This is because, as I mentioned above, a queue depth of one is appropriate for non-tagged devices. It would be very difficult to ensure that all the drivers issue just one command at a time to non-tagged devices eventhough we increased the queue depth to >1. So, the right solution is to modify st driver so that it releases the command structure after the write is complete, and not wait for the next request to cause it to do so. I am currently investigating such a patch. One of the issues that we will need to deal with is to determine whether this problem exists in the upstream 2.6 kernel, and if so, to get the fix reviewed and approved by them. Have you tried this test on RHEL-4, or some other 2.6 kernel? Would you be able to? If you can not, can you give me a version of nsrserv and nsrdiag that work on 2.6 (I have not tried the ones I have)? Thanks. Tom
Email from Jim: In answer to Tom's question - We did test it with RHEL 4, and the problem did not show up. The Legato files that I had sent do work with the 2.6 kernel. BTW we do appreciate the sterling job that RH has done (in particular Tom Coughlan and Doug Ledford have done to get to the source of the issue). Thank you so much for your help. jimk
Created attachment 118894 [details] patch, to release bufffer after write completes Here is a patch to fix the problem, thanks to Doug Ledford. Initial testing looks good. Jim, please test this thoroughly and let us know the results.
On Sept. 20 I received the following from Jim Kam: "I have been doing some testing on it, and thus far it looks great. I need to test it on an Ultrium drive, since that is what the customer has. I should have that done sometime tomorrow. Plus we are also going to do some large backup jobs to make sure that it works properly with no other discernable issues." Please let us know the results of these tests. If the patch is accepted, it will go in RHEL 3 U7. It can be provided as a hotfix as needed prior to U7.
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.6.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html