Bug 244097
| Summary: | SCSI: set error value when failing commands in prep_fn | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Issue Tracker <tao> | ||||
| Component: | kernel | Assignee: | Mike Christie <mchristi> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 4.0 | CC: | tao, tdunnon | ||||
| Target Milestone: | --- | Keywords: | OtherQA | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHBA-2007-0791 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2007-11-15 16:28:45 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
File uploaded: linux-2.6.9-55.EL-scsi_prep_fn-error-handling.patch This event sent from IssueTracker by dmilburn [Support Engineering Group] issue 119253 it_file 90437 Matsuya-san,
The attached patch is against 2.6.9-55.EL based upon these 2 patches, I
think this is what we are looking for, we need to verify the attached
patch. Would you please Fujitsu to review this patch? Would you like me to
build a test kernel rpm with this patch, if so, what arch do you need?
commit 6f16b5359ceb96780eac4178393b0e8a3c8aa1ea
Author: Mike Christie <michaelc.edu>
Date: Sat Sep 10 16:45:35 2005 -0500
[SCSI] set error value when failing commands in prep_fn
set DID_NO_CONNECT for the BLKPREP_KILL case and correct a few
BLKPREP_DEFER cases that weren't checking for the need to plug the
queue.
Signed-Off-By: Mike Christie <michaelc.edu>
Signed-off-by: James Bottomley <James.Bottomley>
commit beb6617d994161a6b12c5f69afc6fb154f085447
Author: Tejun Heo <htejun>
Date: Sun Apr 24 02:04:53 2005 -0500
[SCSI] remove REQ_SPECIAL in scsi_init_io()
scsi_init_io() used to set REQ_SPECIAL when it fails sg
allocation before requeueing the request by returning
BLKPREP_DEFER. REQ_SPECIAL is being updated to mean special
requests. So, remove REQ_SPECIAL setting.
Signed-off-by: Tejun Heo <htejun>
Signed-off-by: James Bottomley <James.Bottomley>
Internal Status set to 'Waiting on Support'
This event sent from IssueTracker by dmilburn [Support Engineering Group]
issue 119253
Matsuya-san, We confirmed the patch and tried on the system(2.6.9-55.EL). The system did not hang-up as a result. It worked just fine. Shintaro Minemoto Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by dmilburn [Support Engineering Group] issue 119253 Hello Matsuya-san, I was looking back over this patch, I think to solve this problem we really only need this fix: ===================================================== commit 6f16b5359ceb96780eac4178393b0e8a3c8aa1ea Author: Mike Christie <michaelc.edu> Date: Sat Sep 10 16:45:35 2005 -0500 [SCSI] set error value when failing commands in prep_fn set DID_NO_CONNECT for the BLKPREP_KILL case and correct a few BLKPREP_DEFER cases that weren't checking for the need to plug the queue. Signed-Off-By: Mike Christie <michaelc.edu> Signed-off-by: James Bottomley <James.Bottomley> ====================================================== Do you think Fujitsu could do a quick re-verification on this patch? I am only trying to keep the RHEL4 change as simple as possible, plus this second patch is consistent with RHEL5, sorry I didn't realize this earlier. Internal Status set to 'Waiting on Support' This event sent from IssueTracker by dmilburn [Support Engineering Group] issue 119253 Matsuya-san, We confirmed the linux-2.6.9-55.EL-scsi_prep_fn-error-handling2.patch and tried on the system(2.6.9-55.EL). The system did not hang-up as a result. It worked just fine. Shintaro Minemoto Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by dmilburn [Support Engineering Group] issue 119253 Hi David, Fujitsu got the positive result from your patch. This issue was gone. Regards, Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by dmilburn [Support Engineering Group] issue 119253 Posted to rhkernel-list: http://post-office.corp.redhat.com/archives/rhkernel-list/2007-June/msg01323.html Matsuya-san , We think that there is no problem if this program fix done with EL4.7 now. Shintaro minemoto This event sent from IssueTracker by L3support_drv issue 119253 Created attachment 158807 [details]
handle retry and fix leak
This is probably a dup of 233010. I think Dave mentioned that.
Dave, I have not tested this patch. It is basically what you sent to rh-kernel,
but has the scsi cmnd leak fixed too. I did not backport all the of the scsi
leak patch from upstream. This is just the parts you guys are hitting since
that part is not very invassive and will be easy to test.
Thanks Mike, we will ask Fujitsu to verify your patch in Comment #12. Matsuya-san, We confirmed the fix-scsi-prep-leak-and-retry2.patch and tried on the system(2.6.9-55.EL). The system did not hang-up as a result. It worked just fine. Shintaro Minemoto Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by L3support_drv issue 119253 Mike, Fujitsu reported that patch in Comment #12 worked fine. Thanks Dave and Fujitsu. I will send the patch to rh-kernel or you can if you want. It does not matter to me. Mike, I can re-send the patch with your changes to rhkernel-list, thanks for your help. Ok thanks, and no problem. committed in stream U6 build 55.23. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ I confirmed a fix for this issue in the RHEL4.6 SP2. Thank you. Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by L3support_drv issue 119253 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html |
Customer Name: Fujitsu Red Hat Customer ID# or RHN System ID: None Customer Contact Name: Shintaro Minemoto Drivers or hardware dependency: SCSI Summary: The system hangs up when sgtable cannot be acquired due to memory shortage. Version-Release of Selected Component: RHLE4 U4 Description of Problem: When sgtable was not able to be acquired due to memory shortage while we were testing, the phenomenon of hanging up the system was generated. The problem occurs because processing is completed without setting the QUEUE_FLAG_PLUGGED flag when scsi_alloc_sgtable() of scsi_init_io() returns 0. The QUEUE_FLAG_PLUGGED flag is set in blk_plug_device(). __Generic_unplug_device() confirms the QUEUE_FLAG_PLUGGED flag of request_queue with blk_remove_plug() before calling elv_next_request(). If this flag is set, elv_next_request() is called. If the flag isn't set, processing is ended without calling elv_next_request(). elv_next_request() is not called even if __ generic_unplug_device() is called again because processing is completed without setting the QUEUE_FLAG_PLUGGED flag when scsi_alloc_sgtable() returned 0, and request doesn't take over to the SCSI layer. void __generic_unplug_device(request_queue_t *q) { if (test_bit(QUEUE_FLAG_STOPPED, &q-queue_flags)) return; if (!blk_remove_plug(q)) <- QUEUE_FLAG_PLUGGED is confirmed. return; /* * was plugged, fire request_fn if queue has stuff to do */ if (elv_next_request(q)) q-request_fn(q); } When the return of scsi_init_io() is an error, scsi_prep_fn() sets the QUEUE_FLAG_PLUGGED flag in RHEL5. RHEL5: static int scsi_prep_fn(struct request_queue *q, struct request *req) { struct scsi_device *sdev = q-queuedata; struct scsi_cmnd *cmd; int specials_only = 0; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ if (req-flags & (REQ_CMD | REQ_BLOCK_PC)) { int ret; /* * This will do a couple of things: * 1) Fill in the actual SCSI command. * 2) Fill in any other upper-level specific fields * (timeout). * * If this returns 0, it means that the request failed * (reading past end of disk, reading offline device, * etc). This won't actually talk to the device, but * some kinds of consistency checking may cause the * request to be rejected immediately. */ /* * This sets up the scatter-gather table (allocating if * required). */ ret = scsi_init_io(cmd); switch(ret) { /* For BLKPREP_KILL/DEFER the cmd was released */ case BLKPREP_KILL: goto kill; case BLKPREP_DEFER: <- It goes to defer when the return value is BLKPREP_DEFER. goto defer; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ defer: /* If we defer, the elv_next_request() returns NULL, but the * queue must be restarted, so we plug here if no returning * command will automatically do that. */ if (sdev-device_busy == 0) blk_plug_device(q); <- The QUEUE_FLAG_PLUGGED flag is hoisted. return BLKPREP_DEFER; kill: req-errors = DID_NO_CONNECT << 16; return BLKPREP_KILL; } scsi_prep_fn() does return without setting the flag when the return of scsi_init_io() is an error in RHEL4. RHEL4: static int scsi_prep_fn(struct request_queue *q, struct request *req) { struct scsi_device *sdev = q-queuedata; struct scsi_cmnd *cmd; int specials_only = 0; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ if (req-flags & (REQ_CMD | REQ_BLOCK_PC)) { struct scsi_driver *drv; int ret; /* * This will do a couple of things: * 1) Fill in the actual SCSI command. * 2) Fill in any other upper-level specific fields * (timeout). * * If this returns 0, it means that the request failed * (reading past end of disk, reading offline device, * etc). This won't actually talk to the device, but * some kinds of consistency checking may cause the * request to be rejected immediately. */ /* * This sets up the scatter-gather table (allocating if * required). */ ret = scsi_init_io(cmd); if (ret) /* BLKPREP_KILL return also releases the command */ return ret; <- It is a return without hoisting the QUEUE_FLAG_PLUGGED flag. /* * Initialize the actual SCSI command for this request. */ drv = *(struct scsi_driver **)req-rq_disk-private_data; if (unlikely(!drv-init_command(cmd))) { scsi_release_buffers(cmd); scsi_put_command(cmd); return BLKPREP_KILL; } } /* * The request is now prepped, no need to come back here */ req-flags |= REQ_DONTPREP; return BLKPREP_OK; defer: /* If we defer, the elv_next_request() returns NULL, but the * queue must be restarted, so we plug here if no returning * command will automatically do that. */ if (sdev-device_busy == 0) blk_plug_device(q); return BLKPREP_DEFER; } If you can't fix the problem in RHEL 4.5, please fix them in RHEL4.6 . How reproducible: One time Step to Reproduce: Actual Results: Expected Results: Summary of actions taken to resolve issue: Location of diagnostic data: Hardware configuration: Fujitsu PRIMERGY TX200S3 (x86) Software: RHEL4 U4 Additional Info: Business Impact: This event sent from IssueTracker by dmilburn [Support Engineering Group] issue 119253