Bug 1634070

Summary: [ESXi][RHEL6.9]vmw_pvscsi driver crashes during host reset from accessing stale scsi_cmnd
Product: Red Hat Enterprise Linux 6 Reporter: David Jeffery <djeffery>
Component: qemu-kvmAssignee: Cathy Avery <cavery>
Status: CLOSED WORKSFORME QA Contact: Bo Yang <boyang>
Severity: high Docs Contact:
Priority: high    
Version: 6.9CC: ailan, apanagio, boyang, cavery, djeffery, emilne, glen.gold, jen, jgill, jsavanyo, jsuchane, ldu, leiwang, mkenneth, nkshirsa, revers, virt-maint, vmware-gos-qa, yacao
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-26 11:43:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Jeffery 2018-09-28 16:15:34 UTC
A customer's vm system crashed during error recovery in the vmw_pvscsi driver.

    crash> bt
    PID: 269    TASK: ffff8802359caab0  CPU: 0   COMMAND: "scsi_eh_3"
     #0 [ffff880235aeb920] machine_kexec at ffffffff8103fddb
     #1 [ffff880235aeb980] crash_kexec at ffffffff810d1f02
     #2 [ffff880235aeba50] oops_end at ffffffff8154f160
     #3 [ffff880235aeba80] no_context at ffffffff8105186b
     #4 [ffff880235aebad0] __bad_area_nosemaphore at ffffffff81051af5
     #5 [ffff880235aebb20] bad_area_nosemaphore at ffffffff81051bc3
     #6 [ffff880235aebb30] __do_page_fault at ffffffff810522dc
     #7 [ffff880235aebc50] do_page_fault at ffffffff815510ee
     #8 [ffff880235aebc80] page_fault at ffffffff8154e3e5
        [exception RIP: pvscsi_host_reset+339]
        RIP: ffffffffa016d403  RSP: ffff880235aebd30  RFLAGS: 00010082
        RAX: 0000000000000000  RBX: ffff880235b036c0  RCX: 000000000000000a
        RDX: ffffc90009820000  RSI: 0000000000000020  RDI: 0000000000000100
        RBP: ffff880235aebd80   R8: ffffffff8160dc20   R9: 0000000000000000
        R10: 736f482049534353  R11: 0a74657365722074  R12: ffff88011b19ee80
        R13: ffff880235ad3de0  R14: 00000000000000db  R15: ffff880235ad3e88
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
     #9 [ffff880235aebd88] scsi_try_host_reset at ffffffff8139e6d2
    #10 [ffff880235aebda8] scsi_eh_ready_devs at ffffffff813a06d6
    #11 [ffff880235aebe38] scsi_error_handler at ffffffff813a11c6
    #12 [ffff880235aebee8] kthread at ffffffff810a649e
    #13 [ffff880235aebf48] kernel_thread at ffffffff8100c28a


After sending the reset to the adapter, the system crashed while within vmw_pvscsi's inlined function pvscsi_reset_all.  While it was checking for any command which had not been returned, it found an unfinished scsi_cmnd ffff88011b19ee80 but then crashed when accessing it.  The scsi_cmnd was found in entry 0xdb in the adapter's cmd_map.


crash> p ((struct pvscsi_adapter *)0xffff880235ad3de0)->cmd_map[0xdb].cmd
$9 = (struct scsi_cmnd *) 0xffff88011b19ee80


This scsi_cmnd 0xffff88011b19ee80 had been freed and reused for something other than a scsi_cmnd.  It was now part of a 2k slab and its pointers were no longer valid.

crash> kmem 0xffff88011b19ee80
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff88023fcf03c0 size-2048               2048       1427      1428    714     4k
SLAB              MEMORY            TOTAL  ALLOCATED  FREE
ffff88001b2f3300  ffff88011b19e000      2          2     0
FREE / [ALLOCATED]
  [ffff88011b19e800]



Consequently, when the driver tried to dereference the request pointer field in this invalid scsi_cmnd when calling the inlined function scmd_printk, the value was invalid and crashed the system.


crash> p ((struct pvscsi_adapter *)0xffff880235ad3de0)->cmd_map[0xdb].cmd->request
$12 = (struct request *) 0x0


In the kernel log, we can see the scsi_cmnd 0xffff88011b19ee80 timed out and went through error recovery in the past.

sd 3:0:0:0: [sdb] task abort on host 3, ffff88011b19ee80
sd 3:0:0:0: [sdb] Failed to get completion for aborted cmd ffff88011b19ee80
sd 3:0:0:0: [sdb] SCSI device reset on scsi3:0
[0]: VMCI: Updating context from (ID=0x7a9140b5) to (ID=0x7a9140b5) on event (type=0).
[0]: VMCI: Updating context from (ID=0x7a9140b5) to (ID=0x7a9140b5) on event (type=0).
[0]: VMCI: Updating context from (ID=0x7a9140b5) to (ID=0x7a9140b5) on event (type=0).
[32413]: VMCI: Updating context from (ID=0x7a9140b5) to (ID=0x7a9140b5) on event (type=0).
sd 3:0:0:0: [sdb] task abort on host 3, ffff88010dceb9c0
sd 3:0:0:0: [sdb] Failed to abort cmd ffff88010dceb9c0
sd 3:0:5:0: [sdc] task abort on host 3, ffff8801059b57c0
sd 3:0:5:0: [sdc] Failed to abort cmd ffff8801059b57c0
sd 3:0:5:0: [sdc] task abort on host 3, ffff8801059b54c0
sd 3:0:5:0: [sdc] Failed to abort cmd ffff8801059b54c0
sd 3:0:5:0: [sdc] task abort on host 3, ffff88010dcebac0
sd 3:0:5:0: [sdc] Failed to abort cmd ffff88010dcebac0
sd 3:0:5:0: [sdc] task abort on host 3, ffff8801059b5ac0
sd 3:0:5:0: [sdc] Failed to abort cmd ffff8801059b5ac0
sd 3:0:0:0: [sdb] task abort on host 3, ffff88010dceb9c0
sd 3:0:0:0: [sdb] Failed to abort cmd ffff88010dceb9c0
sd 3:0:0:0: [sdb] SCSI device reset on scsi3:0
sd 3:0:0:0: [sdb] task abort on host 3, ffff88010dceb9c0
sd 3:0:0:0: [sdb] Failed to abort cmd ffff88010dceb9c0
sd 3:0:0:0: [sdb] SCSI Bus reset
sd 3:0:0:0: [sdb] task abort on host 3, ffff88010dceb9c0
sd 3:0:0:0: [sdb] Failed to abort cmd ffff88010dceb9c0
sd 3:0:0:0: [sdb] SCSI Host reset
BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

Based on the messages, it seems the vmw_pvscsi driver did a device reset as part of its error recovery and believed it had recovered the command.  There were then more timeouts and another run of the error handler later.  This last run of the error handler progressed all the way to a host reset, which then found this stale scsi_cmnd and crashed trying to complete it.

I suspect this is an issue with vmw_pvscsi's device reset routine, pvscsi_device_reset, which is very similar to the old pvscsi_abort issue from BZ1002727.  pvscsi_device_reset always reports success without checking that all commands to the device being reset have actually been returned and completed.  If a command does not get returned by vmware in time, the SCSI layer may complete it anyway or cause a double-submit after vmw_pvscsi reports the device reset as a success.


Version-Release number of selected component (if applicable):
2.6.32-696.16.1.el6.x86_64

How reproducible:
Only one known occurrence

Additional info:
Vmcore is on optimus.gsslab.rdu2.redhat.com, /cores/retrace/tasks/353114606/crash/vmcore

Comment 2 ldu 2018-10-08 03:02:08 UTC
Hi David,
I am a QE that test RHEL on ESXi platform. For this bug, does it have reproduce steps that we can reproduce on our internal test ENV?
if no reproduce steps or can not 100% reproduce, does the customer can help verify it?

Thanks
ldu

Comment 3 Alexandros Panagiotou 2018-10-09 13:48:22 UTC
Hello,
I'll answer based on my understanding: For this to appear, we need a VM with the vmw_pvscsi driver being used for disks. At least one of the disks presented by the vmw_pvscsi needs to become non-responsive, so that we start having aborted commands, target resets and eventually a host reset. The crash happens exactly at the moment of a host reset. The tricky part, as far as I can understand, is that there needs to be a scsi command that was supposed to be cleared due to a previous target reset, but it is still present.

I'm not sure how easy it is to reproduce this, either on a test system or even within the customer's environment. There are messages that match a live migration of the VM (vMotion) in parallel with the aborted scsi commands/device resets (to be more specific, there are multiple of these: https://access.redhat.com/solutions/179393). This could explain why the disks were not responding fast enough, however we cannot tell for sure from within the VM what was causing the disks to be non-responsive.

I'm leaving the needinfo flag on - as there may be details that I'm missing in this answer.

Regards,
Alexandros

Comment 5 Jaroslav Suchanek 2019-02-06 11:31:49 UTC
There is nothing related to virt-manager nor libvirt. Moving to qemu for further analysis.

Comment 7 Cathy Avery 2019-03-20 11:27:26 UTC
Hi Lili and Bo,

I'm going to be looking at this. Do you think it is reproducible?

Thanks,

Cathy

Comment 8 Bo Yang 2019-03-21 11:56:33 UTC
WW 12.4 UPDATE


Hi Cathy, 

We haven't reproduced it in our test ENV, as NO specific reproduce steps, we tried to reset disk with command sg_reset -v -b /dev/sdb when live migration a guest to reproduce, but haven't reproduced it currently.

Do you have any suggestion for how to reproduce it?


BR,
BO

Comment 9 Cathy Avery 2019-03-25 14:16:37 UTC
(In reply to boyang from comment #8)

Hi Bo,

Could I see your test setup? Also with instructions on how to do the live migration? I want to play around with it.

Thanks!

Comment 12 Cathy Avery 2019-03-26 18:47:03 UTC
@Lili or Bo

I just read this comment from https://bugzilla.redhat.com/show_bug.cgi?id=1002727#c52 

We might be able to do something similar but use 2 external iscsi data stores ( disks served out through iscsi/LIO on a server ) that we could could control online and offline through iscsiadm on the disk server while we migrated the vm between the two data stores. Does this sound doable?

Cathy

Comment 13 Bo Yang 2019-03-27 11:16:16 UTC
(In reply to Cathy Avery from comment #12)
> @Lili or Bo
> 
> I just read this comment from
> https://bugzilla.redhat.com/show_bug.cgi?id=1002727#c52 
> 
> We might be able to do something similar but use 2 external iscsi data
> stores ( disks served out through iscsi/LIO on a server ) that we could
> could control online and offline through iscsiadm on the disk server while
> we migrated the vm between the two data stores. Does this sound doable?
> 
> Cathy


HI Cathy,


From my understanding, The target VM should be migrated in these two iscsi datastores, right?

I think we can setup what you say about iSCSI in our test ENV. Currently iSCSI setup in our ESXi test ENV isn't setup by us.

Trying to setup above ENV in ESXi6.0 and hope it satisfy our reproduction asap.


BR,
BO

Comment 14 Cathy Avery 2019-03-27 11:29:19 UTC
(In reply to boyang from comment #13)

Sounds good! While you are doing that I'm going to check out vSphere PowerCLI to see if the data store can be disabled out from underneath the VM. That may be another option. Either way we would have to have dedicated the data stores so we don't disrupt other vms.

Comment 15 Cathy Avery 2019-03-27 13:22:15 UTC
@Bo

So it looks like we can disconnect the iscsi target used for the data store with esxcli on the host. We might be able to get away with just unmounting it but I don't know if you can unmount a data store with running VMs attached to it. It also looks like PowerCli can do it too  Set-ScsiLunPath -ScsiLunPath $path -Active:$false -Confirm:$false. Sorry if I am telling you something you already know. I have used esxcli but never powercli. So maybe using a script to migrate back and forth while activating/deactivating iscsi or ScsiLunPath plus pounding the driver with fio jobs would do the trick?

Comment 16 Cathy Avery 2019-03-29 17:37:48 UTC
More crash analysis

Here is the ctx of the junk command 0xffff88011b19ee80

struct pvscsi_ctx {
  cmd = 0xffff88011b19ee80,
  sgl = 0xffff880235bd6000,
  list = {
    next = 0xdead000000100100,
    prev = 0xdead000000200200       
  },
  dataPA = 0, 
  sensePA = 5491922112,
  sglPA = 0,
  abort_cmp = 0x0
}

The context id on adapter->cmd_map is 219

Here are the context id 219 entries on the request ring. None of the sense or data Addrs match.

search -s ffff880235af0000 -l 32752 db 
ffff880235af1980: db
ffff880235af3100: db
ffff880235af5080: db
ffff880235af7c80: db

and completion ring.

search -s ffff880235af8000 -l 32752 db
ffff880235afb500: db
ffff880235afbd20: db
ffff880235afc3e0: db
ffff880235afcbe0: db
ffff880235afd6c0: db
ffff880235afde00: db
ffff880235afe680: db

I poked around the completion ring but there does not seem much there to go on. So about all I know is that the command appears to have never been completed by the backend emulation driver and as David pointed out pvscsi_host_reset tripped on it. Ewan has some crash extensions for vmw_pvscsi he will be showing me how to use.

Comment 17 Bo Yang 2019-04-01 10:14:48 UTC
(In reply to Cathy Avery from comment #15)
> @Bo
> 
> So it looks like we can disconnect the iscsi target used for the data store
> with esxcli on the host. We might be able to get away with just unmounting
> it but I don't know if you can unmount a data store with running VMs
> attached to it. It also looks like PowerCli can do it too  Set-ScsiLunPath
> -ScsiLunPath $path -Active:$false -Confirm:$false. Sorry if I am telling you
> something you already know. I have used esxcli but never powercli. So maybe
> using a script to migrate back and forth while activating/deactivating iscsi
> or ScsiLunPath plus pounding the driver with fio jobs would do the trick?


WW14.1 UPDATEs:


Sorry, missed somethings in PTO and weekend.


Setup two software iscsi datastores which wasn't attached to Host currently. We can confirm we wound't umount a datastore which VM(s) running on from UI, maybe "esxcli iscsi software" can disable / enable iscsi forcefully. 

We have the script to migrate VM between datastores(non iscsi type). I can check / debug the script / cmdlet if Set-ScsiLunPath works


BR,
BO

Comment 18 Cathy Avery 2019-04-03 12:12:48 UTC
John,

Could you place me in contact with someone from Vmware that can explain exactly what happens in the backend hypervisor/host side
when the vmw_pvscsi driver issues:

PVSCSI_CMD_RESET_DEVICE
PVSCSI_CMD_RESET_BUS
PVSCSI_CMD_ADAPTER_RESET

Specifically what happens to outstanding requests on the ring during resets. Am I guaranteed that all will be completed? What happens if a request is not completed by the backend? Can I free its resources as the backend will not longer touch it after a certain RESET is issued? We seem to have an outstanding request that is not being completed by the driver and possibly the backend.

Thanks,

Cathy

Comment 19 John Savanyo 2019-04-03 17:17:27 UTC
Jim Gill is official maintainer for pvscsi kernel module.  Jim can you look into this?

Comment 20 Jim Gill 2019-04-03 18:42:42 UTC
Yes, I will look into this.

Comment 21 Cathy Avery 2019-04-03 18:51:10 UTC
(In reply to Jim Gill from comment #20)
> Yes, I will look into this.

Thanks. The reset information requested in comment 18 will be very helpful.

Cathy

Comment 24 Cathy Avery 2019-04-15 12:40:22 UTC
@Jim

Hi,

Have you heard anything back regarding the backend's response to RESET ( comment 18 )? 

Thanks,

Cathy

Comment 26 Cathy Avery 2019-05-10 11:30:36 UTC
@John

Hi,

Is there any update on the reset information I requested? I haven't seen anything yet.

Thanks

Comment 27 Jim Gill 2019-05-10 16:51:22 UTC
I have a writeup explaining the various resets. I will run it by the VMware storage people (waiting for them to get in) and you will have this by end-of-day. Apologies for my delay.

Comment 28 Jim Gill 2019-05-10 23:58:02 UTC
Response to comment 18:
> What happens in the backend hypervisor/host side when the vmw_pvscsi driver
  issues RESET_DEVICE, RESET_BUS, and ADAPTER_RESET?
 
PVSCSI_RESET_DEVICE: Any commands in flight are tagged internally as “device reset”.
As underlying I/O to the backend completes, the ESX kernel posts completions to the
PVSCSI completion ring with status “device reset”.
 
PVSCSI_RESET_BUS simply iterates PVSCSI_RESET_DEVICE for all devices attached to that
adapter.
 
PVSCSI_ADAPTER_RESET waits briefly for any commands in flight to complete in the backend.
Completions will be posted. The request, completion, and message rings are unmapped on
the ESX side. All four interrupt pins are deasserted.
 
> What happens to outstanding requests on the ring during resets? Am I guaranteed
  that all will be completed?
 
Ordinarily, they complete with device-reset status. You are not guaranteed that they will be completed.
 
> What happens if a request is not completed by the backend?
 
If a command is held up internally for some reason, it may complete (by having an entry
posted to the completion ring) at any time until the device, bus or adapter
is reset. In the unlikely case that a command is stuck somewhere in the backend, and the
guest driver issues a RESET, ESX will keep trying to reset the stuck command.
 
> Can I free its resources as the backend will no longer touch it after a certain RESET
  is issued?
 
No.


I notice that the "Version-Release number of selected component" is 2.6.32-696.1.e16x86-64, which seems quite old to me. This looks to be quite close to the vmw_pvscsi driver's original open-source release. Has the reset code been modified since then, might I be misinterpreting the version number?

Comment 29 Cathy Avery 2019-05-22 12:59:21 UTC
(In reply to Jim Gill from comment #28)

Thanks for the info.

> 
> I notice that the "Version-Release number of selected component" is
> 2.6.32-696.1.e16x86-64, which seems quite old to me. This looks to be quite
> close to the vmw_pvscsi driver's original open-source release. Has the reset
> code been modified since then, might I be misinterpreting the version number?

The IO that caused the issue does not look to have been completed as is not currently on the adapter->cmd_pool list. 

struct pvscsi_ctx {
  cmd = 0xffff88011b19ee80, 
  sgl = 0xffff880235bd6000, 
  list = {
    next = 0xdead000000100100, 
    prev = 0xdead000000200200        
  }, 
  dataPA = 0,
  sensePA = 5491922112, 
  sglPA = 0, 
  abort_cmp = 0x0
}

The last commit we have is:

commit aac173e9618faadf8f92af6cc05e64f7acc64d79
Author: David Jeffery <djeffery>
Date:   Fri Oct 28 12:27:26 2016 -0400

    scsi: vmw_pvscsi: return SUCCESS for successful command aborts


I did not see any more recent commits upstream that would have addressed this issue and I could not identify any code changes that would have made a difference although I could have missed something. If you have a commit that will provide a solution I'll add it.

Thanks!

Comment 30 Cathy Avery 2019-09-17 11:27:46 UTC
@John Savanyo

Hi. Unfortunately we are not able to reproduce this crash and I have taken it as far as I could with the customers original crash dump. It would be better at this time if your folks took over the problem.

It is possible that recent commit

commit 240b4cc8fd5db138b675297d4226ec46594d9b3b
Author: Jan Kara <jack>
Date:   Wed Jun 19 09:05:41 2019 +0200

    scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()

will fix the problem but we cannot verify.

Comment 31 John Savanyo 2019-09-25 21:08:47 UTC
It is hard for us to narrow down cause of problem without a reproducible test case or logs. Jim and another engineer looked into this. They examined source code regarding how resets are handled. Explored some theories, but could not find anything problematic in backend or in driver code. We are also not seeing other reports like this from customers. 

We recommend that we close this as Can’t Reproduce. Cathy, Can you resolve this bug?

Please advise customer that if they can also contact VMware Support to open an SR. They help advise with reproduction, collecting logs, etc.

Comment 32 Cathy Avery 2019-09-26 11:40:33 UTC
David could you please contact the customer and ask them to open an SR with VMware per John's request in comment 31. I am going to close this bug for now and we can reopen it if anything else develops.

Thanks,

Cathy

Comment 33 Glen 2020-07-28 06:09:53 UTC
hello all,

i've been attempting to find a patch for bug 1002727.
can you kindly point me in the right direction?

ty,
glen.

Comment 34 Ewan D. Milne 2020-07-29 13:38:28 UTC
These are the patches to vmw_pvscsi in RHEL6.  The second one is the one
for bug 1002727, the first one was for a different bug 1372465 but you
would probably need both.

I think there may have also been a change to the VMware hypervisor around
the same time having to do with command ordering (i.e. the TEST UNIT READY
issued by the SCSI error handling passing other commands) but I'm not
certain what that might have been.  That was several years ago though.

---

commit 570dd5c289298a83e3bc2afc2a443c41a705e42f
Author: Ewan Milne <emilne>
Date:   Mon Nov 14 21:39:03 2016 -0500

    [scsi] vmw_pvscsi: return SUCCESS for successful command aborts
    
    upstream commit aac173e9618faadf8f92af6cc05e64f7acc64d79
    Author: David Jeffery <djeffery>
    Date:   Fri Oct 28 12:27:26 2016 -0400
    
        scsi: vmw_pvscsi: return SUCCESS for successful command aborts
    
        The vmw_pvscsi driver reports most successful aborts as FAILED to the
        scsi error handler.  This is do to a misunderstanding of how
        completion_done() works and its interaction with a successful wait using
        wait_for_completion_timeout().  The vmw_pvscsi driver is expecting
        completion_done() to always return true if complete() has been called on
        the completion structure.  But completion_done() returns true after
        complete() has been called only if no function like
        wait_for_completion_timeout() has seen the completion and cleared it as
        part of successfully waiting for the completion.
    
        Instead of using completion_done(), vmw_pvscsi should just use the
        return value from wait_for_completion_timeout() to know if the wait
        timed out or not.
    
        [mkp: bumped driver version per request]
    
        Signed-off-by: David Jeffery <djeffery>
        Reviewed-by: Laurence Oberman <loberman>
        Reviewed-by: Ewan D. Milne <emilne>
        Acked-by: Jim Gill <jgill>
        Signed-off-by: Martin K. Petersen <martin.petersen>
    
    (Modified for RHEL6 -- Omitted change to driver version number, because
     RHEL6 does not contain the additional driver functionality between versions
     "1.0.3.0-k" and "1.0.6.0-k", so updating the version to "1.0.7.0-k" would
     be misleading.)
    
    Signed-off-by: Ewan D. Milne <emilne>
    Signed-off-by: Phillip Lougher <plougher>

diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c
index 7f154081137..daa3fe71d4d 100644
--- a/drivers/scsi/vmw_pvscsi.c
+++ b/drivers/scsi/vmw_pvscsi.c
@@ -739,6 +739,7 @@ static int pvscsi_abort(struct scsi_cmnd *cmd)
        unsigned long flags;
        int result = SUCCESS;
        DECLARE_COMPLETION_ONSTACK(abort_cmp);
+       int done;
 
        scmd_printk(KERN_DEBUG, cmd, "task abort on host %u, %p\n",
                    adapter->host->host_no, cmd);
@@ -770,10 +771,10 @@ static int pvscsi_abort(struct scsi_cmnd *cmd)
        pvscsi_abort_cmd(adapter, ctx);
        spin_unlock_irqrestore(&adapter->hw_lock, flags);
        /* Wait for 2 secs for the completion. */
-       wait_for_completion_timeout(&abort_cmp, msecs_to_jiffies(2000));
+       done = wait_for_completion_timeout(&abort_cmp, msecs_to_jiffies(2000));
        spin_lock_irqsave(&adapter->hw_lock, flags);
 
-       if (!completion_done(&abort_cmp)) {
+       if (!done) {
                /*
                 * Failed to abort the command, unmark the fact that it
                 * was requested to be aborted.



commit 10ad4e5302fc1c4f79abc1d4b53eb7f9b54b2c90
Author: Ewan Milne <emilne>
Date:   Mon Mar 10 20:25:56 2014 -0400

    [scsi] vmw_pvscsi: Fix pvscsi_abort() function
    
    This change ensures that pvscsi_abort() function returns SUCCESS
    only when the command in question was actually completed, otherwise
    returns FAILURE. The code before change, was causing a bug where
    driver tries to complete a command to the mid-layer while the mid-layer
    has already requested the driver to abort that command, in response
    to which the driver has responded with SUCCESS causing mid-layer
    to free the command struct.
    
    Signed-off-by: Arvind Kumar <arvindkumar>
    Signed-off-by: Ewan D. Milne <emilne>
    Signed-off-by: Rafael Aquini <aquini>

diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c
index 75445c6e70a..7f154081137 100644
--- a/drivers/scsi/vmw_pvscsi.c
+++ b/drivers/scsi/vmw_pvscsi.c
@@ -1,7 +1,7 @@
 /*
  * Linux driver for VMware's para-virtualized SCSI HBA.
  *
- * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
+ * Copyright (C) 2008-2014, VMware, Inc. All Rights Reserved.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the
@@ -61,6 +61,7 @@ struct pvscsi_ctx {
        dma_addr_t              dataPA;
        dma_addr_t              sensePA;
        dma_addr_t              sglPA;
+       struct completion       *abort_cmp;
 };
 
 struct pvscsi_adapter {
@@ -176,6 +177,7 @@ static void pvscsi_release_context(struct pvscsi_adapter *adapter,
                                   struct pvscsi_ctx *ctx)
 {
        ctx->cmd = NULL;
+       ctx->abort_cmp = NULL;
        list_add(&ctx->list, &adapter->cmd_pool);
 }
 
@@ -495,15 +497,27 @@ static void pvscsi_complete_request(struct pvscsi_adapter *adapter,
 {
        struct pvscsi_ctx *ctx;
        struct scsi_cmnd *cmd;
+       struct completion *abort_cmp;
        u32 btstat = e->hostStatus;
        u32 sdstat = e->scsiStatus;
 
        ctx = pvscsi_get_context(adapter, e->context);
        cmd = ctx->cmd;
+       abort_cmp = ctx->abort_cmp;
        pvscsi_unmap_buffers(adapter, ctx);
        pvscsi_release_context(adapter, ctx);
-       cmd->result = 0;
+       if (abort_cmp) {
+               /*
+                * The command was requested to be aborted. Just signal that
+                * the request completed and swallow the actual cmd completion
+                * here. The abort handler will post a completion for this
+                * command indicating that it got successfully aborted.
+                */
+               complete(abort_cmp);
+               return;
+       }
 
+       cmd->result = 0;
        if (sdstat != SAM_STAT_GOOD &&
            (btstat == BTSTAT_SUCCESS ||
             btstat == BTSTAT_LINKED_COMMAND_COMPLETED ||
@@ -723,6 +737,8 @@ static int pvscsi_abort(struct scsi_cmnd *cmd)
        struct pvscsi_adapter *adapter = shost_priv(cmd->device->host);
        struct pvscsi_ctx *ctx;
        unsigned long flags;
+       int result = SUCCESS;
+       DECLARE_COMPLETION_ONSTACK(abort_cmp);
 
        scmd_printk(KERN_DEBUG, cmd, "task abort on host %u, %p\n",
                    adapter->host->host_no, cmd);
@@ -745,13 +761,40 @@ static int pvscsi_abort(struct scsi_cmnd *cmd)
                goto out;
        }
 
+       /*
+        * Mark that the command has been requested to be aborted and issue
+        * the abort.
+        */
+       ctx->abort_cmp = &abort_cmp;
+
        pvscsi_abort_cmd(adapter, ctx);
+       spin_unlock_irqrestore(&adapter->hw_lock, flags);
+       /* Wait for 2 secs for the completion. */
+       wait_for_completion_timeout(&abort_cmp, msecs_to_jiffies(2000));
+       spin_lock_irqsave(&adapter->hw_lock, flags);
 
-       pvscsi_process_completion_ring(adapter);
+       if (!completion_done(&abort_cmp)) {
+               /*
+                * Failed to abort the command, unmark the fact that it
+                * was requested to be aborted.
+                */
+               ctx->abort_cmp = NULL;
+               result = FAILED;
+               scmd_printk(KERN_DEBUG, cmd,
+                           "Failed to get completion for aborted cmd %p\n",
+                           cmd);
+               goto out;
+       }
+
+       /*
+        * Successfully aborted the command.
+        */
+       cmd->result = (DID_ABORT << 16);
+       cmd->scsi_done(cmd);
 
 out:
        spin_unlock_irqrestore(&adapter->hw_lock, flags);
-       return SUCCESS;
+       return result;
 }
 
 /*
diff --git a/drivers/scsi/vmw_pvscsi.h b/drivers/scsi/vmw_pvscsi.h
index 3546e8662e3..a6437758384 100644
--- a/drivers/scsi/vmw_pvscsi.h
+++ b/drivers/scsi/vmw_pvscsi.h
@@ -1,7 +1,7 @@
 /*
  * VMware PVSCSI header file
  *
- * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
+ * Copyright (C) 2008-2014, VMware, Inc. All Rights Reserved.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the
@@ -26,7 +26,7 @@
 
 #include <linux/types.h>
 
-#define PVSCSI_DRIVER_VERSION_STRING   "1.0.2.0-k"
+#define PVSCSI_DRIVER_VERSION_STRING   "1.0.3.0-k"
 
 #define PVSCSI_MAX_NUM_SG_ENTRIES_PER_SEGMENT 128