Bug 1118123 - [Hyper-V][REHL 6.6] fcopy large file from host to guest failed
Summary: [Hyper-V][REHL 6.6] fcopy large file from host to guest failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.6
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: rc
: ---
Assignee: jason wang
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1006229 (view as bug list)
Depends On:
Blocks: 1056239 1019213
TreeView+ depends on / blocked
 
Reported: 2014-07-10 03:32 UTC by dnie
Modified: 2017-02-07 12:25 UTC (History)
16 users (show)

Fixed In Version: kernel-2.6.32-495.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-14 06:17:23 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:1392 normal SHIPPED_LIVE Important: kernel security, bug fix, and enhancement update 2014-10-14 01:28:44 UTC

Description dnie 2014-07-10 03:32:20 UTC
Description of problem:
fcopy large file from host to guest failed after copy %4, host display timeout period expired exception


Version-Release number of selected component (if applicable):
Host: Windows server 2012 R2
kernel version:
2.6.32-482.el6fcopy.x86_64
hypervfcopyd-0-0.14.20130826git.el6.log_timestamp.x86_64.rpm


How reproducible:
100%

Steps to Reproduce:
1. Login RHEL6 and yum -y install hypervfcopyd
2. Enable "Guest services" integration services from hyperv manager
3. service hypervfcopyd start and make sure hypervfcopyd service is running
4. Login server 2012 R2 host and COPY-VMFILE large-file(larger than 1G) to guest
5. Check guest and host log

Actual Result:
Fcopy about %4 and hung up no log appear until display "timeout period expired" exception in host

Expected Result:
It should copy large file from host to guest correctly



Additional info:

Comment 2 dnie 2014-07-10 05:47:41 UTC
Sorry, change component to kernel

Comment 6 Kylie 2014-07-11 09:43:38 UTC
Above fix is accepted in Greg's tree and will be in mainline soon. 
This issue is also in Red Hat 6.5. 
Could you please patch this fix in Red Hat 6.5?

Comment 7 Ronen Hod 2014-07-13 08:49:50 UTC
Makes sense for 6.5.z too. The risk is low and hv specific.

Comment 8 RHEL Product and Program Management 2014-07-13 08:50:04 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 9 dnie 2014-07-14 02:12:09 UTC
Hi, jason

kernel: 2.6.32-489.el6bzfcopy.x86_64
hypervfcopyd: hypervfcopyd-0-0.14.20130826git.el6.log_timestamp.x86_64.rpm

    I check this build, the result still have some issues
After fcopy about 70%, in host, the copy-vmfile command will stop and thrown exception, in guest, the fcopyd large file will be deleted permanently. 
    In guest, the daemons hypervfcopyd still running and no error messages. belows are the exception from host.
===============================================================================
PS C:\Users\vdcadmin.HYPERV> Copy-VMFile "hv14_RHEL6.5_x64_dnie" -SourcePath ".\
RHEL-7.0-20140507.0-Server-x86_64-dvd1.iso" -DestinationPath "/tmp" -CreateFullP
ath -FileSource Host
Copy-VMFile : 'hv14_RHEL6.5_x64_dnie' failed to copy file. (Virtual machine ID
5949191C-81EA-4698-A409-8EF06158F9E6)
'hv14_RHEL6.5_x64_dnie' failed to initiate copying files to the guest:
Unspecified error (0x80004005). (Virtual machine ID
5949191C-81EA-4698-A409-8EF06158F9E6)
'hv14_RHEL6.5_x64_dnie' failed to copy the source file
'C:\Users\vdcadmin.HYPERV\RHEL-7.0-20140507.0-Server-x86_64-dvd1.iso' to the
destination '/tmp' in the guest: Unspecified error (0x80004005). (Virtual
machine ID 5949191C-81EA-4698-A409-8EF06158F9E6)
At line:1 char:1
+ Copy-VMFile "hv14_RHEL6.5_x64_dnie" -SourcePath
".\RHEL-7.0-20140507.0-Server-x8 ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~
    + CategoryInfo          : NotSpecified: (Microsoft.Hyper...FileToGuestTask
   :VMCopyFileToGuestTask) [Copy-VMFile], VirtualizationOperationFailedExcept
  ion
    + FullyQualifiedErrorId : OperationFailed,Microsoft.HyperV.PowerShell.Comm
   ands.CopyVMFileCommand
==============================================================================

Comment 10 jason wang 2014-07-14 03:16:08 UTC
K.Y:

See above comment, after picking

9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc (Drivers: hv: util: Fix a bug in the KVP code)
and
affb1aff300ddee54df307812b38f166e8a865ef (Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code)

Our QE still can see some issues.

Comment 11 K. Y. Srinivasan 2014-07-14 15:07:54 UTC
Jason,

I will take a look. Vivek could you have somebody repro this locally here so I can take a look. I think Chris tested this.

Comment 12 Dexuan Cui 2014-07-15 10:59:47 UTC
Hi, jason wang,
Can you please confirm this new issue only happens to *SMP* VM?

I think I got the root cause: there is a race condition in SMP case.
Can you please test the below small patch?

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index eaaa3d8..23b2ce2 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -246,8 +246,8 @@ void hv_fcopy_onchannelcallback(void *context)
                /*
                 * Send the information to the user-level daemon.
                 */
-               fcopy_send_data();
                schedule_delayed_work(&fcopy_work, 5*HZ);
+               fcopy_send_data();
                return;
        }
        icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;

Comment 14 jason wang 2014-07-18 02:57:23 UTC
(In reply to Dexuan Cui from comment #12)
> Hi, jason wang,
> Can you please confirm this new issue only happens to *SMP* VM?
> 
> I think I got the root cause: there is a race condition in SMP case.
> Can you please test the below small patch?
> 
> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> index eaaa3d8..23b2ce2 100644
> --- a/drivers/hv/hv_fcopy.c
> +++ b/drivers/hv/hv_fcopy.c
> @@ -246,8 +246,8 @@ void hv_fcopy_onchannelcallback(void *context)
>                 /*
>                  * Send the information to the user-level daemon.
>                  */
> -               fcopy_send_data();
>                 schedule_delayed_work(&fcopy_work, 5*HZ);
> +               fcopy_send_data();
>                 return;
>         }
>         icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;

Thanks, I will pick this.

Comment 18 Kylie 2014-07-22 09:21:57 UTC
Hi Jason, 

To ensure KVP works well, additional patch is also needed. 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc

11 days Drivers: hv: util: Fix a bug in the KVP code K. Y. Srinivasan 2 -4/+15 

Could you please help ensure above patch is added into 6.5? Thank you. 

And let me send you a summary in another comment. 

Thanks.

Comment 19 Kylie 2014-07-22 09:23:03 UTC
Summary:

Issue #1: 
Red Hat 6.5 has built-in KVP. But KVP will stop working after several hours. 

To fix this issue, two patches in connection.c, hv_util.c and hv_kvp.c are
needed. 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc 

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=affb1aff300ddee54df307812b38f166e8a865ef

Logs:

11 days Drivers: hv: util: Fix a bug in the KVP code K. Y. Srinivasan 2 -4/+15 
11 days Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code K.
Y. Srinivasan 1 -2/+6 

Issue #2:
File copy hangs when copying large file. Red Hat 6.5 + additional fcopy
service or Red Hat 6.6/7.0. 

To fix this issue, two patches in connection.c and hv_fcopy.c are needed. 

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=affb1aff300ddee54df307812b38f166e8a865ef

https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/commit/?h=char-misc-linus&id=2ef82d24f445e82f80e235f44eb9d1bc933e3670

Logs:
11 days Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code K.
Y. Srinivasan 1 -2/+6 

4 days Drivers: hv: hv_fcopy: fix a race condition for SMP guest Dexuan Cui 1
-1/+1

Comment 20 jason wang 2014-07-22 09:52:03 UTC
(In reply to Kylie from comment #19)
> Summary:
> 
> Issue #1: 
> Red Hat 6.5 has built-in KVP. But KVP will stop working after several hours. 
> 
> To fix this issue, two patches in connection.c, hv_util.c and hv_kvp.c are
> needed. 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc 
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=affb1aff300ddee54df307812b38f166e8a865ef
> 

Get it, I will create a bug for 6.5z.
> Logs:
> 
> 11 days Drivers: hv: util: Fix a bug in the KVP code K. Y. Srinivasan 2
> -4/+15 
> 11 days Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code
> K.
> Y. Srinivasan 1 -2/+6 
> 
> Issue #2:
> File copy hangs when copying large file. Red Hat 6.5 + additional fcopy
> service or Red Hat 6.6/7.0. 
> 
> To fix this issue, two patches in connection.c and hv_fcopy.c are needed. 
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=affb1aff300ddee54df307812b38f166e8a865ef
> 
> https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/commit/
> ?h=char-misc-linus&id=2ef82d24f445e82f80e235f44eb9d1bc933e3670
> 

I've picked them to 6.6. 7.0 does not support file copy, so probably needed only for 7.1. We will probably do a sync from upstream during 7.1 so it will be included at that time.

Thanks
> Logs:
> 11 days Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code
> K.
> Y. Srinivasan 1 -2/+6 
> 
> 4 days Drivers: hv: hv_fcopy: fix a race condition for SMP guest Dexuan Cui 1
> -1/+1

Comment 21 Kylie 2014-07-25 03:08:06 UTC
Hi Jason, 

Did you create a KVP bug for 6.5z? If yes, could you please share me the bug number? Thanks.

Comment 24 Rafael Aquini 2014-08-05 00:49:16 UTC
Patch(es) available on kernel-2.6.32-495.el6

Comment 27 Kylie 2014-08-19 11:39:12 UTC
(In reply to Rafael Aquini from comment #24)
> Patch(es) available on kernel-2.6.32-495.el6

Thank you for update. It is good to know patches are available. 
I would like to double check. Is kernel-2.6.32-495.el6 for RHEL6.5z? If yes, no more question. 
If not, any plan on RHEL6.5z? Thank you. 

BTW, the kernel version of RHEL6.5 on my hand is 2.6.32-431. Thank you.

Comment 29 Kylie 2014-08-26 02:28:23 UTC
Could you please share me whether this patch is on RHEL6.5z? Is kernel-2.6.32-495.el6 for RHEL6.5z? If yes, no more question. Thank you.

Comment 30 jason wang 2014-08-26 03:18:37 UTC
(In reply to Kylie from comment #29)
> Could you please share me whether this patch is on RHEL6.5z? Is
> kernel-2.6.32-495.el6 for RHEL6.5z? If yes, no more question. Thank you.

Hi Kylie:

We're still considering the bug for 6.5z. We want to know the severity of this bug and only consider high severity bugs for 6.5z. So you can try to increase the severity of this bug if you think it was necessary for z stream.

Thanks

Comment 31 Kylie 2014-08-26 03:26:02 UTC
Thanks Jason. I will come back for severity.
 
Does 6.5z consider high priority bugs? Priority could be driven by 6.5z customer. Right?

Comment 32 jason wang 2014-08-26 04:36:39 UTC
(In reply to Kylie from comment #31)
> Thanks Jason. I will come back for severity.
>  
> Does 6.5z consider high priority bugs? Priority could be driven by 6.5z
> customer. Right?

Yes, I think so.

Comment 33 Kylie 2014-08-27 02:38:10 UTC
This is high priority for our customer who is using 6.5. Very important.  
As for severity, checked the definition. Yes, it has high impact which our customer's operation through KVP exchange on Hyper-v is disrupted.

Comment 34 jason wang 2014-08-27 02:51:15 UTC
(In reply to Kylie from comment #33)
> This is high priority for our customer who is using 6.5. Very important.  
> As for severity, checked the definition. Yes, it has high impact which our
> customer's operation through KVP exchange on Hyper-v is disrupted.

Increase the priority to high according to this comment.

Comment 36 dnie 2014-09-02 07:34:41 UTC
Verify this bug with RHEL6 x64 guest on Hyper-V 2012 R2, change status to VERIFIED.
At present i386 guest hypervfcopyd not work, Bug number is 1123156

Verify Version:
kernel 2.6.32-498.el6.x86_64

Verify steps:
1. make sure hypervfcopyd service start
2. Copy-VMFile large file (3.5G) from host to VM guest
3. Repeat 3 times

Verify result:
The large file can be copied from host to guest correctly and the issue is fixed

Comment 37 Kylie 2014-09-02 09:33:09 UTC
How about accepting it into 6.5z? Thanks.

Comment 38 Kylie 2014-09-15 23:55:37 UTC
Ping again. Could we have it into 6.5z which is needed for customers? Thank you.

Comment 39 Mike Surcouf 2014-09-29 11:08:31 UTC
Regarding

hv: vmbus: Fix a bug in the channel callback dispatch code

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=affb1aff300ddee54df307812b38f166e8a865ef

Unhelpfully the above patch does not state it solves another important scenario.

Thomas Shao <huishao@microsoft.com>

said

>For pause/resume or save/restore case, the time sync IC will set the guest time using host time sample. (In this case, host will send a ICTIMESYNCFLAG_SYNC message). But there is a bug in VM Bus Channel code, that will cause time sync IC service stop running after a long time (like one day). It is fixed by the above patch:

I had a problem where pause and resume (backup of vm) and live migration of 6.5 on hyperv  would not set the time or would set it 2 days in the past on restore.

I applied the patch and the problem is resolved.  Therefore this is a major defect for anyone backing up their vm through hyperv or using there vm on a hyperv cluster.

Certainly the patch in this comment should be considered for 7.0 possibly 6.5.
As I understand form the comments above it is already pulled for 6.6.

Comment 40 Mike Surcouf 2014-09-30 11:28:50 UTC
My problem not resolved by patch
Sorry for misinformation.  Did not test for long enough.

Comment 42 dnie 2014-10-03 08:25:48 UTC
(In reply to Mike Surcouf from comment #40)
> My problem not resolved by patch
> Sorry for misinformation.  Did not test for long enough.

Hi, Mike Surcouf
    Do you enable "Guest services" the "Integration Services" from Hyper-V manager? and could you provides us what your test steps, build info and log messages?
    
    I check fcopy in my environment again. Hyper-V 2012 R2  x86_64
    kernel 2.6.32-502.el6.x86_64 
    hypervfcopyd 0-0.15.20130826git.el6 

    Repeat steps from comment 36. 3.6G iso file could be copied from Host to guest correctly. no error or fail info from guest log and Host log.

Comment 43 Mike Surcouf 2014-10-03 08:41:24 UTC
Please ignore all my comments as they do not apply here (sorry for noise).

I would not let my issue stop the patches in this bug as they are all important and don't have any effect on my issue and others are seeing good results.

I am working with Microsoft to repro my specific issue.  I may open a separate bug report and if I do I will link it here for others.

Thanks

Mike

Comment 45 lijing 2014-10-08 11:02:09 UTC
Hi  Mike Surcouf,

For comment 39, I still have some confused with it and would like to make sure it with you further.

--> I had a problem where pause and resume (backup of vm) and live migration of 6.5 on hyperv  would not set the time or would set it 2 days in the past on restore.

Could you please give me the detail steps to reproduce it? because I would like to reproduce it on my local environment. I did the test on rhel6.5 rhel6.6 and rhel7 and didn't meet the issue you met. 

1. Do pause and resume test against VM
--> click right button to select pause and then resume VM about 10 min later. the time can be set. 
2. Do migration test, I used the command below to migrate VM . 
PS C:\> Move-VM "${name of the vm guest}" ${target host} –IncludeStorage –DestinationStoragePath ${destination pathe}
The result is still like above, the time can be set as well. 

Are there some different steps with yours? BTW, your comments referred the backup of VM, what does the mean? 

Do you run the backup function with the command below to check the time on VM? 
#wbadmin get items -version:<version string of the backup you want> -backupTarget:K:\
#wbadmin start recovery -version:02/05/2013-23:38 -itemType:HyperV -items:{$VM name} -backuptarget:K:\

I guess you backup VM firstly and restore VM to check the time, but I am not sure whether i am right or not. 


We have had a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1006229) that can set the time but there are time drift after save/start VM.

Comment 46 Mike Surcouf 2014-10-08 13:09:34 UTC
I think this is running off topic so I created

https://bugzilla.redhat.com/show_bug.cgi?id=1150584

Hopefully I have answered you questions here and provided a clear repro.

Comment 47 errata-xmlrpc 2014-10-14 06:17:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-1392.html

Comment 48 Vitaly Kuznetsov 2015-01-06 14:09:17 UTC
*** Bug 1006229 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.