Bug 1118123
Summary: | [Hyper-V][REHL 6.6] fcopy large file from host to guest failed | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | dnie <dnie> |
Component: | kernel | Assignee: | jason wang <jasowang> |
kernel sub component: | Hyper-V | QA Contact: | Virtualization Bugs <virt-bugs> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | medium | CC: | decui, dnie, henning.sackewitz, jasowang, jingli, kyliel, kys, leiwang, mike, pm-rhel, rhod, salmy, shwang, thozza, yacao, yangcao |
Version: | 6.6 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.32-495.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-10-14 06:17:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1019213, 1056239 |
Description
dnie
2014-07-10 03:32:20 UTC
Sorry, change component to kernel A fix is ready. https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/commit/?h=char-misc-linus&id=affb1aff300ddee54df307812b38f166e8a865ef Above fix is accepted in Greg's tree and will be in mainline soon. This issue is also in Red Hat 6.5. Could you please patch this fix in Red Hat 6.5? Makes sense for 6.5.z too. The risk is low and hv specific. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Hi, jason kernel: 2.6.32-489.el6bzfcopy.x86_64 hypervfcopyd: hypervfcopyd-0-0.14.20130826git.el6.log_timestamp.x86_64.rpm I check this build, the result still have some issues After fcopy about 70%, in host, the copy-vmfile command will stop and thrown exception, in guest, the fcopyd large file will be deleted permanently. In guest, the daemons hypervfcopyd still running and no error messages. belows are the exception from host. =============================================================================== PS C:\Users\vdcadmin.HYPERV> Copy-VMFile "hv14_RHEL6.5_x64_dnie" -SourcePath ".\ RHEL-7.0-20140507.0-Server-x86_64-dvd1.iso" -DestinationPath "/tmp" -CreateFullP ath -FileSource Host Copy-VMFile : 'hv14_RHEL6.5_x64_dnie' failed to copy file. (Virtual machine ID 5949191C-81EA-4698-A409-8EF06158F9E6) 'hv14_RHEL6.5_x64_dnie' failed to initiate copying files to the guest: Unspecified error (0x80004005). (Virtual machine ID 5949191C-81EA-4698-A409-8EF06158F9E6) 'hv14_RHEL6.5_x64_dnie' failed to copy the source file 'C:\Users\vdcadmin.HYPERV\RHEL-7.0-20140507.0-Server-x86_64-dvd1.iso' to the destination '/tmp' in the guest: Unspecified error (0x80004005). (Virtual machine ID 5949191C-81EA-4698-A409-8EF06158F9E6) At line:1 char:1 + Copy-VMFile "hv14_RHEL6.5_x64_dnie" -SourcePath ".\RHEL-7.0-20140507.0-Server-x8 ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~ + CategoryInfo : NotSpecified: (Microsoft.Hyper...FileToGuestTask :VMCopyFileToGuestTask) [Copy-VMFile], VirtualizationOperationFailedExcept ion + FullyQualifiedErrorId : OperationFailed,Microsoft.HyperV.PowerShell.Comm ands.CopyVMFileCommand ============================================================================== K.Y: See above comment, after picking 9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc (Drivers: hv: util: Fix a bug in the KVP code) and affb1aff300ddee54df307812b38f166e8a865ef (Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code) Our QE still can see some issues. Jason, I will take a look. Vivek could you have somebody repro this locally here so I can take a look. I think Chris tested this. Hi, jason wang, Can you please confirm this new issue only happens to *SMP* VM? I think I got the root cause: there is a race condition in SMP case. Can you please test the below small patch? diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index eaaa3d8..23b2ce2 100644 --- a/drivers/hv/hv_fcopy.c +++ b/drivers/hv/hv_fcopy.c @@ -246,8 +246,8 @@ void hv_fcopy_onchannelcallback(void *context) /* * Send the information to the user-level daemon. */ - fcopy_send_data(); schedule_delayed_work(&fcopy_work, 5*HZ); + fcopy_send_data(); return; } icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE; The patch of comment #12 was accepted into Greg's tree: https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/commit/?h=char-misc-linus&id=2ef82d24f445e82f80e235f44eb9d1bc933e3670 (In reply to Dexuan Cui from comment #12) > Hi, jason wang, > Can you please confirm this new issue only happens to *SMP* VM? > > I think I got the root cause: there is a race condition in SMP case. > Can you please test the below small patch? > > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c > index eaaa3d8..23b2ce2 100644 > --- a/drivers/hv/hv_fcopy.c > +++ b/drivers/hv/hv_fcopy.c > @@ -246,8 +246,8 @@ void hv_fcopy_onchannelcallback(void *context) > /* > * Send the information to the user-level daemon. > */ > - fcopy_send_data(); > schedule_delayed_work(&fcopy_work, 5*HZ); > + fcopy_send_data(); > return; > } > icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE; Thanks, I will pick this. Hi Jason, To ensure KVP works well, additional patch is also needed. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc 11 days Drivers: hv: util: Fix a bug in the KVP code K. Y. Srinivasan 2 -4/+15 Could you please help ensure above patch is added into 6.5? Thank you. And let me send you a summary in another comment. Thanks. Summary: Issue #1: Red Hat 6.5 has built-in KVP. But KVP will stop working after several hours. To fix this issue, two patches in connection.c, hv_util.c and hv_kvp.c are needed. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=affb1aff300ddee54df307812b38f166e8a865ef Logs: 11 days Drivers: hv: util: Fix a bug in the KVP code K. Y. Srinivasan 2 -4/+15 11 days Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code K. Y. Srinivasan 1 -2/+6 Issue #2: File copy hangs when copying large file. Red Hat 6.5 + additional fcopy service or Red Hat 6.6/7.0. To fix this issue, two patches in connection.c and hv_fcopy.c are needed. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=affb1aff300ddee54df307812b38f166e8a865ef https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/commit/?h=char-misc-linus&id=2ef82d24f445e82f80e235f44eb9d1bc933e3670 Logs: 11 days Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code K. Y. Srinivasan 1 -2/+6 4 days Drivers: hv: hv_fcopy: fix a race condition for SMP guest Dexuan Cui 1 -1/+1 (In reply to Kylie from comment #19) > Summary: > > Issue #1: > Red Hat 6.5 has built-in KVP. But KVP will stop working after several hours. > > To fix this issue, two patches in connection.c, hv_util.c and hv_kvp.c are > needed. > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=9bd2d0dfe4714dd5d7c09a93a5c9ea9e14ceb3fc > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=affb1aff300ddee54df307812b38f166e8a865ef > Get it, I will create a bug for 6.5z. > Logs: > > 11 days Drivers: hv: util: Fix a bug in the KVP code K. Y. Srinivasan 2 > -4/+15 > 11 days Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code > K. > Y. Srinivasan 1 -2/+6 > > Issue #2: > File copy hangs when copying large file. Red Hat 6.5 + additional fcopy > service or Red Hat 6.6/7.0. > > To fix this issue, two patches in connection.c and hv_fcopy.c are needed. > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=affb1aff300ddee54df307812b38f166e8a865ef > > https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/commit/ > ?h=char-misc-linus&id=2ef82d24f445e82f80e235f44eb9d1bc933e3670 > I've picked them to 6.6. 7.0 does not support file copy, so probably needed only for 7.1. We will probably do a sync from upstream during 7.1 so it will be included at that time. Thanks > Logs: > 11 days Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code > K. > Y. Srinivasan 1 -2/+6 > > 4 days Drivers: hv: hv_fcopy: fix a race condition for SMP guest Dexuan Cui 1 > -1/+1 Hi Jason, Did you create a KVP bug for 6.5z? If yes, could you please share me the bug number? Thanks. Patch(es) available on kernel-2.6.32-495.el6 (In reply to Rafael Aquini from comment #24) > Patch(es) available on kernel-2.6.32-495.el6 Thank you for update. It is good to know patches are available. I would like to double check. Is kernel-2.6.32-495.el6 for RHEL6.5z? If yes, no more question. If not, any plan on RHEL6.5z? Thank you. BTW, the kernel version of RHEL6.5 on my hand is 2.6.32-431. Thank you. Could you please share me whether this patch is on RHEL6.5z? Is kernel-2.6.32-495.el6 for RHEL6.5z? If yes, no more question. Thank you. (In reply to Kylie from comment #29) > Could you please share me whether this patch is on RHEL6.5z? Is > kernel-2.6.32-495.el6 for RHEL6.5z? If yes, no more question. Thank you. Hi Kylie: We're still considering the bug for 6.5z. We want to know the severity of this bug and only consider high severity bugs for 6.5z. So you can try to increase the severity of this bug if you think it was necessary for z stream. Thanks Thanks Jason. I will come back for severity. Does 6.5z consider high priority bugs? Priority could be driven by 6.5z customer. Right? (In reply to Kylie from comment #31) > Thanks Jason. I will come back for severity. > > Does 6.5z consider high priority bugs? Priority could be driven by 6.5z > customer. Right? Yes, I think so. This is high priority for our customer who is using 6.5. Very important. As for severity, checked the definition. Yes, it has high impact which our customer's operation through KVP exchange on Hyper-v is disrupted. (In reply to Kylie from comment #33) > This is high priority for our customer who is using 6.5. Very important. > As for severity, checked the definition. Yes, it has high impact which our > customer's operation through KVP exchange on Hyper-v is disrupted. Increase the priority to high according to this comment. Verify this bug with RHEL6 x64 guest on Hyper-V 2012 R2, change status to VERIFIED. At present i386 guest hypervfcopyd not work, Bug number is 1123156 Verify Version: kernel 2.6.32-498.el6.x86_64 Verify steps: 1. make sure hypervfcopyd service start 2. Copy-VMFile large file (3.5G) from host to VM guest 3. Repeat 3 times Verify result: The large file can be copied from host to guest correctly and the issue is fixed How about accepting it into 6.5z? Thanks. Ping again. Could we have it into 6.5z which is needed for customers? Thank you. Regarding hv: vmbus: Fix a bug in the channel callback dispatch code https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=affb1aff300ddee54df307812b38f166e8a865ef Unhelpfully the above patch does not state it solves another important scenario. Thomas Shao <huishao> said >For pause/resume or save/restore case, the time sync IC will set the guest time using host time sample. (In this case, host will send a ICTIMESYNCFLAG_SYNC message). But there is a bug in VM Bus Channel code, that will cause time sync IC service stop running after a long time (like one day). It is fixed by the above patch: I had a problem where pause and resume (backup of vm) and live migration of 6.5 on hyperv would not set the time or would set it 2 days in the past on restore. I applied the patch and the problem is resolved. Therefore this is a major defect for anyone backing up their vm through hyperv or using there vm on a hyperv cluster. Certainly the patch in this comment should be considered for 7.0 possibly 6.5. As I understand form the comments above it is already pulled for 6.6. My problem not resolved by patch Sorry for misinformation. Did not test for long enough. (In reply to Mike Surcouf from comment #40) > My problem not resolved by patch > Sorry for misinformation. Did not test for long enough. Hi, Mike Surcouf Do you enable "Guest services" the "Integration Services" from Hyper-V manager? and could you provides us what your test steps, build info and log messages? I check fcopy in my environment again. Hyper-V 2012 R2 x86_64 kernel 2.6.32-502.el6.x86_64 hypervfcopyd 0-0.15.20130826git.el6 Repeat steps from comment 36. 3.6G iso file could be copied from Host to guest correctly. no error or fail info from guest log and Host log. Please ignore all my comments as they do not apply here (sorry for noise). I would not let my issue stop the patches in this bug as they are all important and don't have any effect on my issue and others are seeing good results. I am working with Microsoft to repro my specific issue. I may open a separate bug report and if I do I will link it here for others. Thanks Mike Hi Mike Surcouf, For comment 39, I still have some confused with it and would like to make sure it with you further. --> I had a problem where pause and resume (backup of vm) and live migration of 6.5 on hyperv would not set the time or would set it 2 days in the past on restore. Could you please give me the detail steps to reproduce it? because I would like to reproduce it on my local environment. I did the test on rhel6.5 rhel6.6 and rhel7 and didn't meet the issue you met. 1. Do pause and resume test against VM --> click right button to select pause and then resume VM about 10 min later. the time can be set. 2. Do migration test, I used the command below to migrate VM . PS C:\> Move-VM "${name of the vm guest}" ${target host} –IncludeStorage –DestinationStoragePath ${destination pathe} The result is still like above, the time can be set as well. Are there some different steps with yours? BTW, your comments referred the backup of VM, what does the mean? Do you run the backup function with the command below to check the time on VM? #wbadmin get items -version:<version string of the backup you want> -backupTarget:K:\ #wbadmin start recovery -version:02/05/2013-23:38 -itemType:HyperV -items:{$VM name} -backuptarget:K:\ I guess you backup VM firstly and restore VM to check the time, but I am not sure whether i am right or not. We have had a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1006229) that can set the time but there are time drift after save/start VM. I think this is running off topic so I created https://bugzilla.redhat.com/show_bug.cgi?id=1150584 Hopefully I have answered you questions here and provided a clear repro. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-1392.html *** Bug 1006229 has been marked as a duplicate of this bug. *** |