Bug 654937
Summary: | kvm freezes after migration cancellation attempt. | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Mike Cao <bcao> | |
Component: | kvm | Assignee: | Juan Quintela <quintela> | |
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 5.6 | CC: | bcao, gcosta, juzhang, jyang, lcapitulino, michen, mkenneth, tburke, virt-maint, ykaul | |
Target Milestone: | rc | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 669581 (view as bug list) | Environment: | ||
Last Closed: | 2011-07-18 10:20:57 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 580949 |
Description
Mike Cao
2010-11-19 05:09:07 UTC
Mike, how many minutes did you wait? (In reply to comment #4) > Mike, how many minutes did you wait? More than 30mins (In reply to comment #6) > (In reply to comment #4) > > Mike, how many minutes did you wait? > > More than 30mins Ok, then it's likely a bug. I mean, we if we did get a response then this wouldn't be a bug. This is not a kvm bug, it is a libvirt/virt-manager/rhev-m bug. THis is TCP for you :-( Only thing that migration code can do is to put a timeout, says 10mins by default, and "cancel" migration it it was not able to sent any data on so much time. Nothing else. But any management app can do exactly the same. They issue "info migrate", and if "transferred ram" is stuck for <timeout> long, just cancel migration. What do you expect migration to do here? Later, Juan. You're right that the real issue should be solved at the management level, ie. hardcoded timeouts should _not_ be added to qemu. However, I believe that the scenario described in the bug report is expected to eventually fail. Doesn't matter if it's 1 or 40 minutes, I think send()/write() should eventually fail. If this assumption is correct and if this is not happening, then we're likely ignoring or not reporting the error back to the user. That would be a valid bug. Note that I'm not discussing its severity (ie. does it really matter to report an error after 30 minutes), but it's worth investigating. Closing this bug due to the above comments. Tcp keep alive over the live migration socket will help, but it is not that important. I'll add it to the todo list (In reply to comment #10) > You're right that the real issue should be solved at the management level, ie. > hardcoded timeouts should _not_ be added to qemu. > > However, I believe that the scenario described in the bug report is expected to > eventually fail. Doesn't matter if it's 1 or 40 minutes, I think send()/write() > should eventually fail. If this assumption is correct and if this is not > happening, then we're likely ignoring or not reporting the error back to the > user. That would be a valid bug. > > Note that I'm not discussing its severity (ie. does it really matter to report > an error after 30 minutes), but it's worth investigating. THis is not how qemu works. We do a non-blocking write() when conection is ready to accept packets. If the connection is blocked after some communication, we have the host kernel buffers for that socket full, and socket will not become ready again anymore. So we don't do any other write, and we never found the error. As said, a timeout is needed, and it is as easy to add it to management level than to qemu. Later, Juan. According to comment #9 migrate_cancel after the steps in comment #0. 1.(qemu)migrate_cancel . Actual Results: qemu-kvm process freezed. Based on above ,reopen this issue for further investigation. mark this issue as ack+ 1. can be reproduced. 2. comment13 3. same bug still open in rhel6.1 which proposed to fixed in rhel6.2 I am changing the bug description. Based on the last comments, this is a issue that migration freezing after a cancellation attempt. Has nothing to do with timeouts. |