Description of problem: when start migration ,use iptables to blocks the migration tcp port in dst host , then quit qemu-kvm process in dst host ,(qemu)info migrate in src should be returns "migration failed" ,but is always shows migration is in progress. Version-Release number of selected component (if applicable): # uname -r 2.6.18-231.el5 # rpm -q kvm kvm-83-207.el5 How reproducible: 100% Steps to Reproduce: 1.start VM in src host : CLI:/usr/libexec/qemu-kvm -m 10G -smp 1 -name RHEL3_64 -uuid 59960563-0abf-79df-fdfb-8462354d62b8 -no-kvm-pit-reinjection -boot c -drive file=/mnt/RHEL3_32.raw,if=ide,format=raw,cache=none,boot=on -net nic,macaddr=04:52:00:35:e8:6a,vlan=0,model=e1000 -net tap,script=/etc/qemu-ifup,vlan=0 -serial pty -parallel none -usb -vnc :4 -monitor stdio 2.clean all the firewall rules in dst host # iptables -F 3.start listenning port in dst host <commandLine> -incoming tcp:0:5888 4.begin live migration,before migration completed ,use iptables commands to reject the migration tcp port #iptables -A INPUT -p tcp --dport 5888 -j REJECT 5.kill qemu-kvm process in dst host Actual results: (qemu)info migrate Migration status: active transferred ram: 904060 kbytes remaining ram: 9638584 kbytes total ram: 10506252 kbytes and migration will nerver be end. Expected results: (qemu)info migrate Migrationg Failed. Additional info: after step 2 ,check firewall rules in dst host #iptables -L ]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination after step5 , check firewall rules in dst host. # iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination REJECT tcp -- anywhere anywhere tcp dpt:5888 reject-with icmp-port-unreachable Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
Mike, how many minutes did you wait?
(In reply to comment #4) > Mike, how many minutes did you wait? More than 30mins
(In reply to comment #6) > (In reply to comment #4) > > Mike, how many minutes did you wait? > > More than 30mins Ok, then it's likely a bug. I mean, we if we did get a response then this wouldn't be a bug.
This is not a kvm bug, it is a libvirt/virt-manager/rhev-m bug. THis is TCP for you :-( Only thing that migration code can do is to put a timeout, says 10mins by default, and "cancel" migration it it was not able to sent any data on so much time. Nothing else. But any management app can do exactly the same. They issue "info migrate", and if "transferred ram" is stuck for <timeout> long, just cancel migration. What do you expect migration to do here? Later, Juan.
You're right that the real issue should be solved at the management level, ie. hardcoded timeouts should _not_ be added to qemu. However, I believe that the scenario described in the bug report is expected to eventually fail. Doesn't matter if it's 1 or 40 minutes, I think send()/write() should eventually fail. If this assumption is correct and if this is not happening, then we're likely ignoring or not reporting the error back to the user. That would be a valid bug. Note that I'm not discussing its severity (ie. does it really matter to report an error after 30 minutes), but it's worth investigating.
Closing this bug due to the above comments. Tcp keep alive over the live migration socket will help, but it is not that important. I'll add it to the todo list
(In reply to comment #10) > You're right that the real issue should be solved at the management level, ie. > hardcoded timeouts should _not_ be added to qemu. > > However, I believe that the scenario described in the bug report is expected to > eventually fail. Doesn't matter if it's 1 or 40 minutes, I think send()/write() > should eventually fail. If this assumption is correct and if this is not > happening, then we're likely ignoring or not reporting the error back to the > user. That would be a valid bug. > > Note that I'm not discussing its severity (ie. does it really matter to report > an error after 30 minutes), but it's worth investigating. THis is not how qemu works. We do a non-blocking write() when conection is ready to accept packets. If the connection is blocked after some communication, we have the host kernel buffers for that socket full, and socket will not become ready again anymore. So we don't do any other write, and we never found the error. As said, a timeout is needed, and it is as easy to add it to management level than to qemu. Later, Juan.
According to comment #9 migrate_cancel after the steps in comment #0. 1.(qemu)migrate_cancel . Actual Results: qemu-kvm process freezed. Based on above ,reopen this issue for further investigation.
mark this issue as ack+ 1. can be reproduced. 2. comment13 3. same bug still open in rhel6.1 which proposed to fixed in rhel6.2
I am changing the bug description. Based on the last comments, this is a issue that migration freezing after a cancellation attempt. Has nothing to do with timeouts.