I am running a POC of Domino on xen with live migration support, but ran into some strange behaviors. - I have updated the kernel to -8.el5 and all xen/virt related tools. - One of the problems is in starting/migrating guest OS, it will crash around 3 to 4 times at a line like: "xvda:" - Normally it should be somethingl ike "xvda: xvda1 xvda2", showing the "partitions" of xvda. - But after several restarts it can boot finally. - It can migrate even while running Domino. - The image is raw image created with virt-manager. And most interest problem comes, firstly my setup is: xenhost1: IBM T60 laptop, running RHEL5 beta 2 with latest kernel and xen related packages xenhost2: IBM T60p laptop, running RHEL5 beta 2 with latest kernel and xen related packages bascially the hardware configurations are the same except for a larger LCD panel on T60p. Then, xenhost1 use nfs to share out the rhel5.img disk image on /var/lib/xen/images, xenhost2 mounts it using " mount -t nfs xenhost1:/var/lib/xen/images /var/lib/xen/images " Test case 1: xenhost1 started a guest of rhel5, it can boot successfully to the command prompt, then I tried to migrate it to xenhost2, after several seconds of network traffic, the rhel5 guset disappeared on both sides. Test case 2: xenhost2 started a guest of rhel5, it can boot, then I tried to migrate it to xenhost1, successfully migrated. After that, I try to migrate it back to xenhost1, also finished in several seconds. I have attached my log files on both hosts for investigation. Thanks! Log files (taken on both host in test cases): 00-start-on-xenhost1.log - Test case 1, starting rhel5 01-migrate-to-xenhost2-fail.log - Test case 1, migration failed 02-start-on-xenhost2.log - Test case 2, starting rhel5 03-migrate-to-xenhost1-ok.log - Test case 2, migration OK
Created attachment 149080 [details] Log files collected on both machines
OK, it looks like there are two separate problems you are reporting: failure to start 75% of the time due to the xvda: hang, and failure to migrate. Given that they are both in the same bugzilla, I think it will be much less confusing if we take them one at a time. We need to understand the failure to boot first; the "xvda:" hang indicates a failure of some sort in accessing or connecting up the block devices for the guest, and any such failure could easily also affect connection of a new guest arriving on migrate. Can I assume that the logs you posted are all from the migrate case and don't show the xvda: hang case? Do you have logs for the fail-to-boot case? Does it fail to boot on both hosts in the same way? Is the exact same kernel/xen being used on both hosts, with the same kernel on the guest? Thanks.
I think at this moment, I cannot give logs for fail-to-boot case yet. So let's deal with the migration problem first, and I have cleared the log files and restart xend service after each test cases, so we can assume that the log file is for clean booting. And yes, the kernel version on both machines, both guest and host OS are -8.el5.
We really need to attack the boot case first, for the reasons I said above --- it is quite possible that the boot problem is causing the migrate problem too, and the boot case is likely to be simpler to debug, so we need to get that working before we tackle the more complex case.
Created attachment 149127 [details] Boot crash log This is the log file for boot failure. It will crash at a line like scanning partition on xvda. Guest and host are both rhel5 beta with -8.el5 kernel.
Please also attach the crash output itself. Thanks!
The output is nearly same with normal bootup, just crash suddenly, before mounting root, after detecting hard disk and scanning partition. But I cannot get the crash output at the moment, but i will attach it asap. Thanks.
In the passed 2 weeks I cannot reproduce the "Cannot start" problem anymore, and I am busying with other stuffs recently, will update when have new finding.
Numerous fixes were added to 5.1 to improve migration stability/repeatability. If the reporter still sees problems with 5.1, please re-open this ticket for 5.2.