Red Hat Bugzilla – Bug 230672
Strange migration problem (RHEL5 beta2 with -8.el5 kernel)
Last modified: 2007-12-07 15:38:16 EST
I am running a POC of Domino on xen with live migration support, but ran into
some strange behaviors.
- I have updated the kernel to -8.el5 and all xen/virt related tools.
- One of the problems is in starting/migrating guest OS, it will crash around 3
to 4 times at a line like: "xvda:"
- Normally it should be somethingl ike "xvda: xvda1 xvda2", showing the
"partitions" of xvda.
- But after several restarts it can boot finally.
- It can migrate even while running Domino.
- The image is raw image created with virt-manager.
And most interest problem comes, firstly my setup is:
xenhost1: IBM T60 laptop, running RHEL5 beta 2 with latest kernel and xen
xenhost2: IBM T60p laptop, running RHEL5 beta 2 with latest kernel and xen
bascially the hardware configurations are the same except for a larger LCD panel
Then, xenhost1 use nfs to share out the rhel5.img disk image on
/var/lib/xen/images, xenhost2 mounts
it using " mount -t nfs xenhost1:/var/lib/xen/images /var/lib/xen/images "
Test case 1:
xenhost1 started a guest of rhel5, it can boot successfully to the command
prompt, then I tried to migrate
it to xenhost2, after several seconds of network traffic, the rhel5 guset
disappeared on both sides.
Test case 2:
xenhost2 started a guest of rhel5, it can boot, then I tried to migrate it to
xenhost1, successfully migrated.
After that, I try to migrate it back to xenhost1, also finished in several seconds.
I have attached my log files on both hosts for investigation. Thanks!
Log files (taken on both host in test cases):
00-start-on-xenhost1.log - Test case 1, starting rhel5
01-migrate-to-xenhost2-fail.log - Test case 1, migration failed
02-start-on-xenhost2.log - Test case 2, starting rhel5
03-migrate-to-xenhost1-ok.log - Test case 2, migration OK
Created attachment 149080 [details]
Log files collected on both machines
OK, it looks like there are two separate problems you are reporting: failure to
start 75% of the time due to the xvda: hang, and failure to migrate.
Given that they are both in the same bugzilla, I think it will be much less
confusing if we take them one at a time. We need to understand the failure to
boot first; the "xvda:" hang indicates a failure of some sort in accessing or
connecting up the block devices for the guest, and any such failure could easily
also affect connection of a new guest arriving on migrate.
Can I assume that the logs you posted are all from the migrate case and don't
show the xvda: hang case? Do you have logs for the fail-to-boot case? Does it
fail to boot on both hosts in the same way? Is the exact same kernel/xen being
used on both hosts, with the same kernel on the guest?
I think at this moment, I cannot give logs for fail-to-boot case yet.
So let's deal with the migration problem first, and I have cleared the
log files and restart xend service after each test cases, so we can
assume that the log file is for clean booting.
And yes, the kernel version on both machines, both guest and host OS are -8.el5.
We really need to attack the boot case first, for the reasons I said above ---
it is quite possible that the boot problem is causing the migrate problem too,
and the boot case is likely to be simpler to debug, so we need to get that
working before we tackle the more complex case.
Created attachment 149127 [details]
Boot crash log
This is the log file for boot failure. It will crash at a line like scanning
partition on xvda.
Guest and host are both rhel5 beta with -8.el5 kernel.
Please also attach the crash output itself. Thanks!
The output is nearly same with normal bootup, just crash suddenly, before
mounting root, after detecting hard disk and scanning partition.
But I cannot get the crash output at the moment, but i will attach it asap.
In the passed 2 weeks I cannot reproduce the "Cannot start" problem anymore, and
I am busying with other stuffs recently, will update when have new finding.
Numerous fixes were added to 5.1 to improve migration stability/repeatability.
If the reporter still sees problems with 5.1, please re-open this ticket for 5.2.