Bug 230672 - Strange migration problem (RHEL5 beta2 with -8.el5 kernel)
Summary: Strange migration problem (RHEL5 beta2 with -8.el5 kernel)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Xen Maintainance List
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-03-02 01:06 UTC by Fai Wong
Modified: 2007-12-07 20:38 UTC (History)
3 users (show)

Fixed In Version: 5.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-12-07 20:38:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Log files collected on both machines (25.59 KB, application/zip)
2007-03-02 01:06 UTC, Fai Wong
no flags Details
Boot crash log (9.89 KB, text/plain)
2007-03-02 15:53 UTC, Fai Wong
no flags Details

Description Fai Wong 2007-03-02 01:06:31 UTC
I am running a POC of Domino on xen with live migration support, but ran into
some strange behaviors.

- I have updated the kernel to -8.el5 and all xen/virt related tools.
- One of the problems is in starting/migrating guest OS, it will crash around 3
to 4 times at a line like: "xvda:" 
- Normally it should be somethingl ike "xvda: xvda1 xvda2", showing the
"partitions" of xvda.
- But after several restarts it can boot finally.
- It can migrate even while running Domino.
- The image is raw image created with virt-manager.

And most interest problem comes, firstly my setup is:
xenhost1: IBM T60 laptop, running RHEL5 beta 2 with latest kernel and xen
related packages
xenhost2: IBM T60p laptop, running RHEL5 beta 2 with latest kernel and xen
related packages
bascially the hardware configurations are the same except for a larger LCD panel
on T60p.

Then, xenhost1 use nfs to share out the rhel5.img disk image on
/var/lib/xen/images, xenhost2 mounts
it using " mount -t nfs xenhost1:/var/lib/xen/images /var/lib/xen/images "

Test case 1:
xenhost1 started a guest of rhel5, it can boot successfully to the command
prompt, then I tried to migrate
it to xenhost2, after several seconds of network traffic, the rhel5 guset
disappeared on both sides.

Test case 2:
xenhost2 started a guest of rhel5, it can boot, then I tried to migrate it to
xenhost1, successfully migrated.
After that, I try to migrate it back to xenhost1, also finished in several seconds.

I have attached my log files on both hosts for investigation. Thanks!

Log files (taken on both host in test cases):
00-start-on-xenhost1.log - Test case 1, starting rhel5
01-migrate-to-xenhost2-fail.log - Test case 1, migration failed
02-start-on-xenhost2.log - Test case 2, starting rhel5
03-migrate-to-xenhost1-ok.log - Test case 2, migration OK

Comment 1 Fai Wong 2007-03-02 01:06:31 UTC
Created attachment 149080 [details]
Log files collected on both machines

Comment 2 Stephen Tweedie 2007-03-02 12:06:20 UTC
OK, it looks like there are two separate problems you are reporting: failure to
start 75% of the time due to the xvda: hang, and failure to migrate.

Given that they are both in the same bugzilla, I think it will be much less
confusing if we take them one at a time.  We need to understand the failure to
boot first; the "xvda:" hang indicates a failure of some sort in accessing or
connecting up the block devices for the guest, and any such failure could easily
also affect connection of a new guest arriving on migrate.

Can I assume that the logs you posted are all from the migrate case and don't
show the xvda: hang case?  Do you have logs for the fail-to-boot case?  Does it
fail to boot on both hosts in the same way?  Is the exact same kernel/xen being
used on both hosts, with the same kernel on the guest?

Thanks.


Comment 3 Fai Wong 2007-03-02 15:20:29 UTC
I think at this moment, I cannot give logs for fail-to-boot case yet.
So let's deal with the migration problem first, and I have cleared the 
log files and restart xend service after each test cases, so we can 
assume that the log file is for clean booting.

And yes, the kernel version on both machines, both guest and host OS are -8.el5.

Comment 4 Stephen Tweedie 2007-03-02 15:36:44 UTC
We really need to attack the boot case first, for the reasons I said above ---
it is quite possible that the boot problem is causing the migrate problem too,
and the boot case is likely to be simpler to debug, so we need to get that
working before we tackle the more complex case.


Comment 5 Fai Wong 2007-03-02 15:53:13 UTC
Created attachment 149127 [details]
Boot crash log

This is the log file for boot failure. It will crash at a line like scanning
partition on xvda.
Guest and host are both rhel5 beta with -8.el5 kernel.

Comment 6 Stephen Tweedie 2007-03-02 16:05:32 UTC
Please also attach the crash output itself.  Thanks!

Comment 7 Fai Wong 2007-03-02 16:09:01 UTC
The output is nearly same with normal bootup, just crash suddenly, before
mounting root, after detecting hard disk and scanning partition.
But I cannot get the crash output at the moment, but i will attach it asap.
Thanks.

Comment 8 Fai Wong 2007-03-13 14:41:31 UTC
In the passed 2 weeks I cannot reproduce the "Cannot start" problem anymore, and
I am busying with other stuffs recently, will update when have new finding.

Comment 9 Don Dutile (Red Hat) 2007-12-07 20:38:16 UTC
Numerous fixes were added to 5.1 to improve migration stability/repeatability.
If the reporter still sees problems with 5.1, please re-open this ticket for 5.2.



Note You need to log in before you can comment on or make changes to this bug.