Bug 220166

Summary: Apparent Lack of Disk Sync After Virt-Manager Install
Product: [Fedora] Fedora Reporter: Thomas J. Baker <tjb>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: bstein, katzj
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-03-13 14:48:09 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
rpm versions, modules
none
xend.log
none
xend-debug.log
none
/etc/xen/fc6 config
none
/etc/xen/as5 config
none
virt-manager.log none

Description Thomas J. Baker 2006-12-19 09:16:25 EST
I did a virt-manager install of FC6 on an FC6 host. I used a real disk
(/dev/sdw) for the guest domain disk. The install went fine. Upon reboot, I get
a pygrub error and the following xend-debug log entries:

[root@wintermute xen]# more xend-debug.log 
Traceback (most recent call last):
  File "/usr/bin/pygrub", line 489, in ?
    g = Grub(file, isconfig)
  File "/usr/bin/pygrub", line 147, in __init__
    self.read_config(file, isconfig)
  File "/usr/bin/pygrub", line 345, in read_config
    raise RuntimeError, "Unable to read filesystem" 
RuntimeError: Unable to read filesystem
Traceback (most recent call last):
  File
"/usr/lib64/python2.4/site-packages/xen/xend/server/SrvDomainDir.py",
line 77, in op_create
    dominfo = self.xd.domain_create(config)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line
228, in domain_create
    dominfo = XendDomainInfo.create(config)
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py",
line 195, in create
    vm.initDomain()
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py",
line 1290, in initDomain
    self.configure_bootloader()
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py",
line 1756, in configure_bootloader
    self.info['image'])
  File "/usr/lib64/python2.4/site-packages/xen/xend/XendBootloader.py",
line 85, in bootloader
    raise VmError, msg
VmError: Boot loader didn't return any data!
Traceback (most recent call last):
  File "/usr/bin/pygrub", line 489, in ?
    g = Grub(file, isconfig)
  File "/usr/bin/pygrub", line 147, in __init__
    self.read_config(file, isconfig)
  File "/usr/bin/pygrub", line 345, in read_config
    raise RuntimeError, "Unable to read filesystem" 
RuntimeError: Unable to read filesystem



When I do an fdisk -l of the disk I just did the install on, it returns
an empty disk:

fdisk -l /dev/sdw

Disk /dev/sdw: 9105 MB, 9105018880 bytes
255 heads, 63 sectors/track, 1106 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System


I go home for the night. I come back the next morning and try to start
the domain and it starts fine (except for a lack of console, more on
that later.) Then for grins, I look at the disk again:

 fdisk -l /dev/sdw

Disk /dev/sdw: 9105 MB, 9105018880 bytes
255 heads, 63 sectors/track, 1106 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdw1   *           1          13      104391   83  Linux
/dev/sdw2              14        1106     8779522+  8e  Linux LVM

It shows the correct layout that the domain would have written to the
disk. Maybe I made a mistake. I don't know. So then I explode RHEL5B2 CDs to
make a tree to see if I can do a virt-manager install with it and it works
fine. I point it to a blank disk (/dev/sdx) and do a complete install.
Reboot and get the same pybrub errors. Look at the disk and it's showing
this disk as empty too.

fdisk -l /dev/sdx

Disk /dev/sdx: 50.0 GB, 50019202560 bytes
255 heads, 63 sectors/track, 6081 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System


I remember the apparent lack of sync so I run sync a couple of times,
restart xend, all trying to get whatever has the disk data in memory
sync'd back out to disk. No luck. I go home and come back in this
morning and 

fdisk -l /dev/sdx

Disk /dev/sdx: 50.0 GB, 50019202560 bytes
255 heads, 63 sectors/track, 6081 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdx1   *           1          13      104391   83  Linux
/dev/sdx2              14        6081    48741210   8e  Linux LVM


Domain starts up fine again (except for the console.)

Any clue what's going on here?
Comment 1 Daniel Berrange 2006-12-19 09:42:57 EST
Never seen this behaviour at all before - installs to physical devices have
always 'just worked'. So perhaps this is some edge case with the drivers used
for your host disks. We'll need to examine some more debugging / logs to get an
idea of what the likely problem areas are, so could you attach the contents of
the following files to this ticket

/proc/modules
/var/log/xen/xend.log
/var/log/xen/xend-debug.log
/etc/xen/<domain name>    (the config file for the guest)
/root/.virt-manager/virt-manager.log

Also, can you comment on what RPM versions you have for kernel-xen, xen,
libvirt, python-virtinst & virt-manager.
Comment 2 Thomas J. Baker 2006-12-19 09:52:32 EST
Created attachment 144000 [details]
rpm versions, modules
Comment 3 Thomas J. Baker 2006-12-19 09:57:34 EST
Created attachment 144001 [details]
xend.log
Comment 4 Thomas J. Baker 2006-12-19 09:58:40 EST
Created attachment 144002 [details]
xend-debug.log
Comment 5 Thomas J. Baker 2006-12-19 10:00:33 EST
Created attachment 144003 [details]
/etc/xen/fc6 config
Comment 6 Thomas J. Baker 2006-12-19 10:01:32 EST
Created attachment 144004 [details]
/etc/xen/as5 config
Comment 7 Thomas J. Baker 2006-12-19 10:02:30 EST
Created attachment 144005 [details]
virt-manager.log
Comment 8 Thomas J. Baker 2006-12-19 10:05:08 EST
There's all the info you requested. There is probably some extraneous stuff in
the logs from failed install attempts, etc. Hopefully you can weed out what you
need. Both domains also don't have their graphics consoles working. I'm not sure
what if I've done anything wrong or not.
Comment 9 Stephen Tweedie 2006-12-19 12:19:49 EST
When I do an fdisk -l of the disk I just did the install on, it returns
an empty disk:

fdisk -l /dev/sdw

...


This is often a sign that you have got software running which is holding the
disk's partition table pinned in cache.  As a result, the previously-read copy
of the superblock in cache never gets invalidated and we continue to see that
even after guest install has overwritten the partition table.

Have you any such software running?  LVM or raw devices bound to /dev/sdw or
sdx?  "Smart" running on them?
Comment 10 Daniel Berrange 2006-12-19 12:46:44 EST
I just noticed that you mapped through the entire disk /dev/sdw  to the guest
OS, rather than a particular partition within /dev/sdw. This means that the host
OS will be exposed to the partition tables created/managed by the guest. As wel
as the partition table refresh issues, (more seriously) this will impact
mount-by-label - the host OS will see the indiivdual partitions & filesystems
created by the guest & may well end up mistaking them for its own. So when doing
'mount FS with label /'  it may well end up mounting the guests root filesystem
in the host, with obviously disasterous results. Our recommended approach for
physical disks is to actually have nested partition tables. 

eg, On host, create a partition on your disk, eg /dev/sdw1 and map that to the
guest as /dev/xvda. Then when provisioning the guest, let it create its own
partitioning within /dev/xvda.

Can you re-try with this partitioning approach & confirm that pygrub then works
as expected - it should avoid any data pinning/refresh issues that Stephen
described in commentt #9.
Comment 11 Thomas J. Baker 2006-12-19 13:36:40 EST
Thanks. I was wondering about best practices regarding disks. I had only seen
references to using partitions but I didn't find any definitive documentation.
I'll reinstall those two guests with partitions.
Comment 12 Thomas J. Baker 2006-12-19 16:52:35 EST
I reinstalled using a partition and things went much better.
Comment 13 Stephen Tweedie 2007-03-13 14:48:09 EDT
OK, closing this; it's not a bug as such, and is no different from the behaviour
you'd get if any other application tried accessing a whole-disk device while
it's already partition-scanned.