Bug 465914

Summary: rhel4 PV guest installations busted on rhel 5.3 i386 intel dom0
Product: Red Hat Enterprise Linux 4 Reporter: Gurhan Ozen <gozen>
Component: kernel-xenAssignee: Chris Lalancette <clalance>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.7CC: clalance, dhoward, duck, ijc, jburke, jplans, kmoriwak, martin.wilck, qcai, rlerch, syeghiay, tao, vmayatsk, xen-maint
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:06:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 458752, 483748    
Attachments:
Description Flags
Backport of upstream Xen c/s 10425, to fix the RHEL-4 crash none

Description Gurhan Ozen 2008-10-07 05:29:43 UTC
Description of problem:

When trying to install rhel4 paravirt guests on rhel5.3 dom0, installation crashes with kernel backtrace:

Oops: 0003 [#1]
SMP 
Modules linked in: dm_snapshot dm_mirror dm_zero dm_mod ext3 jbd msdos raid6 raid5 xor raid1 raid0 xenblk xennet sr_mod sd_mod scsi_mod cdrom loop nfs nfs_acl lockd sunrpc vfat fat cramfs                                                            
CPU:    0                                                                       
EIP:    0061:[<c0112510>]    Not tainted VLI                                    
EFLAGS: 00010246   (2.6.9-78.ELxenU) |  <Space> selects   |  <F12> next screen 
EIP is at pgd_free+0x11b/0x158
eax: 00000000   ebx: d40bb000   ecx: 00000400   edx: 80000001
esi: 00000000   edi: d40bb000   ebp: 00000003   esp: d41ede4c
ds: 007b   es: 007b   ss: 0068
Process make_fonts_map. (pid: 2289, threadinfo=d41ed000 task=d42118f0)
Stack: eb50c300 eb50c300 00007ff0 eb50c300 eb50c300 c011ad70 d4530000 d41ede78 
       c0165240 eb50c300 d42118f0 00000005 001c801e 00000080 c015b6a7 d40c6544 
       d40c6040 d41ed000 c0000000 00000000 d4527900 d41ed000 ec1cd400 c016530f 
Call Trace:
 [<c011ad70>] __mmdrop+0x21/0x3a
 [<c0165240>] exec_mmap+0x1df/0x200
 [<c015b6a7>] vfs_read+0xcf/0xd8
 [<c016530f>] flush_old_exec+0x46/0x24b
 [<c0181cc4>] load_elf_binary+0x385/0xd3d
 [<c0181ed3>] load_elf_binary+0x594/0xd3d
 [<c0148e7b>] kmap_high+0x19/0x21c
 [<c0149091>] kunmap_high+0x13/0x95
 [<c01490f6>] kunmap_high+0x78/0x95
 [<c0164b87>] copy_strings+0x22f/0x23a
 [<c018193f>] load_elf_binary+0x0/0xd3d
 [<c0165dfa>] search_binary_handler+0xb8/0x257
 [<c0166111>] do_execve+0x178/0x210
 [<c0105da0>] sys_execve+0x2c/0x8e
 [<c010740f>] syscall_call+0x7/0xb
Code: 8b 04 98 89 f1 c1 e0 0c 81 e1 ff 0f 00 00 89 c6 09 ce 6a 00 8d 9e ff ff ff bf 89 df 53 e8 55 01 00 00 59 31 c0 b9 00 04 00 00 5e <f3> ab 53 ff 35 44 31 36 c0 e8 b6 26 03 00 80 3d 04 77 2f c0 00 
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception
 
Guest installation complete... restarting guest.
virDomainCreate() failed POST operation failed: (xend.err "Error creating domain: Boot loader didn't return any data!")
Domain installation may not have been
 successful.  If it was, you can restart your domain
 by running 'virsh start rhel4.7_i386_pv_guest'; otherwise, please
 restart your installation.
Mon, 06 Oct 2008 18:52:26 ERROR    virDomainCreate() failed POST operation failed: (xend.err "Error creating domain: Boot loader didn't return any data!")
Traceback (most recent call last):
  File "/usr/sbin/virt-install", line 559, in ?
    main()
  File "/usr/sbin/virt-install", line 545, in main
    dom.create()
  File "/usr/lib/python2.4/site-packages/libvirt.py", line 228, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: virDomainCreate() failed POST operation failed: (xend.err "Error creating domain: Boot loader didn't return any data!")


Version-Release number of selected component (if applicable):
# uname -a
Linux intel-s3ea2-03.rhts.bos.redhat.com 2.6.18-117.el5xen #1 SMP Mon Sep 29 22:57:44 EDT 2008 i686 i686 i386 GNU/Linux
# rpm -qa | grep xen 
xen-3.0.3-73.el5
xen-devel-3.0.3-73.el5
kernel-xen-2.6.18-117.el5
xen-libs-3.0.3-73.el5


How reproducible:
Very. This (intel-s3ea2-03.rhts.bos.redhat.com) is an rhts machine so you can reserve it and reproduce it there too. 


Steps to Reproduce:
1. Install a rhel5.3 tree. 
2. virt-install --name rhel4.7_i386_pv_guest --location nfs:bigpapi.bos.redhat.com:/vol/engineering/redhat/released/RHEL-4/U7/AS/i386/tree --nonsparse --paravirt --file /var/lib/xen/images/rhel4.7_i386_pv_guest.img -s 5 -r 1024 --nographics
3. Continue with installation process 
  
Actual results:
Crashes.

Expected results:
should complete

Additional info:

this is an intel box, hardware info about the box can be seen here: 
http://rhts.redhat.com/cgi-bin/rhts/system.cgi?id=224

Also from xend.log:
[2008-10-06 18:52:11 xend.XendDomainInfo 4413] WARNING (XendDomainInfo:931) Domain has crashed: name=rhel4.7_i386_pv_guest id=1.
[2008-10-06 18:52:25 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:1568) XendDomainInfo.destroy: domid=1
[2008-10-06 18:52:25 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:1576) XendDomainInfo.destroyDomain(1)
[2008-10-06 18:52:26 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:200) XendDomainInfo.create(['vm', ['name', 'rhel4.7_i386_pv_guest'], ['memory', '1024'], ['maxmem', '1024'], ['vcpus', '1'], ['uuid', '8e252747-1594-aade-8767-4359dfe26704'], ['bootloader', '/usr/bin/pygrub'], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['device', ['tap', ['dev', 'xvda'], ['uname', 'tap:aio:/var/lib/xen/images/rhel4.7_i386_pv_guest.img'], ['mode', 'w']]], ['device', ['vif', ['mac', '00:16:3e:09:9b:b2'], ['bridge', 'xenbr1']]]])
[2008-10-06 18:52:26 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:312) parseConfig: config is ['vm', ['name', 'rhel4.7_i386_pv_guest'], ['memory', '1024'], ['maxmem', '1024'], ['vcpus', '1'], ['uuid', '8e252747-1594-aade-8767-4359dfe26704'], ['bootloader', '/usr/bin/pygrub'], ['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash', 'restart'], ['device', ['tap', ['dev', 'xvda'], ['uname', 'tap:aio:/var/lib/xen/images/rhel4.7_i386_pv_guest.img'], ['mode', 'w']]], ['device', ['vif', ['mac', '00:16:3e:09:9b:b2'], ['bridge', 'xenbr1']]]]
[2008-10-06 18:52:26 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:417) parseConfig: result is {'shadow_memory': None, 'start_time': None, 'uuid': '8e252747-1594-aade-8767-4359dfe26704', 'on_crash': 'restart', 'on_reboot': 'restart', 'localtime': None, 'image': None, 'on_poweroff': 'destroy', 'bootloader_args': None, 'cpus': None, 'name': 'rhel4.7_i386_pv_guest', 'backend': [], 'vcpus': 1, 'cpu_weight': None, 'features': None, 'vcpu_avail': None, 'memory': 1024, 'device': [('tap', ['tap', ['dev', 'xvda'], ['uname', 'tap:aio:/var/lib/xen/images/rhel4.7_i386_pv_guest.img'], ['mode', 'w']]), ('vif', ['vif', ['mac', '00:16:3e:09:9b:b2'], ['bridge', 'xenbr1']])], 'bootloader': '/usr/bin/pygrub', 'cpu': None, 'maxmem': 1024}
[2008-10-06 18:52:26 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:1358) XendDomainInfo.construct: None
[2008-10-06 18:52:26 xend 4413] DEBUG (balloon:143) Balloon: 1048772 KiB free; need 2048; done.
[2008-10-06 18:52:26 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:1406) XendDomainInfo.initDomain: 2 1.0
[2008-10-06 18:52:26 xend 4413] ERROR (XendBootloader:84) Boot loader didn't return any data!
[2008-10-06 18:52:26 xend.XendDomainInfo 4413] ERROR (XendDomainInfo:212) Domain construction failed
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 205, in create
    vm.initDomain()
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1411, in initDomain
    self.configure_bootloader()
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1932, in configure_bootloader
    self.info['image'])
  File "/usr/lib/python2.4/site-packages/xen/xend/XendBootloader.py", line 85, in bootloader
    raise VmError, msg
VmError: Boot loader didn't return any data!
[2008-10-06 18:52:26 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:1568) XendDomainInfo.destroy: domid=2
[2008-10-06 18:52:26 xend.XendDomainInfo 4413] DEBUG (XendDomainInfo:1576) XendDomainInfo.destroyDomain(2)
[2008-10-06 18:52:26 xend 4413] ERROR (SrvBase:88) Request create failed.
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/web/SrvBase.py", line 85, in perform
    return op_method(op, req)
  File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDomainDir.py", line 82, in op_create
    raise XendError("Error creating domain: " + str(ex))
XendError: Error creating domain: Boot loader didn't return any data!

Comment 2 Chris Lalancette 2008-10-07 15:37:10 UTC
I tested this out on an Intel machine locally:

Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz

With kernel -118 and xen -73, and wasn't able to reproduce the problem.  I also tried with kernel -117 and xen -73, and also wasn't able to reproduce the problem.  So it's probably hardware specific.  We'll have to jump on the RHTS machine that Gurhan mentioned in the initial description and reproduce it there.

Chris Lalancette

Comment 5 Chris Lalancette 2008-10-09 10:24:53 UTC
Well, the good news here is that I'm pretty sure this is a RHEL-4 Xen guest bug, not a dom0 bug.  I wasn't able to start the install *at all* with 5.2 on this particular hardware, and on 5.3 stuff I get the crash here.  So it will have to be looked at for 4.8, but I don't think (at the moment) this is a RHEL-5 blocker.  I'm going to update the component to reflect this.

Chris Lalancette

Comment 6 Chris Lalancette 2008-10-09 13:42:31 UTC
OK, I found it.  We've been missing a patch that's been in upstream Xen dom kernels basically forever; RHEL-5 has this patch, but we do not.  It basically makes it so that we can take a "spurious" page fault, when the hypervisor has changed the pte mapping underneath us from R/0 -> R/W; that's exactly what is happening in this sequence in arch/i386/mm/pgtable-xen.c:

				make_lowmem_page_writable(
					pmd, XENFEAT_writable_page_tables);
				memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));


It all makes sense; the only thing I don't understand is why we've gotten away with it up until this point.  Maybe this processor family changes something with the way TLB's are done, or something like that, which is why we only see it here.  Anyway, I'll attach a backport of upstream Xen c/s 10425 which fixes the issue for me; I've only tested it on i386 so far, but I'll also need to test it on x86_64.

Chris Lalancette

Comment 7 Chris Lalancette 2008-10-09 13:44:13 UTC
Created attachment 319854 [details]
Backport of upstream Xen c/s 10425, to fix the RHEL-4 crash

Comment 8 Chris Lalancette 2008-10-14 16:01:20 UTC
*** Bug 466932 has been marked as a duplicate of this bug. ***

Comment 9 Vivek Goyal 2008-10-28 12:56:51 UTC
Committed in 78.16.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 13 Gurhan Ozen 2009-02-01 00:00:59 UTC
Just as an update, I was able to manually install 4.8 guest on rhel5.3 release kernel without an issue. I won't verify the bug just yet, however so far things are looking good:

[root@dhcp71-25 ~]# uname -a
Linux dhcp71-25.rhts.bos.redhat.com 2.6.9-80.ELxenU #1 SMP Fri Jan 23 16:57:22 EST 2009 i686 i686 i386 GNU/Linux

Comment 20 errata-xmlrpc 2009-05-18 19:06:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html