Bug 592806

Summary: rhel4 PV guest installations busted on rhel 5.5 i386 intel boxboro dom0
Product: Red Hat Enterprise Linux 4 Reporter: yanfu,wang <yanwang>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 4.7.zCC: drjones, xen-maint
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-05-17 10:11:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
virt workflow xml to reproduce none

Description yanfu,wang 2010-05-17 02:48:57 UTC
Created attachment 414444 [details]
virt workflow xml to reproduce

Description of problem:
When trying to install rhel4 paravirt guests on rhel5.5 dom0 on Intel boxboro, installation crashes kernel backtrace:

SMP 

Modules linked in: dm_snapshot dm_mirror dm_zero dm_mod ext3 jbd msdos raid6 raid5 xor raid1 raid0 xenblk xennet sr_mod sd_mod scsi_mod cdrom loop nfs nfs_acl lockd sunrpc vfat fat cramfs

CPU:    0

EIP:    0061:[<c0112510>]    Not tainted VLI

EFLAGS: 00010246   (2.6.9-78.ELxenU) 

EIP is at pgd_free+0x11b/0x158

eax: 00000000   ebx: cb812000   ecx: 00000400   edx: 80000000

esi: 00000000   edi: cb812000   ebp: 00000003   esp: c327ef6c

ds: 007b   es: 007b   ss: 0068

Process 05-pam_console. (pid: 765, threadinfo=c327e000 task=dfac6070)

Stack: de4ab300 de4ab300 de4ab300 dfac6070 00000000 c011ad70 cb80e000 dfac65c0 

       c011ecb4 de4ab300 00000001 c163f4c0 00000000 c327e000 c327e000 c011efc6 

       00000000 00000000 00000000 4014d6f8 c327e000 c010740f 00000000 00000000 

Call Trace:

 [<c011ad70>] __mmdrop+0x21/0x3a

 [<c011ecb4>] do_exit+0x1f4/0x412

 [<c011efc6>] sys_exit_group+0x0/0x11

 [<c010740f>] syscall_call+0x7/0xb

Code: 8b 04 98 89 f1 c1 e0 0c 81 e1 ff 0f 00 00 89 c6 09 ce 6a 00 8d 9e ff ff ff bf 89 df 53 e8 55 01 00 00 59 31 c0 b9 00 04 00 00 5e <f3> ab 53 ff 35 44 31 36 c0 e8 b6 26 03 00 80 3d 04 77 2f c0 00 

 <0>Fatal exception: panic in 5 seconds

Kernel panic - not syncing: Fatal exception

 KERNEL PANIC!


Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-194.3.1.el5 

How reproducible:
always

Steps to Reproduce:
1.Install a rhel5.5 GA tree, and upgrade kernel to 2.6.18-194.3.1.el5.

2.virt-install --name rhel4u7_i386_pv --mac 00:16:3E:50:83:E7 --location nfs:bigpapi.bos.redhat.com:/vol/engarchive2/redhat/released/RHEL-4/U7/AS/i386/tree --paravirt --file /var/lib/xen/images/rhel4u7_i386_pv.img -s 10 --debug --extra-args ks=http://lab2.rhts.eng.bos.redhat.com/cblr/svc/op/ks/system/guest-80-131.rhts.eng.bos.redhat.com --prompt --nographics --noreboot                                                                   

3.Continue with installation process
  
Actual results:
kernel panic


Expected results:
should complete


Additional info:
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=156033
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=156034
pls refer to attached xml to submit job.

Comment 1 yanfu,wang 2010-05-17 05:43:19 UTC
Install a rhel5.5 GA treeļ¼Œ hadn't upgraded to the lastest 5.5.z kernel, and same problem occur on yet, pls refer to the below job:
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=157517

Comment 2 yanfu,wang 2010-05-17 10:11:17 UTC
4.7.z does not support Boxboro (Nehalem-EX), pls refer to bz491338.

Comment 3 Andrew Jones 2010-05-17 14:22:44 UTC
(In reply to comment #2)
> 4.7.z does not support Boxboro (Nehalem-EX), pls refer to bz491338.    

The bz referenced here is for a deadlock, but the backtrace in this bug is in paging code, and looks unrelated. Does this machine have >= 64G of memory? If so, then this bug is likely a dup of bug 504988.

Comment 4 yanfu,wang 2010-05-18 03:22:18 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > 4.7.z does not support Boxboro (Nehalem-EX), pls refer to bz491338.    
> 
> The bz referenced here is for a deadlock, but the backtrace in this bug is in
> paging code, and looks unrelated. Does this machine have >= 64G of memory? If
> so, then this bug is likely a dup of bug 504988.    

I check the machine intel-s3e36-01.lab.bos.redhat.com which my job run on it, seems not.
[root@intel-s3e36-01 ~]# cat /proc/meminfo 
MemTotal:     14042112 kB
MemFree:      13552432 kB
Buffers:         31072 kB
Cached:         144136 kB
SwapCached:          0 kB
Active:          76292 kB
Inactive:       140100 kB
HighTotal:    13303304 kB
HighFree:     13096432 kB
LowTotal:       738808 kB
LowFree:        456000 kB
SwapTotal:     3899384 kB
SwapFree:      3899384 kB
Dirty:              28 kB
Writeback:          84 kB
AnonPages:       41164 kB
Mapped:          12196 kB
Slab:            39600 kB
PageTables:       2076 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  10920440 kB
Committed_AS:   276988 kB
VmallocTotal:   114680 kB
VmallocUsed:      9796 kB
VmallocChunk:   104784 kB

Comment 5 Andrew Jones 2010-05-18 14:16:41 UTC
[root@intel-s3e36-01 ~]# uname -a
Linux intel-s3e36-01.lab.bos.redhat.com 2.6.32-19.el6.x86_64 #1 SMP Tue Mar 9 17:48:46 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@intel-s3e36-01 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         64439       2429      62009          0         75        752
-/+ buffers/cache:       1602      62837
Swap:        66495          0      66495

Looks like it has 64G.

Comment 6 yanfu,wang 2010-05-19 09:13:33 UTC
(In reply to comment #5)
> [root@intel-s3e36-01 ~]# uname -a
> Linux intel-s3e36-01.lab.bos.redhat.com 2.6.32-19.el6.x86_64 #1 SMP Tue Mar 9
> 17:48:46 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
> [root@intel-s3e36-01 ~]# free -m
>              total       used       free     shared    buffers     cached
> Mem:         64439       2429      62009          0         75        752
> -/+ buffers/cache:       1602      62837
> Swap:        66495          0      66495
> 
> Looks like it has 64G.    


hi Andrew,
Thanks your reminder, I checked my failed jobs about the mem size again, seems there is limit with the available memory in kernel-xen on host, the info is below:
 
********** System Information **********
Hostname                = intel-s3e36-01.lab.bos.redhat.com
Kernel Version          = 2.6.18-194.el5xen
Machine Hardware Name   = i686
Processor Type          = i686
uname -a output         = Linux intel-s3e36-01.lab.bos.redhat.com 2.6.18-194.el5xen #1 SMP Tue Mar 16 22:08:06 EDT 2010 i686 i686 i386 GNU/Linux
Swap Size               = 3807 MB
Mem Size                = 13713 MB

pls refer to these below links, sorry I can't reserve the machine to double check since there's problem on inventory today.
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14084523
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14058503
http://rhts.redhat.com/testlogs/2010/05/156034/400833/3242361/sys.log

Comment 7 Andrew Jones 2010-05-19 20:46:18 UTC
(In reply to comment #6)
> hi Andrew,
> Thanks your reminder, I checked my failed jobs about the mem size again, seems
> there is limit with the available memory in kernel-xen on host, the info is

Right, there are limits. 32G is the limit for a 64b dom0, and 16G for a 32b. There's a warning in the third log file you linked to.

<4>RAM exceeds maximum supported memory for x86, Truncating to 64GB
<4>Warning only 4GB will be used.
<4>Use a PAE enabled kernel.

When I hopped on the machine it was booted to a bare-metal kernel, so I just did 'free -m'. If you want to check system memory on a dom0 (kernel-xen) machine then you should do 'xm info | grep total_mem'.

Since I saw 64G on the system, and also the warning I've copied above in the log, then I'm pretty sure this bug is a dup of bug 504988. I guess leaving it closed as NOTABUG is fine as well though.