Bug 210070

Summary: blkbk/netbk modules don't load
Product: Red Hat Enterprise Linux 5 Reporter: Chris Wright <chrisw>
Component: kernelAssignee: Aron Griffis <agriffis>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5.0CC: bstein, jburke, prarit, riel
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
URL: http://rhts.lab/cgi-bin/rhts/test_log.cgi?id=863692
Whiteboard:
Fixed In Version: 2.6.18-1.2728.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-25 18:47:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 202971, 210015    
Bug Blocks:    

Description Chris Wright 2006-10-09 20:40:01 UTC
+++ This bug was initially created as a clone of Bug #202971 +++

Description of problem:
modprobe blkbk/netbk fails due to memory allocation failures

Version-Release number of selected component (if applicable):
2.6.17-1.2566.fc6xen

Additional info:
FATAL: Error inserting blkbk
(/lib/modules/2.6.17-1.2566.fc6xen/kernel/drivers/xen/blkback/blkbk.ko): Cannot
allocate memory
modprobe: page allocation failure. order:8, mode:0xd0

Call Trace:
 [<a00000010001c8c0>] show_stack+0x40/0xa0
                                sp=e00000001f3ffc20 bsp=e00000001f3f9230
 [<a00000010001c950>] dump_stack+0x30/0x60
                                sp=e00000001f3ffdf0 bsp=e00000001f3f9218
 [<a0000001000fdce0>] __alloc_pages+0x500/0x540
                                sp=e00000001f3ffdf0 bsp=e00000001f3f91a0
 [<a0000001000fddd0>] __get_free_pages+0xb0/0x140
                                sp=e00000001f3ffe00 bsp=e00000001f3f9178
 [<a0000001003b35c0>] balloon_alloc_empty_page_range+0x80/0x400
                                sp=e00000001f3ffe00 bsp=e00000001f3f9130
 [<a0000002003cc190>] netback_init+0x190/0x440 [netbk]
                                sp=e00000001f3ffe30 bsp=e00000001f3f90f8
 [<a0000001000c5140>] sys_init_module+0x180/0x400
                                sp=e00000001f3ffe30 bsp=e00000001f3f9088
 [<a0000001000140e0>] ia64_ret_from_syscall+0x0/0x40
                                sp=e00000001f3ffe30 bsp=e00000001f3f9088
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e00000001f400000 bsp=e00000001f3f9088

Full log attached

I believe this is fixed by
http://xenbits.xensource.com/ext/xen-ia64-unstable.hg?cs=bef360142b62
I'll follow up with confirmation on that after testing.

-- Additional comment from aron on 2006-09-14 15:31 EST --
Mentioned changeset doesn't fix this problem.

There are three ways to fix this problem:
(1) link blkback and netback into xen kernel statically instead of as modules.
(2) wait for upstream xen-ia64 to finish xencomm implementation (in progress)
(3) debug the balloon driver issues that cause this problem presently

Of these, I've been working on #3 but may not have a fix soon enough.  I think
we should go ahead with #1 while we wait for #2 (which hopefully will be ready
for rhel5 release).  Statically linking the blkback and netback modules has no
negative effect on domU, but it enables us to put more effort into testing
fedora/rhel5 xen/ia64, which we can't do well right now.

-- Additional comment from aron on 2006-09-29 14:00 EST --
The xencomm implementation is nearly complete, but I misunderstood that it would
fix the problem.  It seems that the real issue is that the netbk and blkbk
drivers need to be loaded ASAP in initramfs instead of deferring them to the
xend script.

This is basically the same problem as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=201796 which I think was
closed prematurely.

Comment 1 RHEL Program Management 2006-10-09 20:49:09 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux release.  Product Management has requested further review
of this request by Red Hat Engineering.  This request is not yet committed for
inclusion in release.

Comment 2 Chris Wright 2006-10-13 18:07:53 UTC
This should be fixed when we rebase to 3.0.3
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=210015 

Comment 3 Peter Martuccelli 2006-10-13 18:54:33 UTC
We are in a day for day slip for RHEL5 Beta2.  I do not think we have a choice
other then to proceed with option 1 mentioned in the original problem report.

Comment 4 Aron Griffis 2006-10-13 20:41:10 UTC
We don't need to do #1.  This problem is already fixed in xen-3.0.3-testing.hg
upstream, and it's already fixed in Juan's tree because he's pulled the latest
bits.  We're just waiting for that to bubble to RHEL5 now.

Additionally there is a workaround.  Booting with dom0_mem=1G makes enough
physically contiguous space available to avoid the issue.

Comment 5 Peter Martuccelli 2006-10-16 15:56:38 UTC
Rik is working on the xen 3.0.3-rc4 update for R5 B2.

Comment 7 Don Zickus 2006-10-17 02:35:02 UTC
in kernel-2.6.18-1.2728.el5

Comment 8 Jay Turner 2006-10-17 11:47:26 UTC
I'm not entirely sure what changes were applied in 2.6.18-1.2728.el5 . . . QE
ack to taking the upstream xen patch.  We need as much testing as possible on
these changes.

Comment 9 Aron Griffis 2006-10-25 18:37:36 UTC
This problem has been fixed for all architectures, ia64 included, since rhel5
rebased to xen-3.0.3 release.  This bug can be closed.

Comment 10 Aron Griffis 2006-10-25 18:47:50 UTC
Closing