Bug 201796

Summary: [x86-32/PAE] loading modular netbk causes panic
Product: [Fedora] Fedora Reporter: Matt C <wago>
Component: xenAssignee: Herbert Xu <herbert.xu>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: bstein, katzj, nobody+bjmason, wago
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
URL: https://www.redhat.com/archives/fedora-xen/2006-August/msg00041.html
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-15 10:14:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 150224    
Attachments:
Description Flags
Output from 'xm info' and 'xm dmesg' on system
none
Complete boot dmesg and 'shift-scrlock' meminfo: xen.gz mem=800M
none
Complete boot dmesg and 'shift-scrlock' meminfo: xen.gz mem=16G none

Description Matt C 2006-08-08 21:40:58 UTC
Description of problem:
On my x86-32/PAE system, I'm seeing a crash as soon as netbk.ko is loaded in
Dom0. I'm using a fresh FC5 system with all updates merged.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.17-1.2157_FC5
xen-3.0.2-3.FC5

How reproducible:
'modprobe netbk' on a system with PAE enabled. Tested with 16GB of RAM, haven't
tested other sizes.

Steps to Reproduce:
1. build system with 16GB of RAM (mine is a dual-opteron)
2. install FC5/i386 and kernel-xen
3. 'modprobe netbk'
  
Actual results:
-bash-3.1# modprobe netbk
printk: 27 messages suppressed.
modprobe: page allocation failure. order:8, mode:0xd0

<c044844a> __alloc_pages+0x298/0x2ac <c0448483> __get_free_pages+0x25/0x34
<c0549ebe> balloon_alloc_empty_page_range+0x34/0x167 <c902c05b>
netback_init+0x5b/0x16d [netbk] <c902c0b1> netback_init+0xb1/0x16d [netbk]
<c042b092> blocking_notifier_call_chain+0x31/0x48 <c0438def>
sys_init_module+0x15df/0x178a <c05b115a> register_netdevice+0x0/0x31a

 <c0404bff> syscall_call+0x7/0xb
Mem-info:
DMA per-cpu:
cpu 0 hot: high 42, batch 7 used:2
cpu 0 cold: high 14, batch 3 used:0
cpu 1 hot: high 42, batch 7 used:8
cpu 1 cold: high 14, batch 3 used:11
DMA32 per-cpu: empty
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages:        6504kB (0kB HighMem)

Active:7358 inactive:2693 dirty:34 writeback:0 unstable:0 free:1626 slab:2194
mapped:5098 pagetables:331 DMA free:6504kB min:1492kB low:1864kB high:2236kB
active:29432kB inactive:10772kB present:139264kB pages_scanned:0
all_unreclaimable? no

lowmem_reserve[]: 0 0 0 0

DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no

lowmem_reserve[]: 0 0 0 0

Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no

lowmem_reserve[]: 0 0 0 0

HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no

lowmem_reserve[]: 0 0 0 0

DMA: 293*4kB 205*8kB 49*16kB 16*32kB 3*64kB 3*128kB 3*256kB 0*512kB 1*1024kB
0*2048kB 0*4096kB = 6476kB

DMA32: empty
Normal: empty
HighMem: empty
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap  = 1052248kB
Total swap = 1052248kB
Free swap:       1052248kB
34816 pages of RAM
0 pages of HIGHMEM
17798 reserved pages
8132 pages shared
0 pages swap cached
34 pages dirty
0 pages writeback
5093 pages mapped
2195 pages slab
321 pages pagetables
------------[ cut here ]------------
kernel BUG at drivers/xen/netback/netback.c:1073!
invalid opcode: 0000 [#1]
SMP

Modules linked in: netbk ipv6 eeprom adm1026 hwmon_vid hwmon hidp l2cap
bluetooth dm_mirror dm_mod video button battery ac parport_pc lp parport
i2c_amd756 i2c_core sg hw_random tg3 ext3 jbd 3w_9xxx sd_mod scsi_mod

CPU:    1
EIP:    0061:[<c902c0b8>]    Not tainted VLI
EFLAGS: 00010246   (2.6.17-1.2157_FC5xen #1)
EIP is at netback_init+0xb8/0x16d [netbk]
eax: 00000000   ebx: c658fed0   ecx: c0622b14   edx: ffffff29
esi: c91f6000   edi: c5c7fff8   ebp: c5c7ffc0   esp: c658feb0
ds: 007b   es: 007b   ss: 0069
Process modprobe (pid: 1722, threadinfo=c658e000 task=c08130d0)

Stack: <0>c066c3e0 c91f6000 c5c7fff8 c042b092 00000001 dead4ead ffffffff
ffffffff 00000001 dead4ead ffffffff ffffffff c5c7ffe0 c0438def 00000017 00000398
00007354 00000ae0 c9203f80 00000000 00000000 00000000 00000000 00000000

Call Trace:

<c042b092> blocking_notifier_call_chain+0x31/0x48 <c0438def>
sys_init_module+0x15df/0x178a

 <c05b115a> register_netdevice+0x0/0x31a  <c0404bff> syscall_call+0x7/0xb

Code: 00 e8 61 b0 3f f7 c7 05 14 51 20 c9 00 00 00 00 c7 05 10 51 20 c9 b7 f3 1f
c9 68 00 01 00 00 e8 d9 dd 51 f7 83 c4 10 85 c0 75 08 <0f> 0b 31 04 28 13 20 c9
89 c2 31 c9 2b 15 10 7a 74 c0 c1 fa 05

EIP: [<c902c0b8>] netback_init+0xb8/0x16d [netbk] SS:ESP 0069:c658feb0
 Segmentation fault

Expected results:
netbk module loads
xen backend networking works

Additional info:
If I flip back to the kernel-xen0-2.6.17-1.2157_FC5 kernel (and non-PAE domU
kernel, of course), the network works perfectly all the way from domU to the LAN.

The hardware is a 2p Opteron system with 16GB of RAM. The NICs are broadcom
BCM5702s using the tg3 driver, for what it's worth.

Comment 1 Matt C 2006-08-08 21:40:58 UTC
Created attachment 133830 [details]
Output from 'xm info' and 'xm dmesg' on system

Comment 2 Matt C 2006-08-09 05:10:10 UTC
This is isolated to the module-based network driver. I've worked around it by
building the kernel with both the network and loopback drivers into the
monolithic kernel:

CONFIG_XEN_NETDEV_BACKEND=y
CONFIG_XEN_NETDEV_LOOPBACK=y

The non-PAE kernel has these settings already, so it could be that this is just
a module/non-module difference instead of PAE. I'll try that permutation next.
I'm also going to test the FC6 1.2517 kernel to see if this is still broken
upstream.

Comment 3 Matt C 2006-08-09 06:45:34 UTC
Okay, so this bug _is_ limited to the combination of PAE and the modular netbk
driver. I built the xen kernel without PAE support (but with a modular
netbk/netloop), booted with a non-PAE xen hypervisor, and the netbk/netloop
modules worked just fine. Everything else remained the same (hardware, etc), the
kernel config change was just:

@@ -164,11 +163,10 @@
 CONFIG_DELL_RBU=m
 CONFIG_DCDBAS=m
 # CONFIG_NOHIGHMEM is not set
-# CONFIG_HIGHMEM4G is not set
-CONFIG_HIGHMEM64G=y
+CONFIG_HIGHMEM4G=y
+# CONFIG_HIGHMEM64G is not set
 CONFIG_PAGE_OFFSET=0xC0000000
 CONFIG_HIGHMEM=y
-CONFIG_X86_PAE=y
 CONFIG_SELECT_MEMORY_MODEL=y
 CONFIG_FLATMEM_MANUAL=y
 # CONFIG_DISCONTIGMEM_MANUAL is not set



Comment 4 Matt C 2006-08-09 07:00:16 UTC
This bug is limited to PAE itself, however (as far as I can tell). I tried a
couple permutations of the mem=xxx flag to the xen hypervisor at boot time.
These are using the kernel-xen-2.6.17-1.2157 as-shipped xen/PAE kernel:

16GB: broken
4GB: broken
800M: works!

I assume that this is because the 800M test was under the 896M (?) lowmem
boundary, and therefore the PAE paging technology was not used.

Comment 5 Herbert Xu 2006-08-09 15:11:42 UTC
netbk requires 1M of contiguous physical memory to load.  On Linux it is very
difficult to get that much contiguous memory once the system has been up and
running for a while.

Therefore if it is built as a module it must be loaded as early as possible
during the boot process, before significant memory fragmentation sets in.

The best option is to load it from initrd/initramfs which should be equivalent
to building it into the kernel.

Comment 6 Matt C 2006-08-10 05:12:18 UTC
Thanks for the explanation.

I've confirmed that this works around my problem. I was able to use
/etc/rc.modules as a simpler method of loading these modules early in the boot
process (albeit not as early as the initrd, of course, but it worked).

So, it's still interesting that the presence of PAE tables in the xen hypervisor
impacts this. In my test cases above, I found that constraining the RAM
available to the xen hypervisor would make this work... even when the dom0
memory footprint remained constant. To state this another way:

"xen.gz dom0_mem=128M mem=4G" fails
"xen.gz dom0_mem=128M mem=800M" works

Also, while rc.modules is a fine workaround for me, it seems like this should be
fixed before RHEL5. I can see a lot of problems with this if users have to hack
the initrd or init scripts.


Comment 7 Herbert Xu 2006-08-10 05:17:22 UTC
The presence of large amounts of high memory causes various kernel data
structures to grow to occupy unreasonable amounts of low memory (the lower 896M
of your RAM).

Since netbk is only able to use low memory, this greatly reduces the likelihood
of it finding 1M of contiguous low memory once the system has been running for a
while.

For RHEL5, my recommendation would be for the system bootup scripts to be
organised so that blkbk as well as netbk are loaded as early as possible.

Comment 8 Matt C 2006-08-10 05:29:51 UTC
So I'm probably just being dense here, so I apologize...

Since I constrained dom0_mem to 128M in both cases, the domain0 linux kernel
only had 128MB of lowmem at all times. The only parameter that changed was the
amount of memory made available to the hypervisor itself. The memory was never
allocated to any running domain.

If the netbk driver required 1MB of contig lowmem inside the xen hypervisor, I
suppose this would make sense to me. It sounds like it requires 1MB of contig
lowmem inside linux, though, correct?

Thanks for taking the time to explain this to me :)

Comment 9 Herbert Xu 2006-08-10 06:09:28 UTC
Sorry, I missed this interesting obersvation.  This doesn't change the fact that
allocating 1MB of contiguous low memory is *not* expected to succeed other than
during early boot (people have problems allocating 8KB of memory, let along 1MB :)

It does sound as if the presence of the extra memory in Xen is somehow
influencing what memory is available in dom0.  Could you please get a memory
dump by hitting SHIFT-SCROLLLOCK frmo the console in dom0? I'd like to see what
it says when you boot xen with mem=800M just before you load netbk (the highmem
case is already evident from your backtrace at the beginning).

Comment 10 Matt C 2006-08-10 23:37:31 UTC
Created attachment 133998 [details]
Complete boot dmesg and 'shift-scrlock' meminfo: xen.gz mem=800M

Comment 11 Matt C 2006-08-10 23:38:33 UTC
Created attachment 133999 [details]
Complete boot dmesg and 'shift-scrlock' meminfo: xen.gz mem=16G

Comment 12 Matt C 2006-08-10 23:41:22 UTC
Check out the two files that I attached. Each is the complete dmesg and the
meminfo dump from two consecutive boots of the system. One is with the full
16GB, the other is with xen.gz mem=800M constraint.

I found that there are some interesting differences in the dmesg output when you
diff the two, such as:

--- dmesg-16G
+++ dmesg-800M
-Memory: 57356k/139264k available (2090k kernel code, 73564k reserved, 841k
data, 172k init, 0k highmem)
+Memory: 121228k/139264k available (2090k kernel code, 9704k reserved, 841k
data, 172k init, 0k highmem)


Comment 13 Herbert Xu 2006-08-11 06:32:28 UTC
Thanks a lot for the dmesg attachments.  This shows that the reason 4G/16G is
running out of memory when loading netbk is because of the software IOTLB size.

The software IOTLB is used to bounce IO buffers coming from guest domains so
that they are contiguous when presented to the hardware.

The size of the software IOTLB in dom0 is indeed dependent on the amount of
memory in the hypervisor (although its size can be overridden by setting
swiotlb=).  In particular, for <2G systems it takes 2MB while everyone else gets
a 64MB swiotlb.

So in your case you really should be assigning at least another 64MB of memory
to your dom0 to compensate for the bigger swiotlb.

However, this does not change the fact that loading blkbk/netbk after early boot
is like playing Russian Roulette :)

Comment 14 Matt C 2006-08-14 22:49:36 UTC
Yep, no argument that a late modprobe is dangerous here. Thanks for the further
explanation.

I guess I'd just stress that it's important to fix this operationally before
RHEL5 lands. It could be a hacked mkinitrd script, or an
/etc/sysconfig/modules/*.modules file, something like that. We can work around
it for our testing, of course.

Comment 15 Herbert Xu 2006-08-15 10:14:12 UTC
Thanks.  I'm going to merge this with #202182 which is really the same issue
(the boot script isn't loading the modules automatically).

*** This bug has been marked as a duplicate of 202182 ***