Description of problem: On my x86-32/PAE system, I'm seeing a crash as soon as netbk.ko is loaded in Dom0. I'm using a fresh FC5 system with all updates merged. Version-Release number of selected component (if applicable): kernel-xen-2.6.17-1.2157_FC5 xen-3.0.2-3.FC5 How reproducible: 'modprobe netbk' on a system with PAE enabled. Tested with 16GB of RAM, haven't tested other sizes. Steps to Reproduce: 1. build system with 16GB of RAM (mine is a dual-opteron) 2. install FC5/i386 and kernel-xen 3. 'modprobe netbk' Actual results: -bash-3.1# modprobe netbk printk: 27 messages suppressed. modprobe: page allocation failure. order:8, mode:0xd0 <c044844a> __alloc_pages+0x298/0x2ac <c0448483> __get_free_pages+0x25/0x34 <c0549ebe> balloon_alloc_empty_page_range+0x34/0x167 <c902c05b> netback_init+0x5b/0x16d [netbk] <c902c0b1> netback_init+0xb1/0x16d [netbk] <c042b092> blocking_notifier_call_chain+0x31/0x48 <c0438def> sys_init_module+0x15df/0x178a <c05b115a> register_netdevice+0x0/0x31a <c0404bff> syscall_call+0x7/0xb Mem-info: DMA per-cpu: cpu 0 hot: high 42, batch 7 used:2 cpu 0 cold: high 14, batch 3 used:0 cpu 1 hot: high 42, batch 7 used:8 cpu 1 cold: high 14, batch 3 used:11 DMA32 per-cpu: empty Normal per-cpu: empty HighMem per-cpu: empty Free pages: 6504kB (0kB HighMem) Active:7358 inactive:2693 dirty:34 writeback:0 unstable:0 free:1626 slab:2194 mapped:5098 pagetables:331 DMA free:6504kB min:1492kB low:1864kB high:2236kB active:29432kB inactive:10772kB present:139264kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 293*4kB 205*8kB 49*16kB 16*32kB 3*64kB 3*128kB 3*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 6476kB DMA32: empty Normal: empty HighMem: empty Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap = 1052248kB Total swap = 1052248kB Free swap: 1052248kB 34816 pages of RAM 0 pages of HIGHMEM 17798 reserved pages 8132 pages shared 0 pages swap cached 34 pages dirty 0 pages writeback 5093 pages mapped 2195 pages slab 321 pages pagetables ------------[ cut here ]------------ kernel BUG at drivers/xen/netback/netback.c:1073! invalid opcode: 0000 [#1] SMP Modules linked in: netbk ipv6 eeprom adm1026 hwmon_vid hwmon hidp l2cap bluetooth dm_mirror dm_mod video button battery ac parport_pc lp parport i2c_amd756 i2c_core sg hw_random tg3 ext3 jbd 3w_9xxx sd_mod scsi_mod CPU: 1 EIP: 0061:[<c902c0b8>] Not tainted VLI EFLAGS: 00010246 (2.6.17-1.2157_FC5xen #1) EIP is at netback_init+0xb8/0x16d [netbk] eax: 00000000 ebx: c658fed0 ecx: c0622b14 edx: ffffff29 esi: c91f6000 edi: c5c7fff8 ebp: c5c7ffc0 esp: c658feb0 ds: 007b es: 007b ss: 0069 Process modprobe (pid: 1722, threadinfo=c658e000 task=c08130d0) Stack: <0>c066c3e0 c91f6000 c5c7fff8 c042b092 00000001 dead4ead ffffffff ffffffff 00000001 dead4ead ffffffff ffffffff c5c7ffe0 c0438def 00000017 00000398 00007354 00000ae0 c9203f80 00000000 00000000 00000000 00000000 00000000 Call Trace: <c042b092> blocking_notifier_call_chain+0x31/0x48 <c0438def> sys_init_module+0x15df/0x178a <c05b115a> register_netdevice+0x0/0x31a <c0404bff> syscall_call+0x7/0xb Code: 00 e8 61 b0 3f f7 c7 05 14 51 20 c9 00 00 00 00 c7 05 10 51 20 c9 b7 f3 1f c9 68 00 01 00 00 e8 d9 dd 51 f7 83 c4 10 85 c0 75 08 <0f> 0b 31 04 28 13 20 c9 89 c2 31 c9 2b 15 10 7a 74 c0 c1 fa 05 EIP: [<c902c0b8>] netback_init+0xb8/0x16d [netbk] SS:ESP 0069:c658feb0 Segmentation fault Expected results: netbk module loads xen backend networking works Additional info: If I flip back to the kernel-xen0-2.6.17-1.2157_FC5 kernel (and non-PAE domU kernel, of course), the network works perfectly all the way from domU to the LAN. The hardware is a 2p Opteron system with 16GB of RAM. The NICs are broadcom BCM5702s using the tg3 driver, for what it's worth.
Created attachment 133830 [details] Output from 'xm info' and 'xm dmesg' on system
This is isolated to the module-based network driver. I've worked around it by building the kernel with both the network and loopback drivers into the monolithic kernel: CONFIG_XEN_NETDEV_BACKEND=y CONFIG_XEN_NETDEV_LOOPBACK=y The non-PAE kernel has these settings already, so it could be that this is just a module/non-module difference instead of PAE. I'll try that permutation next. I'm also going to test the FC6 1.2517 kernel to see if this is still broken upstream.
Okay, so this bug _is_ limited to the combination of PAE and the modular netbk driver. I built the xen kernel without PAE support (but with a modular netbk/netloop), booted with a non-PAE xen hypervisor, and the netbk/netloop modules worked just fine. Everything else remained the same (hardware, etc), the kernel config change was just: @@ -164,11 +163,10 @@ CONFIG_DELL_RBU=m CONFIG_DCDBAS=m # CONFIG_NOHIGHMEM is not set -# CONFIG_HIGHMEM4G is not set -CONFIG_HIGHMEM64G=y +CONFIG_HIGHMEM4G=y +# CONFIG_HIGHMEM64G is not set CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_HIGHMEM=y -CONFIG_X86_PAE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set
This bug is limited to PAE itself, however (as far as I can tell). I tried a couple permutations of the mem=xxx flag to the xen hypervisor at boot time. These are using the kernel-xen-2.6.17-1.2157 as-shipped xen/PAE kernel: 16GB: broken 4GB: broken 800M: works! I assume that this is because the 800M test was under the 896M (?) lowmem boundary, and therefore the PAE paging technology was not used.
netbk requires 1M of contiguous physical memory to load. On Linux it is very difficult to get that much contiguous memory once the system has been up and running for a while. Therefore if it is built as a module it must be loaded as early as possible during the boot process, before significant memory fragmentation sets in. The best option is to load it from initrd/initramfs which should be equivalent to building it into the kernel.
Thanks for the explanation. I've confirmed that this works around my problem. I was able to use /etc/rc.modules as a simpler method of loading these modules early in the boot process (albeit not as early as the initrd, of course, but it worked). So, it's still interesting that the presence of PAE tables in the xen hypervisor impacts this. In my test cases above, I found that constraining the RAM available to the xen hypervisor would make this work... even when the dom0 memory footprint remained constant. To state this another way: "xen.gz dom0_mem=128M mem=4G" fails "xen.gz dom0_mem=128M mem=800M" works Also, while rc.modules is a fine workaround for me, it seems like this should be fixed before RHEL5. I can see a lot of problems with this if users have to hack the initrd or init scripts.
The presence of large amounts of high memory causes various kernel data structures to grow to occupy unreasonable amounts of low memory (the lower 896M of your RAM). Since netbk is only able to use low memory, this greatly reduces the likelihood of it finding 1M of contiguous low memory once the system has been running for a while. For RHEL5, my recommendation would be for the system bootup scripts to be organised so that blkbk as well as netbk are loaded as early as possible.
So I'm probably just being dense here, so I apologize... Since I constrained dom0_mem to 128M in both cases, the domain0 linux kernel only had 128MB of lowmem at all times. The only parameter that changed was the amount of memory made available to the hypervisor itself. The memory was never allocated to any running domain. If the netbk driver required 1MB of contig lowmem inside the xen hypervisor, I suppose this would make sense to me. It sounds like it requires 1MB of contig lowmem inside linux, though, correct? Thanks for taking the time to explain this to me :)
Sorry, I missed this interesting obersvation. This doesn't change the fact that allocating 1MB of contiguous low memory is *not* expected to succeed other than during early boot (people have problems allocating 8KB of memory, let along 1MB :) It does sound as if the presence of the extra memory in Xen is somehow influencing what memory is available in dom0. Could you please get a memory dump by hitting SHIFT-SCROLLLOCK frmo the console in dom0? I'd like to see what it says when you boot xen with mem=800M just before you load netbk (the highmem case is already evident from your backtrace at the beginning).
Created attachment 133998 [details] Complete boot dmesg and 'shift-scrlock' meminfo: xen.gz mem=800M
Created attachment 133999 [details] Complete boot dmesg and 'shift-scrlock' meminfo: xen.gz mem=16G
Check out the two files that I attached. Each is the complete dmesg and the meminfo dump from two consecutive boots of the system. One is with the full 16GB, the other is with xen.gz mem=800M constraint. I found that there are some interesting differences in the dmesg output when you diff the two, such as: --- dmesg-16G +++ dmesg-800M -Memory: 57356k/139264k available (2090k kernel code, 73564k reserved, 841k data, 172k init, 0k highmem) +Memory: 121228k/139264k available (2090k kernel code, 9704k reserved, 841k data, 172k init, 0k highmem)
Thanks a lot for the dmesg attachments. This shows that the reason 4G/16G is running out of memory when loading netbk is because of the software IOTLB size. The software IOTLB is used to bounce IO buffers coming from guest domains so that they are contiguous when presented to the hardware. The size of the software IOTLB in dom0 is indeed dependent on the amount of memory in the hypervisor (although its size can be overridden by setting swiotlb=). In particular, for <2G systems it takes 2MB while everyone else gets a 64MB swiotlb. So in your case you really should be assigning at least another 64MB of memory to your dom0 to compensate for the bigger swiotlb. However, this does not change the fact that loading blkbk/netbk after early boot is like playing Russian Roulette :)
Yep, no argument that a late modprobe is dangerous here. Thanks for the further explanation. I guess I'd just stress that it's important to fix this operationally before RHEL5 lands. It could be a hacked mkinitrd script, or an /etc/sysconfig/modules/*.modules file, something like that. We can work around it for our testing, of course.
Thanks. I'm going to merge this with #202182 which is really the same issue (the boot script isn't loading the modules automatically). *** This bug has been marked as a duplicate of 202182 ***