Bug 120685
Summary: | (C3) Via C3 reboots immediately on load of kernel | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Paul Coleman <pdcoleman> | ||||||||||||||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||
Priority: | medium | ||||||||||||||||||
Version: | rawhide | CC: | andy, bbooth, carsten, cdelasaux, cdhiller, cpjunk, earlt, erich, glen, g.mansfield, jms87, jvanveelen, klgage, k_paulsen, lee.wilson, mingo, mulix, paul.morgan, pfrields, rschaal_95135, steve, trickreed | ||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | i586 | ||||||||||||||||||
OS: | Linux | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2004-07-12 20:57:16 UTC | Type: | --- | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Attachments: |
|
Description
Paul Coleman
2004-04-12 23:08:04 UTC
kernel-2.6.5-1.319 does the same kernel-2.6.5-1.322 does the same kernel-2.6.5-1.309 was the to boot Via C3 "Me too" on an EPIA-M 600MHz Via-C3 based board. Reboots right after "Uncompressing Linux" on both .322 *i586* (the i586 guys used to work okay) and the FC2 Test 2 Install Kernel (!). can you cat /proc/cpuinfo from a kernel that works please ? I'm trying to reproduce it here, but it looks like it might only affect certain C3s, as the latest kernel works just fine on the ones I've tried so far. processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 7 model name : VIA Samuel 2 stepping : 3 cpu MHz : 599.725 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge mmx 3dnow bogomips : 1196.03 boots for me with a samuel2 too. You are using the 586 kernel right ? Yum installed it, here is the package from the yum cache: [root@backup root]# ll /var/cache/yum/development/packages/kern* -rw-r--r-- 1 root root 14670525 Apr 15 17:30 /var/cache/yum/development/packages/kernel-2.6.5-1.322.i586.rpm -rw-r--r-- 1 root root 391711 Apr 15 17:40 /var/cache/yum/development/packages/kernel-utils-2.4-9.1.127.i386.rpm [root@backup root]# rpm -q kernel kernel-2.4.22-1.2061.nptl kernel-2.4.22-1.2179.nptl kernel-2.6.5-1.322 I don't know a more direct way to show that it is an i586 image, but I don't think any other package was downloaded. I updated this image from FC1 through to FC2 development by using yum and some "by hand" rpm installs. I'm wondering if the stuff necessary to make a good initrd was present when the kernel package was installed. (The matching initrd is present in /boot and at 179K is about the right size). Yum reports that the package set is now up to date (except libselinux which depends on a not yet released glibc). I will remove and reinstall the same kernel package and see if that makes any difference. No, it is the same behaviour after erasing and reinstalling the .322 i586 kernel package. ... Uncompressing Linux .... Okay, booting the kernel <reboot> In any event, exactly the same thing happens with the FC2 Test2 install kernel/initrd. I'm quite willing to believe there is something pathalogical about the motherboard/chipset/CPU, but it is a bare, unmodified EPIA-M 600MHz fanless, 256MB DIMM, running the current BIOS (but the same happened with its original BIOS from Dec 2002). It has been working great, on 24/7 under our TV serving video, for a year or so, no flakiness. processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 7 model name : VIA Ezra stepping : 8 cpu MHz : 800.264 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge mmx 3dnow bogomips : 1585.15 kernel 315,319,322 all rebooted on initial load. I got 326 out of people and it boots but it usb is borked so I am still using 309 to have a functional system. Installed Arjan's .326 i586 version, got this error during install: device-mapper: ioctl interface mismatch: kernel(1.0.3), user(4.0.0), cmd(0) But like pdcoleman says, it boots! USB didn't come up, but that may have been because modprobe.conf was trying to pull in the old name usb-uhci. I modprobed uhci-hcd by hand (via ssh) and my keyboard came up. Anyway, big news is Arjan's .326 boots. This C3 bug actually arrived with the introduction of fc2t1 (2.6 and new gcc) and seems to be a hit or miss situation as far as getting a kernel that boots. Andy's hardware is different than mine (northbridge-ple266 vs ple133), C3 ver, bios) so I don't think its that. There may be some sort of address alignment/location problem of a critical section specific to the C3. Just an uneducated guess. .326 and .327 boot normally .332 kernel panics.. kill idle at kernel start (sorry no backtrace) Yeppers -- just rebooted the via machine after yum update last night, booting into .332 a series of panics? scroll up quickly, last one is something to do with some kernel mount routine and contains a stack dump of four or five named kernel routines+IP offsets. Then "Attempted to kill init", syonara. .327 boots and works fine, except USB doesn't seem to recognize any devices despite modprobing the hcd by hand. Will try to note panic details down later, kids watching stuff on it at the moment. Correction, Paul's kill *idle* was right, not init. Here is what I copied down... due to the TV being the output device, e and c might be conflated. The error happens very early in the Kernel boot, after a page or so of output. There is a scrolling spew of these errors but I can only copy the last one. (Possibly since it is the idle task being killed maybe all the other processes were being killed in the spew). I truncated leading zeros after the first few. EIP: 0060: {<c01b7873>} Not tainted EFLAGS: 00010002 (2.6.5-1.332) EIP is at avc_lookup:0x53/0x9a eax 50 ebx 3 ecx 5 edx 6b6b6b6b esi 1 edi 2 ebp c0387eb4 esp c0387eb0 ds 7b es 7b ss 68 Process swapper 0 5 246 5 c0387edc 0 c01be892 1 c0387edc 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Call Trace: c01b8e92 avc_has_perm_noaudit+0x10d/0x48a c01b9233 avc_has_perm+0x24/0x49 c01baa2a superblock_has_perm+0x24/0xe9 c01bbcd1 selinux_sb_kern_mount+0x3e/0x49 c01970f2 proc_get_sb+0xe/0x10 c0166b5a do_kern_mount+0xa0/0x124 c0166beb kern_mount+0xd/0xf c0393a22 proc_root_init+0x29/0xcf c038867d start_kernel+0x1f3/0x21b Code: 3b 32 75 f4 66 3b 4a 08 75 ee 3b 5a 04 75 e9 85 d2 74 23 85 <0> kernel panic: attempted to kill the idle task in idle task - not sycning Kernel 2.6.5-1.349 i586 is back to rebooting spontaneously just after Uncompressing Linux... I can use 327, but with this I have stability probs with the motherboard, after 48hrs or so it stops responding on the network. I did not have a chance to see what it is doing on its display so far, I had to reboot it quickly as it is my mailserver. We are approching fc2 and the last 2 kernels (356 &358) do not boot on a via c3. Is this the right forum to discuss this problem or does it need attention upstream. This has been an issue since fc1. I've attempted a fresh install of fc2 final on my Via C3 Ezra system, and it reboots itself as well. Does it have something to do with the optimization flags that the kernel was compiled with? See http://www.epiawiki.org/wiki/tiki-index.php?page=EpiaInstallingGentoo. I doubt that a i686 kernel will boot anything less that a Via Nehemiah. Is there a i586 or i386 kernel that can be used to boot the installer with? the 686 kernel isn't entering the picture at all here. read the comments above, they're all from 586 kernels. Ok just just upgraded my system from FC1 to FC2 via yum upgrade. Here's what kernels have kernels I have installed and are working: kernel-2.4.22-1.2115.nptl kernel-2.4.22-1.2135.nptl kernel-2.4.22-1.2149.nptl kernel-2.4.22-1.2163.nptl kernel-2.4.22-1.2166.nptl kernel-2.4.22-1.2174.nptl kernel-2.4.22-1.2188.nptl kernel-2.6.5-1.358 But the kernel on FC2 Disk1, whatever it is causes my system to reboot endlessly. Any other info I can provide? processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 8 model name : VIA C3 Ezra stepping : 9 cpu MHz : 1002.294 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge mmx 3dnow bogomips : 1998.84 # uname -a Linux gollum.techgooroo.net 2.4.22-1.2188.nptl #1 Wed Apr 21 20:10:59 EDT 2004 i686 i686 i386 GNU/Linux Me Too Trying to perform a clean install of FC2. 933MHz VIA C3 Ezra similar experience on new epia 800 system, 512 MB memory This issue could be a duplicate of issue 121819. Others who are using ASUS P4 800 motherboards also are having reboot problems on install. This appears to be a major problem that has gone as far as Fedora Core Two Final - Why ? Ed Almos Budapest, Hungary similar experience on my epia M 1000 system - tried various memory configurations of 256 and 512 MB always with same result I downloaded the kernel source rpm to play around with and this bug seems to be happening because CONFIG_M686 is set in the kernel-2.6.5-i586-config file. This is causing the kernel in the i586 rpm to be build with some i686 specific code. I was able to get a bootable kernel on my VIA C3 Ezra (933Mhz) by commenting out the CONFIG_M686 line in that config file and rebuilding the rpm. hmm, now it's stalling on startup. Forget my last post. Great news that someone at RH is able to see a failing board. FWIW this evening I sat down with the -385 kernel and screwed with all the BIOS settings I could find, removed all the USB peripherals / USB keyboard support, reset the BIOS settings to 'safe', etc, etc, no change. Then I appended all kinds of noacpi, pci=bios, pci=off, nousb, nomce etc, etc, no difference. Always reboots reliably after Uncompressing Linux. Maybe worth noting -- this reboot is 100% reliable, it is not the case that it can boot okay after 20 tries or something. Because it loops from the reboot, it keeps trying, I have left it for 15 minutes or more and there was no successful boot. So if you ever got a kernel to boot even once then that is new behaviour. I'm downloading the kernel source for 358 and will try to compile it tonight and look at moving hang loops around its init tomorrow, if there is no joy in the meantime. Re: comments 25 and 26 (Brian Booth) At what point in startup is it stalling for you? And this is a stab in the dark, but does adding "vdso=0" to the kernel boot command line help matters? (vdso=0 makes no difference here on -358) I would expect vdso=0 to be much more likely to help if the kernel is stalling on startup, as opposed to instantly rebooting. (Whether it has any chance of helping depends on where in the startup process it's stalling, though.) *** Bug 123843 has been marked as a duplicate of this bug. *** Identical problem here, Via ME6000 board, /proc/cpuinfo contents: processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 7 model name : VIA Samuel 2 stepping : 3 cpu MHz : 599.721 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge mmx 3dnow bogomips : 1196.03 Compiled a new kernel from the SRPM on another machine - diff from the default i586 config: --- configs/kernel-2.6.5-i586.config 2004-05-08 13:56:48.000000000 +0100 +++ .config 2004-05-21 02:07:17.000000000 +0100 @@ -61,7 +61,7 @@ # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set -CONFIG_M586=y +# CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set @@ -76,21 +76,21 @@ # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set -# CONFIG_MCYRIXIII is not set +CONFIG_MCYRIXIII=y # CONFIG_MVIAC3_2 is not set CONFIG_X86_GENERIC=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_RWSEM_XCHGADD_ALGORITHM=y -CONFIG_X86_PPRO_FENCE=y -CONFIG_X86_F00F_BUG=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_ALIGNMENT_16=y CONFIG_X86_INTEL_USERCOPY=y +CONFIG_X86_USE_PPRO_CHECKSUM=y +CONFIG_X86_USE_3DNOW=y # CONFIG_X86_4G is not set # CONFIG_X86_SWITCH_PAGETABLES is not set # CONFIG_X86_4G_VM_LAYOUT is not set @@ -101,6 +101,7 @@ # CONFIG_SMP is not set # CONFIG_PREEMPT is not set # CONFIG_X86_UP_APIC is not set +CONFIG_X86_TSC=y CONFIG_X86_MCE=y # CONFIG_X86_MCE_NONFATAL is not set CONFIG_TOSHIBA=m @@ -2317,7 +2318,7 @@ # CONFIG_DEBUG_SPINLOCK is not set # CONFIG_DEBUG_PAGEALLOC is not set CONFIG_DEBUG_HIGHMEM=y -CONFIG_DEBUG_INFO=y +# CONFIG_DEBUG_INFO is not set CONFIG_DEBUG_SPINLOCK_SLEEP=y # CONFIG_FRAME_POINTER is not set This kernel boots fine over PXE, and the installation dialogs start, now I need to figure out how to build modules.cgz in the initrd so that the modules will actually load. I compiled the -358 kernel source without changes first, and confirmed that when I boot with the 358-custom kernel, I get the same reboot behaviour, which is a very good start. I am compiling with the current development/FC2 gcc-3.3.3-7 version. Now I added a for(;;) ; at the top of init/main.c start_kernel(), and it hung instead of rebooting. So I am going to move it around a bit and report what happens. My main worry is that merely changing the layout of the binary by inserting the loop is what is changing the behaviour, ie, I may find that there is no place to put the loop that gets me the reboot behaviour back again. Cool! I stuck the loop at the end of this section, and got a reboot. One of these does the badness. lock_kernel(); page_address_init(); printk(linux_banner); setup_arch(&command_line); setup_per_cpu_areas(); /* * Mark the boot cpu "online" so that it can call console drivers in * printk() and can access its per-cpu storage. */ smp_prepare_boot_cpu(); build_all_zonelists(); page_alloc_init(); printk("Kernel command line: %s\n", saved_command_line); Going to stick the loop after setup_per_cpu_areas(); next After a slight pause while a 250G filesystem was checked, I can report we are down to: setup_arch(&command_line); setup_per_cpu_areas(); Place your bets, ladies and gentlemen! The winner is... ./init/main.c: setup_arch(&command_line); I moved the loop into ./arch/i386/kernel/setup.c setup_arch(), (removing it from main.c to limit the footprint changes). The reboot action is bracketed in here somewhere: ./arch/i386/kernel/setup.c: ROOT_DEV = old_decode_dev(ORIG_ROOT_DEV); drive_info = DRIVE_INFO; screen_info = SCREEN_INFO; edid_info = EDID_INFO; apm_info.bios = APM_BIOS_INFO; ist_info = IST_INFO; saved_videomode = VIDEO_MODE; if( SYS_DESC_TABLE.length != 0 ) { MCA_bus = SYS_DESC_TABLE.table[3] &0x2; machine_id = SYS_DESC_TABLE.table[0]; machine_submodel_id = SYS_DESC_TABLE.table[1]; BIOS_revision = SYS_DESC_TABLE.table[2]; } aux_device_present = AUX_DEVICE_INFO; #ifdef CONFIG_BLK_DEV_RAM rd_image_start = RAMDISK_FLAGS & RAMDISK_IMAGE_START_MASK; rd_prompt = ((RAMDISK_FLAGS & RAMDISK_PROMPT_FLAG) != 0); rd_doload = ((RAMDISK_FLAGS & RAMDISK_LOAD_FLAG) != 0); #endif ARCH_SETUP if (efi_enabled) efi_init(); else setup_memory_region(); copy_edd(); if (!MOUNT_ROOT_RDONLY) root_mountflags &= ~MS_RDONLY; init_mm.start_code = (unsigned long) _text; init_mm.end_code = (unsigned long) _etext; init_mm.end_data = (unsigned long) _edata; init_mm.brk = init_pg_tables_end + PAGE_OFFSET; code_resource.start = virt_to_phys(_text); code_resource.end = virt_to_phys(_etext)-1; data_resource.start = virt_to_phys(_etext); data_resource.end = virt_to_phys(_edata)-1; parse_cmdline_early(cmdline_p); max_low_pfn = setup_memory(); /* * NOTE: before this point _nobody_ is allowed to allocate * any memory using the bootmem allocator. */ #ifdef CONFIG_SMP smp_alloc_memory(); /* AP processor realmode stacks in low memory*/ #endif paging_init(); #ifdef CONFIG_EARLY_PRINTK { char *s = strstr(*cmdline_p, "earlyprintk="); if (s) { extern void setup_early_printk(char *); setup_early_printk(s); printk("early console enabled\n"); } } #endif Stop me if you have a guess! setup_memory() looks the most likely candidate (next to paging_init() ) No, it seems to return from setup_memory(), at least, it hangs when the loop is placed after that. The bad region is currently: ./init/main.c: setup_arch(&command_line); ./arch/i386/kernel/setup.c: #ifdef CONFIG_SMP smp_alloc_memory(); /* AP processor realmode stacks in low memory*/ #endif paging_init(); #ifdef CONFIG_EARLY_PRINTK { char *s = strstr(*cmdline_p, "earlyprintk="); if (s) { extern void setup_early_printk(char *); setup_early_printk(s); printk("early console enabled\n"); } } #endif Next try is just before the printk stuff Since I assume CONFIG_SMP is undefined, our next winner is: ./init/main.c: setup_arch(&command_line); ./arch/i386/kernel/setup.c: paging_init(); ./arch/i386/mm/init.c: pagetable_init(); load_cr3(swapper_pg_dir); #ifdef CONFIG_X86_PAE /* * We will bail out later - printk doesn't work right now so * the user would just see a hanging kernel. */ if (cpu_has_pae) set_in_cr4(X86_CR4_PAE); #endif __flush_tlb_all(); /* * Subtle. SMP is doing it's boot stuff late (because it has to * fork idle threads) - but it also needs low mappings for the * protected-mode entry to work. We zap these entries only after * the WP-bit has been tested. */ #ifndef CONFIG_SMP zap_low_mappings(); #endif kmap_init(); zone_sizes_init(); New winner! ./init/main.c: setup_arch(&command_line); ./arch/i386/kernel/setup.c: paging_init(); ./arch/i386/mm/init.c: load_cr3(swapper_pg_dir); include/asm-i386/processor.h: #define load_cr3(pgdir) \ asm volatile("movl %0,%%cr3": :"r" (__pa(pgdir))) Can't really go any further with the loop trick. I can imagine: - swapper_pg_dir is corrupt or wrongly computed - CPU or cache reacts badly to or needs special environment when loading cr3 on via - peepholer came and did evil - where swapper_pg_dir points to is somehow diseased or electrified Please advise if I can make any further useful moves. Ok based on this, can you try adding mem=nopentium to the kernel commandline and see if that makes a difference ? Sorry, mem=nopentium does not make a difference either on the RH-compiled -358 or my modified one, instant reboot in both cases. Could you uncomment these lines from arch/i386/mm/init.c: /* Enable PGE if available */ if (cpu_has_pge) { set_in_cr4(X86_CR4_PGE); __PAGE_KERNEL |= _PAGE_GLOBAL; } does this make any difference to the problem? The only place in that file mentioning PGE is this: /* Make it "global" too if supported */ if (cpu_has_pge) { set_in_cr4(X86_CR4_PGE); #if !defined(CONFIG_X86_SWITCH_PAGETABLES) __pe += _PAGE_GLOBAL; __PAGE_KERNEL |= _PAGE_GLOBAL; #endif } I do not find any commented section as you describe in the sources for 2.6.5-1.358 yeah - the cpu_has_pge branch - could you uncomment it? or change it to: if (0) { Sorry Ingo, do you mean "comment" instead of "uncomment" then? I will do this now. another thing to try: replace the pagetable loading (load_cr3() line) with __flush_tlb_global(). the cr3 doesnt have to be loaded - we already loaded swapper_pg_dir in arch/i386/kernel/head.S. So it must be the flush somehow causing trouble - we most likely somehow created pagetable contents that cause the next instruction to triple-fault right away. This has to be some really fubar situation though - all of the kernel's mapping have to go away, including the GDT, TSS and IDT. But it's all very weird. yeah - comment it. Just make sure that code doesnt get run. (it's the code that sets the PGE bit in the kernel mappings. This is on the theory that perhaps the CPU has some weirdness with PGE handling.) (i quoted the wrong code because FC2 has the 4:4 patch applied.) Commenting out that if { ... } block made no difference, it still reboots. I have the for(;;) ; still waiting after the load_cr3() so it is not like it is getting any further. Now I will try the __flush_tld_global(); replacement. I removed the commenting around the if { ... } block we tried. Still reboots. Maybe that is interesting... could it be merely to do with the position of the code in memory? Something crapping on the code or breaking the decompression? If you add a __flush_tlb_global() _before_ the call to pagetable_init() [in paging_init()], does that cause a reboot too? I.e. are the pagetables already corrupt when we enter paging_init(), or do they get corrupted during pagetable_init(). pagetable init goes on like this: we've got some pre-constructed pagetables that are present in the kernel image when we boot - these cover the first 8 MB of RAM. pagetable_init() extends the pagetable setup to cover the whole RAM. It still redoes the whole pagetable though, so if it's somehow messed up (or the CPU is confused), it could corrupt the pagetable for this code. the pagetable had to be correct at some earlier point, or we'd not be executing this code ... another (random) suggestion: do you see the same symptoms if you remove one RAM module from the system? [my theory is that smaller RAM will cause smaller initialization in pagetable_init(), and could avert/impact this corruption problem.] Here is the situation in paging_init() at the moment, then: void __init paging_init(void) { __flush_tlb_global(); for(;;) ; pagetable_init(); //loop hangs // load_cr3(swapper_pg_dir); // !!!! rebooter __flush_tlb_global(); for(;;) ; *** Result: a hang, NOT a reboot. I do have a dim idea of the pagetable stuff from the work I did on the Xbox Clean BIOS, I got paging (more importantly, segfaulting) working on that. I also designed a hardware device that sat on the Xbox's LPC bus and memory mapped some SRAM and allowed debugging IO back to a PC, something I'm starting to wish we had on this guy. I only have one 256MB stick of RAM on this board. How do you normally get debug info out at this early stage? This motherboard has a serial port, maybe it is possible to consider adding a loop to dump stuff, like the pagetable contents, by directly tickling the serial IO ports? cool - it would be great if you could try to dump the (relevant) pagetable contents prior pagetable_init() and after pagetable_init(). what 'relevant' is hard to tell, but to make it easier to compare, could you run with mem=nopentium from now on? This will force 4K paging and the 'relevant' pagetable contents should be identical prior and after pagetable_init() - making comparison easier. i'd wager that 'relevant' right now means the following 3 pages: swapper_pg_dir [you guessed that], pg0 and pg1. You can access pg0 and pg1 by declaring them like this: extern char pg0[4096]; extern char pg1[4096]; pg0 and pg1 are the first two 'pte' pagetables, covering the linear addresses of 0-4MB and 4MB-8MB. This linear range is also aliased to 3GB (via entries 768 and 769 in swapper_pg_dir) - this is where the kernel executes in fact. so swapper_pg_dir should have entry 0 and entry 1 set to pg0 and pg1, and entry 768 and 769 set to pg0 and pg1 too. Neither of these 4 entries ought to change during pagetable_init(), nor should the contents of pg0 and pg1 change. [this all is only true if mem=nopentium is used.] you can do printks this early over the serial console - activate CONFIG_EARLY_PRINT in your .config and add the following boot command line option: earlyprintk=serial,ttyS0,38400 after this point all kernel messages should go to the serial console. I use this feature quite often, i typically use 'minicom' on another Linux box and connect the two via a null modem cable. I will try to set this up, but it will take a while. Last time I needed a serial cable was about ten years ago :-) Re: Comment 28 > At what point in startup is it stalling for you? It stalls when trying to boot INIT. The last output I see is: Mounting root filesystem kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Freeing unused kernel memory: 144k freed > And this is a stab in the dark, but does adding "vdso=0" to the > kernel boot command line help matters? Not in my case either. I feel I should mention that I am not a kernel developer so my knowledge is limited in this area. If there's any further help I can give, let me know. Brian: Are you fully running a FC2 userspace ? Also does apm=off on the commandline help ? The "hangs at Freeing unused kernel memory" thing is a different bug as far as I can see, so I would like to propose to open a separate bugzilla for that, to not mix the 2 bugs up and confuse matters too much. > Are you fully running a FC2 userspace ? > Also does apm=off on the commandline help ? No to both. > The "hangs at Freeing unused kernel memory" thing is a different bug > as far as I can see Well, it came up after removing CONFIG_M686 from the i586 kernel config file in hopes of fixing this bug, so since it was a result of fiddeling around with this bug, I'll avoid opening another bug report for it. If I can be of any assistance, let me know. I have a working Null DIY frankencable tested with Minicom on both sides. I added a "Hello World" printk() before the spin for(). I added earlyprintk=serial,ttyS0,38400 on the grub commandline for my custom kernel. I verified that CONFIG_EARLY_PRINTK=y in .config (it was on by default) I don't get any messages on my terminal when it boots and hangs :-( Aha -- under the same circumstances with -327 kernel, which I am booting into to compile and so on, I DO get the early kernel messages on the serial terminal. I'll move my hello world up a bit earlier then. Nope, I don't get any output from my custom -358 kernel, even with the printk() just before the call to paging_init() in ./arch/i386/kernel/setup.c. printk("HELLO WORLD"); is what I have... should I flush the printk buffer somehow before I enter the spinloop? Or will printk() just not work until this stuff is right? I had a google around, I could not find a kernel function to flush the printk buffer. Can the data be easily issued at a lower, unbuffered level than printk? If the earlyprintk commandline thing has been parsed already, the UART will be set up. Maybe some code can sit there polling the UART status regs and poking some new data in when the old stuff is gone. hm, i think the problem is that the UART has not been initialized yet. in arch/i386/kernel/setup.c, there's this code: paging_init(); #ifdef CONFIG_EARLY_PRINTK { char *s = strstr(*cmdline_p, "earlyprintk="); if (s) { extern void setup_early_printk(char *); setup_early_printk(s); printk("early console enabled\n"); } } #endif could you try to move the paging_init() code to after this code? I am not 100% certain that setup_early_printk() will work fine without having the full pagetables, but it ought to. If it doesnt work [i.e. setup_early_printk() crashes and you dont get the 'early console enabled' message over the serial line], then there's yet another way: you can trick the UART into being set up via GRUB. Just enable the serial console in GRUB via something like this in /etc/grub.conf: serial --unit=0 --speed=19200 terminal --timeout=0 serial (NOTE: the maximum speed of GRUB's UART driver is somewhere around 38400 - while the kernel can drive it at 115200 - so use the lower speed for both.) and after this point you can try the attached lowlevel code that implements a simple printk based on UART output. (it hardcodes ttyS1 iirc.) but lets hope the simpler method of reodering the initialization will help too. Created attachment 100447 [details]
very early printk code
very early printk code - it relies on the UART being set up via GRUB or LILO.
It's hardcoded to ttyS1.
>hm, i think the problem is that the UART has not
>been initialized yet.
Doh!!! There it was right in front of me.
Okay, I moved the paging_init() call to after the printk init and now
I have some normal-looking output! Good news!!
Linux version 2.6.5-1.358custom (root.ath.cx) (gcc version
3.3.3 20040412 (Red Hat Linux 3.3.3-7)) #22 Sat May 22 08:28:21 BST 2004
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000eff0000 (usable)
BIOS-e820: 000000000eff0000 - 000000000eff3000 (ACPI NVS)
BIOS-e820: 000000000eff3000 - 000000000f000000 (ACPI data)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
239MB LOWMEM available.
early console enabled
Pre-paging_init() call HELLO WORLD
Now I will stick in some loops to issue the paging table data as %02X
Unfortunately the sources aren't completely matching up with the
earlier (#57) advice about what to dump.
It seems there is a single array holding the paging data, pg0[], which
is defined at include/asm/pgtable.h:191 as being an unsigned long [].
It has a comment by it suggesting that perhaps pg1 was merged into it
at some point. At any rate at link time pg1[] is undefined so I can't
dump it.
What I am dumping at the moment then is 1024 %08lX from
swapper_pg_dir[] and 2048 %08lX from pg0[], both before the call to
paging_init() and just before the for(;;) ; loop
Here is the dump code so you can be certain of what you are getting:
{
extern unsigned long pg0[];
unsigned long *pb=(unsigned long *)swapper_pg_dir;
int n;
printk("***DUMPING PAGING TABLES\n");
printk("swapper_pg_dir = 0x%08X\n", (unsigned int)pb);
for(n=0;n<1024;n++) { if(!(n&3)) printk("%04X: ", n); printk("%08lX
", (unsigned long)pb[n]); if((n&3)==3) printk("\n"); }
printk("\n");
printk("pg0 = 0x%08X\n", (unsigned int)&pg0[0]);
for(n=0;n<2048;n++) { if(!(n&3)) printk("%04X: ", n);
printk("%08lX ", (unsigned long)pg0[n]); if((n&3)==3) printk("\n"); }
}
There ARE some small differences in the pg0[] array before and after.
< 0354: 00354067 00355067 00356067 00357007
< 0358: 00358007 00359007 0035A007 0035B007
< 035C: 0035C007 0035D007 0035E007 0035F007
---
> 0354: 00354067 00355067 00356067 00357067
> 0358: 00358067 00359067 0035A067 0035B067
> 035C: 0035C067 0035D067 0035E067 0035F067
I will attach the full log now.
Created attachment 100448 [details]
dump of paging tables
Hold on, I had the second dump BEFORE the call to pagetable_init(); I am compiling this as the second dump now: void __init paging_init(void) { __flush_tlb_global(); pagetable_init(); printk("JUST BEFORE HANG LOOP\n"); { extern unsigned long pg0[]; unsigned long *pb=(unsigned long *)swapper_pg_dir; int n; printk("***DUMPING PAGING TABLES\n"); printk("swapper_pg_dir = 0x%08X\n", (unsigned int)pb); for(n=0;n<1024;n++) { if(!(n&3)) printk("%04X: ", n); printk("%08lX ", (unsigned long)pb[n]); if((n&3)==3) printk("\n"); } printk("\n"); printk("pg0 = 0x%08X\n", (unsigned int)&pg0[0]); for(n=0;n<2048;n++) { if(!(n&3)) printk("%04X: ", n); printk("%08lX ", (unsigned long)pg0[n]); if((n&3)==3) printk("\n"); } } // pagetable_init(); for(;;) ; //loop hangs // load_cr3(swapper_pg_dir); // !!!! rebooter That test resets before it can print the second dump. The initial __flush_tlb_global() was not in the original -358 source (it was added as a test), commenting it out and trying again. If it still resets that's a big clue that maybe pagetable_init() is doing something to destroy the environment. Woohoo, forced a new behaviour out of it, we must be on the right track. This time it prints only this from the second dump: JUST BEFORE HANG LOOP ***DUMPING PAGING TABLES and then HANGS ITSELF, we never saw that before. For whatever reason it can handle a simple printk() but not one with %08lX? Or one that touches swapper_pg_dir? Going to move a copy of the dump block inside pagetable_init() and see if we can probe out when the environment trashing action begins. Yes - the hang you get in the dumper is a likely sign that the pagetables are somehow corrupted/invalid. The dumping itself activates more kernel code, so the TLBs get flushed 'naturally', then get reloaded from the now-invalid pagetable entry - kaboom. (you are doing all these runs with mem=nopentium, correct?) One other method opposed to dumping would be to validate that the entries pagetable_init() is creating match the previous content. This code is at around line 217 in arch/i386/mm/init.c: *pte = mk_pte_phys(vaddr-start, PAGE_KERNEL); could you add a sanity-check, something along the lines of: prev_val = pte->pte_low; *pte = mk_pte_phys(vaddr-start, PAGE_KERNEL); if ((vaddr <= 8*1024*1024) || ((vaddr >= 3*1024*1024*1024) && (vaddr < 3*1024*1024*1024+8*1024*1024))) if (prev_val != pte->pte_low) printk("ouch! %08lx != %08lx for vaddr %08lx\n", prev_val, pte->pte_low, vaddr); i.e. in the 0...8MB and 3GB...3GB+8MB virtual address ranges, check that the new and the old values of the pte match. (there are other places where the pagetable can get corrupted, but this would be the most likely one.) another method: could you write a function that is a copy of setup_identity_mappings() but does not actually modify the pagetables, only checks that the already existing contents of the pagetable match the expected value. then you can add calls to this function (lets call it check_pagetables()) from every possible place in the pagetable code - even from within setup_identity_mappings(). Created attachment 100450 [details]
check_pagetables()
i've attached a quick implementation of check_pagetables().
you should be able to call this function from any place in the kernel,
it iterates over these two 8 MB ranges and simply returns if everything
is OK. If it finds an illegal value then it prints the values and does
a BUG() [which is an assert to halts the kernel].
i havent tested this code yet, but it compiles.
Okay, bizarro corrupt text from printk() problems start partway through pagetable_init(). static void __init pagetable_init (void) { unsigned long vaddr, end; pgd_t *pgd_base; #ifdef CONFIG_X86_PAE int i; #endif // pagetable dump here OKAY /* * This can be zero as well - no problem, in that case we exit * the loops anyway due to the PTRS_PER_* conditions. */ end = (unsigned long)__va(max_low_pfn*PAGE_SIZE); pgd_base = swapper_pg_dir; #ifdef CONFIG_X86_PAE /* * It causes too many problems if there's no proper pmd set up * for all 4 entries of the PGD - so we allocate all of them. * PAE systems will not miss this extra 4-8K anyway ... */ for (i = 0; i < PTRS_PER_PGD; i++) { pmd_t *pmd = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE); set_pgd(pgd_base + i, __pgd(__pa(pmd) + 0x1)); } #endif // pagetable dump here (test 1) OKAY /* * Set up lowmem-sized identity mappings at PAGE_OFFSET: */ setup_identity_mappings(pgd_base, PAGE_OFFSET, end); /* * Add flat-mode identity-mappings - SMP needs it when * starting up on an AP from real-mode. (In the non-PAE * case we already have these mappings through head.S.) * All user-space mappings are explicitly cleared after * SMP startup. */ #if defined(CONFIG_SMP) && defined(CONFIG_X86_PAE) setup_identity_mappings(pgd_base, 0, 16*1024*1024); #endif // pagetable dump (test 2) here BROKEN pagetable_init() test 2 ***DUMPING PAGING TABLES swapper_pg_dir = 0xC0347000 0000: 00391027 00000000 00000000 00000000 0004: 00000000 00000000 00000000 00000000 0008: 00000000 00000000 00000000 00000000 ..... 00F8: 00000000 00000000 00000000 000^W^D^Aë,ö^F^W^D^Ct^D.<8a>g^A<86>Ã<8b>^^^\^D<89><87>u^D<8b>^^<80>^D;^^^Dt^D<89>^^^\^Dú° æ ûa^_Ã!2@3#4$5%6^7&8*9(0)-_=+^H^H RtTÃYuUiIoOpP[{]}^DÿaAsSdDfFgGhHjJkKlL;:'"`~^Bÿ\|zZxXcCvVbBnNmM,<.>/?^Aÿÿ @ÿÿÿÿ ....(more crap)... Drive A error. System halt DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTERBIOS ROM checksum errorKeyboard controller errorKeyboard error or no keyboard present Detecting floppy drive A media...Drive media is : 1.44Mb 1.2Mb 720Kb 360Kb ...(more crap)... 003EE007 003EF007 03F0: 003F0007 003F1007 003F2007 003F3007 03F4: 003F4007 003F5007 003F6007 003F7007 03F8: 003F8007 003F9007 003FA007 003FB007 ...(rest of dump) Subsequent test dumps are running, but with this weirdo corruption. It seems to run all the test dumps this time. Anyway, the point is something in setup_identity_mappings(pgd_base, PAGE_OFFSET, end); seems to corrupt the environment such that printk() becomes unreliable. hm, the early pagetable setup code changed a bit since i last touched it. We now runtime-construct the early pagetables, in startup_32. so the check_pagetables() code is not fully usable. We dont reuse pg0, all pagetables are allocated anew via bootmem_alloc(). So the kernel changes the pgd entries during pagetable init. i cannot see any immediate bug in this new method, but it's concievable that this somehow causes the C3 problems. this recent patch changed the early pagetable handling: [PATCH] i386 very early memory detection cleanup patch i cannot convince myself that it's correct - it uses an area of RAM for pagetable init that it knows nothing about (end of the kernel image). Furthermore, i cannot see how it can guarantee that bootmem doesnt stomp over this area as soon as we start constructing the new pagetables. _Normally_ it could go well if we manage to hold on to our TLBs, but if the CPU's TLBs are small enough then this could be a problem. i'll extract and attach the patch - could you try to unapply it? [but the 4G patch likely interferes so i dont have high hopes ...] Created attachment 100451 [details]
please try to unapply it from your tree - it will likely not succeed though.
please make a copy of your tree first, to make sure the failed unapply
of this patch doesnt damage it.
Unfortunately I have to go pick my wife up from the airport, which is an NMI for me :-) I will probably not be able to do more tests until tomorrow :-( I do not have any experience unapplying patches, I can do this if given exact instructions. But I suspect perhaps someone else reading may well have the experience and the time to try it in the meanwhile :-) I must say your last note is very encouraging! Created attachment 100452 [details]
test-patch
i've attached a small patch that is an easy way to check whether my
theory holds. It changes pg0 to be part of the kernel image and allocates
32K of space for it - enough for the root pagetable and the pagetable
entries. So if the memdetect patch doesnt get allocation right then this
patch will automatically protect the early pagetable contents.
unapplying is easy: add -R to the patch command you use to apply patches. E.g. 'patch -p1 -R < 1'. but lets not do the unapplying, i dont think it will succeed. Please try my last patch instead. the main argument against the bootmem-stomp idea is that without nopentium it crashes too. The PSE case is really simple: there are no additional pagetables, everything is set up within swapper_pg_dir. ah ... Arjan says that the C3 might not have the PSE feature. how does your /proc/cpuinfo look like? if the CPU does not have a PSE then the pagetable arguments get stronger - it's an atypical case on other systems. (almost all CPUs these days have the PSE bit, so a bug in the non-PSE case does not get noticed too quickly.) the memdetect patch introduced init_pg_tables_end, which seems to guarantee that bootmem does not stomp on the early pagetables. the patch is still suspect though. does a vanilla kernel (eg 2.6.6) fail on your box too? Hi Ingo - I am back but I am just driving by, I am cooking tea. My cpuinfo can be found in comment #5. It mentions PGE but not PSE. I did not try a vanilla kernel. Maybe some of the other users can comment if they tried a vanilla kernel. Tomorrow morning if there is no resolution in the meanwhile I will follow your test patch directions. People probably don't say this often enough: I'm very glad you guys are around and well funded. Same here :( kernel-2.6.5-1.358.i586 and rebooting on decompresing kernel. Vanilla 2.6.6 (CyrixIII/C3) works like a charm. [root@pajonk rc.d]# cat /proc/cpuinfo processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 7 model name : VIA Samuel 2 stepping : 3 cpu MHz : 733.376 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge mmx 3dnow bogomips : 1464.72 *** Bug 123935 has been marked as a duplicate of this bug. *** Could you attach your kernel config or mail it to me? My "Vanilla" 2.6.6 failed to boot after having been built with the VIA C3 selected. Failure was pretty much what we've seen as the symptom of this bug. I'll compare and report any differences.. Thanks. Originally logged under 123935 Do the boot.iso or rescuecd.iso images use a differently configured kernel? I have tried booting from each of those images and the exact same thing happens. I would have assumed that the rescuecd.iso would have had a much simpler kernel (e.g. less compiled in features/load as module, etc). I have tried the patch and unfortunately it did not seem to make much difference, quite possibly I am not testing quite what we want to test. I should recap with the current state of the actual code being tried, perhaps. arch/i386/mm/init.c setup_identity_mappings() now has this // original code: for (k = 0; k < PTRS_PER_PTE; pte++, k++) { vaddr = i*PGDIR_SIZE + j*PMD_SIZE + k*PAGE_SIZE; if (end && (vaddr >= end)) break; if (vaddr < start) continue; // added code: { unsigned long prev_val = (unsigned long)pte->pte_low; *pte = mk_pte_phys(vaddr-start, PAGE_KERNEL); if ((vaddr <= 8*1024*1024) || ((vaddr >= ((unsigned int)3*1024*1024*1024)) && (vaddr < ((unsigned int)3*1024*1024*1024)+((unsigned int)8*1024*1024)))) if (prev_val != pte->pte_low) printk("!!!!!!!!!!!!!!!!!!!! ouch! %08lx != %08lx for vaddr %08lx\n", prev_val, pte->pte_low, vaddr); } } set_pmd(pmd, __pmd(_KERNPG_TABLE + __pa(pte_base))); } } arch/i386/kernel/vmlinux.lds.S: __bss_start = .; /* BSS */ .bss : { *(.bss.page_aligned) *(.bss) } . = ALIGN(4); __bss_stop = .; /* _end = . ; */ /* This is where the kernel creates the early boot page tables */ . = ALIGN(4096); pg0 = .; . = pg0 + 32768 ; _end = . ; /* Sections to be discarded */ /DISCARD/ : { *(.exitcall.exit) } I took out all my for(;;) ; hanging loops, but I still have my dumping loops in pagetable_init() and they still start failing with garbage after the call to setup_identity_mappings(pgd_base, PAGE_OFFSET, end); in there. I will attach the dump. Created attachment 100473 [details]
dump of paging tables degenerating into garbage
Should add if you look down the dumps there are a lot of complaints coming out of the sanity check code we added to setup_identity_mappings(), possibly the sanity check code is broken (signed compares?) or this is telling us about the actual pagetable corruption. These pop out just before printk() becomes unreliable. !!!!!!!!!!!!!!!!!!!! ouch! 00000001 != 00000063 for vaddr c0000000 !!!!!!!!!!!!!!!!!!!! ouch! f000e816 != 00001063 for vaddr c0001000 !!!!!!!!!!!!!!!!!!!! ouch! f000e2c3 != 00002063 for vaddr c0002000 !!!!!!!!!!!!!!!!!!!! ouch! f000e816 != 00003063 for vaddr c0003000 !!!!!!!!!!!!!!!!!!!! ouch! f000e816 != 00004063 for vaddr c0004000 !!!!!!!!!!!!!!!!!!!! ouch! f000ff54 != 00005063 for vaddr c0005000 .... Hum, had the idea to go compare the arch/i386/mm/init.c from the working -327 and broken -358, they seem pretty much identical :-( I think the magic ingredient to cause the disaster must be elsewhere, even if the car crash is happening in setup_identity_mappings(). Just downloaded the rescuecd.iso image for 1.92 (I think that is test3?). This successfully boots. I have copied uname & /proc/cpuinfo below:- Linux localhost.localdomain 2.6.5-1.327 #1 Sun Apr 18 04:51:55 EDT 2004 i686 unknown processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 7 model name : VIA Ezra stepping : 10 cpu MHz : 800.252 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge mmx 3dnow bogomips : 1576.96 Can anyone point me in the direction of finding out what has changed between test3 & the release version. I really want to get Fedora2 running on this machine. Lee, you're in the same boat as I am, kernel -327 boots and works (but I find it will freeze after 48hrs or so). Read the early posts on this bug carefully and you'll see that kernels either side of -326 and -327 do not boot. The rest of the posts are trying to find out why, if you are reading this bug then I guess you have the very latest info on the problem. Decided to stick in some printk()s in setup_identity_mappings() to try to see what is happening. I have a very limited idea of what the code is trying to achieve. Here it is with my dumps: void setup_identity_mappings(pgd_t *pgd_base, unsigned long start, unsigned long end) { unsigned long vaddr; pgd_t *pgd; int i, j, k; pmd_t *pmd; pte_t *pte, *pte_base; pgd = pgd_base; printk("setup_identity_mappings(pdg_base=%p, start=0x%08lX, end=0x%08lX);\n", pgd_base, start, end); printk("PTRS_PER_PGD=0x%08X, PTRS_PER_PMD=0x%08X, PTRS_PER_PTE=0x%08X, cpu_has_pse=%d, cpu_has_pge=%d, PGDIR_SIZE=0x%08lX\n", PTRS_PER_PGD, PTRS_PER_PMD, PTRS_PER_PTE, cpu_has_pse, cpu_has_pge, PGDIR_SIZE); for (i = 0; i < PTRS_PER_PGD; pgd++, i++) { vaddr = i*PGDIR_SIZE; // PGDIR_SIZE=4M if (end && (vaddr >= end)) break; pmd = pmd_offset(pgd, 0); printk("i=%d, vaddr=0x%08lX, pmd=%p\n", i, vaddr, pmd); for (j = 0; j < PTRS_PER_PMD; pmd++, j++) { vaddr = i*PGDIR_SIZE + j*PMD_SIZE; printk(" i=%d, j=%d, vaddr=0x%08lX, ", i, j, vaddr); if (end && (vaddr >= end)) break; if (vaddr < start) continue; if (cpu_has_pse) { unsigned long __pe; set_in_cr4(X86_CR4_PSE); boot_cpu_data.wp_works_ok = 1; __pe = _KERNPG_TABLE + _PAGE_PSE + vaddr - start; /* Make it "global" too if supported */ if (cpu_has_pge) { set_in_cr4(X86_CR4_PGE); #if !defined(CONFIG_X86_SWITCH_PAGETABLES) __pe += _PAGE_GLOBAL; __PAGE_KERNEL |= _PAGE_GLOBAL; #endif } set_pmd(pmd, __pmd(__pe)); continue; } if (!pmd_present(*pmd)) { pte_base = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE); printk(" (pmd not present) "); } else { pte_base = (pte_t *) page_address(pmd_page(*pmd)); printk(" (pmd present) "); } pte = pte_base; printk("pte_base=0x%p\n", pte); for (k = 0; k < PTRS_PER_PTE; pte++, k++) { vaddr = i*PGDIR_SIZE + j*PMD_SIZE + k*PAGE_SIZE; printk(" i=%d, j=%d, k=%d: vaddr=0x%08lX ", i, j, k, vaddr); if (end && (vaddr >= end)) break; if (vaddr < start) continue; { //unsigned long prev_val = (unsigned long)pte->pte_low; *pte = mk_pte_phys(vaddr-start, PAGE_KERNEL); printk("--> 0x%08lX\n", pte->pte_low); /* if ((vaddr <= 8*1024*1024) || ((vaddr >= ((unsigned int)3*1024*1024*1024)) && (vaddr < ((unsigned int)3*1024*1024*1024)+((unsigned int)8*1024*1024)))) if ((unsigned long)prev_val != (unsigned long)pte->pte_low) printk("!! ouch! %08lx != %08lx for vaddr %08lx\n", prev_val, pte->pte_low, vaddr); */ } } set_pmd(pmd, __pmd(_KERNPG_TABLE + __pa(pte_base))); } } } The behaviour of the code is like this. First there are these header vars dumped, indicating the calling params, etc. setup_identity_mappings(pdg_base=c0347000, start=0xC0000000, end=0xCEFF0000); PTRS_PER_PGD=0x00000400, PTRS_PER_PMD=0x00000001, PTRS_PER_PTE=0x00000400, cpu_has_pse=0, cpu_has_pge=1, PGDIR_SIZE=0x00400000 Then for the first 768 times around the outer loop, it does a continue in the k loop: i=0, vaddr=0x00000000, pmd=c0347000 i=0, j=0, vaddr=0x00000000, i=1, vaddr=0x00400000, pmd=c0347004 i=1, j=0, vaddr=0x00400000, i=2, vaddr=0x00800000, pmd=c0347008 i=2, j=0, vaddr=0x00800000, .... i=768, vaddr=0xC0000000, pmd=c0347c00 i=768, j=0, vaddr=0xC0000000, Then on the 768th one, it enters the innermost loop, first finding the pte address --> (pmd present) pte_base=0x00000000 WHICH IS NULL (this seems WRONG??? Seems like it is used as a POINTER???) Then it does the inner loop 1K times i=768, j=0, k=0: vaddr=0xC0000000 --> 0x00000063 i=768, j=0, k=1: vaddr=0xC0001000 --> 0x00001063 i=768, j=0, k=2: vaddr=0xC0002000 --> 0x00002063 i=768, j=0, k=3: vaddr=0xC0003000 --> 0x00003063 .... i=768, j=0, k=1021: vaddr=0xC03FD000 --> 0x003FD063 i=768, j=0, k=1022: vaddr=0xC03FE000 --> 0x003FE063 i=768, j=0, k=1023: vaddr=0xC03FF000 --> 0x003FF063 It proceeds to do 1024 inner blocks for i=769 thru 827, although on the 827 one it seems to abort early at k=1008... i=827, j=0, k=1005: vaddr=0xCEFED000 --> 0x0EFED063 i=827, j=0, k=1006: vaddr=0xCEFEE000 --> 0x0EFEE063 i=827, j=0, k=1007: vaddr=0xCEFEF000 --> 0x0EFEF063 i=827, j=0, k=1008: vaddr=0xCEFF0000 pagetable_init() test 2 ***DUMPING PAGING TABLES swapper_pg_dir = 0xC0347000 0000: 00391027 00000000 00000000 00000000 0004: 00000000 00000000 00000000 00000000 ... it then seems to return and do the pagetable_init() test 2 dump which is after the call to this routine. Maybe it crapped on its printk() buffer for the last 15 times around the loop? Don't know. It completes the dump and then reboots. Anyway, the interesting thing is that pte_base in the above copied code comes out as 0x00000000 at runtime. That seems wrong to my undereducated eyes. Looking a bit harder it is not aborting early, but because it hit the end address of 0xceff0000 set by the third calling param, then returned cleanly to the caller which does the dump. But 0x00000000 can't be right for that pte pointer, unless it 'just so happens' that the memory at 0x00000000 is being used as the pte table... this seems unlikely????? Indeed you seem to be on to something. pte_base = NULL is almost certainly incorrect. Even if we allocated physical address zero as the pagetable (which is close to impossible, since we mark it as reserved - certain BIOSes rely on it for suspend), even then it should be 0xc0000000. so pte_base = NULL means that the pmd_present() == true condition is wrong. the only way this can happen is if the head.S code does the root-pagetable-setup incorrectly. to test this theory, could you comment out the true branch from the pmd_present() condition? Something like: // if (!pmd_present(*pmd)) pte_base = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE); // else // pte_base = (pte_t *) page_address(pmd_page(*pmd)); this will cause us to allocate new pagetables and not accept the pre-generated head.S layout. Yep, BINGO Early printk stuff completes my dumps and then finally: zapping low mappings. On node 0 totalpages: 61424 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 57328 pages, LIFO batch:13 HighMem zone: 0 pages, LIFO batch:1 DMI 2.2 present. ACPI: RSDP (v000 VT9174 ) @ 0x000f6650 ACPI: RSDT (v001 VT9174 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0eff3000 ACPI: FADT (v001 VT9174 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0eff3040 ACPI: DSDT (v001 VT9174 AWRDACPI 0x00001000 MSFT 0x0100000c) @ 0x00000000 ACPI: PM-Timer IO Port: 0x408 Built 1 zonelists Kernel command line: ro root=LABEL=/ mem=nopentium earlyprintk=serial,ttyS0,38400 Initializing CPU#0 CPU 0 irqstacks, hard=c034d000 soft=c034c000 PID hash table entries: 1024 (order 10: 8192 bytes) Detected 599.892 MHz processor. Using pmtmr for high-res timesource disabling early console It has booted up all the way to the login prompt in fact :-D Seems like adding one of those BUG() asserts you mentioned checking pte_base!=NULL would be a good addition to the actual kernel code, it is not in the innermost loop so there would be no real performance penalty. Since pmd_present() can disagree with getting a non-null result from page_address().... Can I help probe the problem behind this, presumably in head.S then? i've reviewed the head.S code and it seems to be correct. Could you please print out some more state in the pte_base == NULL case? It would be quite useful to print out the raw pmd value. One of your earlier dumps showed these swapper_pg_dir contents: 0000: 00391027 00000000 00000000 00000000 0004: 00000000 00000000 00000000 00000000 this means that the pmd value was 0x00391027 or 0x00000000. pmd_present() tests bit 0 of the pmd - so only the first entry could be pmd_true - but in that case pte_base should have been 0xc0391000. pte_base = NULL and pmd_present() means that the entry in swapper_pg_dir must have been 0x00000001 (or perhaps 0x00000027). None of the dumps suggest this though. Another (remote) possibility would be that some sort of non-RAM page ends up being used for pagetables. This can lead to similarly funny results. How does a full bootup log look like on your box - how does the e820 map (the RAM map, provided by the BIOS) look like? Another question - what is the precise value of pg0 on your box? (should be in your System.map). perhaps there's a boundary condition bug in the head.S code - this should only be possible if pg0+INIT_MAP_BEYOND_END [==pg0+128k] is exactly on a 4 MB boundary. [quite unlikely ...] May 23 08:56:03 backup kernel: BIOS-provided physical RAM map: May 23 08:56:03 backup kernel: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) May 23 08:56:03 backup kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) May 23 08:56:03 backup kernel: BIOS-e820: 0000000000100000 - 000000000eff0000 (usable) May 23 08:56:03 backup kernel: BIOS-e820: 000000000eff0000 - 000000000eff3000 (ACPI NVS) May 23 08:56:03 backup kernel: BIOS-e820: 000000000eff3000 - 000000000f000000 (ACPI data) May 23 08:56:03 backup kernel: BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) May 23 08:56:03 backup kernel: 0MB HIGHMEM available. May 23 08:56:03 backup kernel: 239MB LOWMEM available. From the System.map for the custom kernel c0391000 A pg0 c0399000 A _end ok, could you undo the vmlinux.lds.S hack (which i suggested some time ago, and which still seems to be in your tree)? That changed pg0/end. They should be equal in an unhacked tree. still, they are near to a 4MB boundary, but not near enough to cause trouble i think. what we need is a full dump in the pte_base == NULL case - how did we get into the pmd_present() branch? I think the dump of swapper_pg_dir should be enough as a starter. (the raw value of the pmd is pmd_val(*pmd) - please print that one out too.) The e820 map looks sane and simple - there are only two RAM ranges: 0...640K, 1MB...~240MB. The ACPI areas are all after the end of RAM. So the likelyhood of something weird being near the pagetables is quite slim. (we load the kernel at 1MB physical.) swapper_pg_dir = 0xC0347000 0000: 00391027 00000000 00000000 00000000 ...(all zeros)... 0300: 00391027 00000000 00000000 00000000 ...(all zeros)... Going to remove the pg0/end thing, which is indeed still in. I added some code to find all cases where there is a disagreement problem if(pmd_present(*pmd)) { if(((pte_t *) page_address(pmd_page(*pmd)))==NULL ) { printk("pmd_present TRUE page_address NULL at 0x%08lX\n", pmd_page(*pmd)); } } your pmd_page(*pmd) condition is not correct. the best would be to use: 'pmd_val(*pmd) < PAGE_SIZE'. hm ... this line is incorrect: pte_base = (pte_t *) page_address(pmd_page(*pmd)); I have little understanding of what I am working with, however, here is the original failing code: if (!pmd_present(*pmd)) { pte_base = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE); printk(" (pmd not present) "); } else { pte_base = (pte_t *) page_address(pmd_page(*pmd)); // NULL printk(" (pmd present) "); } The error is happening because pmd_present(*pmd) is true, and the line commented //NULL is NULL. Therefore that is the basis of my test code. Can you explain why using the actual failing code above is 'not correct' when I am looking for the instances where that exact code fails? Ah, just saw your next comment... is this in fact The Bug? the correct could would be something like: pte_base = pte_offset_kernel(pmd, 0); does it boot if you fix that line? Created attachment 100481 [details]
fix
ok, the fix as a patch against the Fedora kernel.
Catching up with previous tests first just in case - Still boots on my test tree with pg0 == _end - got this single instance of the disagreement between pmd_present() and page_address() pmd_present TRUE page_address NULL at 0x00007220 Now I make your fix. i do think this is The Bug. But lets wait for your test first! :-) the bug got introduced the following way: i did a cleanup of the pagetable setup code for the 4:4 patch - but then the meaning of pmd_page() changed sometime in the -mm tree and this stray use remained... It's a winner Ingo, well done :-) Cool :) You did all the hard work though! (this fix does not mean all Via C3 problems are necessarily fixed. E.g. if vanilla 2.6.6 fails for someone that's an indication of some other problem - 2.6.6 does not have this buggy line of code.) Continues to boot fine with the mem=nopentium and the earlyprintk cmdline params removed too. I was experiencing flaky behaviour (freezing) on this board with -327, it would not stay up for >48hrs and sometimes a lot less: this machine does my mail so it is highly visible when it goes down. It'll be interesting to see if that was coming from the same direction and has gone away or if that is another issue. At least I have a serial cable now if it is something else! I did get a vanilla 2.6.6 kernel to boot correctly on the second attempt at building - I changed the highmem config option from 64G to none. - that seems to be it. Can't say when or why it first went on in the first place - I don't have anywhere near that much memory. *** Bug 124066 has been marked as a duplicate of this bug. *** CONFIG_HIGHMEM_64G will certainly not work on this CPU - that feature needs the PAE capability. We should bail out a bit more gracefully in that case though ... right now we just hang i think with a message printk'd that doesnt make it to the console because console_init() has not been done yet. Just noting that -327 has the same bad line, yet it worked. I suppose this can be coming out of the exact details of why pte_base = (pte_t *) page_address(pmd_page(*pmd)); is not right, yet was capable to work on most CPUs anyway. It can also go towards subtly trashed pagetables on -327 being responsible for the 48hr freeze behaviour, eg, it might have mapped the same linear page twice. Do we have a story for why -327 was okay on C3? We used a really broken way to establish the pte_base pointer - and we used that pointer blindly from that point on. in addition, the initial mappings map the NULL address too - so dereferencing it doesnt cause traps. Plus, the first page in the system is _typically_ not used. The creaming is that 99% of the CPUs out there have the PSE feature, so most boxes wont ever hit this codepath. So i'm not surprised -327 worked fine on your box. But we ended up corrupting page 0 - which could have nasty side-effects for BIOS related things like SMM handlers or suspend. If you feel inclined, could you print out the value of pte_base on a rebuild tree of -327, to confirm that it's different? I'm trying to make a new boot.iso so that people can install FC2 on their VIA C3 anyway; I'd love if some of the people on this bug would help test this. The url is http://people.redhat.com/arjanv/c3boot.iso Bad news, the test iso comes up in the grub-type vga display, when I hit enter it pulls in the initrd and kernel, then... reboots :-/ Mounting the test iso image and poking around, an obvious difference is that this is an SMP kernel.... is it i586 though? No obvious way to tell. Just tried the c3boot.iso as well, still same problem. In fact I would probably say it is worse. Where as the original kernel got as far as "uncompressing the kernel" message this one, this one finishes loading the initrd then reboots. Hope this well. Sorry guys I screwed it up (you'd THINK after 3 years in the job I could build a kernel correctly .,.. but no) working on a better iso ok reuploaded now with sane kernel hopefully same url Yes, booting now, comes up on the first page of the install script in textmode, language selection I think. Tried the new boot.iso and the installation process works fine now-- all the way through to the end. But upon the reboot after installation, it crashes in the same way (consistently rebooting) after the GRUB page with kernel 2.6.5-1.358. I've got C3/Eden (Ezra) and 1Ghz. Are there any command line options I should try? Presumably the crash after install is just because the kernel package from the original FC2 media is being installed. Maybe shortly you will be able to come up off the install kernel with linux rescue (IIRC) and rpm -Uvf kernel-new-one-from-Arjan.i586.rpm and that'll be it. Tested the new c3boot.iso - seems to work fine - my install is in progress. I will look forward to the next kernel update! Thanks! Richard VIA C3 800MHz, 256Mb Original BIOS Tried using the new ISO and the install now works fine after following instruction to change CD to FC2 disk 1. I do however have the same problem as post #133, the system reboots after the GRUB page. You're getting there guys so well done for the work so far, would it be reasonable to assume that the eventual outcome of all this will be a new FC2 disk 1 iso that contains the modifications ? Ed Almos Folks, Arjan's new -383 kernel is up at http://people.redhat.com/arjanv/2.6/RPMS.kernel/kernel-2.6.6-1.383.i586.rpm and boots fine on my machine... If you - boot off Arjan's ISO (http://people.redhat.com/arjanv/c3boot.iso) - swap in the normal FC2 CD1 and install FC2 - on completion, boot again off Arjan's ISO - type linux rescue at the grub prompt - swap in a CD with the new 383 kernel RPM - rpm -Uvf kernel-2.6.6-1.383.i586.rpm - type exit (I think) to reboot On reboot you should come up in -383 and hopefully all this will be an ugly memory ;-) I apologize in advance for what are going to be basic questions. 1. I have completed steps 1-4 above. After typing "linux rescue", I get the first couple of screen from the FC2 install (keyboard, language, etc). It's clearly in rescue mode though. At the end of these screens it asks for disc 1 of the FC2 isos. It will not accept the CD with the new rpm. 2. I give it the FC2 disc 1 and then it says that it's mounting my systems at /mnt/sysimage and then gives me a bash prompt. 3. At this point, I put in the CD with the new kernel rpm and type the command above. The shell responds that no such file exists. I use the find command and confirm that it cannot see the new rpm file. 4. I try to mount the cdrom with the mount command but this does not work either. 5. I try to chroot to the /mnt/sysimage and do steps 3. and 4. but this does not work either. What am I doing wrong? This is way over my head. Sounds like you're real close. What exactly was the mount command you tried in your step 4? I would try something like this mkdir mymount mount /dev/hdc mymount -t iso9660 cd mymount ls This assumes /dev/hdc is your CD reader. /dev/cdrom might work too, this is a symlink I think to the actual device. Is that similar to what you tried? Before I try this, do I need to chroot to /mnt/sysimage? Yeah, quite possibly, since that will be where your real rpm database is at. You seem to know what chroot is about but just in case or if anyone else is wondering, it basically replaces / with some other directory. So /mnt/sysimage/bin becomes /bin and so on. Rescue mode comes up with some utils and stuff in /, and your normal root filesystem in /mnt/sysimage. Success!! Thanks for all your help. Now I need to solve the X server problems. Version 2 of the FC2 install sequence for C3s, battle-tested by J Spells, then ... 1) Download http://people.redhat.com/arjanv/2.6/RPMS.kernel/kernel-2.6.6-1.383.i586.rpm 2) burn it on to a CD as a file on its own call it KERNEL RPM CD 3) Download http://people.redhat.com/arjanv/c3boot.iso 4) burn this on to a CD as a CD image, call it C3 BOOT CD 5) boot off C3 BOOT CD 6) at the first menu swap in the normal FC2 CD1 and install FC2 7) on install completion, boot again off C3 BOOT CD... but... 8) type linux rescue at the grub prompt 9) when it is finished booting, type chroot /mnt/sysimage 10) swap in the KERNEL RPM CD 11) mkdir mymount 12) mount /dev/hdc mymount -t iso9660 13) cd mymount 14) rpm -Uvf kernel-2.6.6-1.383.i586.rpm 15) type exit (I think) to reboot On reboot you should come up in -383 and hopefully all this will be an ugly memory ;-) would it be useful if I stuck the 383 kernel onto the boot.iso ? What is boot.iso, the first FC2 CD? That would be a full solution, assuming the kernel package is on CD1, issue a new 700MB or whatever iso for FC2 CD1 which boots into -383 and has the -383 kernel and kernel-source package on it too. I saw your message in fedora-test-list about your intention along these lines. I can see what would be REALLY COOL is if the installer had an option to go to a yum repository before it started and download headers for updated packages, favouring the updates instead of the ones on the CD where the updates were newer. That would allow you to solve this problem by issuing an updated kernel package over yum and telling people they must get that at installtime for C3 installs. Another cool idea would be a small script, which wget-ed the newer kernel, mounted the standard FC2 CD1 ISO with -o loop and replaced the kernel package and updated the install kernel footprint too. Then a download of a few megabytes would automate the update of FC2 CD1 to C3 compatability. I meant putting the 383 kernel rpm file inside the c3boot.iso actually, it would make it unneeded to make 2 cds, and in fact one could rpm -i the kernel during the normal installation already, no need to rescue boot... Yes, that's a great idea would clearly cut out a lot of fiddling. Can you just go to a different virtual console and install the kernel when the main install is over, then? I still like the yum idea for the future, it would ensure that every install had the latest patches from the get-go. ok I uploaded http://people.redhat.com/arjanv/c3boot-2.iso with the RPM on it Version 3pre1 ;-) of the FC2 install sequence for C3s 1) Download http://people.redhat.com/arjanv/c3boot-2.iso 2) burn it on to a CD as a CD image 3) boot off it 4) at the first menu swap in the normal FC2 CD1 and install FC2 5) on install completion, type ctrl-alt-F2 to get to a console 6) rpm -Uvf /!!!Where is the CD mounted???/kernel-2.6.6-1.383.i586.rpm 7) type ctrl-alt-F1 and complete the install Arjan, - what is the path that the CD is mounted at during the install action? - is it true that ctrl-alt-F2 will get you to a bash prompt? - is it true that the installer is back on virtual console 1 - is it true that the installer waits at the end to allow you to do the RPM in another vc Further to comment #136 An ISO image of FC2 CD1 with the 383 kernel would really make my day. Can this be installed on one or two mirrors just for us C3 folk ? Ed Almos Budapest, Hungary Just an additional note that this problem also affects my CM-588 single board computer. Its based on the Geode 5530 Chipset. Thanks for all your hard work everyone, -Hugh c3boot.iso #1 worked for my Syntax S635MP motherboard with an Integrated VIA C3 Samuel2. I had to download the kernel rpm since I failed to mount a separate CD (user error no doubt) containing the kernel. Seems to work fine other than X seeming slow so far. If you have via graphics, look in xorg.conf and use the "via" driver rather than "vesa". This driver will be much faster -- but Xorg's via driver is a bit out of date and it does have stability problems sometimes. I don't remember the URL for the latest driver. *** Bug 124385 has been marked as a duplicate of this bug. *** *** Bug 118255 has been marked as a duplicate of this bug. *** Andy, 3pre1 looks mostly good :) > - what is the path that the CD is mounted at during the install action? it's /mnt/source/ in FC2. > - is it true that ctrl-alt-F2 will get you to a bash prompt? yes. > - is it true that the installer is back on virtual console 1 only in a text install. In a graphical install you need to Alt-F7. > - is it true that the installer waits at the end to allow you to do the RPM in another vc the installer waits at the end to do a reboot - so that you can take out the installation CD - otherwise it would boot into the installer again, instead of the HD. You still have a shell prompt on vc2 at this stage IIRC. I've upgraded from FC1 to FC2, got the new kernel in, however, my machine bombs out at the point it will fsck the root lvm volume. The kernel finds the rootvg, however, it is not able to mount it readwrite, or fsck it. Has anyone run in to this? Is this an LVM problem? I can still boot under a 2.4 kernel without any issue. I have got nearly all the way there... But every time I try and rpm -Uvh the kernel off the via boot disk, it moans about missing dependancies. This makes no sense as I actually did a full install.. I've tried twice now and each time it has failed :( ... A. Well personally I would hit it with --nodeps and --force on the rpm commandline too. That will install the thing regardless. But what where the missing deps? You need to do this before the RPM command: chroot /mnt/sysimage then the RPM command will work fine. works for me :) Ok slight problem.. I've now rebooted.. Do I have to go thru the whole install again? I have rebooted with the VIA disc, then inserted the FC2 disk1 at the relevant point.... Sorry.. I should know this... A. Ahh.. Sorry .. I see. Wait until you've selected the fact that it's already installed, choose that then prior to making a GRUB selection ALT+F2 do the Chroot and then force the CD to unmount/eject then remount the VIA boot CD... Currently RPMing the kernel.. :) Hope it works <g> A. Yes, thanks for all the work on getting this fixed. My processor (details below) now works fine with this patched kernel. processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 7 model name : VIA Samuel 2 stepping : 3 cpu MHz : 401.175 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge mmx 3dnow bogomips : 793.67 Using c3boot-2.iso for the kernel and initrd file on a VIA EPIA 800 motherboard I getserveral errors When viewed with ALT-F4 I see messages like <3>via_rhine: version magic '2.6.5-1.358 586 REGPARAM 4KSTACKS gcc-3.3' should be '2.6.5-1.358 586 REGPARM 4KSTACKS gcc 3.4' When viewed with ALT-F3 is see messages like * failed to insert /tmp/uhci-hcd.ko As a result I do not have access to any devices Ignore my last posting - it may be user error on my part (I am DHCP booting the system since I have no floppy. I have a USB dvd. but can't boot USB dvd so I DHCP boot pointing to the c3boot-2 ISO image. I think I had more then one iso mounted on the same path now booting works) Version 3 of the FC2 install sequence for C3s 1) Download http://people.redhat.com/arjanv/c3boot-2.iso 2) burn it on to a CD as a CD image 3) boot off it 4) at the first menu swap in the normal FC2 CD1 and install FC2 5) on install completion, where it says to remove the CD and it will reboot, type ctrl-alt-F2 to get to a console 6) type eject 7) stick back in the CD you booted from 8) (replace hdc with your cdrom device if different) mount /dev/hdc /mnt/source -t iso9660 9) chroot /mnt/sysimage [NOT SURE IF THIS IS NEEDED] 10) rpm -Uvf /mnt/source/kernel* 11) type ctrl-alt-F1 (for text mode install) or crtl-alt-F7 (graphical install) and complete the install 12) You should reboot into a working kernel BTW With Arjan's 383 my C3 board is back to its old reliable behaviour, nearly 4 day uptime already when -327 would freeze within 48hrs. JOB WELL DONE EVERYBODY! Hi! Thanks for all your Hints, but my problem is, that there is no /dev/hdc in my system. I can't 'find' any cdrom device in my system. A log said Method hdc://mnt/cdrom My board (i don't know the real Nmae) is the smallest VIA EPIA fanless with 566 MHz. Any further Hints, what to do? Regards Carsten Hi ! I also had problems to find how to mount cdrom. To be able to mount it, i had to chroot FIRST. So i think that you should do that: 1) Download http://people.redhat.com/arjanv/c3boot-2.iso 2) burn it on to a CD as a CD image 3) boot off it 4) at the first menu swap in the normal FC2 CD1 and install FC2 5) on install completion, where it says to remove the CD and it will reboot, type ctrl-alt-F2 to get to a console 6) chroot /mnt/sysimage 7) stick back in the CD you booted from 8) (replace hdc with your cdrom device if different) mount /dev/hdc /mnt/source -t iso9660 9) rpm -Uvf /mnt/source/kernel* 10) type ctrl-alt-F1 (for text mode install) or crtl-alt-F7 (graphical install) and complete the install 11) You should reboot into a working kernel Regards, Gerald *** Bug 125139 has been marked as a duplicate of this bug. *** I have the same problems like Carsten (#168). Even with the changes from Gerald (#169) I can't find my cdrom. I do have a hdc (hdc1 to hdc32) but I can't mount anything (error-message: could not find /dev/hdc in /dev/mstab or /dev/fstab. What have I done wrong? hda is "master device attached to primary IDE channel". hdb is "slave device attached to primary IDE channel". hdc is "master device attached to secondary IDE channel". hdd is "slave device attached to secondary IDE channel". Depending on the slot you attached your cdrom to, you have to adapt the script to /dev/hdb or /dev/hdc/ or /dev/hdd. check out the output of "dmesg". You should find which device your cdrom is attached to. E.g. : ... hdc: CD-224E, ATAPI CD/DVD-ROM drive ... Thx Arnaud for your answer, but ... 1. I knew that my CD-ROM is attached to hdc 2. I still have the problem Hi Christian, After doing the things like Gerald suggested #169 it works fine for me. Did you get other errors? Maybe a unreadable CD? Regards Carsten I'd be interested in hearing back some stability reports on use with these C3 CPU's (Samuel 2/Ezra). I've had two occasions of serious weird bugs with software crashing from back in my days with Antefacto and more over a year ago when testing some IPC units. I always put this down to rogue cmov instructions in packages that should have been i386 though. I experienced this FC2 install CD kernel bug myself last week when testing an Ezra unit. Andy, now your mail server is up and running again, how's the stability. I'd emaill you off bugzilla but don't know you email address (and can't see how to get it off bugzilla). Cheers for fixing this issue. I'll get the boot cd and give that Ezra another spin tomorrow. I followed the procedure and installed FC2 on my VIA EPIA 5000 system. (bug 125139). Thanks to all for finding this so fast and giving me a fix. Emails are right there at the top of each post, Glen (andy in my case). Right this second: 12:44:10 up 7 days, 3:36, 2 users, load average: 0.38, 0.47, 0.45 It has frozen once since the new kernel, I don't know why or how to debug that so I just rebooted. Before the recent FC2 kernels, the box had been up for months without a reset. However it is a bit of a sealed unit codewise, it just runs a small set of apps and that's it. If you are having random app crashes on a machine without a stable history, I would be thinking about the RAM. Hi, a new kernel is arrived: kernel-2.6.6-1.427.i586.rpm is this kernel ready to use in the C3 CPU Boards, or do we need another user compiled one? Any experiences? Regards, Carsten I ran: yum update and got kernel-2.6.6-1.435. It works for me! (see my previous comment for my config) kernel-2.6.6-1.435 (i586) works like a champ for me as well, on an EPIA ME6000 board. Many thanks to Andy and Ingo and Arjan for the troubleshooting and fix and workaround ISO image. It was great to see, when trying to install this sucker last night, that the problem had already been tracked down and fixed. I followed the workaround in comment 169, except that I didn't have a /mnt/source directory and so mounted to /mnt instead. Hello - I am using the workaround found in Comment 169 to install FC2 on a Compaq Presario laptop with a Cyrix chip (yes its a little old); I experienced the instant reboot problem as well. When I press [ENTER] at the boot: prompt using the c3-boot-2 CD everything appears fine (uncompressing the kernel and no reboot issues) until a series of "hdc: lost interrupt" errors that functionally prevent installation. I've tried booting with "nodma", "pci=noacpi", "hdc=nodma"; Any suggestions on how to get around this? If I've posted to the wrong forum please let me know - Thanks in advance! Hi - Please ignore my previous post; for some reason IRQ 14 was being assigned to both ide0 and ide1; once I passed the following to the kernel: ide1=0x170,0x376,15 at the boot prompt installation proceeds normally. Thanks for solving the Cyrix install issue! anyone know how to make this work with the dvd install???? The workarounds described in this bug report do not work with the following setup: Motherboard : EPIA-V10000A CPU : VIA C3 1GHz Memory : 256MB ( 248M + 8M shared) When trying the workaround either with c3boot.iso or c3boot-2.iso, boot up is successful, but when asked for the installation medium (choosing any of the options yeilds the same result) I get a blue screen with the following message: install exited abnormally -- received signal 11 sending termination signals...done sending kill signals...done <Tab>/<Alt-Tab> between elements | <Space> seldisabling swap... screen unmounting filesystems... /proc/bus/usb done /proc done /dev/pts done /sys done /tmp/rawfs done you may safely reboot your system Any help with this would be greatly appreciated, I've tried 4 different distributions trying in desperation to get any flavour of Linux up and running on this EPIA board without any success so far. Cheers, L Les, Is this with the Graphical or Text install? What is the last message given prior to that which you have already supplied, any mention of Anaconda? Also what version of Fedora/RHEL are you using? This bug was originally posted again FC2 I think, are you now using FC3? Hi Lee - I'm using an official RH pressed version of FC2 DVD (navy blue background with white print). What I described was what happens when I try to use the graphical install, but strangely enough it doesn't actually result in giving me a graphical install but instead shows ncurses style text install :( What's even worse is that if I try the text mode install then I don't even get past the boot stage. The last thing that I see when I do a text install is: ------------ Greetings. anaconda installer init version 10.0 starting mounting /proc filesystem... done moungint /dev/pts (unix98 pty) filesystem... done mounting /sys filesystem... done trying to remount root filesystem read write... done mounting /tmp as ramfs... done running install... running /sbin/loader install exited abnormally -- received signal 11 sending termination signals... done sending kill signals...done disabling swap unmounting filesystems... /proc/bus/usb done /proc done /dev/pts/done /sys done tmp/ramfs done you man safely reboot your system ------------ When I do a graphical mode install as I'd described in my first post yes I do see the "anaconda installer init version 10.0 starting" just before it goes into the blue screen text mode installer. I select the "English" option when prompted at the first install item "Choose a Language", then choose "us" at the "Keyboard Type" question then if I select any of the options "Local CDROM, Hard drive, NFS image, FTP, HTTP" the installer immediately dies displaying nothing but a blue screen (white text) with the text that I'd mentioned in my earlier post but unlike the text in this post that was formated against the left hand side of the screen nicely, the text is kinda all over the screen with different levels of tabbing. Any help with this would be greatly appreciated, so far I've tried RH FC2, SuSE Professional 9.2, Debian woody, Ubuntu, Minislack and Mandrake and not been able to actually finish an installation :( For what its worth, I've also downloaded the FC2 CD iso's last night too. I was going to put my money on the fact that I wouldn't make a blind bit of difference seeing as the which ever of the install mediums I selected it would always result in the "install exited abnormally" shame spiral. True to form, this problem that I have exhibits the same failed behaviour regardless of DVD or CD install media :( From what I can gather, the problem seems to be that the C3 Ezra CPU that is on the motherboard doesn't support the "cmov" instruction that other i686 CPUs do. Unfortunately when GCC is in i686 mode it uses the cmov instruction which isn't strictly speaking part of the i686 instruction set. Apperently there are patches available for both GCC and for the kernel to get around the problem, but then I'm scared that I'm going to have to recompile everything in FC, which just sounds insane :( Looks like this was down to a faulty EPIA V10000A motherboard, should be getting a replacement sent through the mail in the next few days and will report back when that gets here. Is that issue (... really ...) solved now ? ... I mean without downgrading to any old kernel ? My RHEL4 (which has 2.6.9) has the same problems on EPIA V8000 ... so doesn´t seem fixed, right ? I found many sites referencing to that bug number/list here but is there any solution for new kernels (>= 2.6.8) ??? RHEL4 does not support Via C3 processors. (at least not the ones without the cmov instruction, which some newer via cpus have) Ok, just to get it straight ... its not nessesarrily only a kernel problem but also a problem of the "native target architecture" of the distribution, right ? I started RHEL4 with a 2.6.8_i586 kernel and it does NOT reboot after some seconds, it just hangs at "switching to new root" which may be the point where the first "not i586_kernel stuff" or the mentioned cmov comes, right ? Can it broken down to "if your distibution supports i586" then it should work with a C3 processor (and of course with a i586 kernel) ? Is there anywhere a list or something of C3s that support cmov ?? Can anyone advise us on how to get a Via C3 through a diskless RH install? We have got one of the teeny cappuccinopc boxes with a Via cpu, trying to boot it off an install CD image via dhcp/pxe/tftp. we get as far as uncompressing vmlinuz and Bang, off into infinite reboot land. do we need a whole new set of cd images? do we have to hand craft them or can we download a "safe" version from rh? we are not used to dealing with anything outside the mainstream of intel cpus so this is new territory, we are newbies and could do with some advice to help reduce time wasted trying stuff that doesn't work :*( |