Bug 120685

Summary:

(C3) Via C3 reboots immediately on load of kernel

Product:

[Fedora] Fedora

Reporter:

Paul Coleman <pdcoleman>

Component:

kernel

Assignee:

Dave Jones <davej>

Status:

CLOSED ERRATA

QA Contact:

Severity:

medium

Docs Contact:

Priority:

medium

Version:

rawhide

CC:

andy, bbooth, carsten, cdelasaux, cdhiller, cpjunk, earlt, erich, glen, g.mansfield, jms87, jvanveelen, klgage, k_paulsen, lee.wilson, mingo, mulix, paul.morgan, pfrields, rschaal_95135, steve, trickreed

Target Milestone:

---

Target Release:

---

Hardware:

i586

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2004-07-12 20:57:16 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
very early printk code	none
dump of paging tables	none
check_pagetables()	none
please try to unapply it from your tree - it will likely not succeed though.	none
test-patch	none
dump of paging tables degenerating into garbage	none
fix	none

Description Paul Coleman 2004-04-12 23:08:04 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040312

Description of problem:
On bootup right after decompressing the kernel the system
resets/reboots. System is an Epia V with Via C3 processor.

Version-Release number of selected component (if applicable):
kernel-2.6.5-1.315

How reproducible:
Always

Steps to Reproduce:
1.reset system
2.choose kernel-2.6.5-1.315 to boot
3.
    

Actual Results:  System resets

Expected Results:  kernel load

Additional info:

Comment 1 Paul Coleman 2004-04-13 14:58:16 UTC

kernel-2.6.5-1.319 does the same

Comment 2 Paul Coleman 2004-04-14 19:00:45 UTC

kernel-2.6.5-1.322 does the same
kernel-2.6.5-1.309 was the to boot Via C3

Comment 3 Andy Green 2004-04-15 18:20:32 UTC

"Me too" on an EPIA-M 600MHz Via-C3 based board. 
Reboots right after "Uncompressing Linux" on both .322 *i586* (the 
i586 guys used to work okay) and the FC2 Test 2 Install Kernel (!).

Comment 4 Dave Jones 2004-04-16 10:31:40 UTC

can you cat /proc/cpuinfo from a kernel that works please ?
I'm trying to reproduce it here, but it looks like it might only
affect certain C3s, as the latest kernel works just fine on the ones
I've tried so far.

Comment 5 Andy Green 2004-04-16 10:49:26 UTC

processor       : 0 
vendor_id       : CentaurHauls 
cpu family      : 6 
model           : 7 
model name      : VIA Samuel 2 
stepping        : 3 
cpu MHz         : 599.725 
cache size      : 64 KB 
fdiv_bug        : no 
hlt_bug         : no 
f00f_bug        : no 
coma_bug        : no 
fpu             : yes 
fpu_exception   : yes 
cpuid level     : 1 
wp              : yes 
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow 
bogomips        : 1196.03

Comment 6 Dave Jones 2004-04-16 12:16:58 UTC

boots for me with a samuel2 too.
You are using the 586 kernel right ?

Comment 7 Andy Green 2004-04-16 12:50:33 UTC

Yum installed it, here is the package from the yum cache: 
 
[root@backup root]# ll /var/cache/yum/development/packages/kern* 
-rw-r--r--  1 root root 14670525 Apr 15 
17:30 /var/cache/yum/development/packages/kernel-2.6.5-1.322.i586.rpm 
-rw-r--r--  1 root root   391711 Apr 15 
17:40 /var/cache/yum/development/packages/kernel-utils-2.4-9.1.127.i386.rpm 
[root@backup root]# rpm -q kernel 
kernel-2.4.22-1.2061.nptl 
kernel-2.4.22-1.2179.nptl 
kernel-2.6.5-1.322 
 
I don't know a more direct way to show that it is an i586 image, but 
I don't think any other package was downloaded. 
 
I updated this image from FC1 through to FC2 development by using 
yum and some "by hand" rpm installs.  I'm wondering if the stuff 
necessary to make a good initrd was present when the kernel package 
was installed.  (The matching initrd is present in /boot and at 179K 
is about the right size).  Yum reports that the package set is now 
up to date (except libselinux which depends on a not yet released 
glibc).  I will remove and reinstall the same kernel package and see 
if that makes any difference.

Comment 8 Andy Green 2004-04-16 13:05:03 UTC

No, it is the same behaviour after erasing and reinstalling the .322 
i586 kernel package. 
 
... 
Uncompressing Linux .... Okay, booting the kernel 
<reboot> 
 
In any event, exactly the same thing happens with the FC2 Test2 
install kernel/initrd. 
 
I'm quite willing to believe there is something pathalogical about 
the motherboard/chipset/CPU, but it is a bare, unmodified EPIA-M 
600MHz fanless, 256MB DIMM, running the current BIOS (but the same 
happened with its original BIOS from Dec 2002).  It has been working 
great, on 24/7 under our TV serving video, for a year or so, no 
flakiness.

Comment 9 Paul Coleman 2004-04-16 13:19:19 UTC

processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 7
model name      : VIA Ezra
stepping        : 8
cpu MHz         : 800.264
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips        : 1585.15

kernel 315,319,322 all rebooted on initial load. I got 326 out of
people and it boots but it usb is borked so I am still using 309 to
have a functional system.

Comment 10 Andy Green 2004-04-16 14:00:12 UTC

Installed Arjan's .326 i586 version, got this error during install:  
  
device-mapper: ioctl interface mismatch: kernel(1.0.3), user(4.0.0),  
cmd(0)  
  
But like pdcoleman says, it boots!  
  
USB didn't come up, but that may have been because modprobe.conf was  
trying to pull in the old name usb-uhci.  I modprobed uhci-hcd by  
hand (via ssh) and my keyboard came up.  
  
Anyway, big news is Arjan's .326 boots.

Comment 11 Paul Coleman 2004-04-16 15:18:17 UTC

This C3 bug actually arrived with the introduction of fc2t1 (2.6 and
new gcc) and seems to be a hit or miss situation as far as getting a
kernel that boots. Andy's hardware is different than mine
(northbridge-ple266 vs ple133), C3 ver, bios) so I don't think its
that. There may be some sort of address alignment/location problem of
a critical section specific to the C3. Just an uneducated guess.

Comment 12 Paul Coleman 2004-04-21 18:21:01 UTC

.326 and .327 boot normally
.332 kernel panics.. kill idle at kernel start (sorry no backtrace)

Comment 13 Andy Green 2004-04-21 18:32:06 UTC

Yeppers -- just rebooted the via machine after yum update last 
night, booting into .332 a series of panics? scroll up quickly, last 
one is something to do with some kernel mount routine and contains a 
stack dump of four or five named kernel routines+IP offsets.  Then 
"Attempted to kill init", syonara.   
 
.327 boots and works fine, except USB doesn't seem to recognize any 
devices despite modprobing the hcd by hand. 
 
Will try to note panic details down later, kids watching stuff on it 
at the moment.

Comment 14 Andy Green 2004-04-21 19:33:36 UTC

Correction, Paul's kill *idle* was right, not init.  Here is what I 
copied down... due to the TV being the output device, e and c might 
be conflated. 
 
The error happens very early in the Kernel boot, after a page or so 
of output.  There is a scrolling spew of these errors but I can only 
copy the last one.  (Possibly since it is the idle task being killed 
maybe all the other processes were being killed in the spew).  I 
truncated leading zeros after the first few. 
 
EIP: 0060: {<c01b7873>} Not tainted 
EFLAGS: 00010002 (2.6.5-1.332) 
EIP is at avc_lookup:0x53/0x9a 
eax 50  ebx 3  ecx 5  edx 6b6b6b6b 
esi 1  edi 2  ebp c0387eb4  esp c0387eb0 
ds 7b  es  7b  ss 68 
Process swapper 0 5 246 5 c0387edc 0 c01be892 1 
 c0387edc 3 1 0 0 0 0 0 
 0 0 0 0 0 0 0 0 
 
Call Trace: 
c01b8e92 avc_has_perm_noaudit+0x10d/0x48a 
c01b9233 avc_has_perm+0x24/0x49 
c01baa2a superblock_has_perm+0x24/0xe9 
c01bbcd1 selinux_sb_kern_mount+0x3e/0x49 
c01970f2 proc_get_sb+0xe/0x10 
c0166b5a do_kern_mount+0xa0/0x124 
c0166beb kern_mount+0xd/0xf 
c0393a22 proc_root_init+0x29/0xcf 
c038867d start_kernel+0x1f3/0x21b 
Code: 3b 32 75 f4 66 3b 4a 08 75 ee 3b 5a 04 75 e9 85 d2 74 23 85 
<0> kernel panic: attempted to kill the idle task 
in idle task - not sycning

Comment 15 Andy Green 2004-05-04 22:11:14 UTC

Kernel 2.6.5-1.349 i586 is back to rebooting spontaneously just 
after Uncompressing Linux... 
 
I can use 327, but with this I have stability probs with the 
motherboard, after 48hrs or so it stops responding on the network.  
I did not have a chance to see what it is doing on its display so 
far, I had to reboot it quickly as it is my mailserver.

Comment 16 Paul Coleman 2004-05-12 03:22:50 UTC

We are approching fc2 and the last 2 kernels (356 &358) do not boot on
a via c3. Is this the right forum to discuss this problem or does it
need attention upstream. This has been an issue since fc1.

Comment 17 Jeremy Van Veelen 2004-05-18 20:03:31 UTC

I've attempted a fresh install of fc2 final on my Via C3 Ezra system,
and it reboots itself as well.  Does it have something to do with the
optimization flags that the kernel was compiled with?  See
http://www.epiawiki.org/wiki/tiki-index.php?page=EpiaInstallingGentoo.
 I doubt that a i686 kernel will boot anything less that a Via Nehemiah.  

Is there a i586 or i386 kernel that can be used to boot the installer
with?

Comment 18 Dave Jones 2004-05-19 10:55:47 UTC

the 686 kernel isn't entering the picture at all here.
read the comments above, they're all from 586 kernels.

Comment 19 Jeremy Van Veelen 2004-05-19 13:20:03 UTC

Ok just just upgraded my system from FC1 to FC2 via yum upgrade. 
Here's what kernels have kernels I have installed and are working:
kernel-2.4.22-1.2115.nptl
kernel-2.4.22-1.2135.nptl
kernel-2.4.22-1.2149.nptl
kernel-2.4.22-1.2163.nptl
kernel-2.4.22-1.2166.nptl
kernel-2.4.22-1.2174.nptl
kernel-2.4.22-1.2188.nptl
kernel-2.6.5-1.358

But the kernel on FC2 Disk1, whatever it is causes my system to reboot
endlessly.  Any other info I can provide?

Comment 20 Jeremy Van Veelen 2004-05-19 15:23:17 UTC

processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 8
model name      : VIA C3 Ezra
stepping        : 9
cpu MHz         : 1002.294
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips        : 1998.84

# uname -a
Linux gollum.techgooroo.net 2.4.22-1.2188.nptl #1 Wed Apr 21 20:10:59
EDT 2004 i686 i686 i386 GNU/Linux

Comment 21 Steve Springett 2004-05-19 20:52:58 UTC

Me Too
Trying to perform a clean install of FC2.
933MHz VIA C3 Ezra

Comment 22 Paul Morgan 2004-05-19 21:59:46 UTC

similar experience on new epia 800 system, 512 MB memory

Comment 23 Edward Almos 2004-05-20 16:12:18 UTC

This issue could be a duplicate of issue 121819. Others who are using
ASUS P4 800 motherboards also are having reboot problems on install.

This appears to be a major problem that has gone as far as Fedora Core
Two Final - Why ?

Ed Almos
Budapest, Hungary

Comment 24 Michael Koch 2004-05-20 16:13:42 UTC

similar experience on my epia M 1000 system - tried various memory 
configurations of 256 and 512 MB always with same result

Comment 25 Brian Booth 2004-05-20 17:00:17 UTC

I downloaded the kernel source rpm to play around with and this bug
seems to be happening because CONFIG_M686 is set in the
kernel-2.6.5-i586-config file. This is causing the kernel in the i586
rpm to be build with some i686 specific code.

I was able to get a bootable kernel on my VIA C3 Ezra (933Mhz) by
commenting out the CONFIG_M686 line in that config file and rebuilding
the rpm.

Comment 26 Brian Booth 2004-05-20 17:03:22 UTC

hmm, now it's stalling on startup. Forget my last post.

Comment 27 Andy Green 2004-05-20 20:29:07 UTC

Great news that someone at RH is able to see a failing board. 
 
FWIW this evening I sat down with the -385 kernel and screwed with  
all the BIOS settings I could find, removed all the USB  
peripherals / USB keyboard support, reset the BIOS settings to  
'safe', etc, etc, no change.  Then I appended all kinds of noacpi,  
pci=bios, pci=off, nousb, nomce etc, etc, no difference.  Always  
reboots reliably after Uncompressing Linux.  
  
Maybe worth noting -- this reboot is 100% reliable, it is not the  
case that it can boot okay after 20 tries or something.  Because it  
loops from the reboot, it keeps trying, I have left it for 15  
minutes or more and there was no successful boot.  So if you ever  
got a kernel to boot even once then that is new behaviour.  
  
I'm downloading the kernel source for 358 and will try to compile it  
tonight and look at moving hang loops around its init tomorrow, if  
there is no joy in the meantime.

Comment 28 Barry K. Nathan 2004-05-20 20:38:17 UTC

Re: comments 25 and 26 (Brian Booth)

At what point in startup is it stalling for you? And this is a stab in
the dark, but does adding "vdso=0" to the kernel boot command line
help matters?

Comment 29 Andy Green 2004-05-20 20:53:27 UTC

(vdso=0 makes no difference here on -358)

Comment 30 Barry K. Nathan 2004-05-20 21:08:22 UTC

I would expect vdso=0 to be much more likely to help if the kernel is
stalling on startup, as opposed to instantly rebooting. (Whether it
has any chance of helping depends on where in the startup process it's
stalling, though.)

Comment 31 Dave Jones 2004-05-21 00:13:10 UTC

*** Bug 123843 has been marked as a duplicate of this bug. ***

Comment 32 Lucas Maneos 2004-05-21 05:22:30 UTC

Identical problem here, Via ME6000 board, /proc/cpuinfo contents:

processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 7
model name      : VIA Samuel 2
stepping        : 3
cpu MHz         : 599.721
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips        : 1196.03

Comment 33 Lucas Maneos 2004-05-21 05:36:05 UTC

Compiled a new kernel from the SRPM on another machine - diff from the
default i586 config:

--- configs/kernel-2.6.5-i586.config    2004-05-08 13:56:48.000000000
+0100
+++ .config     2004-05-21 02:07:17.000000000 +0100
@@ -61,7 +61,7 @@
 # CONFIG_X86_ES7000 is not set
 # CONFIG_M386 is not set
 # CONFIG_M486 is not set
-CONFIG_M586=y
+# CONFIG_M586 is not set
 # CONFIG_M586TSC is not set
 # CONFIG_M586MMX is not set
 # CONFIG_M686 is not set
@@ -76,21 +76,21 @@
 # CONFIG_MWINCHIPC6 is not set
 # CONFIG_MWINCHIP2 is not set
 # CONFIG_MWINCHIP3D is not set
-# CONFIG_MCYRIXIII is not set
+CONFIG_MCYRIXIII=y
 # CONFIG_MVIAC3_2 is not set
 CONFIG_X86_GENERIC=y
 CONFIG_X86_CMPXCHG=y
 CONFIG_X86_XADD=y
 CONFIG_X86_L1_CACHE_SHIFT=7
 CONFIG_RWSEM_XCHGADD_ALGORITHM=y
-CONFIG_X86_PPRO_FENCE=y
-CONFIG_X86_F00F_BUG=y
 CONFIG_X86_WP_WORKS_OK=y
 CONFIG_X86_INVLPG=y
 CONFIG_X86_BSWAP=y
 CONFIG_X86_POPAD_OK=y
 CONFIG_X86_ALIGNMENT_16=y
 CONFIG_X86_INTEL_USERCOPY=y
+CONFIG_X86_USE_PPRO_CHECKSUM=y
+CONFIG_X86_USE_3DNOW=y
 # CONFIG_X86_4G is not set
 # CONFIG_X86_SWITCH_PAGETABLES is not set
 # CONFIG_X86_4G_VM_LAYOUT is not set
@@ -101,6 +101,7 @@
 # CONFIG_SMP is not set
 # CONFIG_PREEMPT is not set
 # CONFIG_X86_UP_APIC is not set
+CONFIG_X86_TSC=y
 CONFIG_X86_MCE=y
 # CONFIG_X86_MCE_NONFATAL is not set
 CONFIG_TOSHIBA=m
@@ -2317,7 +2318,7 @@
 # CONFIG_DEBUG_SPINLOCK is not set
 # CONFIG_DEBUG_PAGEALLOC is not set
 CONFIG_DEBUG_HIGHMEM=y
-CONFIG_DEBUG_INFO=y
+# CONFIG_DEBUG_INFO is not set
 CONFIG_DEBUG_SPINLOCK_SLEEP=y
 # CONFIG_FRAME_POINTER is not set

This kernel boots fine over PXE, and the installation dialogs start,
now I need to figure out how to build modules.cgz in the initrd so
that the modules will actually load.

Comment 34 Andy Green 2004-05-21 08:56:08 UTC

I compiled the -358 kernel source without changes first, and 
confirmed that when I boot with the 358-custom kernel, I get the 
same reboot behaviour, which is a very good start.  I am compiling 
with the current development/FC2 gcc-3.3.3-7 version. 
 
Now I added a for(;;) ; at the top of init/main.c start_kernel(), 
and it hung instead of rebooting.  So I am going to move it around a 
bit and report what happens. 
 
My main worry is that merely changing the layout of the binary by 
inserting the loop is what is changing the behaviour, ie, I may find 
that there is no place to put the loop that gets me the reboot 
behaviour back again.

Comment 35 Andy Green 2004-05-21 09:24:23 UTC

Cool!  I stuck the loop at the end of this section, and got a 
reboot.  One of these does the badness. 
 
 
        lock_kernel(); 
        page_address_init(); 
        printk(linux_banner); 
        setup_arch(&command_line); 
        setup_per_cpu_areas(); 
 
        /* 
         * Mark the boot cpu "online" so that it can call console 
drivers in 
         * printk() and can access its per-cpu storage. 
         */ 
        smp_prepare_boot_cpu(); 
 
        build_all_zonelists(); 
        page_alloc_init(); 
        printk("Kernel command line: %s\n", saved_command_line); 
 
Going to stick the loop after setup_per_cpu_areas(); next

Comment 36 Andy Green 2004-05-21 10:02:11 UTC

After a slight pause while a 250G filesystem was checked, I can 
report we are down to: 
 
        setup_arch(&command_line);  
        setup_per_cpu_areas();  
 
Place your bets, ladies and gentlemen!

Comment 37 Andy Green 2004-05-21 10:33:18 UTC

The winner is... 
 
./init/main.c: 
        setup_arch(&command_line);  
 
I moved the loop into ./arch/i386/kernel/setup.c setup_arch(), 
(removing it from main.c to limit the footprint changes).  The 
reboot action is bracketed in here somewhere: 
 
./arch/i386/kernel/setup.c: 
 	ROOT_DEV = old_decode_dev(ORIG_ROOT_DEV); 
 	drive_info = DRIVE_INFO; 
 	screen_info = SCREEN_INFO; 
	edid_info = EDID_INFO; 
	apm_info.bios = APM_BIOS_INFO; 
	ist_info = IST_INFO; 
	saved_videomode = VIDEO_MODE; 
	if( SYS_DESC_TABLE.length != 0 ) { 
		MCA_bus = SYS_DESC_TABLE.table[3] &0x2; 
		machine_id = SYS_DESC_TABLE.table[0]; 
		machine_submodel_id = SYS_DESC_TABLE.table[1]; 
		BIOS_revision = SYS_DESC_TABLE.table[2]; 
	} 
	aux_device_present = AUX_DEVICE_INFO; 
 
#ifdef CONFIG_BLK_DEV_RAM 
	rd_image_start = RAMDISK_FLAGS & RAMDISK_IMAGE_START_MASK; 
	rd_prompt = ((RAMDISK_FLAGS & RAMDISK_PROMPT_FLAG) != 0); 
	rd_doload = ((RAMDISK_FLAGS & RAMDISK_LOAD_FLAG) != 0); 
#endif 
	ARCH_SETUP 
	if (efi_enabled) 
		efi_init(); 
	else 
		setup_memory_region(); 
 
	copy_edd(); 
 
	if (!MOUNT_ROOT_RDONLY) 
		root_mountflags &= ~MS_RDONLY; 
	init_mm.start_code = (unsigned long) _text; 
	init_mm.end_code = (unsigned long) _etext; 
	init_mm.end_data = (unsigned long) _edata; 
	init_mm.brk = init_pg_tables_end + PAGE_OFFSET; 
 
	code_resource.start = virt_to_phys(_text); 
	code_resource.end = virt_to_phys(_etext)-1; 
	data_resource.start = virt_to_phys(_etext); 
	data_resource.end = virt_to_phys(_edata)-1; 
 
	parse_cmdline_early(cmdline_p); 
 
	max_low_pfn = setup_memory(); 
 
	/* 
	 * NOTE: before this point _nobody_ is allowed to allocate 
	 * any memory using the bootmem allocator. 
	 */ 
 
#ifdef CONFIG_SMP 
	smp_alloc_memory(); /* AP processor realmode stacks in low 
memory*/ 
#endif 
	paging_init(); 
 
#ifdef CONFIG_EARLY_PRINTK 
	{ 
		char *s = strstr(*cmdline_p, "earlyprintk="); 
		if (s) { 
			extern void setup_early_printk(char *); 
 
			setup_early_printk(s); 
			printk("early console enabled\n"); 
		} 
	} 
#endif 
 
Stop me if you have a guess!

Comment 38 Arjan van de Ven 2004-05-21 10:35:54 UTC

setup_memory()
looks the most likely candidate
(next to paging_init() )

Comment 39 Andy Green 2004-05-21 10:57:34 UTC

No, it seems to return from setup_memory(), at least, it hangs when 
the loop is placed after that.  The bad region is currently: 
 
./init/main.c:  
        setup_arch(&command_line);   
 
./arch/i386/kernel/setup.c:  
 
#ifdef CONFIG_SMP  
        smp_alloc_memory(); /* AP processor realmode stacks in low  
memory*/  
#endif  
        paging_init();  
  
#ifdef CONFIG_EARLY_PRINTK  
        {  
                char *s = strstr(*cmdline_p, "earlyprintk=");  
                if (s) {  
                        extern void setup_early_printk(char *);  
  
                        setup_early_printk(s);  
                        printk("early console enabled\n");  
                }  
        }  
#endif  
 
 
Next try is just before the printk stuff

Comment 40 Andy Green 2004-05-21 11:07:17 UTC

Since I assume CONFIG_SMP is undefined, our next winner is: 
 
./init/main.c:   
        setup_arch(&command_line);    
  
./arch/i386/kernel/setup.c:   
        paging_init();  
 
./arch/i386/mm/init.c: 
        pagetable_init(); 
 
        load_cr3(swapper_pg_dir); 
 
#ifdef CONFIG_X86_PAE 
        /* 
         * We will bail out later - printk doesn't work right now so 
         * the user would just see a hanging kernel. 
         */ 
        if (cpu_has_pae) 
                set_in_cr4(X86_CR4_PAE); 
#endif 
        __flush_tlb_all(); 
        /* 
         * Subtle. SMP is doing it's boot stuff late (because it has 
to 
         * fork idle threads) - but it also needs low mappings for 
the 
         * protected-mode entry to work. We zap these entries only 
after 
         * the WP-bit has been tested. 
         */ 
#ifndef CONFIG_SMP 
        zap_low_mappings(); 
#endif 
        kmap_init(); 
        zone_sizes_init();

Comment 41 Andy Green 2004-05-21 11:35:27 UTC

New winner! 
 
./init/main.c:    
        setup_arch(&command_line);     
   
./arch/i386/kernel/setup.c:    
        paging_init();   
  
./arch/i386/mm/init.c:  
        load_cr3(swapper_pg_dir);

Comment 42 Andy Green 2004-05-21 11:42:36 UTC

include/asm-i386/processor.h: 
#define load_cr3(pgdir) \ 
        asm volatile("movl %0,%%cr3": :"r" (__pa(pgdir))) 
 
Can't really go any further with the loop trick.  I can imagine: 
 
 - swapper_pg_dir is corrupt or wrongly computed 
 - CPU or cache reacts badly to or needs special environment when 
loading cr3 on via 
 - peepholer came and did evil 
 - where swapper_pg_dir points to is somehow diseased or electrified 
 
Please advise if I can make any further useful moves.

Comment 43 Arjan van de Ven 2004-05-21 11:49:42 UTC

Ok based on this, can you try adding
mem=nopentium
to the kernel commandline and see if that makes a difference ?

Comment 44 Andy Green 2004-05-21 11:56:59 UTC

Sorry, mem=nopentium does not make a difference either on the 
RH-compiled -358 or my modified one, instant reboot in both cases.

Comment 45 Ingo Molnar 2004-05-21 12:06:18 UTC

Could you uncomment these lines from arch/i386/mm/init.c:

        /* Enable PGE if available */
        if (cpu_has_pge) {
                set_in_cr4(X86_CR4_PGE);
                __PAGE_KERNEL |= _PAGE_GLOBAL;
        }

does this make any difference to the problem?

Comment 46 Andy Green 2004-05-21 12:09:49 UTC

The only place in that file mentioning PGE is this: 
 
                                /* Make it "global" too if supported 
*/ 
                                if (cpu_has_pge) { 
                                        set_in_cr4(X86_CR4_PGE); 
#if !defined(CONFIG_X86_SWITCH_PAGETABLES) 
                                        __pe += _PAGE_GLOBAL; 
                                        __PAGE_KERNEL |= 
_PAGE_GLOBAL; 
#endif 
                                } 
 
I do not find any commented section as you describe in the sources 
for 2.6.5-1.358

Comment 47 Ingo Molnar 2004-05-21 12:12:23 UTC

yeah - the cpu_has_pge branch - could you uncomment it?

Comment 48 Ingo Molnar 2004-05-21 12:13:28 UTC

or change it to:

     if (0) {

Comment 49 Andy Green 2004-05-21 12:18:14 UTC

Sorry Ingo, do you mean "comment" instead of "uncomment" then?  I 
will do this now.

Comment 50 Ingo Molnar 2004-05-21 12:19:12 UTC

another thing to try: replace the pagetable loading (load_cr3() line)
with __flush_tlb_global().

the cr3 doesnt have to be loaded - we already loaded swapper_pg_dir in
arch/i386/kernel/head.S. So it must be the flush somehow causing
trouble - we most likely somehow created pagetable contents that cause
the next instruction to triple-fault right away. This has to be some
really fubar situation though - all of the kernel's mapping have to go
away, including the GDT, TSS and IDT.

But it's all very weird.

Comment 51 Ingo Molnar 2004-05-21 12:20:21 UTC

yeah - comment it. Just make sure that code doesnt get run. (it's the
code that sets the PGE bit in the kernel mappings. This is on the
theory that perhaps the CPU has some weirdness with PGE handling.)

Comment 52 Ingo Molnar 2004-05-21 12:23:05 UTC

(i quoted the wrong code because FC2 has the 4:4 patch applied.)

Comment 53 Andy Green 2004-05-21 12:30:09 UTC

Commenting out that if { ... } block made no difference, it still 
reboots.  I have the for(;;) ; still waiting after the load_cr3() so 
it is not like it is getting any further. 
 
Now I will try the __flush_tld_global(); replacement.  I removed the 
commenting around the if { ... } block we tried.

Comment 54 Andy Green 2004-05-21 12:38:48 UTC

Still reboots.  Maybe that is interesting... could it be merely to 
do with the position of the code in memory?  Something crapping on 
the code or breaking the decompression?

Comment 55 Ingo Molnar 2004-05-21 12:45:57 UTC

If you add a __flush_tlb_global() _before_ the call to
pagetable_init() [in paging_init()], does that cause a reboot too?

I.e. are the pagetables already corrupt when we enter paging_init(),
or do they get corrupted during pagetable_init().

pagetable init goes on like this: we've got some pre-constructed
pagetables that are present in the kernel image when we boot - these
cover the first 8 MB of RAM. pagetable_init() extends the pagetable
setup to cover the whole RAM. It still redoes the whole pagetable
though, so if it's somehow messed up (or the CPU is confused), it
could corrupt the pagetable for this code.

the pagetable had to be correct at some earlier point, or we'd not be
executing this code ...

another (random) suggestion: do you see the same symptoms if you
remove one RAM module from the system? [my theory is that smaller RAM
will cause smaller initialization in pagetable_init(), and could
avert/impact this corruption problem.]

Comment 56 Andy Green 2004-05-21 12:59:50 UTC

Here is the situation in paging_init() at the moment, then: 
 
 
void __init paging_init(void) 
{ 
        __flush_tlb_global(); 
for(;;) ; 
 
        pagetable_init(); 
//loop hangs 
//      load_cr3(swapper_pg_dir); // !!!! rebooter 
 __flush_tlb_global(); 
 
for(;;) ; 
 
 
*** Result: a hang, NOT a reboot. 
 
 
I do have a dim idea of the pagetable stuff from the work I did on 
the Xbox Clean BIOS, I got paging (more importantly, segfaulting) 
working on that.  I also designed a hardware device that sat on the 
Xbox's LPC bus and memory mapped some SRAM and allowed debugging IO 
back to a PC, something I'm starting to wish we had on this guy. 
 
I only have one 256MB stick of RAM on this board. 
 
How do you normally get debug info out at this early stage?  This 
motherboard has a serial port, maybe it is possible to consider 
adding a loop to dump stuff, like the pagetable contents, by 
directly tickling the serial IO ports?

Comment 57 Ingo Molnar 2004-05-21 13:14:23 UTC

cool - it would be great if you could try to dump the (relevant)
pagetable contents prior pagetable_init() and after pagetable_init().

what 'relevant' is hard to tell, but to make it easier to compare,
could you run with mem=nopentium from now on? This will force 4K
paging and the 'relevant' pagetable contents should be identical prior
and after pagetable_init() - making comparison easier.

i'd wager that 'relevant' right now means the following 3 pages:
swapper_pg_dir [you guessed that], pg0 and pg1. You can access pg0 and
pg1 by declaring them like this:

 extern char pg0[4096];
 extern char pg1[4096];

pg0 and pg1 are the first two 'pte' pagetables, covering the linear
addresses of 0-4MB and 4MB-8MB. This linear range is also aliased to
3GB (via entries 768 and 769 in swapper_pg_dir) - this is where the
kernel executes in fact.

so swapper_pg_dir should have entry 0 and entry 1 set to pg0 and pg1,
and entry 768 and 769 set to pg0 and pg1 too. Neither of these 4
entries ought to change during pagetable_init(), nor should the
contents of pg0 and pg1 change. [this all is only true if
mem=nopentium is used.]

you can do printks this early over the serial console - activate
CONFIG_EARLY_PRINT in your .config and add the following boot command
line option:

   earlyprintk=serial,ttyS0,38400

after this point all kernel messages should go to the serial console.
I use this feature quite often, i typically use 'minicom' on another
Linux box and connect the two via a null modem cable.

Comment 58 Andy Green 2004-05-21 13:33:14 UTC

I will try to set this up, but it will take a while.  Last time I 
needed a serial cable was about ten years ago :-)

Comment 59 Brian Booth 2004-05-21 14:43:40 UTC

Re: Comment 28

> At what point in startup is it stalling for you? 

It stalls when trying to boot INIT. The last output I see is:

Mounting root filesystem
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 144k freed

> And this is a stab in the dark, but does adding "vdso=0" to the 
> kernel boot command line help matters?

Not in my case either.

I feel I should mention that I am not a kernel developer so my
knowledge is limited in this area. If there's any further help I can
give, let me know.

Comment 60 Arjan van de Ven 2004-05-21 14:45:43 UTC

Brian: Are you fully running a FC2 userspace ?
Also does apm=off on the commandline help ?

The "hangs at Freeing unused kernel memory" thing is a different bug
as far as I can see, so I would like to propose to open a separate
bugzilla for that, to not mix the 2 bugs up and confuse matters too much.

Comment 61 Brian Booth 2004-05-21 15:09:55 UTC

> Are you fully running a FC2 userspace ?
> Also does apm=off on the commandline help ?

No to both. 

> The "hangs at Freeing unused kernel memory" thing is a different bug
> as far as I can see

Well, it came up after removing CONFIG_M686 from the i586 kernel
config file in hopes of fixing this bug, so since it was a result of
fiddeling around with this bug, I'll avoid opening another bug report
for it. 

If I can be of any assistance, let me know.

Comment 62 Andy Green 2004-05-21 16:28:04 UTC

I have a working Null DIY frankencable tested with Minicom on both 
sides. 
 
I added a "Hello World" printk() before the spin for(). 
 
I added earlyprintk=serial,ttyS0,38400 on the grub commandline for 
my custom kernel. 
 
I verified that CONFIG_EARLY_PRINTK=y in .config (it was on by 
default) 
 
I don't get any messages on my terminal when it boots and hangs :-(

Comment 63 Andy Green 2004-05-21 16:34:31 UTC

Aha -- under the same circumstances with -327 kernel, which I am 
booting into to compile and so on, I DO get the early kernel 
messages on the serial terminal.  I'll move my hello world up a bit 
earlier then.

Comment 64 Andy Green 2004-05-21 16:51:13 UTC

Nope, I don't get any output from my custom -358 kernel, even with 
the printk() just before the call to paging_init() 
in ./arch/i386/kernel/setup.c. 
 
printk("HELLO WORLD"); 
 
is what I have... should I flush the printk buffer somehow before I 
enter the spinloop?  Or will printk() just not work until this stuff 
is right?

Comment 65 Andy Green 2004-05-21 22:55:57 UTC

I had a google around, I could not find a kernel function to flush 
the printk buffer. 
 
Can the data be easily issued at a lower, unbuffered level than 
printk?  If the earlyprintk commandline thing has been parsed 
already, the UART will be set up.  Maybe some code can sit there 
polling the UART status regs and poking some new data in when the 
old stuff is gone.

Comment 66 Ingo Molnar 2004-05-22 07:06:53 UTC

hm, i think the problem is that the UART has not been initialized yet.

in arch/i386/kernel/setup.c, there's this code:

        paging_init();
                                                                     
          
#ifdef CONFIG_EARLY_PRINTK
        {
                char *s = strstr(*cmdline_p, "earlyprintk=");
                if (s) {
                        extern void setup_early_printk(char *);
                                                                     
          
                        setup_early_printk(s);
                        printk("early console enabled\n");
                }
        }
#endif

could you try to move the paging_init() code to after this code?

I am not 100% certain that setup_early_printk() will work fine without
having the full pagetables, but it ought to.

If it doesnt work [i.e. setup_early_printk() crashes and you dont get
the 'early console enabled' message over the serial line], then
there's yet another way: you can trick the UART into being set up via
GRUB. Just enable the serial console in GRUB via something like this
in /etc/grub.conf:

    serial --unit=0 --speed=19200
    terminal --timeout=0 serial

(NOTE: the maximum speed of GRUB's UART driver is somewhere around
38400 - while the kernel can drive it at 115200 - so use the lower
speed for both.)

and after this point you can try the attached lowlevel code that
implements a simple printk based on UART output. (it hardcodes ttyS1
iirc.)

but lets hope the simpler method of reodering the initialization will
help too.

Comment 68 Ingo Molnar 2004-05-22 07:10:58 UTC

Created attachment 100447 [details]
very early printk code

very early printk code - it relies on the UART being set up via GRUB or LILO.
It's hardcoded to ttyS1.

Comment 69 Andy Green 2004-05-22 07:44:17 UTC

>hm, i think the problem is that the UART has not 
>been initialized yet.

Doh!!!  There it was right in front of me.

Okay, I moved the paging_init() call to after the printk init and now
I have some normal-looking output!  Good news!!

Linux version 2.6.5-1.358custom (root.ath.cx) (gcc version
3.3.3 20040412 (Red Hat Linux 3.3.3-7)) #22 Sat May 22 08:28:21 BST 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000eff0000 (usable)
 BIOS-e820: 000000000eff0000 - 000000000eff3000 (ACPI NVS)
 BIOS-e820: 000000000eff3000 - 000000000f000000 (ACPI data)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
239MB LOWMEM available.
early console enabled
Pre-paging_init() call HELLO WORLD

Now I will stick in some loops to issue the paging table data as %02X

Comment 70 Andy Green 2004-05-22 08:58:47 UTC

Unfortunately the sources aren't completely matching up with the
earlier (#57) advice about what to dump.

It seems there is a single array holding the paging data, pg0[], which
is defined at include/asm/pgtable.h:191 as being an unsigned long [].
 It has a comment by it suggesting that perhaps pg1 was merged into it
at some point.  At any rate at link time pg1[] is undefined so I can't
dump it.

What I am dumping at the moment then is 1024 %08lX from
swapper_pg_dir[] and 2048 %08lX from pg0[], both before the call to
paging_init() and just before the for(;;) ; loop

Here is the dump code so you can be certain of what you are getting:

	{
		extern unsigned long pg0[];
		unsigned long *pb=(unsigned long *)swapper_pg_dir;
		int n;
		printk("***DUMPING PAGING TABLES\n");
		printk("swapper_pg_dir = 0x%08X\n", (unsigned int)pb);
		for(n=0;n<1024;n++) { if(!(n&3)) printk("%04X: ", n); printk("%08lX
", (unsigned long)pb[n]); if((n&3)==3) printk("\n"); }
		printk("\n");
		printk("pg0 = 0x%08X\n", (unsigned int)&pg0[0]);
                for(n=0;n<2048;n++) { if(!(n&3)) printk("%04X: ", n);
printk("%08lX ", (unsigned long)pg0[n]); if((n&3)==3) printk("\n"); }
	}


There ARE some small differences in the pg0[] array before and after.

< 0354: 00354067 00355067 00356067 00357007
< 0358: 00358007 00359007 0035A007 0035B007
< 035C: 0035C007 0035D007 0035E007 0035F007
---
> 0354: 00354067 00355067 00356067 00357067
> 0358: 00358067 00359067 0035A067 0035B067
> 035C: 0035C067 0035D067 0035E067 0035F067

I will attach the full log now.

Comment 71 Andy Green 2004-05-22 08:59:59 UTC

Created attachment 100448 [details]
dump of paging tables

Comment 72 Andy Green 2004-05-22 09:44:50 UTC

Hold on, I had the second dump BEFORE the call to pagetable_init();

I am compiling this as the second dump now:

void __init paging_init(void)
{
        __flush_tlb_global();


        pagetable_init();

        printk("JUST BEFORE HANG LOOP\n");
        {
                extern unsigned long pg0[];
                unsigned long *pb=(unsigned long *)swapper_pg_dir;
                int n;
                printk("***DUMPING PAGING TABLES\n");
                printk("swapper_pg_dir = 0x%08X\n", (unsigned int)pb);
                for(n=0;n<1024;n++) { if(!(n&3)) printk("%04X: ", n);
printk("%08lX ", (unsigned long)pb[n]); if((n&3)==3) printk("\n"); }
                printk("\n");
                printk("pg0 = 0x%08X\n", (unsigned int)&pg0[0]);
                for(n=0;n<2048;n++) { if(!(n&3)) printk("%04X: ", n);
printk("%08lX ", (unsigned long)pg0[n]); if((n&3)==3) printk("\n"); }
        }


//      pagetable_init();

        for(;;) ;
//loop hangs
//      load_cr3(swapper_pg_dir); // !!!! rebooter

Comment 73 Andy Green 2004-05-22 09:59:58 UTC

That test resets before it can print the second dump.  The initial
__flush_tlb_global() was not in the original -358 source (it was added
as a test), commenting it out and trying again.

If it still resets that's a big clue that maybe pagetable_init() is
doing something to destroy the environment.

Comment 74 Andy Green 2004-05-22 10:09:55 UTC

Woohoo, forced a new behaviour out of it, we must be on the right track.

This time it prints only this from the second dump:

JUST BEFORE HANG LOOP
***DUMPING PAGING TABLES

and then HANGS ITSELF, we never saw that before.  For whatever reason
it can handle a simple printk() but not one with %08lX?  Or one that
touches swapper_pg_dir?

Going to move a copy of the dump block inside pagetable_init() and see
if we can probe out when the environment trashing action begins.

Comment 75 Ingo Molnar 2004-05-22 10:22:46 UTC

Yes - the hang you get in the dumper is a likely sign that the
pagetables are somehow corrupted/invalid. The dumping itself activates
more kernel code, so the TLBs get flushed 'naturally', then get
reloaded from the now-invalid pagetable entry - kaboom.

(you are doing all these runs with mem=nopentium, correct?)

One other method opposed to dumping would be to validate that the
entries pagetable_init() is creating match the previous content. This
code is at around line 217 in arch/i386/mm/init.c:

    *pte = mk_pte_phys(vaddr-start, PAGE_KERNEL);

could you add a sanity-check, something along the lines of:

    prev_val = pte->pte_low;
    *pte = mk_pte_phys(vaddr-start, PAGE_KERNEL);
    if ((vaddr <= 8*1024*1024) ||
          ((vaddr >= 3*1024*1024*1024) &&
                 (vaddr < 3*1024*1024*1024+8*1024*1024)))
         if (prev_val != pte->pte_low)
                printk("ouch! %08lx != %08lx for vaddr %08lx\n",     
         
                          prev_val, pte->pte_low, vaddr);

i.e. in the 0...8MB and 3GB...3GB+8MB virtual address ranges, check
that the new and the old values of the pte match.

(there are other places where the pagetable can get corrupted, but
this would be the most likely one.)

Comment 76 Ingo Molnar 2004-05-22 10:25:04 UTC

another method:

could you write a function that is a copy of setup_identity_mappings()
but does not actually modify the pagetables, only checks that the
already existing contents of the pagetable match the expected value.

then you can add calls to this function (lets call it
check_pagetables()) from every possible place in the pagetable code -
even from within setup_identity_mappings().

Comment 77 Ingo Molnar 2004-05-22 10:32:45 UTC

Created attachment 100450 [details]
check_pagetables()

i've attached a quick implementation of check_pagetables().

you should be able to call this function from any place in the kernel,
it iterates over these two 8 MB ranges and simply returns if everything
is OK. If it finds an illegal value then it prints the values and does
a BUG() [which is an assert to halts the kernel].

i havent tested this code yet, but it compiles.

Comment 78 Andy Green 2004-05-22 10:37:04 UTC

Okay, bizarro corrupt text from printk() problems start partway
through pagetable_init().

static void __init pagetable_init (void)
{
	unsigned long vaddr, end;
	pgd_t *pgd_base;
#ifdef CONFIG_X86_PAE
	int i;
#endif

	
// pagetable dump here OKAY
	
	/*
	 * This can be zero as well - no problem, in that case we exit
	 * the loops anyway due to the PTRS_PER_* conditions.
	 */
	end = (unsigned long)__va(max_low_pfn*PAGE_SIZE);

	pgd_base = swapper_pg_dir;
#ifdef CONFIG_X86_PAE
	/*
	 * It causes too many problems if there's no proper pmd set up
	 * for all 4 entries of the PGD - so we allocate all of them.
	 * PAE systems will not miss this extra 4-8K anyway ...
	 */
	for (i = 0; i < PTRS_PER_PGD; i++) {
		pmd_t *pmd = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE);
		set_pgd(pgd_base + i, __pgd(__pa(pmd) + 0x1));
	}
#endif

// pagetable dump here (test 1) OKAY

	/*
	 * Set up lowmem-sized identity mappings at PAGE_OFFSET:
	 */
	setup_identity_mappings(pgd_base, PAGE_OFFSET, end);

	/*
	 * Add flat-mode identity-mappings - SMP needs it when
	 * starting up on an AP from real-mode. (In the non-PAE
	 * case we already have these mappings through head.S.)
	 * All user-space mappings are explicitly cleared after
	 * SMP startup.
	 */
#if defined(CONFIG_SMP) && defined(CONFIG_X86_PAE)
	setup_identity_mappings(pgd_base, 0, 16*1024*1024);
#endif

// pagetable dump (test 2) here BROKEN



pagetable_init() test 2
***DUMPING PAGING TABLES
swapper_pg_dir = 0xC0347000
0000: 00391027 00000000 00000000 00000000
0004: 00000000 00000000 00000000 00000000
0008: 00000000 00000000 00000000 00000000
.....
00F8: 00000000 00000000 00000000
000^W^D^AÃ«,Ã¶^F^W^D^Ct^D.<8a>g^A<86>Ã<8b>^^^\^D<89><87>u^D<8b>^^<80>^D;^^^Dt^D<89>^^^\^DÃºÂ°
Ã¦ Ã»a^_Ã!2@3#4$5%6^7&8*9(0)-_=+^H^H  
RtTÃYuUiIoOpP[{]}^DÃ¿aAsSdDfFgGhHjJkKlL;:'"`~^BÃ¿\|zZxXcCvVbBnNmM,<.>/?^AÃ¿Ã¿
 @Ã¿Ã¿Ã¿Ã¿
....(more crap)...
Drive A error. System halt
DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTERBIOS ROM checksum
errorKeyboard controller errorKeyboard error or no keyboard present
Detecting floppy drive A media...Drive media is : 1.44Mb
1.2Mb
720Kb
360Kb
...(more crap)...
003EE007 003EF007
03F0: 003F0007 003F1007 003F2007 003F3007
03F4: 003F4007 003F5007 003F6007 003F7007
03F8: 003F8007 003F9007 003FA007 003FB007
...(rest of dump)

Subsequent test dumps are running, but with this weirdo corruption. 
It seems to run all the test dumps this time.

Anyway, the point is something in setup_identity_mappings(pgd_base,
PAGE_OFFSET, end); seems to corrupt the environment such that printk()
becomes unreliable.

Comment 79 Ingo Molnar 2004-05-22 10:42:31 UTC

hm, the early pagetable setup code changed a bit since i last touched
it. We now runtime-construct the early pagetables, in startup_32.

so the check_pagetables() code is not fully usable. We dont reuse
pg0, all pagetables are allocated anew via bootmem_alloc(). So the
kernel changes the pgd entries during pagetable init.

i cannot see any immediate bug in this new method, but it's
concievable that this somehow causes the C3 problems.

Comment 80 Ingo Molnar 2004-05-22 10:49:22 UTC

this recent patch changed the early pagetable handling:

  [PATCH] i386 very early memory detection cleanup patch

i cannot convince myself that it's correct - it uses an area of RAM
for pagetable init that it knows nothing about (end of the kernel
image). Furthermore, i cannot see how it can guarantee that bootmem
doesnt stomp over this area as soon as we start constructing the new
pagetables. _Normally_ it could go well if we manage to hold on to our
TLBs, but if the CPU's TLBs are small enough then this could be a problem.

i'll extract and attach the patch - could you try to unapply it? [but
the 4G patch likely interferes so i dont have high hopes ...]

Comment 81 Ingo Molnar 2004-05-22 10:51:42 UTC

Created attachment 100451 [details]
please try to unapply it from your tree - it will likely not succeed though.

please make a copy of your tree first, to make sure the failed unapply
of this patch doesnt damage it.

Comment 82 Andy Green 2004-05-22 10:54:49 UTC

Unfortunately I have to go pick my wife up from the airport, which is
an NMI for me :-)  I will probably not be able to do more tests until
tomorrow :-(

I do not have any experience unapplying patches, I can do this if
given exact instructions.  But I suspect perhaps someone else reading
may well have the experience and the time to try it in the meanwhile :-) 

I must say your last note is very encouraging!

Comment 83 Ingo Molnar 2004-05-22 10:55:05 UTC

Created attachment 100452 [details]
test-patch

i've attached a small patch that is an easy way to check whether my
theory holds. It changes pg0 to be part of the kernel image and allocates
32K of space for it - enough for the root pagetable and the pagetable
entries. So if the memdetect patch doesnt get allocation right then this
patch will automatically protect the early pagetable contents.

Comment 84 Ingo Molnar 2004-05-22 10:56:58 UTC

unapplying is easy: add -R to the patch command you use to apply
patches. E.g. 'patch -p1 -R < 1'.

but lets not do the unapplying, i dont think it will succeed. Please
try my last patch instead.

Comment 85 Ingo Molnar 2004-05-22 11:14:19 UTC

the main argument against the bootmem-stomp idea is that without
nopentium it crashes too. The PSE case is really simple: there are no
additional pagetables, everything is set up within swapper_pg_dir.

Comment 86 Ingo Molnar 2004-05-22 11:16:16 UTC

ah ... Arjan says that the C3 might not have the PSE feature.

how does your /proc/cpuinfo look like?

if the CPU does not have a PSE then the pagetable arguments get
stronger - it's an atypical case on other systems. (almost all CPUs
these days have the PSE bit, so a bug in the non-PSE case does not get
noticed too quickly.)

Comment 87 Ingo Molnar 2004-05-22 11:20:40 UTC

the memdetect patch introduced init_pg_tables_end, which seems to
guarantee that bootmem does not stomp on the early pagetables.

the patch is still suspect though.

does a vanilla kernel (eg 2.6.6) fail on your box too?

Comment 88 Andy Green 2004-05-22 15:58:53 UTC

Hi Ingo -  I am back but I am just driving by, I am cooking tea.  My 
cpuinfo can be found in comment #5.  It mentions PGE but not PSE. 
 
I did not try a vanilla kernel.  Maybe some of the other users can 
comment if they tried a vanilla kernel. 
 
Tomorrow morning if there is no resolution in the meanwhile I will 
follow your test patch directions. 
 
People probably don't say this often enough: I'm very glad you guys 
are around and well funded.

Comment 89 Beerman 2004-05-22 19:20:44 UTC

Same here :(
kernel-2.6.5-1.358.i586 and rebooting on decompresing kernel.

Vanilla 2.6.6 (CyrixIII/C3) works like a charm.

[root@pajonk rc.d]# cat /proc/cpuinfo
processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 7
model name      : VIA Samuel 2
stepping        : 3
cpu MHz         : 733.376
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips        : 1464.72

Comment 90 Dave Jones 2004-05-22 19:51:23 UTC

*** Bug 123935 has been marked as a duplicate of this bug. ***

Comment 91 Richard Schaal 2004-05-22 20:31:34 UTC

Could you attach your kernel config or mail it to me?  My "Vanilla"
2.6.6 failed to boot after having been built with the VIA C3 selected.
Failure was pretty much what we've seen as the symptom of this bug.

I'll compare and report any differences..  Thanks.

Comment 92 Lee Wilson 2004-05-23 00:44:52 UTC

Originally logged under 123935

Do the boot.iso or rescuecd.iso images use a differently configured 
kernel? 

I have tried booting from each of those images and the exact same 
thing happens.  I would have assumed that the rescuecd.iso would have 
had a much simpler kernel (e.g. less compiled in features/load as 
module, etc).

Comment 93 Andy Green 2004-05-23 08:17:48 UTC

I have tried the patch and unfortunately it did not seem to make much
difference, quite possibly I am not testing quite what we want to
test.  I should recap with the current state of the actual code being
tried, perhaps.

arch/i386/mm/init.c setup_identity_mappings() now has this

// original code:

			for (k = 0; k < PTRS_PER_PTE; pte++, k++) {
				vaddr = i*PGDIR_SIZE + j*PMD_SIZE + k*PAGE_SIZE;
				if (end && (vaddr >= end))
					break;
				if (vaddr < start)
					continue;

// added code:

				{ unsigned long prev_val = (unsigned long)pte->pte_low;
				  *pte = mk_pte_phys(vaddr-start, PAGE_KERNEL);
    if ((vaddr <= 8*1024*1024) ||
          ((vaddr >= ((unsigned int)3*1024*1024*1024)) &&
                 (vaddr < ((unsigned int)3*1024*1024*1024)+((unsigned
int)8*1024*1024))))
         if (prev_val != pte->pte_low)
                printk("!!!!!!!!!!!!!!!!!!!! ouch! %08lx != %08lx for
vaddr %08lx\n",     
                          prev_val, pte->pte_low, vaddr);

				}
			}
			set_pmd(pmd, __pmd(_KERNPG_TABLE + __pa(pte_base)));
		}
	}


arch/i386/kernel/vmlinux.lds.S:

  __bss_start = .;              /* BSS */
  .bss : {
        *(.bss.page_aligned)
        *(.bss)
  }
  . = ALIGN(4);
  __bss_stop = .;

 /* _end = . ; */

  /* This is where the kernel creates the early boot page tables */
  . = ALIGN(4096);
  pg0 = .;

 . = pg0 + 32768 ;
 _end = . ;

  /* Sections to be discarded */
  /DISCARD/ : {
        *(.exitcall.exit)
        }


I took out all my for(;;) ; hanging loops, but I still have my dumping
loops in pagetable_init() and they still start failing with garbage
after the call to  setup_identity_mappings(pgd_base, PAGE_OFFSET,
end); in there.  I will attach the dump.

Comment 94 Andy Green 2004-05-23 08:19:17 UTC

Created attachment 100473 [details]
dump of paging tables degenerating into garbage

Comment 95 Andy Green 2004-05-23 08:26:10 UTC

Should add if you look down the dumps there are a lot of complaints
coming out of the sanity check code we added to
setup_identity_mappings(), possibly the sanity check code is broken
(signed compares?) or this is telling us about the actual pagetable
corruption.  These pop out just before printk() becomes unreliable.

!!!!!!!!!!!!!!!!!!!! ouch! 00000001 != 00000063 for vaddr c0000000
!!!!!!!!!!!!!!!!!!!! ouch! f000e816 != 00001063 for vaddr c0001000
!!!!!!!!!!!!!!!!!!!! ouch! f000e2c3 != 00002063 for vaddr c0002000
!!!!!!!!!!!!!!!!!!!! ouch! f000e816 != 00003063 for vaddr c0003000
!!!!!!!!!!!!!!!!!!!! ouch! f000e816 != 00004063 for vaddr c0004000
!!!!!!!!!!!!!!!!!!!! ouch! f000ff54 != 00005063 for vaddr c0005000
....

Comment 96 Andy Green 2004-05-23 09:47:52 UTC

Hum, had the idea to go compare the arch/i386/mm/init.c from the
working -327 and broken -358, they seem pretty much identical :-(  I
think the magic ingredient to cause the disaster must be elsewhere,
even if the car crash is happening in setup_identity_mappings().

Comment 97 Lee Wilson 2004-05-23 11:18:29 UTC

Just downloaded the rescuecd.iso image for 1.92 (I think that is 
test3?).  This successfully boots.  I have copied uname 
& /proc/cpuinfo below:-

Linux localhost.localdomain 2.6.5-1.327 #1 Sun Apr 18 04:51:55 EDT 
2004 i686 unknown

processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 7
model name      : VIA Ezra
stepping        : 10
cpu MHz         : 800.252
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips        : 1576.96

Can anyone point me in the direction of finding out what has changed 
between test3 & the release version. I really want to get Fedora2 
running on this machine.

Comment 98 Andy Green 2004-05-23 11:26:38 UTC

Lee, you're in the same boat as I am, kernel -327 boots and works (but
I find it will freeze after 48hrs or so).  Read the early posts on
this bug carefully and you'll see that kernels either side of -326 and
-327 do not boot.  The rest of the posts are trying to find out why,
if you are reading this bug then I guess you have the very latest info
on the problem.

Comment 99 Andy Green 2004-05-23 18:28:11 UTC

Decided to stick in some printk()s in setup_identity_mappings() to try
to see what is happening.  I have a very limited idea of what the code
is trying to achieve.  Here it is with my dumps:


void setup_identity_mappings(pgd_t *pgd_base, unsigned long start,
unsigned long end)
{
	unsigned long vaddr;
	pgd_t *pgd;
	int i, j, k;
	pmd_t *pmd;
	pte_t *pte, *pte_base;

	pgd = pgd_base;

	printk("setup_identity_mappings(pdg_base=%p, start=0x%08lX,
end=0x%08lX);\n", pgd_base, start, end);
	printk("PTRS_PER_PGD=0x%08X, PTRS_PER_PMD=0x%08X,
PTRS_PER_PTE=0x%08X, cpu_has_pse=%d, cpu_has_pge=%d,
PGDIR_SIZE=0x%08lX\n", PTRS_PER_PGD, PTRS_PER_PMD, PTRS_PER_PTE,
cpu_has_pse, cpu_has_pge, PGDIR_SIZE);

		
	for (i = 0; i < PTRS_PER_PGD; pgd++, i++) {
		vaddr = i*PGDIR_SIZE;  // PGDIR_SIZE=4M
		if (end && (vaddr >= end))
			break;
		pmd = pmd_offset(pgd, 0);
		
		printk("i=%d, vaddr=0x%08lX, pmd=%p\n", i, vaddr, pmd);
				
		for (j = 0; j < PTRS_PER_PMD; pmd++, j++) {
			vaddr = i*PGDIR_SIZE + j*PMD_SIZE;
			
			printk(" i=%d, j=%d, vaddr=0x%08lX,  ", i, j, vaddr);
			
			if (end && (vaddr >= end))
				break;
			if (vaddr < start)
				continue;
			if (cpu_has_pse) {
				unsigned long __pe;

				set_in_cr4(X86_CR4_PSE);
				boot_cpu_data.wp_works_ok = 1;
				__pe = _KERNPG_TABLE + _PAGE_PSE + vaddr - start;
				/* Make it "global" too if supported */
				if (cpu_has_pge) {
					set_in_cr4(X86_CR4_PGE);
#if !defined(CONFIG_X86_SWITCH_PAGETABLES)
					__pe += _PAGE_GLOBAL;
					__PAGE_KERNEL |= _PAGE_GLOBAL;

#endif
				}
				set_pmd(pmd, __pmd(__pe));
				continue;
			}
			if (!pmd_present(*pmd)) {
				pte_base = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE);
				printk(" (pmd not present) ");
			} else {
				pte_base = (pte_t *) page_address(pmd_page(*pmd));
				printk(" (pmd present) ");
			}
			pte = pte_base;
			
			printk("pte_base=0x%p\n", pte);
			
			for (k = 0; k < PTRS_PER_PTE; pte++, k++) {
				vaddr = i*PGDIR_SIZE + j*PMD_SIZE + k*PAGE_SIZE;
				
				printk("   i=%d, j=%d, k=%d: vaddr=0x%08lX ", i, j, k, vaddr);
				
				if (end && (vaddr >= end))
					break;
				if (vaddr < start)
					continue;
					
				{ //unsigned long prev_val = (unsigned long)pte->pte_low;
				  *pte = mk_pte_phys(vaddr-start, PAGE_KERNEL);
				  
				  printk("--> 0x%08lX\n", pte->pte_low);
/*				  
					if ((vaddr <= 8*1024*1024) ||
						((vaddr >= ((unsigned int)3*1024*1024*1024)) &&
							(vaddr < ((unsigned int)3*1024*1024*1024)+((unsigned
int)8*1024*1024))))
						if ((unsigned long)prev_val != (unsigned long)pte->pte_low)
							printk("!! ouch! %08lx != %08lx for vaddr %08lx\n",     
								prev_val, pte->pte_low, vaddr);
*/
				}
			}
			set_pmd(pmd, __pmd(_KERNPG_TABLE + __pa(pte_base)));
		}
	}

	
}


The behaviour of the code is like this.  First there are these header
vars dumped, indicating the calling params, etc.

setup_identity_mappings(pdg_base=c0347000, start=0xC0000000,
end=0xCEFF0000);
PTRS_PER_PGD=0x00000400, PTRS_PER_PMD=0x00000001,
PTRS_PER_PTE=0x00000400, cpu_has_pse=0, cpu_has_pge=1,
PGDIR_SIZE=0x00400000


Then for the first 768 times around the outer loop, it does a continue
in the k loop:

i=0, vaddr=0x00000000, pmd=c0347000
 i=0, j=0, vaddr=0x00000000,  i=1, vaddr=0x00400000, pmd=c0347004
 i=1, j=0, vaddr=0x00400000,  i=2, vaddr=0x00800000, pmd=c0347008
 i=2, j=0, vaddr=0x00800000, 
....
 i=768, vaddr=0xC0000000, pmd=c0347c00
 i=768, j=0, vaddr=0xC0000000,   

Then on the 768th one, it enters the innermost loop, first finding the
pte address -->

(pmd present) pte_base=0x00000000

WHICH IS NULL (this seems WRONG???  Seems like it is used as a POINTER???)

Then it does the inner loop 1K times

   i=768, j=0, k=0: vaddr=0xC0000000 --> 0x00000063
   i=768, j=0, k=1: vaddr=0xC0001000 --> 0x00001063
   i=768, j=0, k=2: vaddr=0xC0002000 --> 0x00002063
   i=768, j=0, k=3: vaddr=0xC0003000 --> 0x00003063
....
   i=768, j=0, k=1021: vaddr=0xC03FD000 --> 0x003FD063
   i=768, j=0, k=1022: vaddr=0xC03FE000 --> 0x003FE063
   i=768, j=0, k=1023: vaddr=0xC03FF000 --> 0x003FF063

It proceeds to do 1024 inner blocks for i=769 thru 827, although on
the 827 one it seems to abort early at k=1008...

   i=827, j=0, k=1005: vaddr=0xCEFED000 --> 0x0EFED063
   i=827, j=0, k=1006: vaddr=0xCEFEE000 --> 0x0EFEE063
   i=827, j=0, k=1007: vaddr=0xCEFEF000 --> 0x0EFEF063
   i=827, j=0, k=1008: vaddr=0xCEFF0000 pagetable_init() test 2
***DUMPING PAGING TABLES
swapper_pg_dir = 0xC0347000
0000: 00391027 00000000 00000000 00000000 
0004: 00000000 00000000 00000000 00000000 

... it then seems to return and do the pagetable_init() test 2 dump
which is after the call to this routine.  Maybe it crapped on its
printk() buffer for the last 15 times around the loop?  Don't know.

It completes the dump and then reboots.

Anyway, the interesting thing is that pte_base in the above copied
code comes out as 0x00000000 at runtime.  That seems wrong to my
undereducated eyes.

Comment 100 Andy Green 2004-05-23 18:37:35 UTC

Looking a bit harder it is not aborting early, but because it hit the
end address of 0xceff0000 set by the third calling param, then
returned cleanly to the caller which does the dump.

But 0x00000000 can't be right for that pte pointer, unless it 'just so
happens' that the memory at 0x00000000 is being used as the pte
table... this seems unlikely?????

Comment 101 Ingo Molnar 2004-05-23 19:06:12 UTC

Indeed you seem to be on to something.

pte_base = NULL is almost certainly incorrect. Even if we allocated
physical address zero as the pagetable (which is close to impossible,
since we mark it as reserved - certain BIOSes rely on it for suspend),
even then it should be 0xc0000000.

so pte_base = NULL means that the pmd_present() == true condition is
wrong.

the only way this can happen is if the head.S code does the
root-pagetable-setup incorrectly.

to test this theory, could you comment out the true branch from the
pmd_present() condition? Something like:

//     if (!pmd_present(*pmd))
             pte_base = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE);
//     else
//           pte_base = (pte_t *) page_address(pmd_page(*pmd));

this will cause us to allocate new pagetables and not accept the
pre-generated head.S layout.

Comment 102 Andy Green 2004-05-23 19:21:37 UTC

Yep, BINGO

Early printk stuff completes my dumps and then finally:

zapping low mappings.
On node 0 totalpages: 61424
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 57328 pages, LIFO batch:13
  HighMem zone: 0 pages, LIFO batch:1
DMI 2.2 present.
ACPI: RSDP (v000 VT9174                                    ) @ 0x000f6650
ACPI: RSDT (v001 VT9174 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0eff3000
ACPI: FADT (v001 VT9174 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0eff3040
ACPI: DSDT (v001 VT9174 AWRDACPI 0x00001000 MSFT 0x0100000c) @ 0x00000000
ACPI: PM-Timer IO Port: 0x408
Built 1 zonelists
Kernel command line: ro root=LABEL=/ mem=nopentium
earlyprintk=serial,ttyS0,38400
Initializing CPU#0
CPU 0 irqstacks, hard=c034d000 soft=c034c000
PID hash table entries: 1024 (order 10: 8192 bytes)
Detected 599.892 MHz processor.
Using pmtmr for high-res timesource
disabling early console

It has booted up all the way to the login prompt in fact :-D

Seems like adding one of those BUG() asserts you mentioned checking
pte_base!=NULL would be a good addition to the actual kernel code, it
is not in the innermost loop so there would be no real performance
penalty.  Since pmd_present() can disagree with getting a non-null
result from page_address()....

Can I help probe the problem behind this, presumably in head.S then?

Comment 103 Ingo Molnar 2004-05-23 19:37:35 UTC

i've reviewed the head.S code and it seems to be correct.

Could you please print out some more state in the pte_base == NULL
case? It would be quite useful to print out the raw pmd value. One of
your earlier dumps showed these swapper_pg_dir contents:

0000: 00391027 00000000 00000000 00000000 
0004: 00000000 00000000 00000000 00000000 

this means that the pmd value was 0x00391027 or 0x00000000.
pmd_present() tests bit 0 of the pmd - so only the first entry could
be pmd_true - but in that case pte_base should have been 0xc0391000.

pte_base = NULL and pmd_present() means that the entry in
swapper_pg_dir must have been 0x00000001 (or perhaps 0x00000027).
None of the dumps suggest this though.

Comment 104 Ingo Molnar 2004-05-23 19:39:31 UTC

Another (remote) possibility would be that some sort of non-RAM page
ends up being used for pagetables. This can lead to similarly funny
results. How does a full bootup log look like on your box - how does
the e820 map (the RAM map, provided by the BIOS) look like?

Comment 105 Ingo Molnar 2004-05-23 19:42:04 UTC

Another question - what is the precise value of pg0 on your box?
(should be in your System.map).

perhaps there's a boundary condition bug in the head.S code - this
should only be possible if pg0+INIT_MAP_BEYOND_END [==pg0+128k] is
exactly on a 4 MB boundary. [quite unlikely ...]

Comment 106 Andy Green 2004-05-23 19:45:55 UTC

May 23 08:56:03 backup kernel: BIOS-provided physical RAM map:
May 23 08:56:03 backup kernel:  BIOS-e820: 0000000000000000 -
00000000000a0000 (usable)
May 23 08:56:03 backup kernel:  BIOS-e820: 00000000000f0000 -
0000000000100000 (reserved)
May 23 08:56:03 backup kernel:  BIOS-e820: 0000000000100000 -
000000000eff0000 (usable)
May 23 08:56:03 backup kernel:  BIOS-e820: 000000000eff0000 -
000000000eff3000 (ACPI NVS)
May 23 08:56:03 backup kernel:  BIOS-e820: 000000000eff3000 -
000000000f000000 (ACPI data)
May 23 08:56:03 backup kernel:  BIOS-e820: 00000000ffff0000 -
0000000100000000 (reserved)
May 23 08:56:03 backup kernel: 0MB HIGHMEM available.
May 23 08:56:03 backup kernel: 239MB LOWMEM available.

Comment 107 Andy Green 2004-05-23 19:47:47 UTC

From the System.map for the custom kernel

c0391000 A pg0
c0399000 A _end

Comment 108 Ingo Molnar 2004-05-23 19:52:35 UTC

ok, could you undo the vmlinux.lds.S hack (which i suggested some time
ago, and which still seems to be in your tree)? That changed pg0/end.
They should be equal in an unhacked tree.

still, they are near to a 4MB boundary, but not near enough to cause
trouble i think.

what we need is a full dump in the pte_base == NULL case - how did we
get into the pmd_present() branch? I think the dump of swapper_pg_dir
should be enough as a starter.

(the raw value of the pmd is pmd_val(*pmd) - please print that one out
too.)

Comment 109 Ingo Molnar 2004-05-23 19:56:41 UTC

The e820 map looks sane and simple - there are only two RAM ranges:
0...640K, 1MB...~240MB. The ACPI areas are all after the end of RAM.
So the likelyhood of something weird being near the pagetables is
quite slim. (we load the kernel at 1MB physical.)

Comment 110 Andy Green 2004-05-23 20:00:04 UTC

swapper_pg_dir = 0xC0347000
0000: 00391027 00000000 00000000 00000000 
...(all zeros)... 
0300: 00391027 00000000 00000000 00000000 
...(all zeros)...


Going to remove the pg0/end thing, which is indeed still in.

I added some code to find all cases where there is a disagreement problem

			if(pmd_present(*pmd)) {
				if(((pte_t *) page_address(pmd_page(*pmd)))==NULL ) {
				   printk("pmd_present TRUE page_address NULL at 0x%08lX\n",
pmd_page(*pmd));
				}
			}

Comment 111 Ingo Molnar 2004-05-23 20:02:43 UTC

your pmd_page(*pmd) condition is not correct.

the best would be to use: 'pmd_val(*pmd) < PAGE_SIZE'.

Comment 112 Ingo Molnar 2004-05-23 20:07:00 UTC

hm ... 

this line is incorrect:

   pte_base = (pte_t *) page_address(pmd_page(*pmd));

Comment 113 Andy Green 2004-05-23 20:08:32 UTC

I have little understanding of what I am working with, however, here
is the original failing code:

			if (!pmd_present(*pmd)) {
				pte_base = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE);
				printk(" (pmd not present) ");
			} else {
				pte_base = (pte_t *) page_address(pmd_page(*pmd));  //  NULL
				printk(" (pmd present) ");
			}

The error is happening because pmd_present(*pmd) is true, and the line
commented //NULL is NULL.  Therefore that is the basis of my test code.

Can you explain why using the actual failing code above is 'not
correct' when I am looking for the instances where that exact code fails?

Ah, just saw your next comment... is this in fact The Bug?

Comment 114 Ingo Molnar 2004-05-23 20:09:24 UTC

the correct could would be something like:

   pte_base = pte_offset_kernel(pmd, 0);

does it boot if you fix that line?

Comment 115 Ingo Molnar 2004-05-23 20:10:26 UTC

Created attachment 100481 [details]
fix

ok, the fix as a patch against the Fedora kernel.

Comment 116 Andy Green 2004-05-23 20:12:50 UTC

Catching up with previous tests first just in case

 - Still boots on my test tree with pg0 == _end

 - got this single instance of the disagreement between pmd_present()
and page_address()

pmd_present TRUE page_address NULL at 0x00007220

Now I make your fix.

Comment 117 Ingo Molnar 2004-05-23 20:15:29 UTC

i do think this is The Bug. But lets wait for your test first! :-)

the bug got introduced the following way: i did a cleanup of the
pagetable setup code for the 4:4 patch - but then the meaning of
pmd_page() changed sometime in the -mm tree and this stray use remained...

Comment 118 Andy Green 2004-05-23 20:22:49 UTC

It's a winner Ingo, well done :-)

Comment 119 Ingo Molnar 2004-05-23 20:30:36 UTC

Cool :) You did all the hard work though!

(this fix does not mean all Via C3 problems are necessarily fixed.
E.g. if vanilla 2.6.6 fails for someone that's an indication of some
other problem - 2.6.6 does not have this buggy line of code.)

Comment 120 Andy Green 2004-05-23 20:36:26 UTC

Continues to boot fine with the mem=nopentium and the earlyprintk
cmdline params removed too.

I was experiencing flaky behaviour (freezing) on this board with -327,
it would not stay up for >48hrs and sometimes a lot less: this machine
does my mail so it is highly visible when it goes down.  It'll be
interesting to see if that was coming from the same direction and has
gone away or if that is another issue.  At least I have a serial cable
now if it is something else!

Comment 121 Richard Schaal 2004-05-23 23:25:39 UTC

I did get a vanilla 2.6.6 kernel to boot correctly on the second
attempt at building - I changed the highmem config option from 64G to
none. - that seems to be it.  Can't say when or why it first went on
in the first place - I don't have anywhere near that much memory.

Comment 122 Dave Jones 2004-05-23 23:40:00 UTC

*** Bug 124066 has been marked as a duplicate of this bug. ***

Comment 123 Ingo Molnar 2004-05-24 05:29:18 UTC

CONFIG_HIGHMEM_64G will certainly not work on this CPU - that feature
needs the PAE capability. We should bail out a bit more gracefully in
that case though ... right now we just hang i think with a message
printk'd that doesnt make it to the console because console_init() has
not been done yet.

Comment 124 Andy Green 2004-05-24 07:58:13 UTC

Just noting that -327 has the same bad line, yet it worked.  I suppose
this can be coming out of the exact details of why pte_base = (pte_t
*) page_address(pmd_page(*pmd)); is not right, yet was capable to work
on most CPUs anyway.  

It can also go towards subtly trashed pagetables on -327 being
responsible for the 48hr freeze behaviour, eg, it might have mapped
the same linear page twice.

Do we have a story for why -327 was okay on C3?

Comment 125 Ingo Molnar 2004-05-24 10:42:38 UTC

We used a really broken way to establish the pte_base pointer - and we
used that pointer blindly from that point on.

in addition, the initial mappings map the NULL address too - so
dereferencing it doesnt cause traps. Plus, the first page in the
system is _typically_ not used. The creaming is that 99% of the CPUs
out there have the PSE feature, so most boxes wont ever hit this codepath.

So i'm not surprised -327 worked fine on your box. But we ended up
corrupting page 0 - which could have nasty side-effects for BIOS
related things like SMM handlers or suspend.

If you feel inclined, could you print out the value of pte_base on a
rebuild tree of -327, to confirm that it's different?

Comment 126 Arjan van de Ven 2004-05-24 16:40:21 UTC

I'm trying to make a new boot.iso so that people can install FC2 on
their VIA C3 anyway; I'd love if some of the people on this bug would
help test this.

The url is
http://people.redhat.com/arjanv/c3boot.iso

Comment 127 Andy Green 2004-05-24 18:59:02 UTC

Bad news, the test iso comes up in the grub-type vga display, when I 
hit enter it pulls in the initrd and kernel, then... reboots :-/

Comment 128 Andy Green 2004-05-24 19:04:20 UTC

Mounting the test iso image and poking around, an obvious difference 
is that this is an SMP kernel.... is it i586 though?  No obvious way 
to tell.

Comment 129 Lee Wilson 2004-05-24 19:06:06 UTC

Just tried the c3boot.iso as well, still same problem.  In fact I 
would probably say it is worse.

Where as the original kernel got as far as "uncompressing the kernel" 
message this one, this one finishes loading the initrd then reboots.

Hope this well.

Comment 130 Arjan van de Ven 2004-05-24 19:08:35 UTC

Sorry guys I screwed it up (you'd THINK after 3 years in the job I
could build a kernel correctly .,.. but no)
working on a better iso

Comment 131 Arjan van de Ven 2004-05-24 19:18:09 UTC

ok reuploaded now with sane kernel hopefully
same url

Comment 132 Andy Green 2004-05-24 19:29:30 UTC

Yes, booting now, comes up on the first page of the install script 
in textmode, language selection I think.

Comment 133 J Spells 2004-05-24 20:49:13 UTC

Tried the new boot.iso and the installation process works fine now--
all the way through to the end.  But upon the reboot after 
installation, it crashes in the same way (consistently rebooting) 
after the GRUB page with kernel 2.6.5-1.358.  I've got C3/Eden (Ezra) 
and 1Ghz.  Are there any command line options I should try?

Comment 134 Andy Green 2004-05-24 21:10:36 UTC

Presumably the crash after install is just because the kernel 
package from the original FC2 media is being installed. 
 
Maybe shortly you will be able to come up off the install kernel 
with linux rescue (IIRC) and rpm -Uvf 
kernel-new-one-from-Arjan.i586.rpm and that'll be it.

Comment 135 Richard Schaal 2004-05-24 23:01:23 UTC

Tested the new c3boot.iso - seems to work fine - my install is in
progress.  I will look forward to the next kernel update! 

Thanks!
Richard

Comment 136 Edward Almos 2004-05-25 09:34:34 UTC

VIA C3 800MHz, 256Mb Original BIOS

Tried using the new ISO and the install now works fine after following
instruction to change CD to FC2 disk 1. I do however have the same
problem as post #133, the system reboots after the GRUB page.

You're getting there guys so well done for the work so far, would it
be reasonable to assume that the eventual outcome of all this will be
a new FC2 disk 1 iso that contains the modifications ?

Ed Almos

Comment 137 Andy Green 2004-05-25 14:39:55 UTC

Folks, Arjan's new -383 kernel is up at
http://people.redhat.com/arjanv/2.6/RPMS.kernel/kernel-2.6.6-1.383.i586.rpm
and boots fine on my machine... If you 

 - boot off Arjan's ISO (http://people.redhat.com/arjanv/c3boot.iso)
 - swap in the normal FC2 CD1 and install FC2
 - on completion, boot again off Arjan's ISO
 - type linux rescue at the grub prompt
 - swap in a CD with the new 383 kernel RPM
 - rpm -Uvf kernel-2.6.6-1.383.i586.rpm
 - type exit (I think) to reboot

On reboot you should come up in -383 and hopefully all this will be an
ugly memory ;-)

Comment 138 J Spells 2004-05-25 17:23:39 UTC

I apologize in advance for what are going to be basic questions.

1. I have completed steps 1-4 above.  After typing "linux rescue", I 
get the first couple of screen from the FC2 install (keyboard, 
language, etc).  It's clearly in rescue mode though.  At the end of 
these screens it asks for disc 1 of the FC2 isos.  It will not accept 
the CD with the new rpm.
2.  I give it the FC2 disc 1 and then it says that it's mounting my 
systems at /mnt/sysimage and then gives me a bash prompt.
3.  At this point, I put in the CD with the new kernel rpm and type 
the command above.  The shell responds that no such file exists.  I 
use the find command and confirm that it cannot see the new rpm file.
4.  I try to mount the cdrom with the mount command but this does not 
work either.
5.  I try to chroot to the /mnt/sysimage and do steps 3. and 4. but 
this does not work either.

What am I doing wrong?  This is way over my head.

Comment 139 Andy Green 2004-05-25 17:28:08 UTC

Sounds like you're real close.

What exactly was the mount command you tried in your step 4?  I would
try something like this

mkdir mymount
mount /dev/hdc mymount -t iso9660
cd mymount
ls

This assumes /dev/hdc is your CD reader.  /dev/cdrom might work too,
this is a symlink I think to the actual device.  

Is that similar to what you tried?

Comment 140 J Spells 2004-05-25 17:37:32 UTC

Before I try this, do I need to chroot to /mnt/sysimage?

Comment 141 Andy Green 2004-05-25 17:45:48 UTC

Yeah, quite possibly, since that will be where your real rpm database
is at.  You seem to know what chroot is about but just in case or if
anyone else is wondering, it basically replaces / with some other
directory.  So /mnt/sysimage/bin becomes /bin and so on.  Rescue mode
comes up with some utils and stuff in /, and your normal root
filesystem in /mnt/sysimage.

Comment 142 J Spells 2004-05-25 18:18:39 UTC

Success!!

Thanks for all your help.  Now I need to solve the X server problems.

Comment 143 Andy Green 2004-05-25 18:58:54 UTC

Version 2 of the FC2 install sequence for C3s, battle-tested by J
Spells, then ...

 1) Download
http://people.redhat.com/arjanv/2.6/RPMS.kernel/kernel-2.6.6-1.383.i586.rpm
 2) burn it on to a CD as a file on its own call it KERNEL RPM CD
 3) Download http://people.redhat.com/arjanv/c3boot.iso
 4) burn this on to a CD as a CD image, call it C3 BOOT CD
 5) boot off C3 BOOT CD
 6) at the first menu swap in the normal FC2 CD1 and install FC2
 7) on install completion, boot again off C3 BOOT CD... but...
 8) type linux rescue at the grub prompt
 9) when it is finished booting, type chroot /mnt/sysimage
10) swap in the KERNEL RPM CD
11) mkdir mymount
12) mount /dev/hdc mymount -t iso9660
13) cd mymount
14) rpm -Uvf kernel-2.6.6-1.383.i586.rpm
15) type exit (I think) to reboot

On reboot you should come up in -383 and hopefully all this will be an
ugly memory ;-)

Comment 144 Arjan van de Ven 2004-05-25 19:02:12 UTC

would it be useful if I stuck the 383 kernel onto the boot.iso ?

Comment 145 Andy Green 2004-05-25 19:09:50 UTC

What is boot.iso, the first FC2 CD?  That would be a full solution,
assuming the kernel package is on CD1, issue a new 700MB or whatever
iso for FC2 CD1 which boots into -383 and has the -383 kernel and
kernel-source package on it too.  I saw your message in
fedora-test-list about your intention along these lines.

I can see what would be REALLY COOL is if the installer had an option
to go to a yum repository before it started and download headers for
updated packages, favouring the updates instead of the ones on the CD
where the updates were newer.  That would allow you to solve this
problem by issuing an updated kernel package over yum and telling
people they must get that at installtime for C3 installs.

Comment 146 Andy Green 2004-05-25 19:12:42 UTC

Another cool idea would be a small script, which wget-ed the newer
kernel, mounted the standard FC2 CD1 ISO with -o loop and replaced the
kernel package and updated the install kernel footprint too.  Then a
download of a few megabytes would automate the update of FC2 CD1 to C3
compatability.

Comment 147 Arjan van de Ven 2004-05-25 19:26:40 UTC

I meant putting the 383 kernel rpm file inside the c3boot.iso
actually, it would make it unneeded to make 2 cds, and in fact one
could rpm -i the kernel during the normal installation already, no
need to rescue boot...

Comment 148 Andy Green 2004-05-25 19:30:57 UTC

Yes, that's a great idea would clearly cut out a lot of fiddling.  Can
you just go to a different virtual console and install the kernel when
the main install is over, then?

I still like the yum idea for the future, it would ensure that every
install had the latest patches from the get-go.

Comment 149 Arjan van de Ven 2004-05-25 19:48:22 UTC

ok I uploaded http://people.redhat.com/arjanv/c3boot-2.iso with the
RPM on it

Comment 150 Andy Green 2004-05-25 19:56:46 UTC

Version 3pre1 ;-) of the FC2 install sequence for C3s

 1) Download http://people.redhat.com/arjanv/c3boot-2.iso 
 2) burn it on to a CD as a CD image
 3) boot off it
 4) at the first menu swap in the normal FC2 CD1 and install FC2
 5) on install completion, type ctrl-alt-F2 to get to a console
 6) rpm -Uvf /!!!Where is the CD mounted???/kernel-2.6.6-1.383.i586.rpm
 7) type ctrl-alt-F1 and complete the install

Arjan, 

 - what is the path that the CD is mounted at during the install action?

 - is it true that ctrl-alt-F2 will get you to a bash prompt?

 - is it true that the installer is back on virtual console 1

 - is it true that the installer waits at the end to allow you to do
the RPM in another vc

Comment 151 Edward Almos 2004-05-25 20:46:54 UTC

Further to comment #136

An ISO image of FC2 CD1 with the 383 kernel would really make my day.
Can this be installed on one or two mirrors just for us C3 folk ?

Ed Almos
Budapest, Hungary

Comment 152 Hugh de Burgh 2004-05-25 23:58:21 UTC

Just an additional note that this problem also affects my CM-588 
single board computer. Its based on the Geode 5530 Chipset.

Thanks for all your hard work everyone,
-Hugh

Comment 153 Gil Chilton 2004-05-26 01:29:12 UTC

c3boot.iso #1 worked for my Syntax S635MP motherboard with an
Integrated VIA C3 Samuel2.  I had to download the kernel rpm since I
failed to mount a separate CD (user error no doubt) containing the
kernel.  Seems to work fine other than X seeming slow so far.

Comment 154 Barry K. Nathan 2004-05-26 03:55:57 UTC

If you have via graphics, look in xorg.conf and use the "via" driver
rather than "vesa". This driver will be much faster -- but Xorg's via
driver is a bit out of date and it does have stability problems
sometimes. I don't remember the URL for the latest driver.

Comment 155 Arjan van de Ven 2004-05-26 06:42:21 UTC

*** Bug 124385 has been marked as a duplicate of this bug. ***

Comment 156 Bill Nottingham 2004-05-26 20:31:02 UTC

*** Bug 118255 has been marked as a duplicate of this bug. ***

Comment 157 Ingo Molnar 2004-05-27 11:56:53 UTC

Andy, 3pre1 looks mostly good :)

>  - what is the path that the CD is mounted at during the install action?

it's /mnt/source/ in FC2.


> - is it true that ctrl-alt-F2 will get you to a bash prompt?

yes.

> - is it true that the installer is back on virtual console 1

only in a text install. In a graphical install you need to Alt-F7.

> - is it true that the installer waits at the end to allow you to do
the RPM in another vc

the installer waits at the end to do a reboot - so that you can take
out the installation CD - otherwise it would boot into the installer
again, instead of the HD. You still have a shell prompt on vc2 at this
stage IIRC.

Comment 158 John D 2004-05-27 12:10:37 UTC

I've upgraded from FC1 to FC2, got the new kernel in, however, my
machine bombs out at the point it will fsck the root lvm volume.   The
kernel finds the rootvg, however, it is not able to mount it
readwrite, or fsck it.  Has anyone run in to this?  Is this an LVM
problem?

I can still boot under a 2.4 kernel without any issue.

Comment 159 Alex Bloor 2004-05-27 12:45:14 UTC

I have got nearly all the way there... 

But every time I try and rpm -Uvh the kernel off the via boot disk, 
it moans about missing dependancies. This makes no sense as I 
actually did a full install.. I've tried twice now and each time it 
has failed :( ... 

A.

Comment 160 Andy Green 2004-05-27 12:47:36 UTC

Well personally I would hit it with --nodeps and --force on the rpm 
commandline too.  That will install the thing regardless. 
 
But what where the missing deps?

Comment 161 Earl Terwilliger 2004-05-27 12:53:08 UTC

You need to do this before the RPM command: 
 
chroot /mnt/sysimage 
 
then the RPM command will work fine. works for me :)

Comment 162 Alex Bloor 2004-05-27 12:59:41 UTC

Ok slight problem.. I've now rebooted.. Do I have to go thru the 
whole install again? I have rebooted with the VIA disc, then inserted 
the FC2 disk1 at the relevant point....

Sorry.. I should know this...

A.

Comment 163 Alex Bloor 2004-05-27 13:05:22 UTC

Ahh.. Sorry .. I see. Wait until you've selected the fact that it's 
already installed, choose that then prior to making a GRUB selection 
ALT+F2 do the Chroot and then force the CD to unmount/eject then 
remount the VIA boot CD... Currently RPMing the kernel.. :) Hope it 
works <g>

A.

Comment 164 Earl Terwilliger 2004-05-27 13:38:32 UTC

Yes, thanks for all the work on getting this fixed. 
My processor (details below) now works fine with this patched 
kernel. 
 
processor       : 0 
vendor_id       : CentaurHauls 
cpu family      : 6 
model           : 7 
model name      : VIA Samuel 2 
stepping        : 3 
cpu MHz         : 401.175 
cache size      : 64 KB 
fdiv_bug        : no 
hlt_bug         : no 
f00f_bug        : no 
coma_bug        : no 
fpu             : yes 
fpu_exception   : yes 
cpuid level     : 1 
wp              : yes 
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow 
bogomips        : 793.67

Comment 165 Mike Gore 2004-05-28 01:25:25 UTC

Using c3boot-2.iso for the kernel and initrd file on a VIA EPIA 800
motherboard I getserveral errors
When viewed with ALT-F4 I see messages like
<3>via_rhine: version magic '2.6.5-1.358 586 REGPARAM 4KSTACKS
gcc-3.3' should be '2.6.5-1.358 586 REGPARM 4KSTACKS gcc 3.4'
When viewed with ALT-F3 is see messages like
* failed to insert /tmp/uhci-hcd.ko

As a result I do not have access to any devices

Comment 166 Mike Gore 2004-05-28 02:12:14 UTC

Ignore my last posting - it may be user error on my part
(I am DHCP booting the system since I have no floppy. I have a USB
dvd. but can't boot USB dvd so I DHCP boot pointing to the c3boot-2
ISO image. I think I had more then one iso mounted on the same path
now booting works)

Comment 167 Andy Green 2004-05-29 06:11:34 UTC

Version 3 of the FC2 install sequence for C3s

 1) Download http://people.redhat.com/arjanv/c3boot-2.iso 
 2) burn it on to a CD as a CD image
 3) boot off it
 4) at the first menu swap in the normal FC2 CD1 and install FC2
 5) on install completion, where it says to remove the CD and it will
reboot, type ctrl-alt-F2 to get to a console
 6) type eject
 7) stick back in the CD you booted from
 8) (replace hdc with your cdrom device if different) mount /dev/hdc
/mnt/source -t iso9660
 9) chroot /mnt/sysimage [NOT SURE IF THIS IS NEEDED]
10) rpm -Uvf /mnt/source/kernel*
11) type ctrl-alt-F1 (for text mode install) or crtl-alt-F7 (graphical
install) and complete the install
12) You should reboot into a working kernel


BTW With Arjan's 383 my C3 board is back to its old reliable
behaviour, nearly 4 day uptime already when -327 would freeze within
48hrs.  JOB WELL DONE EVERYBODY!

Comment 168 Carsten Groh 2004-06-01 20:18:51 UTC

Hi!




Thanks for all your Hints, but my problem is, that there is no 
/dev/hdc in my system. I can't 'find' any cdrom device in my system.


A log said Method hdc://mnt/cdrom




My board (i don't know the real Nmae) is the smallest VIA EPIA 
fanless with 566 MHz.




Any further Hints, what to do?




Regards Carsten

Comment 169 Gerald Bastelica 2004-06-03 07:51:37 UTC

Hi !

I also had problems to find how to mount cdrom. To be able to mount 
it, i had to chroot FIRST. 

So i think that you should do that:

 1) Download http://people.redhat.com/arjanv/c3boot-2.iso 
 2) burn it on to a CD as a CD image
 3) boot off it
 4) at the first menu swap in the normal FC2 CD1 and install FC2
 5) on install completion, where it says to remove the CD and it will
reboot, type ctrl-alt-F2 to get to a console
 6) chroot /mnt/sysimage
 7) stick back in the CD you booted from
 8) (replace hdc with your cdrom device if different) mount /dev/hdc
/mnt/source -t iso9660
 9) rpm -Uvf /mnt/source/kernel*
10) type ctrl-alt-F1 (for text mode install) or crtl-alt-F7 (graphical
install) and complete the install
11) You should reboot into a working kernel

Regards,

Gerald

Comment 170 Arjan van de Ven 2004-06-03 07:56:13 UTC

*** Bug 125139 has been marked as a duplicate of this bug. ***

Comment 171 Christian 2004-06-03 17:21:34 UTC

I have the same problems like Carsten (#168). Even with the changes
from  Gerald (#169) I can't find my cdrom. I do have a hdc (hdc1 to
hdc32) but I can't mount anything (error-message: could not find
/dev/hdc in /dev/mstab or /dev/fstab. What have I done wrong?

Comment 172 Arnaud MOURONVAL 2004-06-03 21:23:46 UTC

hda is "master device attached to primary IDE channel".
hdb is "slave device attached to primary IDE channel".
hdc is "master device attached to secondary IDE channel".
hdd is "slave device attached to secondary IDE channel".

Depending on the slot you attached your cdrom to, you have to adapt
the script to /dev/hdb or /dev/hdc/ or /dev/hdd.

check out the output of "dmesg". You should find which device your
cdrom is attached to. E.g. :
...
hdc: CD-224E, ATAPI CD/DVD-ROM drive
...

Comment 173 Christian 2004-06-04 01:37:03 UTC

Thx Arnaud for your answer, but ...
1. I knew that my CD-ROM is attached to hdc
2. I still have the problem

Comment 174 Carsten Groh 2004-06-04 07:09:55 UTC

Hi Christian,

After doing the things like Gerald suggested #169 it works fine for 
me. Did you get other errors? Maybe a unreadable CD?

Regards Carsten

Comment 175 Glen Gray 2004-06-10 11:02:22 UTC

I'd be interested in hearing back some stability reports on use with
these C3 CPU's (Samuel 2/Ezra). I've had two occasions of serious
weird bugs with software crashing from back in my days with Antefacto
and more over a year ago when testing some IPC units. I always put
this down to rogue cmov instructions in packages that should have been
i386 though.

I experienced this FC2 install CD kernel bug myself last week when
testing an Ezra unit. 

Andy, now your mail server is up and running again, how's the
stability. I'd emaill you off bugzilla but don't know you email
address (and can't see how to get it off bugzilla). 

Cheers for fixing this issue. I'll get the boot cd and give that Ezra
another spin tomorrow.

Comment 176 Greg Mansfield 2004-06-10 11:40:59 UTC

I followed the procedure and installed FC2 on my VIA EPIA 5000 system.
(bug 125139).  Thanks to all for finding this so fast and giving me a
fix.

Comment 177 Andy Green 2004-06-10 11:46:37 UTC

Emails are right there at the top of each post, Glen 
(andy in my case). 
 
Right this second: 
 
 12:44:10 up 7 days,  3:36,  2 users,  load average: 0.38, 0.47, 
0.45 
 
It has frozen once since the new kernel, I don't know why or how to 
debug that so I just rebooted.  Before the recent FC2 kernels, the 
box had been up for months without a reset.  However it is a bit of 
a sealed unit codewise, it just runs a small set of apps and that's 
it. 
 
If you are having random app crashes on a machine without a stable 
history, I would be thinking about the RAM.

Comment 178 Carsten Groh 2004-06-15 07:16:08 UTC

Hi,

a new kernel is arrived: kernel-2.6.6-1.427.i586.rpm is this kernel 
ready to use in the C3 CPU Boards, or do we need another user 
compiled one? Any experiences?

Regards,

Carsten

Comment 179 Earl Terwilliger 2004-06-15 14:16:36 UTC

I ran:  
 
yum update 
 
and got kernel-2.6.6-1.435. It works for me! (see my previous comment 
for my config)

Comment 180 Eric Hedström 2004-06-16 17:30:15 UTC

kernel-2.6.6-1.435 (i586) works like a champ for me as well, on an
EPIA ME6000 board.

Many thanks to Andy and Ingo and Arjan for the troubleshooting and fix
and workaround ISO image. It was great to see, when trying to install
this sucker last night, that the problem had already been tracked down
and fixed. I followed the workaround in comment 169, except that I
didn't have a /mnt/source directory and so mounted to /mnt instead.

Comment 181 Kenneth Gage 2004-07-04 14:46:51 UTC

Hello -

I am using the workaround found in Comment 169 to install FC2 on a 
Compaq Presario laptop with a Cyrix chip (yes its a little old); I 
experienced the instant reboot problem as well.

When I press [ENTER] at the boot: prompt using the c3-boot-2 CD 
everything appears fine (uncompressing the kernel and no reboot 
issues) until a series of "hdc: lost interrupt" errors that 
functionally prevent installation.  I've tried booting 
with "nodma", "pci=noacpi", "hdc=nodma"; Any suggestions on how to 
get around this? If I've posted to the wrong forum please let me 
know - Thanks in advance!

Comment 182 Kenneth Gage 2004-07-04 17:20:15 UTC

Hi - Please ignore my previous post;

for some reason IRQ 14 was being assigned to both ide0 and ide1; once 
I passed the following to the kernel: ide1=0x170,0x376,15 at the boot 
prompt installation proceeds normally.

Thanks for solving the Cyrix install issue!

Comment 183 jason hensler 2004-09-12 03:28:11 UTC

anyone know how to make this work with the dvd install????

Comment 184 Les Carter 2004-11-21 04:32:26 UTC

The workarounds described in this bug report do not work with the
following setup:

Motherboard : EPIA-V10000A
CPU : VIA C3 1GHz
Memory : 256MB ( 248M + 8M shared)

When trying the workaround either with c3boot.iso or c3boot-2.iso,
boot up is successful, but when asked for the installation medium
(choosing any of the options yeilds the same result) I get a blue
screen with the following message:

install exited abnormally -- received signal 11
sending termination signals...done
sending kill signals...done
<Tab>/<Alt-Tab> between elements  |  <Space>  seldisabling swap... screen
unmounting filesystems...
/proc/bus/usb done
/proc done
/dev/pts done
/sys done
/tmp/rawfs done
you may safely reboot your system


Any help with this would be greatly appreciated, I've tried 4
different distributions trying in desperation to get any flavour of
Linux up and running on this EPIA board without any success so far.

Cheers,

L

Comment 185 Lee Wilson 2004-11-21 10:05:05 UTC

Les,

Is this with the Graphical or Text install?

What is the last message given prior to that which you have already 
supplied, any mention of Anaconda? 

Also what version of Fedora/RHEL are you using? This bug was 
originally posted again FC2 I think, are you now using FC3?

Comment 186 Les Carter 2004-11-21 16:26:42 UTC

Hi Lee - I'm using an official RH pressed version of FC2 DVD (navy
blue background with white print).  What I described was what happens
when I try to use the graphical install, but strangely enough it
doesn't actually result in giving me a graphical install but instead
shows ncurses style text install :(  What's even worse is that if I
try the text mode install then I don't even get past the boot stage. 
The last thing that I see when I do a text install is:
------------

Greetings.
anaconda installer init version 10.0 starting
mounting /proc filesystem... done
moungint /dev/pts (unix98 pty) filesystem... done
mounting /sys filesystem... done
trying to remount root filesystem read write... done
mounting /tmp as ramfs... done
running install...
running /sbin/loader
install exited abnormally -- received signal 11
sending termination signals... done
sending kill signals...done
disabling swap
unmounting filesystems...
         /proc/bus/usb done
         /proc done
         /dev/pts/done
         /sys done
         tmp/ramfs done
you man safely reboot your system

------------


When I do a graphical mode install as I'd described in my first post
yes I do see the "anaconda installer init version 10.0 starting" just
before it goes into the blue screen text mode installer.  I select the
"English" option when prompted at the first install item "Choose a
Language", then choose "us" at the "Keyboard Type" question then if I
select any of the options "Local CDROM, Hard drive, NFS image, FTP,
HTTP" the installer immediately dies displaying nothing but a blue
screen (white text) with the text that I'd mentioned in my earlier
post but unlike the text in this post that was formated against the
left hand side of the screen nicely, the text is kinda all over the
screen with different levels of tabbing.

Any help with this would be greatly appreciated, so far I've tried RH
FC2, SuSE Professional 9.2, Debian woody, Ubuntu, Minislack and
Mandrake and not been able to actually finish an installation :(

Comment 187 Les Carter 2004-11-21 16:39:47 UTC

For what its worth, I've also downloaded the FC2 CD iso's last night
too.  I was going to put my money on the fact that I wouldn't make a
blind bit of difference seeing as the which ever of the install
mediums I selected it would always result in the "install exited
abnormally" shame spiral.  True to form, this problem that I have
exhibits the same failed behaviour regardless of DVD or CD install
media :(

Comment 188 Les Carter 2004-11-22 23:22:33 UTC

From what I can gather, the problem seems to be that the C3 Ezra CPU
that is on the motherboard doesn't support the "cmov" instruction that
other i686 CPUs do.  Unfortunately when GCC is in i686 mode it uses
the cmov instruction which isn't strictly speaking part of the i686
instruction set.

Apperently there are patches available for both GCC and for the kernel
to get around the problem, but then I'm scared that I'm going to have
to recompile everything in FC, which just sounds insane :(

Comment 189 Les Carter 2004-11-27 23:30:25 UTC

Looks like this was down to a faulty EPIA V10000A motherboard, should
be getting a replacement sent through the mail in the next few days
and will report back when that gets here.

Comment 190 k_paulsen 2005-08-27 00:40:54 UTC

Is that issue (... really ...) solved now ?

... I mean without downgrading to any old kernel ?  My RHEL4 (which has 2.6.9)
has the same problems on EPIA V8000 ... so doesnÂ´t seem fixed, right ?  I found
many sites referencing to that bug number/list here but is there any solution
for new kernels (>= 2.6.8) ???

Comment 191 Arjan van de Ven 2005-08-27 07:13:18 UTC

RHEL4 does not support Via C3 processors.

Comment 192 Arjan van de Ven 2005-08-27 07:14:54 UTC

(at least not the ones without the cmov instruction, which some newer via cpus have)

Comment 193 k_paulsen 2005-08-27 14:35:51 UTC

Ok, just to get it straight ... its not nessesarrily only a kernel problem but
also a problem of the "native target architecture" of the distribution, right ?

I started RHEL4 with a 2.6.8_i586 kernel and it does NOT reboot after some
seconds, it just hangs at "switching to new root" which may be the point where
the first "not i586_kernel stuff" or the mentioned cmov comes, right ?

Can it broken down to "if your distibution supports i586" then it should work
with a C3 processor (and of course with a i586 kernel) ?

Is there anywhere a list or something of C3s that support cmov ??

Comment 194 De Clarke, UCO/Lick Observatory 2005-10-03 21:12:42 UTC

Can anyone advise us on how to get a Via C3 through a diskless RH install?  We
have got one of the teeny cappuccinopc boxes with a Via cpu, trying to boot it
off an install CD image via dhcp/pxe/tftp.  we get as far as uncompressing
vmlinuz and Bang, off into infinite reboot land.

do we need a whole new set of cd images?  do we have to hand craft them or can
we download a "safe" version from rh?  we are not used to dealing with anything
outside the mainstream of intel cpus so this is new territory, we are newbies
and could do with some advice to help reduce time wasted trying stuff that
doesn't work :*(