Kernel does not handle FP trap properly on EV4 architecture. PC is not incremented at the time of the exception, causing kernel to look at the wrong instruction over and over again! --- linux/arch/alpha/kernel/traps.c.old Wed Jun 7 16:26:42 2000 +++ linux/arch/alpha/kernel/traps.c Thu Aug 31 22:37:32 2000 @@ -417,8 +417,12 @@ /* EV4 does not implement anything except normal rounding. Everything else will come here as an illegal instruction. Emulate them. */ - if (alpha_fp_emul(regs.pc - 4)) - return; + if (alpha_fp_emul(regs.pc)) + { + /* Increment the PC on EV4 */ + regs.pc += 4; + return; + } } send_sig(SIGILL, current, 1); break;
Created attachment 3159 [details] EV4 FPE Patch
EV4 and EV5+ architectures handle floating point traps differently: On EV5 PC is incremented before the trap is executed. On EV4, PC is not incremented before the trap is executed. So on EV5 PC is pointing to the instruction AFTER the one needing emulation. On EV4, PC is pointing TO the instruction needing emulation. The bug here is that the EV4 code is assuming the conditions of the EV5 architecture, and so is attempting to emulate the instruction PREVIOUS to the one needing emulation. Also, the code does not increment the PC, and so returns to the same FP instruction again, rather than the next instruction to be executed, again causing an exception, emulating the wrong instruction, ad infinitum.
I see there is some activity on this bug. I have just recently seen indications that this bug only occurs when the UDB is booted from the SRM console, but *not* when it is booted from the ARC console. Which I find bizarre and inexplicable. I should be able to do some testing tonight to verify whether this is really the case...
What SRM version? This is not a generic EV4 problem -- it does not happen on Avanti or Cabriolet for instance. My only guess is a PALcode bug, which would explain why MILO works and SRM doesn't. As such, the patch would break existing working systems. Perhaps we can come up with a patch to detect and work around the breakage though.
The problem is showing up on the Multia (VX40) with 166Mhz CPU, 80MB Memory, 330MB SCSI Drive. The SRM version: BL5 V3.8-3, built Aug 10 1995 at 03:22:55
Resolved in later base kernels dependant on SRM and other Alpha magic but resolved