We have some hardware compilers provided by third parties that started failing since updating the kernel from kernel-smp-2.6.11-1.27-FC3.i686.rpm to kernel-smp-2.6.11-1.35-FC3.i686.rpm. These problems occur everytime when runing 2.6.11-1.35 with default values. If, when running 2.6.11-1.35, we set /proc/sys/kernel/exec-shield to 0 or 1 (as was the default in 2.6.11-1.27), these problems do not occur. Booting the machine with 2.6.11-1.27 also fixes these problems, regardless of the value in /proc/sys/kernel/exec-shield. We don't have sources for these applications. If it helps, they use j2re v1.4.1. Sample output: With kernel-smp-2.6.11-1.35_FC3.i686.rpm: $ fplc -o route.fpo -s route.fps ../../np/microcode/route.fpl /sw/tools/agere-3.3.0.78/fplc: line 10: 31226 Segmentation fault "/usr/local/j2re-1.4.1/bin/java" -Dagere.log_level="$LOG_LEVEL" -Dinstall.root="/sw/tools/agere-3.3.0.78" -classpath "/sw/tools/agere-3.3.0.78/lib:/sw/tools/agere-3.3.0.78/lib/agere.jar:/sw/tools/agere-3.3.0.78/lang:$CLASSPATH:/tmp" -Xms"$MINHEAPSIZE" $MAXHEAPSIZE com.agere.fplc.fplc "$@" $ With kernel-smp-2.6.11-1.27_FC3.i686.rpm: $ fplc -o route.fpo -s route.fps ../../np/microcode/route.fpl [Info] Agere FPL Compiler 3.3.0.78(Thu Jul 1 15:10:56 CDT 2004) swbuilds [Info] Preprocessing Stage [Info] Compiling ../../np/microcode/route.fpl for processor -- APP500 [Info] Compilation Stage [Info] Output Generation Stage [Info] Compilation Done $ Additional info: $ uname -a Linux XXXXX 2.6.11-1.35_FC3smp #1 SMP Mon Jun 13 01:17:35 EDT 2005 i686 i686 i386 GNU/Linux $ $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3201.365 cache size : 1024 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor ds_cpl cid cx16 xtpr bogomips : 6340.60 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3201.365 cache size : 1024 KB physical id : 0 siblings : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm pni monitor ds_cpl cid cx16 xtpr bogomips : 6389.76 $ Defaults for kernel-smp-2.6.11-1.35_FC3.i686.rpm: $ grep '' /proc/sys/kernel/exec* /proc/sys/kernel/exec-shield:2 /proc/sys/kernel/exec-shield-randomize:1 $ Defaults for kernel-smp-2.6.11-1.27_FC3.i686.rpm: $ grep '' /proc/sys/kernel/exec* /proc/sys/kernel/exec-shield:1 /proc/sys/kernel/exec-shield-randomize:1 $
I have similar exec-shield problem with library application, which I don't have source for. It stuck consuming all CPU power, and what's worse stopped my MTA (sendmail[3872]: rejecting connections on daemon MTA: load average: 27). Interesting part of strace: open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 4 fstat64(4, {st_mode=S_IFREG|0644, st_size=39544576, ...}) = 0 mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 4, 0) = 0xb7d60000 mmap2(NULL, 184320, PROT_READ, MAP_PRIVATE, 4, 0xb78) = 0xb7d33000 mmap2(NULL, 28672, PROT_READ, MAP_PRIVATE, 4, 0xbc2) = 0xb7d2c000 close(4) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- sigreturn() = ? (mask now [RTMIN]) --- SIGSEGV (Segmentation fault) @ 0 (0) --- sigreturn() = ? (mask now [RTMIN]) --- SIGSEGV (Segmentation fault) @ 0 (0) --- sigreturn() = ? (mask now [RTMIN]) --- SIGSEGV (Segmentation fault) @ 0 (0) --- sigreturn() = ? (mask now [RTMIN]) --- SIGSEGV (Segmentation fault) @ 0 (0) --- sigreturn() = ? (mask now [RTMIN]) --- SIGSEGV (Segmentation fault) @ 0 (0) --- sigreturn() .... See also: bug #162182 and bug #162329
FWIW, the 2.6.12 based kernel in updates-testing resets the default exec-shield setting to '1' again. Hopefully that update will be going live soon.
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
I tried kernel 2.6.12-1.1372, but it fails to boot. See BugID # 163917 for the failure details. We are not planning to update into FC4 for the time being.
Ok, can you test that setting /proc/sys/kernel/exec-shield to 1 makes things work again ?
Yes, that makes things work fine, as I had mentioned in the original bug report. Setting it to 0 also works fine.
ok, just go with that until we get an mkinitrd update pushed out, which should resolve 163917, allowing you to update to the fixed kernel.
Now that the mkinitrd problem was pushed out, I tried kernel 2.6.12-1.1372 but it fails a little differently than 2.6.11-1.35. With 2.6.11-1.27, everything works perfectly. With 2.6.11-1.35, each invocation of the compiler would fail. With 2.6.12-1.1372, a few invocations of the compiler fail when doing multiple simultaneous compiles. It seems that setting kernel.randomize_va_space=0 helps. How about if the exec-shield/randomization patch is removed or disabled? It is obviously not ready for prime time.
that sounds like a bug with the compiler. Please file a seperate bug. The randomisation code is now upstream btw.
Sorry, but I don't understand what you want me to do. File a separate bug for what? Using the older kernel the app works flawlessly. Using the newer kernel the app fails. Turning off exec-shield and the randomization makes the app work again. This is not a problem with the application but with the kernel.
There should be no reason for an application to fail due to randomisation. If it goes away when you disable it, the application is faulty.
And why would that be? The fact that randomization of the user space introduces a problem means that the problem lies with how the randomization is done or what it does, not with the app that fails. The proof is in the fact that the app works with previous kernels but doesn't work with newer kernels. The problem is the kernel, not the app.