Description of problem: On boot I get a panic...Last data seen on the system console: EIP is at ia64_leave_kernel [kernel] 0x1 (2.4.21-27.EL) psr : 0000121008022018 ifs : 8000000000000002 ip : [<e00000000440ea21>] Not Tainted unat: 0000000000000000 pfs : 0000000000000002 rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : 00000241a595a6a7 ldrs: 0000000000000000 ccv : 0000000080000000 fpsr: 0009804c0270033f b0 : e00000000440ea20 b6 : e0000000044031d0 b7 : e0000000044151e0 f6 : 1003ea0a0a0a0a0a0a0a1 f7 : 1003e0000000000000000 f8 : 1003e000000000001708d f9 : 1003e0000000000024b61 r1 : e000000004cb7d00 r2 : 0000000000000000 r3 : e00000003b01003c r8 : a000000000008000 r9 : e000000004a08008 r10 : 0000000000000000 r11 : e000000004b78ee0 r12 : e00000003b010650 r13 : e00000003b010000 r14 : 000000000009a8ca r15 : e000000004a080b0 r16 : e000000004b83594 r17 : e00000007feb0000 r18 : e000000004a080b0 r19 : e00000003b012150 r20 : e00000003b011000 r21 : 0000000000001000 r22 : e000000004a08000 r23 : 0000000000000000 r24 : 0000000000000000 r25 : 0000000000000000 r26 : 0000000000004000 r27 : 0000000000000000 r28 : e0000000044151e0 r29 : e000000004994648 r30 : e000000004ac17e0 r31 : e000000004ac17b0 Call Trace: [<e000000004415960>] sp=0xe00000003b0120f8 bsp=0xe00000003b0120e0 show_stack [kernel] 0x80 <0>Kernel panic: Attempted to kill the idle task! in idle task - not syncing Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 111509 [details] We think the kernel thread should always have the IA 64 address limits A note from Gautham Swamy at EMC: This looks like an issue where a 32 bit process (powermt) is trying to execute kernel_thread function while still having IA 32 address limits in force. We think the kernel thread should always have the IA 64 address limits. Modifying the ia64/kernel/process.c code as below seems to fix the issue.
Does this happen on more than one machine?
Yes, We have seen it on multiple Itanium systems: 1) DELL PE7250 - ia64, RHEL 3.0 U4 (lk 2.4.21-27.EL) PowerPath (EMC application) 4.3.2 FC HBA QLA2342 driver version 7.01.01 integrated in kernel 2) Intel white box - ia64, RHEL 3.0 U4 (lk 2.4.21-27.EL), 3) Integrity rx8620 Partition [ia64] Operating System - RHEL 3 U4 AS kernel - 2.4.21-27.0.2 Powerpath version - 4.3.2 HBAs - LP9802 / LP1050 Driver - v7.1.14
Created attachment 111546 [details] Panic output from serial port This attachment is from the HP system showing stack traces after the application that failed
This only happens in U4 RHEL 3.0 U3 worked OK
hmmm, so does that patch suggested in comment #1, completely resolve this issue? The comment says 'seems to fix the issue'...
I have not tried it. I will try to get more information from Gautham Swamy at EMC Meanwhile, does the patch make sense to you, Jason?
We have done unit testing and found the fix to be working for Redhat Update 4 kernels. Can we submit this patch to the general community?
I'm not sure the community will accept this patch. kernel code is assumed to be 64-bit for ia64, and thus code like that in comment #1 is not present. Is it possible to re-compile the module for 64-bit?
Response from Swamy Gautham: Mar 11 2005 6:18AM Gautham Swamy: These code changes are present in 2.4.X kernel where X > 21 and 2.6 kernel. The issue here is not of having to re-compile any of our modules for 64-bit. Our kernel modules/driver are 64-bit. Whereas our user-space components/libs are 32-bit, which is Ok since 32-bit applications can be legally run on the IA64 kernel. request_module() is being invoked by the 32-bit application (powermt), And request_module being a system-call, so 32-bit application is calling into the 64-bit kernel, the current (task struct limits) corresponding to the 32-bit application process (and thus having 32- bit limits) may have to be changed to 64-bit limits, especially if you are spawning off kernel_threads (as we seem to be doing from the request_module -- ie spawn a kernel-thread for modprobe viz going to be a 64-bit kernel thread anyhow). Since this kernel-thread is directly/indirectly spawned and that it is going to be 64-bit, the address limits need to be changed appropriately, if current is a 32-bit process under execution. Thus this change......needed +#ifdef CONFIG_IA32_SUPPORT + if (IS_IA32_PROCESS(ia64_task_regs(current))) { + /* A kernel thread is always a 64-bit process. */ + current->thread.map_base = DEFAULT_MAP_BASE; + current->thread.task_size = DEFAULT_TASK_SIZE; + ia64_set_kr(IA64_KR_IO_BASE, current- >thread.old_iob); + ia64_set_kr(IA64_KR_TSSD, current- >thread.old_k1); + } +#endif
ok, i didn't think of this case, where userspace can cause a kernel thread to be spawned, such as in the modprobe case. thanks for clarifying this. looks like a reasonable request. I'll have to look at the code a little close though, b/c i don't see how the limits, are turned back unless this code is already running in the context of the newly created thread. Would it be possible to produce a 'diff' with line numbers, so i can see the context more clearly. thanks.
Created attachment 111894 [details] bk commit to force 64-bit threads
I will try this patch today or Monday.
Any testing results?
Sorry for the delay. I am having an issue with my system (I was using it for other purposes) I need to reload the OS and other Applications I had on it today. Hope to give you some info the end of today.
Created attachment 112064 [details] Last part of the output from make vmlinux I cannot rebuild the kernel with just the above change to process.c
I tried it again by recreating the source directory and now the build works. I will let you know if the patch works soon.
After applying the BitKeeper patch supplied above and rebuilding the kernel I was able to do the following commands without an OOPS: [root@l82bi237 PowerPath]# service PowerPath start Starting PowerPath: [root@l82bi237 PowerPath]# powermt display Symmetrix logical device count=0 CLARiiON logical device count=176 ============================================================================== ----- Host Bus Adapters --------- ------ I/O Paths ----- ------ Stats ------ ### HW Path Summary Total Dead IO/Sec Q-IOs Errors ============================================================================== 2 QLogic Fibre Channel 2300 optimal 52 0 - 0 0 3 QLogic Fibre Channel 2300 optimal 52 0 - 0 0 4 iSCSI 3.6.1 (22-Sep-2004) optimal 150 0 - 0 0 [root@l82bi237 PowerPath]# service PowerPath stop Stopping PowerPath: [root@l82bi237 PowerPath]# service PowerPath stop Stopping PowerPath: [root@l82bi237 PowerPath]# service PowerPath stop Stopping PowerPath: [root@l82bi237 PowerPath]# ______________________________________________ The system is still functional after the above steps so the patch works GREAT! What is the next step? Do we get an RPM repleased to fix this problem? We do not expect our customers to rebuild the kernel.
U5 closed a few weeks ago (and is already in beta). It is likely that a fix for this problem will be incorporated into U6 (which hasn't yet opened). I will updated this BZ when a fix has been committed.
Yhis will be a problem with us because it means that our PowerPath software product will not be supported in IA64 systems until August? Is it possible to add this Minor change to the Kernel module process.c as a hotfix in RPM? Also, as a submitter of this problem, Why am I not allowed to change the priority of this issue to High This causes an OOPS!
Hello, Lucio. I haven't seen any U6 schedules, so I don't know the answer to your first question. As far as hot fix kernels go, yes, this is a possibility (but please work this through Customer Support after a U6-stream development kernel has been built with an appropriate fix). As for your last question, I think it's the "severity" field that is supposed to represent the impact from a bug, and that's already set to "high".
Ernie, Thanks four your clarifications. We will pursue this through customer support.
Lucio, Can you exec into 64bit prior to the call? This should work around the problem. Also, this have been broken prior to U4.
I do not understand the workaround. Can you be more specific? It seems to me you are asking to change the application. There are times where we do not have control of this. -Lu
I think Rob was suggesting the possibility 'exec'ing a 64-bit modprobe process from the application, and thus working around this problem for the time being. But as you point out, depending on how the code is structured it may be difficult.
*** Bug 155244 has been marked as a duplicate of this bug. ***
(In reply to comment #30) > *** Bug 155244 has been marked as a duplicate of this bug. *** I am trying to modify file /usr/src/linux-2.4.21-27.EL//arch/ia64/kernel/process.c. I see CONFIG_IA32_SUPPORT at 3 places in the file, these are 1) 221 #ifdef CONFIG_IA32_SUPPORT 222 if (IS_IA32_PROCESS(ia64_task_regs(task))) 223 ia32_load_state(task); 224 #endif 2) 353 #ifdef CONFIG_IA32_SUPPORT 354 /* 355 * If we're cloning an IA32 task then save the IA32 extra 356 * state from the current task to the new task 357 */ 358 if (IS_IA32_PROCESS(ia64_task_regs(current))) { 359 ia32_save_state(p); 360 if (clone_flags & CLONE_SETTLS) 361 ia32_clone_tls(p, child_ptregs); 362 } 363 #endif 3) 196 #ifdef CONFIG_IA32_SUPPORT 197 if (IS_IA32_PROCESS(ia64_task_regs(task))) 198 ia32_save_state(task); 199 #endif I am going to put following code as mentioned in your resolution 1) #ifdef CONFIG_IA32_SUPPORT if (IS_IA32_PROCESS(ia64_task_regs(task))) task->thread.map_base = DEFAULT_MAP_BASE; task->thread.task_size = DEFAULT_TASK_SIZE; ia64_set_kr(IA64_KR_IO_BASE, task->thread.old_iob); ia64_set_kr(IA64_KR_TSSD, task->thread.old_k1); ia32_load_state(task); #endif 2) 353 #ifdef CONFIG_IA32_SUPPORT 354 /* 355 * If we're cloning an IA32 task then save the IA32 extra 356 * state from the current task to the new task 357 */ 358 if (IS_IA32_PROCESS(ia64_task_regs(current))) { task->thread.map_base = DEFAULT_MAP_BASE; + task->thread.task_size = DEFAULT_TASK_SIZE; + ia64_set_kr(IA64_KR_IO_BASE, task->thread.old_iob); + ia64_set_kr(IA64_KR_TSSD, task->thread.old_k1); 359 ia32_save_state(p); 360 if (clone_flags & CLONE_SETTLS) 361 ia32_clone_tls(p, child_ptregs); 362 } 363 #endif 3) 196 #ifdef CONFIG_IA32_SUPPORT 197 if (IS_IA32_PROCESS(ia64_task_regs(task))) task->thread.map_base = DEFAULT_MAP_BASE; + task->thread.task_size = DEFAULT_TASK_SIZE; + ia64_set_kr(IA64_KR_IO_BASE, task->thread.old_iob); + ia64_set_kr(IA64_KR_TSSD, task->thread.old_k1); 198 ia32_save_state(task); 199 #endif Please let me know if am doing the right thing. This is with reference to the bug 155244
Created attachment 113480 [details] rhel4 based patch against rhel4
if anybody has a test case to reproduce this issue, that would be helpful.
The test case I have involves EMC PowerPath but I believe the problem is due to the following: 1) A 32-bit application program makes a call to the kernel and the kernel decides to switch tasks possibly because the current task needs to pend (I do not know why) 2) the process.c path does not realize that a 32-bit task came into the kernel so the Addressing is still at 32-bit range but the switched kernel task has a 64- bit range. The result is an acces to a 32-bit address in 64-bit space. The result is OOPS Does this make sense?
yup. This is what's happening. I developed my own simple reproducer for this issue. And the patch did indeed resolve this issue. thanks.
This is an update of 155244 which has been made duplicate of this one. The fix seems to be working for me to a certain extend. I am not getting kernel panic errors. But there are some other issues which I am facing. One of the observations which I saw ldd command which is supposed to give the all the shared objects linked to a particular executable is not working fine. e.g ldd ./rpcd returns "not a dynamic executable". I tried on several other executables also it gives me same error. I did some more investigation and loaded 2.4.21-27 without the fix. I am getting the same error. I guess there is something else also missing in the kernel.
Are you sure that the 'rpcd' not being a dynmaic executable is a problem? The 'not a dynamic executable' is not necessarily an error. It simply means that the executable does not depend on external dynmaically linked librarires. Is there an error condition besides the 'not a dynamic excutable' messages that is occurring?
I guess I have figured out why ldd is not working for me. We are running our application in emulated mode, a 32 bit on 64bit platforms. We pick some of the libraries from /emul/ia32-Linux/lib. ldd is a script which by default is not checking in this directory a small change in ldd would make this happen. The change is listed below 30 should look like line number RTLDLIST="/emul/ia32-linux/lib/ld-linux.so.2 /lib/ld-linux-ia64.so.2 /lib/ld-linux.so.2" instead of RTLDLIST="/lib/ld-linux-ia64.so.2 /lib/ld-linux.so.2" I guess this fix should be added in the next release.
Jason, has this fix made it into U6 and can you tell me when it will make it into RHEL 4.0? Thank you, Wayne.
This fix is indeed queued for U6. If its neede sooner, you can go through support. The fix was included in the RHEL4 GA kernel.
(In reply to comment #31) Has this been tested? Does it correct spawning problems? I tried to use the code and for 32-bit forks I get a segmentation violation with the changes in. Without the changes I can have a 32-bit process fork a 32-bit process. But when the second one forks a third, the third's thread processing fails.
A fix for this problem has just been committed to the RHEL3 U6 patch pool this evening (in kernel version 2.4.21-32.3.EL).
*** Bug 164226 has been marked as a duplicate of this bug. ***
Is this fix being considered for porting and inclusion in RHEL 4.0?
this is not a rhel4 issue. We already have the correct code there.
Okay - thanks.
I have tested RHEL 3.0 U6 beta (lk 2.4.21-35) with EMC PowerPath 4.3.3b10 and no longer see the panic. Tracking to release of RHEL 3.0 U6. Regards, Wayne.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html