Bug 149965 - panic at ia64_leave_kernel [kernel] 0x1 (2.4.21-27.EL)
panic at ia64_leave_kernel [kernel] 0x1 (2.4.21-27.EL)
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
ia64 Linux
medium Severity high
: ---
: ---
Assigned To: Jason Baron
Brian Brock
:
Depends On:
Blocks: 156321
  Show dependency treegraph
 
Reported: 2005-02-28 19:24 EST by Lucio DiGiovanni
Modified: 2013-03-06 00:58 EST (History)
13 users (show)

See Also:
Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-09-28 10:49:01 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
We think the kernel thread should always have the IA 64 address limits (890 bytes, text/plain)
2005-02-28 19:33 EST, Lucio DiGiovanni
no flags Details
Panic output from serial port (159.98 KB, text/plain)
2005-03-01 14:50 EST, Lucio DiGiovanni
no flags Details
bk commit to force 64-bit threads (1.23 KB, text/plain)
2005-03-11 11:59 EST, Jason Baron
no flags Details
Last part of the output from make vmlinux (2.94 KB, text/plain)
2005-03-16 19:44 EST, Lucio DiGiovanni
no flags Details
rhel4 based (626 bytes, patch)
2005-04-21 12:51 EDT, Jason Baron
no flags Details | Diff

  None (edit)
Description Lucio DiGiovanni 2005-02-28 19:24:42 EST
Description of problem:
On boot I get a panic...Last data seen on the system console:

EIP is at ia64_leave_kernel [kernel] 0x1 (2.4.21-27.EL)
psr : 0000121008022018 ifs :  8000000000000002 ip : 
[<e00000000440ea21>] Not Tainted
unat: 0000000000000000 pfs : 0000000000000002 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 00000241a595a6a7
ldrs: 0000000000000000 ccv : 0000000080000000 fpsr: 0009804c0270033f 
b0  : e00000000440ea20 b6  : e0000000044031d0 b7  : e0000000044151e0
f6  : 1003ea0a0a0a0a0a0a0a1 f7  : 1003e0000000000000000
f8  : 1003e000000000001708d f9  : 1003e0000000000024b61
r1  : e000000004cb7d00 r2  : 0000000000000000 r3  : e00000003b01003c
r8  : a000000000008000 r9  : e000000004a08008 r10 : 0000000000000000
r11 : e000000004b78ee0 r12 : e00000003b010650 r13 : e00000003b010000
r14 : 000000000009a8ca r15 : e000000004a080b0 r16 : e000000004b83594
r17 : e00000007feb0000 r18 : e000000004a080b0 r19 : e00000003b012150
r20 : e00000003b011000 r21 : 0000000000001000 r22 : e000000004a08000
r23 : 0000000000000000 r24 : 0000000000000000 r25 : 0000000000000000
r26 : 0000000000004000 r27 : 0000000000000000 r28 : e0000000044151e0
r29 : e000000004994648 r30 : e000000004ac17e0 r31 : e000000004ac17b0

Call Trace: [<e000000004415960>] sp=0xe00000003b0120f8 
bsp=0xe00000003b0120e0 show_stack [kernel] 0x80
<0>Kernel panic: Attempted to kill the idle task!
in idle task - not syncing




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Lucio DiGiovanni 2005-02-28 19:33:51 EST
Created attachment 111509 [details]
We think the kernel thread should always have the IA 64 address limits

A note from Gautham Swamy at EMC:
This looks like an issue where a 32 bit process (powermt) is trying to
 execute kernel_thread function while still having IA 32 address limits in
force.
 We think the kernel thread should always have the IA 64 address limits.
 Modifying the ia64/kernel/process.c code as below seems to fix the issue.
Comment 2 Suzanne Hillman 2005-03-01 13:07:49 EST
Does this happen on more than one machine?
Comment 3 Lucio DiGiovanni 2005-03-01 14:42:58 EST
Yes,
We have seen it on multiple Itanium systems:
1) DELL PE7250 - ia64, RHEL 3.0 U4 (lk 2.4.21-27.EL)
   PowerPath (EMC application) 4.3.2
   FC HBA QLA2342 driver version 7.01.01 integrated in kernel

2) Intel white box - ia64, RHEL 3.0 U4 (lk 2.4.21-27.EL), 

3) Integrity rx8620 Partition [ia64]
Operating System - RHEL 3 U4 AS
kernel - 2.4.21-27.0.2
Powerpath version - 4.3.2 

HBAs - LP9802 / LP1050
Driver - v7.1.14
Comment 4 Lucio DiGiovanni 2005-03-01 14:50:03 EST
Created attachment 111546 [details]
Panic output from serial port

This attachment is from the HP system showing stack traces
after the application that failed
Comment 6 Lucio DiGiovanni 2005-03-09 09:49:53 EST
This only happens in U4
RHEL 3.0 U3 worked OK
Comment 7 Jason Baron 2005-03-09 10:00:26 EST
hmmm, so does that patch suggested in comment #1, completely resolve this issue?
The comment says 'seems to fix the issue'...

Comment 8 Lucio DiGiovanni 2005-03-09 11:14:42 EST
I have not tried it.
I will try to get more information from Gautham Swamy at EMC

Meanwhile, does the patch make sense to you, Jason?
Comment 9 Lucio DiGiovanni 2005-03-09 12:28:13 EST
We have done unit testing and found the fix to be working for Redhat Update 4 
kernels. 

Can we submit this patch to the general community?
Comment 10 Jason Baron 2005-03-09 14:07:13 EST
I'm not sure the community will accept this patch. kernel code is assumed to be
64-bit for ia64, and thus code like that in comment #1 is not present. Is it
possible to re-compile the module for 64-bit?
Comment 13 Lucio DiGiovanni 2005-03-11 07:56:29 EST
Response from Swamy Gautham:


Mar 11 2005  6:18AM Gautham Swamy:
These code changes are present in 2.4.X kernel where X > 21 and 2.6
kernel.

The issue here is not of having to re-compile any of our modules for
64-bit. Our kernel modules/driver are 64-bit. Whereas our user-space
components/libs are 32-bit, which is Ok since 32-bit applications
can be legally run on the IA64 kernel.

request_module() is being invoked by the 32-bit application (powermt),
And request_module being a system-call, so 32-bit application is
calling into the 64-bit kernel, the current (task struct limits)
corresponding to the 32-bit application process (and thus having 32-
bit
limits) may have to be changed to 64-bit limits, especially if you are
spawning off kernel_threads (as we seem to be doing from the
request_module -- ie spawn a kernel-thread for modprobe viz going to 
be a
64-bit
kernel thread anyhow).

Since this kernel-thread is directly/indirectly spawned and that it is
going to be 64-bit, the address limits need to be changed 
appropriately,
if current is a 32-bit process under execution.

Thus this change......needed

+#ifdef CONFIG_IA32_SUPPORT
+		if (IS_IA32_PROCESS(ia64_task_regs(current))) {
+			/* A kernel thread is always a 64-bit 
process. */
+			current->thread.map_base  = DEFAULT_MAP_BASE;
+			current->thread.task_size = DEFAULT_TASK_SIZE;
+			ia64_set_kr(IA64_KR_IO_BASE, current-
>thread.old_iob);
+			ia64_set_kr(IA64_KR_TSSD, current-
>thread.old_k1);
+		}
+#endif
Comment 14 Jason Baron 2005-03-11 08:46:21 EST
ok, i didn't think of this case, where userspace can cause a kernel thread to be
spawned, such as in the modprobe case. thanks for clarifying this. looks like a
reasonable request. I'll have to look at the code a little close though, b/c i
don't see how the limits, are turned back unless this code is already running in
the context of the newly created thread. Would it be possible to produce a
'diff' with line numbers, so i can see the context more clearly. thanks.
Comment 15 Jason Baron 2005-03-11 11:59:21 EST
Created attachment 111894 [details]
bk commit to force 64-bit threads
Comment 16 Lucio DiGiovanni 2005-03-11 13:31:57 EST
I will try this patch today or Monday.
Comment 17 Jason Baron 2005-03-15 16:51:44 EST
Any testing results?
Comment 18 Lucio DiGiovanni 2005-03-16 13:35:39 EST
Sorry for the delay.
I am having an issue with my system (I was using it for other purposes)
I need to reload the OS and other Applications I had on it today.
Hope to give you some info the end of today.
Comment 19 Lucio DiGiovanni 2005-03-16 19:44:46 EST
Created attachment 112064 [details]
Last part of the output from make vmlinux

I cannot rebuild the kernel with just the above change to process.c
Comment 20 Lucio DiGiovanni 2005-03-17 10:40:04 EST
I tried it again by recreating the source directory and now the build works.

I will let you know if the patch works soon.
Comment 21 Lucio DiGiovanni 2005-03-17 17:55:04 EST
After applying the BitKeeper patch supplied above and rebuilding the kernel
I was able to do the following commands without an OOPS:
[root@l82bi237 PowerPath]# service PowerPath start
Starting PowerPath:
[root@l82bi237 PowerPath]# powermt display
Symmetrix logical device count=0
CLARiiON logical device count=176
==============================================================================
----- Host Bus Adapters ---------  ------ I/O Paths -----  ------ Stats ------
### HW Path                        Summary   Total   Dead  IO/Sec Q-IOs Errors
==============================================================================
  2 QLogic Fibre Channel 2300      optimal      52      0       -     0      0
  3 QLogic Fibre Channel 2300      optimal      52      0       -     0      0
  4 iSCSI 3.6.1 (22-Sep-2004)      optimal     150      0       -     0      0


[root@l82bi237 PowerPath]# service PowerPath stop
Stopping PowerPath:
[root@l82bi237 PowerPath]# service PowerPath stop
Stopping PowerPath:
[root@l82bi237 PowerPath]# service PowerPath stop
Stopping PowerPath:
[root@l82bi237 PowerPath]#
______________________________________________

The system is still functional after the above steps so the patch works GREAT!

What is the next step?
Do we get an RPM repleased to fix this problem?
We do not expect our customers to rebuild the kernel.
Comment 22 Ernie Petrides 2005-03-17 18:07:42 EST
U5 closed a few weeks ago (and is already in beta).  It is likely that
a fix for this problem will be incorporated into U6 (which hasn't yet
opened).  I will updated this BZ when a fix has been committed.
Comment 23 Lucio DiGiovanni 2005-03-18 14:24:32 EST
Yhis will be a problem with us because it means that our PowerPath software 
product will not be supported in IA64 systems until August?

Is it possible to add this Minor change to the Kernel module process.c as a 
hotfix in RPM?

Also, as a submitter of this problem, 
Why am I not allowed to change the priority of this issue to High
This causes an OOPS!
Comment 24 Ernie Petrides 2005-03-18 18:21:38 EST
Hello, Lucio.  I haven't seen any U6 schedules, so I don't know the answer
to your first question.  As far as hot fix kernels go, yes, this is a
possibility (but please work this through Customer Support after a U6-stream
development kernel has been built with an appropriate fix).  As for your last
question, I think it's the "severity" field that is supposed to represent the
impact from a bug, and that's already set to "high".
Comment 25 Lucio DiGiovanni 2005-03-20 15:53:59 EST
Ernie,
Thanks four your clarifications.
We will pursue this through customer support.
Comment 26 Rob Kenna 2005-03-28 11:00:57 EST
Lucio,

Can you exec into 64bit prior to the call?  This should work around the problem.
 Also, this have been broken prior to U4.
Comment 27 Lucio DiGiovanni 2005-03-28 11:05:26 EST
I do not understand the workaround.
Can you be more specific?

It seems to me you are asking to change the application.
There are times where we do not have control of this.

-Lu
Comment 29 Jason Baron 2005-03-30 13:26:22 EST
I think Rob was suggesting the possibility 'exec'ing a 64-bit modprobe process
from the application, and thus working around this problem for the time being.
But as you point out, depending on how the code is structured it may be difficult.
Comment 30 Jason Baron 2005-04-20 18:35:02 EDT
*** Bug 155244 has been marked as a duplicate of this bug. ***
Comment 31 Vaibhav Khanduja 2005-04-21 08:36:57 EDT
(In reply to comment #30)
> *** Bug 155244 has been marked as a duplicate of this bug. ***

I am trying to modify file
/usr/src/linux-2.4.21-27.EL//arch/ia64/kernel/process.c. I see 
CONFIG_IA32_SUPPORT at 3 places in the file, these are
1)  221 #ifdef CONFIG_IA32_SUPPORT
    222         if (IS_IA32_PROCESS(ia64_task_regs(task)))
    223                 ia32_load_state(task);
    224 #endif

2)  353 #ifdef CONFIG_IA32_SUPPORT
    354         /*
    355          * If we're cloning an IA32 task then save the IA32 extra
    356          * state from the current task to the new task
    357          */
    358         if (IS_IA32_PROCESS(ia64_task_regs(current))) {
    359                 ia32_save_state(p);
    360                 if (clone_flags & CLONE_SETTLS)
    361                         ia32_clone_tls(p, child_ptregs);
    362         }
    363 #endif

3)  196 #ifdef CONFIG_IA32_SUPPORT
    197         if (IS_IA32_PROCESS(ia64_task_regs(task)))
    198                 ia32_save_state(task);
    199 #endif

I am going to put following code as mentioned in your resolution

1) #ifdef CONFIG_IA32_SUPPORT
             if (IS_IA32_PROCESS(ia64_task_regs(task)))
                    task->thread.map_base  = DEFAULT_MAP_BASE;
		    task->thread.task_size = DEFAULT_TASK_SIZE;
			ia64_set_kr(IA64_KR_IO_BASE, task->thread.old_iob);
			ia64_set_kr(IA64_KR_TSSD, task->thread.old_k1);
                    ia32_load_state(task);
     #endif

2) 353 #ifdef CONFIG_IA32_SUPPORT
    354         /*
    355          * If we're cloning an IA32 task then save the IA32 extra
    356          * state from the current task to the new task
    357          */
    358         if (IS_IA32_PROCESS(ia64_task_regs(current))) {
                      task->thread.map_base  = DEFAULT_MAP_BASE;
+			task->thread.task_size = DEFAULT_TASK_SIZE;
+			ia64_set_kr(IA64_KR_IO_BASE, task->thread.old_iob);
+			ia64_set_kr(IA64_KR_TSSD, task->thread.old_k1);
    359                 ia32_save_state(p);
    360                 if (clone_flags & CLONE_SETTLS)
    361                         ia32_clone_tls(p, child_ptregs);
    362         }
    363 #endif


3)  196 #ifdef CONFIG_IA32_SUPPORT
    197         if (IS_IA32_PROCESS(ia64_task_regs(task)))
                      task->thread.map_base  = DEFAULT_MAP_BASE;
+		      task->thread.task_size = DEFAULT_TASK_SIZE;
+		      ia64_set_kr(IA64_KR_IO_BASE, task->thread.old_iob);
+			ia64_set_kr(IA64_KR_TSSD, task->thread.old_k1);
    198                 ia32_save_state(task);
    199 #endif

Please let me know if am doing the right thing. This is with reference to the
bug 155244
Comment 32 Jason Baron 2005-04-21 12:51:51 EDT
Created attachment 113480 [details]
rhel4 based

patch against rhel4
Comment 33 Jason Baron 2005-04-21 12:53:21 EDT
if anybody has a test case to reproduce this issue, that would be helpful.
Comment 34 Lucio DiGiovanni 2005-04-22 09:26:36 EDT
The test case I have involves EMC PowerPath but I believe the problem is due to
the following:
1) A 32-bit application program makes a call to the kernel and the kernel 
decides to switch tasks possibly because the current task needs to pend (I do 
not know why)
2) the process.c path does not realize that a 32-bit task came into the kernel 
so
the Addressing is still at 32-bit range but the switched kernel task has a 64-
bit range. The result is an acces to a 32-bit address in 64-bit space.
The result is OOPS

Does this make sense?
Comment 35 Jason Baron 2005-04-26 14:30:29 EDT
yup. This is what's happening. I developed my own simple reproducer for this
issue. And the patch did indeed resolve this issue. thanks.
Comment 36 Vaibhav Khanduja 2005-04-27 02:08:59 EDT
This is an update of 155244 which has been made duplicate of this one. The fix
seems to be working for me to a certain extend. I am not getting kernel panic
errors. But there are some other issues which I am facing. One of the
observations which I saw ldd command which is supposed to give the all the
shared objects linked to a particular executable is not working fine. e.g ldd
./rpcd returns  "not a dynamic executable". I tried on several other executables
also it gives me same error.

I did some more investigation and loaded  2.4.21-27 without the fix. I am
getting the same error. I guess there is something else also missing in the kernel. 
Comment 37 Jason Baron 2005-04-27 09:26:14 EDT
Are you sure that the 'rpcd' not being a dynmaic executable is a problem? The
'not a dynamic executable' is not necessarily an error. It simply means that the
executable does not depend on external dynmaically linked librarires. Is there
an error condition besides the 'not a dynamic excutable' messages that is occurring?
Comment 38 Vaibhav Khanduja 2005-04-28 10:51:30 EDT
I guess I have figured out why ldd is not working for me. We are running our
application in emulated mode, a 32 bit on 64bit platforms. We pick some of the
libraries from /emul/ia32-Linux/lib. ldd is a script which by default is not
checking in this directory a small change in ldd would make this happen. The
change is listed below 30 should look like 
line number 
RTLDLIST="/emul/ia32-linux/lib/ld-linux.so.2 /lib/ld-linux-ia64.so.2
/lib/ld-linux.so.2"

instead of 
RTLDLIST="/lib/ld-linux-ia64.so.2 /lib/ld-linux.so.2"

I guess this fix should be added in the next release.
Comment 41 Wayne Berthiaume 2005-04-29 11:59:58 EDT
Jason, has this fix made it into U6 and can you tell me when it will make it 
into RHEL 4.0?
Thank you,
Wayne.
Comment 42 Jason Baron 2005-04-29 12:17:27 EDT
This fix is indeed queued for U6. If its neede sooner, you can go through support. 

The fix was included in the RHEL4 GA kernel.
Comment 43 Jeff Lee (EMC) 2005-05-02 13:44:30 EDT
(In reply to comment #31)

Has this been tested? Does it correct spawning problems? I tried to use the 
code and for 32-bit forks I get a segmentation violation with the changes in. 
Without the changes I can have a 32-bit process fork a 32-bit process. But when 
the second one forks a third, the third's thread processing fails.
Comment 44 Ernie Petrides 2005-05-04 20:38:09 EDT
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.3.EL).
Comment 51 Ernie Petrides 2005-07-21 22:16:56 EDT
*** Bug 155244 has been marked as a duplicate of this bug. ***
Comment 52 Ernie Petrides 2005-07-26 15:08:19 EDT
*** Bug 164226 has been marked as a duplicate of this bug. ***
Comment 53 Heather Conway 2005-07-27 06:49:09 EDT
Is this fix being considered for porting and inclusion in RHEL 4.0?
Comment 54 Jason Baron 2005-07-27 07:58:53 EDT
this is not a rhel4 issue. We already have the correct code there.
Comment 56 Heather Conway 2005-07-28 12:39:57 EDT
Okay - thanks.
Comment 57 Wayne Berthiaume 2005-09-09 15:03:14 EDT
I have tested RHEL 3.0 U6 beta (lk 2.4.21-35) with EMC PowerPath 4.3.3b10 and 
no longer see the panic. Tracking to release of RHEL 3.0 U6.
Regards,
Wayne.
Comment 59 Red Hat Bugzilla 2005-09-28 10:49:01 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html

Note You need to log in before you can comment on or make changes to this bug.