113555 – kernel 2.6.1-1.126 panic when executing tree /proc

Bug 113555 - kernel 2.6.1-1.126 panic when executing tree /proc

Summary: kernel 2.6.1-1.126 panic when executing tree /proc

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	113814 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-01-15 08:36 UTC by Didier
Modified:	2015-01-04 22:04 UTC (History)
CC List:	3 users (show)
Fixed In Version:	1398?
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-08-14 19:15:43 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg output (9.37 KB, text/plain) 2004-01-15 08:38 UTC, Didier	no flags	Details
Output of "# tree /proc > proc.txt" (16.00 KB, text/plain) 2004-01-15 08:38 UTC, Didier	no flags	Details
kernel oops call trace (3.78 KB, text/plain) 2004-01-15 08:42 UTC, Didier	no flags	Details
Pre-oops /var/log/messages (15.44 KB, text/plain) 2004-01-15 10:58 UTC, Didier	no flags	Details
lsmod_261_acpi (425 bytes, text/plain) 2004-01-15 11:11 UTC, Didier	no flags	Details
lsmod_261_apm (293 bytes, text/plain) 2004-01-15 11:11 UTC, Didier	no flags	Details
dmesg_261_apm (8.46 KB, text/plain) 2004-01-15 11:12 UTC, Didier	no flags	Details
proc_261_apm (23.27 KB, text/plain) 2004-01-15 11:12 UTC, Didier	no flags	Details
lsmod_260_acpi (458 bytes, text/plain) 2004-01-15 11:12 UTC, Didier	no flags	Details
dmesg_260_acpi (9.21 KB, text/plain) 2004-01-15 11:16 UTC, Didier	no flags	Details
proc_260_acpi (26.18 KB, text/plain) 2004-01-15 11:17 UTC, Didier	no flags	Details
strace_260_embedded (5.19 KB, text/plain) 2004-01-15 12:57 UTC, Didier	no flags	Details
strace_260_embedded_EC_info (5.04 KB, text/plain) 2004-01-15 12:58 UTC, Didier	no flags	Details
strace_261_embedded (5.19 KB, text/plain) 2004-01-15 12:59 UTC, Didier	no flags	Details
strace_261_embedded_e^A (4.57 KB, text/plain) 2004-01-15 13:00 UTC, Didier	no flags	Details
strace_261-131_embedded_e^A (4.57 KB, text/plain) 2004-01-15 14:58 UTC, Didier	no flags	Details
oops_261-131 (4.06 KB, text/plain) 2004-01-15 14:58 UTC, Didier	no flags	Details
dmesg_261-131 (9.16 KB, text/plain) 2004-01-15 14:59 UTC, Didier	no flags	Details
Oops call trace from /etc/cron.daily/slocate.cron /usr/bin/updatedb command (3.49 KB, text/plain) 2004-01-16 11:46 UTC, Didier	no flags	Details
View All

Description Didier 2004-01-15 08:36:52 UTC

Description of problem:
Kernel panics when executing "# tree /proc"

Version-Release number of selected component (if applicable):
2.6.1-1.126

How reproducible:
Always

Steps to Reproduce:
1. boot kernel in single mode
2. # tree /proc

  
Actual results:
kernel panic

Expected results:
tree representation of the /proc directory

Additional info:
This looks like the same kernel panic as reported in bug #113372,
which occured when compiling and inserting VMware modules.
However, in this case, the "tree /proc" command was executed
immediately after booting an unmodified kernel 2.6.1.-1.126 in single
mode.

Comment 1 Didier 2004-01-15 08:38:21 UTC

Created attachment 97005 [details]
dmesg output

Comment 2 Didier 2004-01-15 08:38:59 UTC

Created attachment 97006 [details]
Output of "# tree /proc > proc.txt"

Comment 3 Arjan van de Ven 2004-01-15 08:41:27 UTC

I've yet to see an actual panic.....

Comment 4 Didier 2004-01-15 08:42:11 UTC

Created attachment 97007 [details]
kernel oops call trace

Comment 5 Didier 2004-01-15 08:46:16 UTC

Comment on attachment 97006 [details]
Output of "# tree /proc > proc.txt"

The file length is exactly at the 16KB boundary (16384 bytes) ; consequently,
as the report could be the result of the disk buffer not flushing (?), I'm not
positively sure the kernel oopsed when descending in the ACPI DSDT /proc tree.

Please indicate whether you require this to be checked.

Comment 6 Arjan van de Ven 2004-01-15 08:57:02 UTC

is it possible to give a few more lines of the log from just before
the oops ?

Comment 7 Arjan van de Ven 2004-01-15 08:58:27 UTC

(the machines I have that use ACPI don't seem to do this...)
Can you do an lsmod and see which exact acpi modules are loaded ?
(makes the hunt-for-the-bug more focused)

Comment 8 Didier 2004-01-15 10:58:30 UTC

Created attachment 97010 [details]
Pre-oops /var/log/messages

Complete pre-oops log, WRT comment #6

Comment 9 Arjan van de Ven 2004-01-15 11:08:25 UTC

ok interesting. 
Can you see this as well without firewire modules loaded ?

Comment 10 Didier 2004-01-15 11:09:45 UTC

1. WRT comment #5 :
I visually confirmed the panic occurs when (or after) accessing the
/proc dsdt tree :

|   |   |       `-- info
|   |   `-- sleep
|   |       `-- SLPB
|   |           `-- info
|   |-- dsdt

>> Oops


2. WRT comment #6 :

- Immediately after booting in single mode, I 'rmmod'ed all modules
except for {usbcore,ext3,jbd} ; the oops still occurs when tree-ing /proc.

- The oops does _not_ occur with kernel 2.6.0-1.104 in ACPI mode, nor
with kernel 2.6.1-1.126 in forced APM mode ("acpi=off apm=on").
Please find in attachment some lsmod, proc and dmesg reports for both
the 2.6.0 and 2.6.1 kernels.

Comment 11 Didier 2004-01-15 11:11:07 UTC

Created attachment 97011 [details]
lsmod_261_acpi

Comment 12 Didier 2004-01-15 11:11:31 UTC

Created attachment 97012 [details]
lsmod_261_apm

Comment 13 Didier 2004-01-15 11:12:00 UTC

Created attachment 97013 [details]
dmesg_261_apm

Comment 14 Didier 2004-01-15 11:12:25 UTC

Created attachment 97014 [details]
proc_261_apm

Comment 15 Didier 2004-01-15 11:12:45 UTC

Created attachment 97015 [details]
lsmod_260_acpi

Comment 16 Arjan van de Ven 2004-01-15 11:14:59 UTC

ok to narrow it down; on my machines the "embedded controller" dir
comes after dsdt;
does 
ls /proc/acpi/embedded_controller
oops too ?
and 
tree /proc/acpi/embedded_controller
?

Comment 17 Didier 2004-01-15 11:16:12 UTC

Created attachment 97016 [details]
dmesg_260_acpi

Please note 2.6.0-1.104 does not give the
"EXT2-fs warning (device hda7): ext2_fill_super: mounting ext3 filesystem as
ext2" warning, contrary to 2.6.1-1.126, which seems to initially mount / in
ext2 modus (resulting in a long fsck after each forced reboot without sync).

Comment 18 Didier 2004-01-15 11:17:10 UTC

Created attachment 97017 [details]
proc_260_acpi

Comment 19 Didier 2004-01-15 11:27:33 UTC

WRT comment #16 : 
'ls /proc/acpi/embedded_controller' (as a regular user) reveals a "e?"
subdirectory ;
'ls /proc/acpi/embedded_controller/e?' oopses.

WRT comment #9 : yes, it oopses with all modules removed (see comment
#10, 2.).


Note 1:
- this machine is an IBM ThinkPad A30p

Note 2:
- 2.4 APM PM worked (almost) flawlessly ;
- 2.6 ACPI CPU freq reports, thermal zones, etc. do work (except for
fan) ;
- 2.6 ACPI PM does not work (keyboard screen blanking & suspend to ram
does nothing, suspend to disk crashes hard, etc.)
- 2.6 APM does not work (keyboard screen blanking works,
suspend-to-disk suspends, but does not recover : system comes up, but
hangs hard).

Comment 20 Arjan van de Ven 2004-01-15 12:10:00 UTC

can you get a strace of those ls commands ?

Comment 21 Didier 2004-01-15 12:57:54 UTC

Created attachment 97021 [details]
strace_260_embedded

strace -f ls /proc/acpi/embedded_controller > strace_260_embedded.txt 2>&1

Comment 22 Didier 2004-01-15 12:58:39 UTC

Created attachment 97022 [details]
strace_260_embedded_EC_info

strace -f ls /proc/acpi/embedded_controller/EC/info

Comment 23 Didier 2004-01-15 12:59:32 UTC

Created attachment 97023 [details]
strace_261_embedded

strace -f ls /proc/acpi/embedded_controller
(kernel 2.6.1-1.126)

Comment 24 Didier 2004-01-15 13:00:42 UTC

Created attachment 97024 [details]
strace_261_embedded_e^A

strace -f ls /proc/acpi/embedded_controller/...  (bash TAB completion => oops)
(kernel 2.6.1-1.126)

Comment 25 Didier 2004-01-15 13:04:28 UTC

- Result of 'cat /proc/acpi/embedded_controller/EC/info' with 2.6.0 :

gpe bit:                 0x1c
ports:                   0x66, 0x62
use global lock:         no


- Kernel also oopses with Dave Jones' RH kernel-2.6.1-1.41


(Sorry for the delays in following up this bug entry, but this being
my 'production' machine, I have to constantly switch kernels & reboot.)

Comment 26 Alexander Viro 2004-01-15 13:12:52 UTC

OK...  I think I see where that code abuses procfs: remove_proc_entry()
is not recursive, so calling it on a non-empty directory is a Bad Thing(tm).
We really ought to add BUG_ON(de->subdir); in fs/proc/generic.c::remove_proc_entry() - right after
de->next = NULL; in there.  I suspect that it will trigger.

Comment 27 Arjan van de Ven 2004-01-15 13:34:43 UTC

Ok building a kernel with this included; 2.6.1-1.131 is going to be
the version of it.

Comment 28 Didier 2004-01-15 13:50:05 UTC

The usual disclaimers notwithstanding, I suppose I can apply
2.6.1-1.131 without too much risk for dataloss ?

I'm running 2.6.0/1 series for a couple of weeks on my personal
production workstation ; I'm neither proficient in C nor very
conscient (sp?) in making data backups. ;-o


Off-topic :  RawHide kernel version 1.41 < this 1.131 ; which version
is appropriate for workstations (not bigmem servers) ?


Thanks for an incredible follow-up, incomparable to certain other
OS'es ! (happily running 4 licensed RHEL's in our datacenter)

Comment 29 Arjan van de Ven 2004-01-15 14:20:03 UTC

ok I'm uploading 131 now; the biggest change is the BUGON Al talks
about; what it will do is oops the kernel earlier if the assumption
about this bug we have is correct.

Comment 30 Didier 2004-01-15 14:58:26 UTC

Created attachment 97026 [details]
strace_261-131_embedded_e^A

Comment 31 Didier 2004-01-15 14:58:50 UTC

Created attachment 97027 [details]
oops_261-131

Comment 32 Didier 2004-01-15 14:59:16 UTC

Created attachment 97028 [details]
dmesg_261-131

Comment 33 Arjan van de Ven 2004-01-15 15:07:20 UTC

ok... so
we need to go back to the drawing board with this one ;(

Comment 34 Didier 2004-01-15 16:23:26 UTC

Please note that this oops is also ignited by 'insmod'ding a freshly
compiled vmmon.o VMware-module (up to and including VMware bld6979),
as indicated in bug & comment
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=113372#c7 ;

however, applying patch
http://platan.vc.cvut.cz/ftp/pub/vmware/vmware-any-any-update48.tar.gz
(courtesy of Petr Vandrovec) prevents the vmmon.o oops ; perhaps this
may be a clue ?
(VMware's binary-only modules taint the kernel; I do not know about
the license restrictions, if any, of the
vmware-any-any-update48.tar.gz patch).

Comment 35 Didier 2004-01-16 11:46:47 UTC

Created attachment 97051 [details]
Oops call trace from /etc/cron.daily/slocate.cron /usr/bin/updatedb command

Seems like updatedb also cops out (different call trace).

Comment 36 Arjan van de Ven 2004-01-17 21:33:52 UTC

we found the same laptop in the hands of an engineer inside RH who ran
a few more tests for us, and we located the problem.
I've uploaded a kernel with a (hackish) workaround (1.138) that works
on this laptop, could you find some time over the next few days to see
if it is also fixed on your laptop ?

Comment 37 Didier 2004-01-18 12:57:42 UTC

Great, the workaround in bld 1.138 fixed the issue both with the ls
and the updatedb (comment #35) ! (withput the any-any-48 patch, VMware
still cops out with an oops though, which is probably a VMware issue).

Off-topic : would either RH or the particular engineer from comment
#36 be interested to sort out the ACPI issues with this laptop (*) ?
It goes without saying that I'm willing to help wherever I can (mostly
limited to guinea pigging, I suppose) .

(*) In particular suspend-to-ram & suspend-to-disk ; I've tried both
the /proc/acpi/sleep (after kernel recompile) and /sys/power/state
interfaces with several Arjan 2.6.x builds, to no avail.

Comment 38 Arjan van de Ven 2004-01-19 09:45:45 UTC

*** Bug 113814 has been marked as a duplicate of this bug. ***

Comment 39 Len Brown 2004-01-19 21:03:03 UTC

Arjan reports that this panic is due to a memory allocation bug in ACPI's EC code, 
and that we should be able to reproduce it on a T40.  We'll try to reproduce it and 
provide a permanent fix. 
 
Didier, 
Regarding ACPI features not working...  There are several resources available. 
acpi-devel.net is the community of folks working on ACPI 
features in Linux.  We've got a number of open bug reports: 
http://bugzilla.kernel.org/buglist.cgi?short_desc_type=allwordssubstr&short_desc=&component=ACPI&long_desc_type=allwordssubstr&long_desc=&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=ASSIGNED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&changedin=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&namedcmd=component%3DACPI&newqueryname=&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0= 
You might take a look at that bug list and offer your services in those 
bug reports to help run tests and debug the issues; or file new bugs where 
there isn't one open already. 
 
In any case, this bug report should be about the panic, and the other issues 
should be worked in other bug reports. 
 
thanks, 
-Len

Comment 40 Kaj J. Niemi 2004-01-19 21:31:50 UTC

Just for the record, this also happens on a T40p (although there
doesn't seem to be much of a difference to the T40 with regards to specs)

Comment 41 Dave Jones 2005-07-15 18:43:12 UTC

An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem.   Please update to this new kernel, and
report whether or not it fixes your problem.

If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.

Thank you.

Comment 42 Didier 2005-08-14 19:15:43 UTC

Tested with kernel-2.6.12-1.1398_FC4 ; resolved.

Note You need to log in before you can comment on or make changes to this bug.