Description of problem: Kernel panics when executing "# tree /proc" Version-Release number of selected component (if applicable): 2.6.1-1.126 How reproducible: Always Steps to Reproduce: 1. boot kernel in single mode 2. # tree /proc Actual results: kernel panic Expected results: tree representation of the /proc directory Additional info: This looks like the same kernel panic as reported in bug #113372, which occured when compiling and inserting VMware modules. However, in this case, the "tree /proc" command was executed immediately after booting an unmodified kernel 2.6.1.-1.126 in single mode.
Created attachment 97005 [details] dmesg output
Created attachment 97006 [details] Output of "# tree /proc > proc.txt"
I've yet to see an actual panic.....
Created attachment 97007 [details] kernel oops call trace
Comment on attachment 97006 [details] Output of "# tree /proc > proc.txt" The file length is exactly at the 16KB boundary (16384 bytes) ; consequently, as the report could be the result of the disk buffer not flushing (?), I'm not positively sure the kernel oopsed when descending in the ACPI DSDT /proc tree. Please indicate whether you require this to be checked.
is it possible to give a few more lines of the log from just before the oops ?
(the machines I have that use ACPI don't seem to do this...) Can you do an lsmod and see which exact acpi modules are loaded ? (makes the hunt-for-the-bug more focused)
Created attachment 97010 [details] Pre-oops /var/log/messages Complete pre-oops log, WRT comment #6
ok interesting. Can you see this as well without firewire modules loaded ?
1. WRT comment #5 : I visually confirmed the panic occurs when (or after) accessing the /proc dsdt tree : | | | `-- info | | `-- sleep | | `-- SLPB | | `-- info | |-- dsdt >> Oops 2. WRT comment #6 : - Immediately after booting in single mode, I 'rmmod'ed all modules except for {usbcore,ext3,jbd} ; the oops still occurs when tree-ing /proc. - The oops does _not_ occur with kernel 2.6.0-1.104 in ACPI mode, nor with kernel 2.6.1-1.126 in forced APM mode ("acpi=off apm=on"). Please find in attachment some lsmod, proc and dmesg reports for both the 2.6.0 and 2.6.1 kernels.
Created attachment 97011 [details] lsmod_261_acpi
Created attachment 97012 [details] lsmod_261_apm
Created attachment 97013 [details] dmesg_261_apm
Created attachment 97014 [details] proc_261_apm
Created attachment 97015 [details] lsmod_260_acpi
ok to narrow it down; on my machines the "embedded controller" dir comes after dsdt; does ls /proc/acpi/embedded_controller oops too ? and tree /proc/acpi/embedded_controller ?
Created attachment 97016 [details] dmesg_260_acpi Please note 2.6.0-1.104 does not give the "EXT2-fs warning (device hda7): ext2_fill_super: mounting ext3 filesystem as ext2" warning, contrary to 2.6.1-1.126, which seems to initially mount / in ext2 modus (resulting in a long fsck after each forced reboot without sync).
Created attachment 97017 [details] proc_260_acpi
WRT comment #16 : 'ls /proc/acpi/embedded_controller' (as a regular user) reveals a "e?" subdirectory ; 'ls /proc/acpi/embedded_controller/e?' oopses. WRT comment #9 : yes, it oopses with all modules removed (see comment #10, 2.). Note 1: - this machine is an IBM ThinkPad A30p Note 2: - 2.4 APM PM worked (almost) flawlessly ; - 2.6 ACPI CPU freq reports, thermal zones, etc. do work (except for fan) ; - 2.6 ACPI PM does not work (keyboard screen blanking & suspend to ram does nothing, suspend to disk crashes hard, etc.) - 2.6 APM does not work (keyboard screen blanking works, suspend-to-disk suspends, but does not recover : system comes up, but hangs hard).
can you get a strace of those ls commands ?
Created attachment 97021 [details] strace_260_embedded strace -f ls /proc/acpi/embedded_controller > strace_260_embedded.txt 2>&1
Created attachment 97022 [details] strace_260_embedded_EC_info strace -f ls /proc/acpi/embedded_controller/EC/info
Created attachment 97023 [details] strace_261_embedded strace -f ls /proc/acpi/embedded_controller (kernel 2.6.1-1.126)
Created attachment 97024 [details] strace_261_embedded_e^A strace -f ls /proc/acpi/embedded_controller/... (bash TAB completion => oops) (kernel 2.6.1-1.126)
- Result of 'cat /proc/acpi/embedded_controller/EC/info' with 2.6.0 : gpe bit: 0x1c ports: 0x66, 0x62 use global lock: no - Kernel also oopses with Dave Jones' RH kernel-2.6.1-1.41 (Sorry for the delays in following up this bug entry, but this being my 'production' machine, I have to constantly switch kernels & reboot.)
OK... I think I see where that code abuses procfs: remove_proc_entry() is not recursive, so calling it on a non-empty directory is a Bad Thing(tm). We really ought to add BUG_ON(de->subdir); in fs/proc/generic.c::remove_proc_entry() - right after de->next = NULL; in there. I suspect that it will trigger.
Ok building a kernel with this included; 2.6.1-1.131 is going to be the version of it.
The usual disclaimers notwithstanding, I suppose I can apply 2.6.1-1.131 without too much risk for dataloss ? I'm running 2.6.0/1 series for a couple of weeks on my personal production workstation ; I'm neither proficient in C nor very conscient (sp?) in making data backups. ;-o Off-topic : RawHide kernel version 1.41 < this 1.131 ; which version is appropriate for workstations (not bigmem servers) ? Thanks for an incredible follow-up, incomparable to certain other OS'es ! (happily running 4 licensed RHEL's in our datacenter)
ok I'm uploading 131 now; the biggest change is the BUGON Al talks about; what it will do is oops the kernel earlier if the assumption about this bug we have is correct.
Created attachment 97026 [details] strace_261-131_embedded_e^A
Created attachment 97027 [details] oops_261-131
Created attachment 97028 [details] dmesg_261-131
ok... so we need to go back to the drawing board with this one ;(
Please note that this oops is also ignited by 'insmod'ding a freshly compiled vmmon.o VMware-module (up to and including VMware bld6979), as indicated in bug & comment https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=113372#c7 ; however, applying patch http://platan.vc.cvut.cz/ftp/pub/vmware/vmware-any-any-update48.tar.gz (courtesy of Petr Vandrovec) prevents the vmmon.o oops ; perhaps this may be a clue ? (VMware's binary-only modules taint the kernel; I do not know about the license restrictions, if any, of the vmware-any-any-update48.tar.gz patch).
Created attachment 97051 [details] Oops call trace from /etc/cron.daily/slocate.cron /usr/bin/updatedb command Seems like updatedb also cops out (different call trace).
we found the same laptop in the hands of an engineer inside RH who ran a few more tests for us, and we located the problem. I've uploaded a kernel with a (hackish) workaround (1.138) that works on this laptop, could you find some time over the next few days to see if it is also fixed on your laptop ?
Great, the workaround in bld 1.138 fixed the issue both with the ls and the updatedb (comment #35) ! (withput the any-any-48 patch, VMware still cops out with an oops though, which is probably a VMware issue). Off-topic : would either RH or the particular engineer from comment #36 be interested to sort out the ACPI issues with this laptop (*) ? It goes without saying that I'm willing to help wherever I can (mostly limited to guinea pigging, I suppose) . (*) In particular suspend-to-ram & suspend-to-disk ; I've tried both the /proc/acpi/sleep (after kernel recompile) and /sys/power/state interfaces with several Arjan 2.6.x builds, to no avail.
*** Bug 113814 has been marked as a duplicate of this bug. ***
Arjan reports that this panic is due to a memory allocation bug in ACPI's EC code, and that we should be able to reproduce it on a T40. We'll try to reproduce it and provide a permanent fix. Didier, Regarding ACPI features not working... There are several resources available. acpi-devel.net is the community of folks working on ACPI features in Linux. We've got a number of open bug reports: http://bugzilla.kernel.org/buglist.cgi?short_desc_type=allwordssubstr&short_desc=&component=ACPI&long_desc_type=allwordssubstr&long_desc=&kernel_version_type=allwordssubstr&kernel_version=&bug_status=NEW&bug_status=ASSIGNED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&changedin=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&namedcmd=component%3DACPI&newqueryname=&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0= You might take a look at that bug list and offer your services in those bug reports to help run tests and debug the issues; or file new bugs where there isn't one open already. In any case, this bug report should be about the panic, and the other issues should be worked in other bug reports. thanks, -Len
Just for the record, this also happens on a T40p (although there doesn't seem to be much of a difference to the T40 with regards to specs)
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
Tested with kernel-2.6.12-1.1398_FC4 ; resolved.