112544 – installing dev-3.3.9-2 rpm dumps trace to console and crashes

Bug 112544 - installing dev-3.3.9-2 rpm dumps trace to console and crashes

Summary: installing dev-3.3.9-2 rpm dumps trace to console and crashes

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Raw Hide
Classification:	Retired
Component:	kernel
Sub Component:
Version:	1.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Stephen Tweedie
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-12-22 17:01 UTC by Karl DeBisschop
Modified:	2007-04-18 17:00 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-01-16 16:56:34 UTC
Embargoed:

Attachments	(Terms of Use)
Kernel trace (6.28 KB, text/plain) 2003-12-22 17:50 UTC, Paul Nasrat	no flags	Details
kernel trace with selinux off (13.06 KB, text/plain) 2003-12-22 18:24 UTC, Karl DeBisschop	no flags	Details
dev-3.3.10 oops selinux=0 (3.96 KB, text/plain) 2004-01-14 19:28 UTC, Karl DeBisschop	no flags	Details
kernel message with 2.6.1-1.43 and dev-3.3.10-1 (114.45 KB, text/plain) 2004-01-16 14:46 UTC, Karl DeBisschop	no flags	Details
View All

Description Karl DeBisschop 2003-12-22 17:01:09 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7a)
Gecko/20031221 Firebird/0.7+

Description of problem:
All my attempts to install this rpm have led to a total crash of the
system. There is and extensive trace generated and dumped to the
console screen. I have not been able to find a core yet (I may need to
change some settings to get the core - I'll try and do as soon as I
have the time).

The computer is a Dell Inspiron 7500 laptop (p2 moblie 450 CPU, 256 MB
RAM)

Failure occurs in any system configuration I have tested - run level
3, run level 1, network up, no network running, ....

I have not tried running the rpm install with noscripts or anything
esoteric like that.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. rpm -Uv dev-3.3.9-2.i386.rpm
2. wait
3. console filled with traceback of some sort
4. cycle power because nothing else will bring the machine back
    

Additional info:

Comment 1 Karl DeBisschop 2003-12-22 17:39:18 UTC

when run with 'rpm -Uvvvh' the last entry created is /dev/compaq

Comment 2 Bill Nottingham 2003-12-22 17:46:08 UTC

What kernel are you running under?

Do you have a kernel trace?

Comment 3 Paul Nasrat 2003-12-22 17:50:11 UTC

Created attachment 96669 [details]
Kernel trace

2.6.0-0.1.14 with selinux=0

Oops attached.	Once it had oopsed further actions such as shutdown -h now
caused oops.  Had to sync/remount/poweroff with sysrq

I also have a script output from rpm -Uvvh if that helps

Comment 4 Karl DeBisschop 2003-12-22 17:56:20 UTC

I'm also on 2.6.0-0.1.14 kernel.

Not sure how to generate an Oops to attach.

I will need to wait until tonight to set up a serial capture, I think.
Any other way to get an Oops for you?

Comment 5 Karl DeBisschop 2003-12-22 18:24:32 UTC

Created attachment 96670 [details]
kernel trace with selinux off

Includes startup messages so you can see kernel command line

Comment 6 Karl DeBisschop 2003-12-22 18:28:34 UTC

OK - now that I've figured out hot to make an oops, I should note that
the trace with SElinux enabled is A LOT longer. Is there any value to
that as well? I'll assume no, unless someone asks for it.

Comment 7 Paul Nasrat 2003-12-30 13:57:21 UTC

Should this actually be filed against kernel?  Got caught by it again
on another rawhide sync.

Comment 8 Karl DeBisschop 2003-12-30 15:04:50 UTC

I don't know if it should be filed against kernel or dev - I guessed,
but I suspect either is justifiable.

Since I update as regularly as I can, I installed the rpm with the
--justdb flag for now until I see some action on this issue. But I
also checked last night and it still crashes my machines dead too.

Comment 9 Paul Nasrat 2004-01-14 15:58:16 UTC

Still occuring against rawhide whilst updating under kernel 2.6.0-1.21
and upgrading to dev 3.3.9-2.  Booting into 2.4.22-1.2149 update works
as expected.

Comment 10 Stephen Tweedie 2004-01-14 18:09:13 UTC

Hmm: can you do a "fsck" to check the root fs?  It's really useful to
know whether such problems are being triggered by something not right
on the filesystem, or whether it's repeatable on an error-free partition.

Comment 11 Karl DeBisschop 2004-01-14 18:56:26 UTC

fsck says all clear

I have been meaning to get back to this. dev-3.3.9-2 successfully
applied maybe 2 kernels ago - I'll have to check dates after I finish
getting the current oops. I was just about to report this bug closed,
but then dev-3.3.10-1 came along and broke things again. I'm still
colecting info, but the oosp starts like:

Unable to handle kernel paging request at virtual address c2f84e48
 printing eip:
c01aa216
*pde = 0000b067
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[<c01aa216>]    Not tainted
EFLAGS: 00010297
EIP is at sync_sb_inodes+0x56/0x3d0
eax: c9808e44   ebx: cf5d5ca0   ecx: cf5d5c98   edx: c2f84e44
esi: cf5d5ca0   edi: cf5d5bf8   ebp: 00000000   esp: cfa63e7c
ds: 007b   es: 007b   ss: 0068
Process pdflush (pid: 7, threadinfo=cfa62000 task=cfa8e960)
Stack: 00000000 00000296 c0465dc0 00000000 c026b63c 00000064 00000000
cf5d5ca0
       00000246 cf5d5c5c ffffe41a cf5d5bf8 cfa63ef0 00002f22 00000000
c01aa6fc
       cf5d5bf8 cfa63ef0 cfa63ef0 cfa63fd0 cfa63f10 00000000 c0154778
cfa63ef0
Call Trace:
 [<c026b63c>] blk_congestion_wait+0x8c/0xa0
 [<c01aa6fc>] writeback_inodes+0x16c/0x450
 [<c0154778>] get_page_state+0x18/0x20
 [<c01557a7>] wb_kupdate+0xa7/0x120
 [<c015612d>] __pdflush+0x21d/0x620
 [<c0156530>] pdflush+0x0/0x20
 [<c015653f>] pdflush+0xf/0x20
 [<c0155700>] wb_kupdate+0x0/0x120

Comment 12 Karl DeBisschop 2004-01-14 19:28:03 UTC

Created attachment 96989 [details]
dev-3.3.10 oops selinux=0

Comment 13 Karl DeBisschop 2004-01-16 05:55:44 UTC

after installing kernel-2.6.1-1.43 I was able to succesfully install
dev-3.3.10-1

If you're still listening, Paul, have your tried that combination?

Comment 14 Stephen Tweedie 2004-01-16 12:20:32 UTC

Karl --- just to be sure, could you try that more than once, just to
be sure?  You can re-install an already-installed rpm with "rpm -Uvh
--force" for testing purposes.  I'd like to double-check before we
close this.

Thanks!

Comment 15 Karl DeBisschop 2004-01-16 14:46:08 UTC

Created attachment 97057 [details]
kernel message with 2.6.1-1.43 and dev-3.3.10-1

search for "Preparing" to get past selinux warnings

note messages from "ext3_destroy_inode"

at end of file, note that two orphaned inodes have been created. (I did an fcsk
and reboot immediately previous to this installation)

Comment 16 Karl DeBisschop 2004-01-16 14:53:09 UTC

To be clear, rerunning the install works on the surface - the install
completes, and the system behave normally. Ext3 errors just tend to
worry me, but if that is a concern, it might not be realted to the dev
package for all I know. It may need to ne a new bugzilla.

Also, I'm not 100% sure that dev is causing the orphaned inodes - it
could well be something else that happens in bootup.

Comment 17 Karl DeBisschop 2004-01-16 15:06:35 UTC

did a few more reboots, and I'm now pretty confident that the orphaned
inodes are from the rpm install of the dev package

Comment 18 Paul Nasrat 2004-01-16 15:10:16 UTC

Yes I got ext3 errors/orphaned inodes after dev install on that kernel
although it didn't hang.

Though a combination of that and most likely my own error, / seems to
be hosed.  I do feel there is possibly still an issue here :(

Comment 19 Stephen Tweedie 2004-01-16 16:56:34 UTC

Orphans can be a natural consequence of such updates.  If you've got a
file open and you delete the file and recreate it, the orphan remains
as long as the original, deleted inode is open.  If the application
which opened it doesn't ever close it, then the kernel *cannot*
reclaim it until the next reboot.  In such cases, orphans are simply a
sign of the kernel doing its job correctly.

The

  Slab corruption: start=c9108084, expend=c9108303, problemat=c9108154
  Last user: [<d088dfdd>](ext3_destroy_inode+0x1d/0x30 [ext3])

are a different matter entirely, and I'll try to recreate that here.

Note You need to log in before you can comment on or make changes to this bug.