204638 – squashfs is causing memory corruption

Bug 204638 - squashfs is causing memory corruption

Summary: squashfs is causing memory corruption

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Prarit Bhargava
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	204625 (view as bug list)
Depends On:
Blocks:	fedora-ia64
TreeView+	depends on / blocked

Reported:	2006-08-30 15:40 UTC by Doug Chapman
Modified:	2007-11-30 22:11 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-10-04 17:26:36 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
bugcheck at list_del+0x100/0x160 (4.09 KB, text/plain) 2006-08-30 15:42 UTC, Doug Chapman	no flags	Details
Oops at mm_init+0x190/0x240 (3.07 KB, text/plain) 2006-08-30 15:44 UTC, Doug Chapman	no flags	Details
oops from xprt_reserve+0x150/0x2e0 [sunrpc] (10.44 KB, text/plain) 2006-08-30 20:01 UTC, Doug Chapman	no flags	Details
reproducer for panics seen during install (486 bytes, text/plain) 2006-09-13 18:29 UTC, Doug Chapman	no flags	Details
View All

Description Doug Chapman 2006-08-30 15:40:28 UTC

Description of problem:
We are seeing various random panics during install of rawhide on ia64 starting
sonewhere around the 20060829 build tree.  Since there are so many different
panics that all are appearing starting at the same time it looks like this is a
memory corruption issue.

I will post stack traces as attachements as I see them.


Version-Release number of selected component (if applicable):
2.6.17-1.2600.fc6


How reproducible:
probably 95% of the time.   So far only seeing during installation.  A couple of
installs have been successful.


Steps to Reproduce:
1. NFS install on ia64
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Doug Chapman 2006-08-30 15:42:47 UTC

Created attachment 135220 [details]
bugcheck at list_del+0x100/0x160

Comment 2 Doug Chapman 2006-08-30 15:44:12 UTC

Created attachment 135221 [details]
Oops at mm_init+0x190/0x240

Comment 3 Doug Chapman 2006-08-30 20:01:29 UTC

Created attachment 135240 [details]
oops from xprt_reserve+0x150/0x2e0 [sunrpc]

Comment 4 Steve Dickson 2006-09-05 15:03:01 UTC

What one out of three and its an NFS problem??? :-)

Make sure that patch from bz 204859 has been applied
and also beware of bz204848 if your using NFSv4.

Comment 5 David Lawrence 2006-09-05 15:27:47 UTC

Reassigning to correct owner, kernel-maint.

Comment 7 Doug Chapman 2006-09-05 16:12:30 UTC

FYI,

things are looking better as of today's build (20060905) however since we could
not reproduce 100% before the plan is to keep this open for a day or two and
make sure we don't hit this again.

But, it is looking like the other NFS patches may have fixed this.

Comment 8 Doug Chapman 2006-09-08 16:25:44 UTC

These issues seem to be back as of 20060907.  I have 1 panic that does seem to
be somewhat reproducable.  Other than that I get random hangs.  I can sometimes
get the hang evenwhen doing an http install so either we have multiple problems
or it isn't NFS.

The panic _does_ appear to be NFS related however.  I get this 100% of the time
on 1 particular system (an rx2600 w/ 1 CPU).

loop0[606]: Oops 8804682956800 [1]
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor
raid1 raid0 mptspi scsi_transport_spi mptscsih mptbase e100 mii tg3 ohci_hcd
ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd
cdrom squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs

Pid: 606, CPU 0, comm:                loop0
psr : 0000101008526010 ifs : 800000000000038b ip  : [<a0000002004ee080>]    Not
tainted
ip is at xprt_reserve+0x140/0x2e0 [sunrpc]
unat: 0000000000000000 pfs : 000000000000038b rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000000009681
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000002004edfa0 b6  : a00000020069bca0 b7  : a0000002004e8340
f6  : 000000000000000000000 f7  : 0ffdd8000000000000000
f8  : 000000000000000000000 f9  : 10008c200000000000000
f10 : 1003e0000000000000082 f11 : 1003e0000000000000078
r1  : a000000200521438 r2  : e00000003eb6eca8 r3  : 0000000000000000
r8  : e00000003e349998 r9  : ffffffffffffffff r10 : e00000003e3499a0
r11 : e00000003e349808 r12 : e00000003eccfcb0 r13 : e00000003ecc8000
r14 : e00000003eb6ecb0 r15 : 0000000000000000 r16 : 00000000ffffffff
r17 : e00000003ecae0a8 r18 : e00000003ecae000 r19 : e00000003ecae0b0
r20 : e00000003ecae078 r21 : e00000003ecae0b8 r22 : e00000003ecae080
r23 : e00000003ecae0c0 r24 : e00000003e349834 r25 : e00000003cb18138
r26 : e00000003ecae108 r27 : e00000003cb18270 r28 : e00000003e349620
r29 : e00000003ecae0a0 r30 : a00000020055a858 r31 : 3466d843f6cebdf7

Call Trace:
 [<a000000100013e80>] show_stack+0x40/0xa0
                                sp=e00000003eccf840 bsp=e00000003ecc9610
 [<a000000100014780>] show_regs+0x840/0x880
                                sp=e00000003eccfa10 bsp=e00000003ecc95b0
 [<a000000100037b80>] die+0x1c0/0x2a0
                                sp=e00000003eccfa10 bsp=e00000003ecc9568
 [<a000000100624940>] ia64_do_page_fault+0x8a0/0x9e0
                                sp=e00000003eccfa30 bsp=e00000003ecc9518
 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
                                sp=e00000003eccfae0 bsp=e00000003ecc9518
 [<a0000002004ee080>] xprt_reserve+0x140/0x2e0 [sunrpc]
                                sp=e00000003eccfcb0 bsp=e00000003ecc94c0
 [<a0000002004e8430>] call_reserve+0xf0/0x120 [sunrpc]
                                sp=e00000003eccfcb0 bsp=e00000003ecc94a0
 [<a0000002004f8950>] __rpc_execute+0x1f0/0x660 [sunrpc]
                                sp=e00000003eccfcb0 bsp=e00000003ecc9468
 [<a0000002004f8ec0>] rpc_execute+0xa0/0xc0 [sunrpc]
                                sp=e00000003eccfcb0 bsp=e00000003ecc9448
 [<a000000200690ef0>] nfs_execute_read+0x90/0xe0 [nfs]
                                sp=e00000003eccfcb0 bsp=e00000003ecc9420
 [<a000000200691cb0>] nfs_pagein_one+0x590/0x600 [nfs]
                                sp=e00000003eccfcc0 bsp=e00000003ecc93c0
 [<a0000002006923e0>] nfs_readpages+0x6c0/0x820 [nfs]
                                sp=e00000003eccfcd0 bsp=e00000003ecc9368
 [<a0000001001048b0>] __do_page_cache_readahead+0x1f0/0x400
                                sp=e00000003eccfd10 bsp=e00000003ecc9308
 [<a000000100104ba0>] blockable_page_cache_readahead+0xe0/0x1e0
                                sp=e00000003eccfda0 bsp=e00000003ecc92c0
 [<a000000100105190>] page_cache_readahead+0x350/0x4a0
                                sp=e00000003eccfda0 bsp=e00000003ecc9268
 [<a0000001000f3c30>] do_generic_mapping_read+0x190/0x8a0
                                sp=e00000003eccfda0 bsp=e00000003ecc91b0
 [<a0000001000f43d0>] generic_file_sendfile+0x90/0xe0
                                sp=e00000003eccfdf0 bsp=e00000003ecc9168
 [<a00000020067d5d0>] nfs_file_sendfile+0x110/0x140 [nfs]
                                sp=e00000003eccfe10 bsp=e00000003ecc9120
 [<a0000002005893b0>] loop_thread+0x750/0x840 [loop]
                                sp=e00000003eccfe10 bsp=e00000003ecc90a8
 [<a0000001000123f0>] kernel_thread_helper+0x30/0x60
                                sp=e00000003eccfe30 bsp=e00000003ecc9080
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e00000003eccfe30 bsp=e00000003ecc9080

Comment 9 Doug Chapman 2006-09-08 17:48:11 UTC

(In reply to comment #7)
> FYI,
> 
> things are looking better as of today's build (20060905) however since we could
> not reproduce 100% before the plan is to keep this open for a day or two and
> make sure we don't hit this again.
> 
> But, it is looking like the other NFS patches may have fixed this.
> 

While trying to get one of my systems installed with something workable I just
tried the 0905 build again and hit yet another random panic there also.  So,
appears this wasn't fixed it just did a better job of hiding for a while.

This time it appears it was just as anaconda was finishing and unmounting (so I
guess the intall completed).

This one was on a 4 cpu HP Integrity rx4640.

sending termination signals...done
sending kill signals...done
disabling swap...
        /dev/mapper/VolGroup00-LogVol01
        /tmp/sdb1
unmounting filesystems...
init[1]: bugcheck! 0 [1]
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor
raid1 raid0 cciss mptspi scsi_transport_spi mptscsih mptbase qla2xxx
scsi_transport_fc e1000 ohci_hcd ehci_hcd iscsi_tcp libiscsi
scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom squashfs loop nfs
nfs_acl fscache lockd sunrpc vfat fat cramfs

Pid: 1, CPU 2, comm:                 init
psr : 00001010085a2010 ifs : 800000000000050e ip  : [<a0000001001433a0>]    Not
tainted
ip is at cache_free_debugcheck+0x3c0/0x600
unat: 0000000000000000 pfs : 000000000000050e rsc : 0000000000000003
rnat: e0000040fe609034 bsps: ffffffffdead4ead pr  : 0000000000066559
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001433a0 b6  : a00000010015a400 b7  : a00000020bcf8900
f6  : 0fffbccccccccc8c00000 f7  : 0ffdaa200000000000000
f8  : 100008000000000000000 f9  : 10002a000000000000000
f10 : 0fffcccccccccc8c00000 f11 : 1003e0000000000000000
r1  : a000000100ba13c0 r2  : a0000001009b8bf8 r3  : e0000040fe609034
r8  : 0000000000000021 r9  : a0000001009b5960 r10 : a0000001009b8c28
r11 : a0000001009b8c28 r12 : e0000040fe60fce0 r13 : e0000040fe608000
r14 : a0000001009b8bf8 r15 : 0000000000000000 r16 : ffffffffdead4ead
r17 : 00000000dead4ead r18 : a000000100841b64 r19 : a0000001009b5958
r20 : 0000000000000000 r21 : a0000001009a1a58 r22 : 0000000000000000
r23 : a0000001007f3100 r24 : a0000001009a1a58 r25 : a0000001009b8c00
r26 : a0000001009b8c00 r27 : a0000001008e8780 r28 : a0000001009a1be8
r29 : 0000000000000002 r30 : a000000100841b70 r31 : e0000040fe609034

Call Trace:
 [<a000000100013e80>] show_stack+0x40/0xa0
                                sp=e0000040fe60f870 bsp=e0000040fe609580
 [<a000000100014780>] show_regs+0x840/0x880
                                sp=e0000040fe60fa40 bsp=e0000040fe609528
 [<a000000100037ba0>] die+0x1c0/0x2a0
                                sp=e0000040fe60fa40 bsp=e0000040fe6094e0
 [<a000000100037cd0>] die_if_kernel+0x50/0x80
                                sp=e0000040fe60fa60 bsp=e0000040fe6094b0
 [<a00000010061ec50>] ia64_bad_break+0x270/0x4a0
                                sp=e0000040fe60fa60 bsp=e0000040fe609488
 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
                                sp=e0000040fe60fb10 bsp=e0000040fe609488
 [<a0000001001433a0>] cache_free_debugcheck+0x3c0/0x600
                                sp=e0000040fe60fce0 bsp=e0000040fe609418
 [<a000000100146a40>] kmem_cache_free+0x1c0/0x600
                                sp=e0000040fe60fce0 bsp=e0000040fe6093c8
 [<a0000001001598b0>] free_buffer_head+0x90/0x100
                                sp=e0000040fe60fcf0 bsp=e0000040fe6093a8
 [<a000000100159ef0>] try_to_free_buffers+0x170/0x1c0
                                sp=e0000040fe60fcf0 bsp=e0000040fe609378
 [<a00000010015a050>] try_to_release_page+0x110/0x140
                                sp=e0000040fe60fd00 bsp=e0000040fe609350
 [<a000000100107440>] invalidate_complete_page+0x60/0x1e0
                                sp=e0000040fe60fd00 bsp=e0000040fe609320
 [<a000000100107ad0>] invalidate_mapping_pages+0x130/0x220
                                sp=e0000040fe60fd00 bsp=e0000040fe6092d0
 [<a000000100107bf0>] invalidate_inode_pages+0x30/0x60
                                sp=e0000040fe60fd80 bsp=e0000040fe6092b0
 [<a00000010015c3f0>] invalidate_bdev+0x90/0xc0
                                sp=e0000040fe60fd80 bsp=e0000040fe609290
 [<a000000100169a30>] kill_bdev+0x30/0x80
                                sp=e0000040fe60fd80 bsp=e0000040fe609270
 [<a00000010016acc0>] __blkdev_put+0xa0/0x3a0
                                sp=e0000040fe60fd80 bsp=e0000040fe609228
 [<a00000010016b050>] blkdev_put+0x30/0x60
                                sp=e0000040fe60fd90 bsp=e0000040fe609208
 [<a00000010016b0b0>] close_bdev_excl+0x30/0x60
                                sp=e0000040fe60fd90 bsp=e0000040fe6091e0
 [<a000000100167a00>] kill_block_super+0x60/0x80
                                sp=e0000040fe60fd90 bsp=e0000040fe6091b8
 [<a000000100167ca0>] deactivate_super+0x180/0x1c0
                                sp=e0000040fe60fd90 bsp=e0000040fe609190
 [<a00000010019bf30>] mntput_no_expire+0xb0/0x1e0
                                sp=e0000040fe60fd90 bsp=e0000040fe609168
 [<a0000001001772a0>] path_release_on_umount+0x40/0x60
                                sp=e0000040fe60fd90 bsp=e0000040fe609148
 [<a00000010019f300>] sys_umount+0x620/0x700
                                sp=e0000040fe60fd90 bsp=e0000040fe6090d0
 [<a00000010000c560>] ia64_ret_from_syscall+0x0/0x40
                                sp=e0000040fe60fe30 bsp=e0000040fe6090d0
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000040fe610000 bsp=e0000040fe6090d0
 <0>Kernel panic - not syncing: Attempted to kill init!

Comment 12 Prarit Bhargava 2006-09-10 16:51:19 UTC

Doug,

steved suggests that we try turning off selinux.  Let's try that on Monday and
retest ...

P.

Comment 13 Doug Chapman 2006-09-11 13:38:21 UTC

(In reply to comment #12)
> Doug,
> 
> steved suggests that we try turning off selinux.  Let's try that on Monday and
> retest ...
> 
> P.

Much if not most of my installs have been done with selinux=0.  I re-verified
just now on today's build and hit a panic again.  Oddly today's kernel is still
the 2.6.17-1.2630.fc6 rev.  Perhaps we should try a newer kernel with the latest
git pull just in case this has been fixed upstream?

Comment 14 Aron Griffis 2006-09-11 21:08:37 UTC

I just hit this today, symptoms look very similar.  Installation on rx1620 using
VNC, http.

Running anaconda, the Fedora Core system installer - please wait...            
Probing for video card:   Unable to probe
No video hardware found, assuming headless
Starting VNC...
The VNC server is now running.
Please connect to 10.202.2.7:1 to begin the install...
Starting graphical installation...


Press <enter> for a shell
loadkeys(747): unaligned access to 0x2000000002c7ea54, ip=0x2000000000018050
loadkeys(747): unaligned access to 0x2000000002c7ea54, ip=0x2000000000018060
loadkeys(747): unaligned access to 0x2000000002c7ea6c, ip=0x2000000000018050
loadkeys(747): unaligned access to 0x2000000002c7ea6c, ip=0x2000000000018060
loadkeys(747): unaligned access to 0x2000000002c7ea84, ip=0x2000000000018050
XKB extension not present on :1
anaconda[714]: bugcheck! 0 [1]
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor
raid1 raid0 mptspi scsi_transport_spi mptscsih mptbase e1000 ohci_hcd ehci_hcd
iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6
squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs

Pid: 714, CPU 1, comm:             anaconda
psr : 0000101008522030 ifs : 800000000000050e ip  : [<a0000001001433c0>]    Not
tainted
ip is at cache_free_debugcheck+0x3c0/0x600
unat: 0000000000000000 pfs : 000000000000050e rsc : 0000000000000003
rnat: 00000000201222f6 bsps: e0000001ffcf1fac pr  : 0000000000265559
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001433c0 b6  : a000000100066fa0 b7  : a000000100500bc0
f6  : 1003e00000000000000a0 f7  : 1003e20c49ba5e353f7cf
f8  : 1003e00000000000004e2 f9  : 1003e000000000fa00000
f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db
r1  : a000000100bb1580 r2  : a0000001009c8fd8 r3  : e0000040fb0c9034
r8  : 0000000000000021 r9  : a0000001009c5748 r10 : a0000001009c9008
r11 : a0000001009c9008 r12 : e0000040fb0cfbc0 r13 : e0000040fb0c8000
r14 : a0000001009c8fd8 r15 : 0000000000000000 r16 : ffffffffdead4ead
r17 : 00000000dead4ead r18 : a0000001008f85ec r19 : a0000001009c5740
r20 : 0000000000000000 r21 : a0000001009b1c18 r22 : 0000000000000073
r23 : a0000001007fb100 r24 : a0000001009b1c18 r25 : a0000001009c8fe0
r26 : a0000001009c8fe0 r27 : 000000003fffff00 r28 : e0000040fb0c9048
r29 : e000000106c10060 r30 : e0000040fb0c802c r31 : e000000106c1002c

Call Trace:
 [<a000000100013e80>] show_stack+0x40/0xa0
                                sp=e0000040fb0cf750 bsp=e0000040fb0c94f0
 [<a000000100014780>] show_regs+0x840/0x880
                                sp=e0000040fb0cf920 bsp=e0000040fb0c9498
 [<a000000100037b80>] die+0x1c0/0x2a0
                                sp=e0000040fb0cf920 bsp=e0000040fb0c9450
 [<a000000100037cb0>] die_if_kernel+0x50/0x80
                                sp=e0000040fb0cf940 bsp=e0000040fb0c9420
 [<a000000100622070>] ia64_bad_break+0x270/0x4a0
                                sp=e0000040fb0cf940 bsp=e0000040fb0c93f0
 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
                                sp=e0000040fb0cf9f0 bsp=e0000040fb0c93f0
 [<a0000001001433c0>] cache_free_debugcheck+0x3c0/0x600
                                sp=e0000040fb0cfbc0 bsp=e0000040fb0c9380
 [<a000000100147f30>] kfree+0x170/0x5e0
                                sp=e0000040fb0cfbc0 bsp=e0000040fb0c9340
 [<a00000010050a650>] skb_release_data+0x190/0x1c0
                                sp=e0000040fb0cfbd0 bsp=e0000040fb0c9318
 [<a000000100509ee0>] kfree_skbmem+0x20/0x160
                                sp=e0000040fb0cfbd0 bsp=e0000040fb0c92f8
 [<a00000010050a310>] __kfree_skb+0x2f0/0x320
                                sp=e0000040fb0cfbd0 bsp=e0000040fb0c92d0
 [<a000000100581e40>] tcp_recvmsg+0x1040/0x1980
                                sp=e0000040fb0cfbd0 bsp=e0000040fb0c9240
 [<a000000100500c50>] sock_common_recvmsg+0x90/0xe0
                                sp=e0000040fb0cfbf0 bsp=e0000040fb0c9200
 [<a0000001004fb4b0>] sock_recvmsg+0x1f0/0x240
                                sp=e0000040fb0cfc00 bsp=e0000040fb0c91b8
 [<a0000001004fe940>] sys_recvfrom+0x120/0x220
                                sp=e0000040fb0cfd60 bsp=e0000040fb0c9120
 [<a0000001004fea80>] sys_recv+0x40/0x60
                                sp=e0000040fb0cfe30 bsp=e0000040fb0c90c8
 [<a00000010000c560>] ia64_ret_from_syscall+0x0/0x40
                                sp=e0000040fb0cfe30 bsp=e0000040fb0c90c8
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000040fb0d0000 bsp=e0000040fb0c90c8

Comment 15 Doug Chapman 2006-09-11 22:30:48 UTC

And a bit more data, to try to rule out networking I did a DVD install of the
20060911 build.  It got farther but still hit a panic when 87% of the way
through loading packages.  This was with selinux enabled, I will try again with
selinux=0.

bugcheck! 0 [1]s to 0x200000000009aa54, ip=0x2000000000018050       │      ─┘
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor
raid1 raid0 cciss mptspi scsi_transport_spi mptscsih mptbase tg3 e100 mii
ohci_hcd ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod
ide_cd cdrom ipv6 squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat
cramfsnext screen

Pid: 172, CPU 0, comm:              kswapd0
psr : 0000101008022038 ifs : 800000000000050e ip  : [<a0000001001433c0>]    Not
tainted
ip is at cache_free_debugcheck+0x3c0/0x600
unat: 0000000000000000 pfs : 000000000000050e rsc : 0000000000000003
rnat: 0000000000000000 bsps: e0000040ffd78588 pr  : 0000000000009541
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a74433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001433c0 b6  : a0000001000112e0 b7  : a000000201be4bc0
f6  : 1003e00000000000000a0 f7  : 1003e20c49ba5e353f7cf
f8  : 1003e00000000000004e2 f9  : 1003e000000000fa00000
f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db
r1  : a000000100bb1580 r2  : a0000001009c8fd8 r3  : e0000040fc301034
r8  : 0000000000000021 r9  : a0000001009c5748 r10 : a0000001009c9008
r11 : a0000001009c9008 r12 : e0000040fc307c10 r13 : e0000040fc300000
r14 : a0000001009c8fd8 r15 : 0000000000000000 r16 : ffffffffdead4ead
r17 : 00000000dead4ead r18 : a0000001008f85ec r19 : a0000001009c5740
r20 : 0000000000000000 r21 : a0000001009b1c18 r22 : 0000000000000004
r23 : a0000001007fb100 r24 : a0000001009b1c18 r25 : a0000001009c8fe0
r26 : a0000001009c8fe0 r27 : e0000040fb3f1020 r28 : e0000040fb3f0008
r29 : e0000040fc360060 r30 : e0000040fb3f002c r31 : e0000040fc36002c

Call Trace:
 [<a000000100013e80>] show_stack+0x40/0xa0
                                sp=e0000040fc3077a0 bsp=e0000040fc301468
 [<a000000100014780>] show_regs+0x840/0x880
                                sp=e0000040fc307970 bsp=e0000040fc301410
 [<a000000100037b80>] die+0x1c0/0x2a0
                                sp=e0000040fc307970 bsp=e0000040fc3013c0
 [<a000000100037cb0>] die_if_kernel+0x50/0x80
                                sp=e0000040fc307990 bsp=e0000040fc301390
 [<a000000100622070>] ia64_bad_break+0x270/0x4a0
                                sp=e0000040fc307990 bsp=e0000040fc301368
 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
                                sp=e0000040fc307a40 bsp=e0000040fc301368
 [<a0000001001433c0>] cache_free_debugcheck+0x3c0/0x600
                                sp=e0000040fc307c10 bsp=e0000040fc3012f8
 [<a000000100146a60>] kmem_cache_free+0x1c0/0x600
                                sp=e0000040fc307c10 bsp=e0000040fc3012b0
 [<a000000201be4bf0>]gnome-applets-2.16.0.1-1.fc6-ia64 [squashfs]
                     34019k     sp=e0000040fc307c20 bsp=e0000040fc301290
 [<a000000100194980>]Small applications for the GNOME panel
                                        040fc307c20 bsp=e0000040fc301270
 [<a000000100195c40>] dispose_list+0x160/0x200
                                      78%                           1238
 [<a0000001001968c0>] shrink_icache_memory+0x480/0x5a0
                                sp=e0000040fc307c20 bsp=e0000040fc3011e0
 [<a00000010010af60>] shrink_slab+0x220/0x380                    55
                                sp=e00000463c307c30 20p=e0000040f4301190
 [<a00000010010c3a0>] kswapd+0x6c0/0x900   3         1
                                sp=e0000040fc307c30 bsp=e0000040fc3010f0
 [<a0000001000adc00>] kthread+0x220/0x2a0
                                sp=e0000040fc307d50 bsp=e0000040fc3010a8
 [<a0000001000123f0>] kernel_thread_helper+0x30/0x60
                                sp=e0000040fc307e30 bsp=e0000040fc301080
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e0000040fc307e30 bsp=e0000040fc301080

Comment 16 Doug Chapman 2006-09-11 22:40:29 UTC

DVD install of 20060911 with selinux=0 failed even sooner.  Appears anaconda
stage2 was just starting:



anaconda[782]: bugcheck! 0 [1]
Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror
dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor
raid1 raid0 cciss mptspi scsi_transport_spi mptscsih mptbase tg3 e100 mii
ohci_hcd ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod
ide_cd cdrom ipv6 squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs
 Pid: 782, CPU 0, comm:             anaconda
psr : 0000101008522030 ifs : 800000000000038c ip  : [<a0000001001411f0>]    Not
tainted
ip is at check_slabp+0x210/0x240
unat: 0000000000000000 pfs : 000000000000038c rsc : 0000000000000003
rnat: 0000000000250259 bsps: a0000001001a73e0 pr  : 00000000002a5559
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001411f0 b6  : a000000100066fa0 b7  : a000000100010570
f6  : 0fffbccccccccc8c00000 f7  : 0ffdaa200000000000000
f8  : 100008000000000000000 f9  : 10002a000000000000000
f10 : 0fffcccccccccc8c00000 f11 : 1003e0000000000000000
r1  : a000000100bb1580 r2  : a0000001009c8fd8 r3  : e000004046af1034
r8  : 0000000000000021 r9  : a0000001009c5748 r10 : a0000001009c9008
r11 : a0000001009c9008 r12 : e000004046af7e20 r13 : e000004046af0000
r14 : a0000001009c8fd8 r15 : 0000000000000000 r16 : ffffffffdead4ead
r17 : 00000000dead4ead r18 : a0000001008f85ec r19 : a0000001009c5740
r20 : 0000000000000000 r21 : a0000001009b1c18 r22 : 0000000000000000
r23 : a0000001007fb100 r24 : a0000001009b1c18 r25 : a0000001009c8fe0
r26 : a0000001009c8fe0 r27 : e0000040fc3c7b18 r28 : e000000004c046e0
r29 : 0000000000000000 r30 : e000000004c046e8 r31 : e000004046af1034

Call Trace:
 [<a000000100013e80>] show_stack+0x40/0xa0
                                sp=e000004046af79b0 bsp=e000004046af15c0
 [<a000000100014780>] show_regs+0x840/0x880
                                sp=e000004046af7b80 bsp=e000004046af1568
 [<a000000100037b80>] die+0x1c0/0x2a0
                                sp=e000004046af7b80 bsp=e000004046af1520
 [<a000000100037cb0>] die_if_kernel+0x50/0x80
                                sp=e000004046af7ba0 bsp=e000004046af14f0
 [<a000000100622070>] ia64_bad_break+0x270/0x4a0
                                sp=e000004046af7ba0 bsp=e000004046af14c8
 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
                                sp=e000004046af7c50 bsp=e000004046af14c8
 [<a0000001001411f0>] check_slabp+0x210/0x240
                                sp=e000004046af7e20 bsp=e000004046af1468
 [<a0000001001449b0>] cache_alloc_refill+0x1f0/0x5e0
                                sp=e000004046af7e20 bsp=e000004046af1410
 [<a000000100145e30>] kmem_cache_alloc+0x190/0x220
                                sp=e000004046af7e20 bsp=e000004046af13d8
 [<a0000001004fb070>] sock_alloc_inode+0x30/0xc0
                                sp=e000004046af7e20 bsp=e000004046af13b8
 [<a000000100194a40>] alloc_inode+0x60/0x3e0
                                sp=e000004046af7e20 bsp=e000004046af1388
 [<a000000100194e00>] new_inode+0x40/0x140
                                sp=e000004046af7e20 bsp=e000004046af1360
 [<a0000001004fd3a0>] sock_alloc+0x40/0x100
                                sp=e000004046af7e20 bsp=e000004046af1348
 [<a0000001004fd6a0>] __sock_create+0x240/0x660
                                sp=e000004046af7e20 bsp=e000004046af12f0
 [<a0000001004fdb60>] sock_create+0x40/0x60
                                sp=e000004046af7e20 bsp=e000004046af12b8
 [<a0000001004fe1d0>] sys_socket+0x30/0xc0
                                sp=e000004046af7e20 bsp=e000004046af1258
 [<a00000010000c560>] ia64_ret_from_syscall+0x0/0x40
                                sp=e000004046af7e30 bsp=e000004046af1258
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e000004046af8000 bsp=e000004046af1258
 install exited abnormally [1/1]
                                 <0>BUG: spinlock cpu recursion on CPU#0,
events/0/8 (Not tainted)
 lock: e0000040fffb75c0, .magic: dead4ead, .owner: anaconda/782, .owner_cpu: 0

Call Trace:
 [<a000000100013e80>] show_stack+0x40/0xa0
                                sp=e0000040fe637b30 bsp=e0000040fe6312c8
 [<a000000100013f10>] dump_stack+0x30/0x60
                                sp=e0000040fe637d00 bsp=e0000040fe6312b0
 [<a0000001002acbb0>] spin_bug+0x130/0x1e0
                                sp=e0000040fe637d00 bsp=e0000040fe631258
 [<a0000001002ace30>] _raw_spin_lock+0xb0/0x260
                                sp=e0000040fe637d00 bsp=e0000040fe631220
 [<a000000100620690>] _spin_lock_irq+0x30/0x60
                                sp=e0000040fe637d00 bsp=e0000040fe631200
 [<a000000100147730>] drain_array+0xb0/0x200
                                sp=e0000040fe637d00 bsp=e0000040fe6311a8
 [<a00000010014af10>] cache_reap+0x1f0/0x580
                                sp=e0000040fe637d00 bsp=e0000040fe631160
 [<a0000001000a4000>] run_workqueue+0x1c0/0x280
                                sp=e0000040fe637d00 bsp=e0000040fe631120
 [<a0000001000a5ee0>] worker_thread+0x1a0/0x240
                                sp=e0000040fe637d00 bsp=e0000040fe6310f0
 [<a0000001000adc00>] kthread+0x220/0x2a0
                                sp=e0000040fe637d50 bsp=e0000040fe6310a8
 [<a0000001000123f0>] kernel_thread_helper+0x30/0x60
                                sp=e0000040fe637e30 bsp=e0000040fe631080
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e0000040fe637e30 bsp=e0000040fe631080
BUG: spinlock lockup on CPU#0, events/0/8, e0000040fffb75c0 (Not tainted)

Call Trace:
 [<a000000100013e80>] show_stack+0x40/0xa0
                                sp=e0000040fe637b30 bsp=e0000040fe631270
 [<a000000100013f10>] dump_stack+0x30/0x60
                                sp=e0000040fe637d00 bsp=e0000040fe631258
 [<a0000001002acf80>] _raw_spin_lock+0x200/0x260
                                sp=e0000040fe637d00 bsp=e0000040fe631220
 [<a000000100620690>] _spin_lock_irq+0x30/0x60
                                sp=e0000040fe637d00 bsp=e0000040fe631200
 [<a000000100147730>] drain_array+0xb0/0x200
                                sp=e0000040fe637d00 bsp=e0000040fe6311a8
 [<a00000010014af10>] cache_reap+0x1f0/0x580
                                sp=e0000040fe637d00 bsp=e0000040fe631160
 [<a0000001000a4000>] run_workqueue+0x1c0/0x280
                                sp=e0000040fe637d00 bsp=e0000040fe631120
 [<a0000001000a5ee0>] worker_thread+0x1a0/0x240
                                sp=e0000040fe637d00 bsp=e0000040fe6310f0
 [<a0000001000adc00>] kthread+0x220/0x2a0
                                sp=e0000040fe637d50 bsp=e0000040fe6310a8
 [<a0000001000123f0>] kernel_thread_helper+0x30/0x60
                                sp=e0000040fe637e30 bsp=e0000040fe631080
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e0000040fe637e30 bsp=e0000040fe631080

Comment 17 Prarit Bhargava 2006-09-12 10:58:55 UTC

Adding Dave in case he's heard of any corruption issues ...

Okay -- just some thoughts here.

AFAICT the problem is only seen in the install.  HTTP, FTP, CD/DVD, and NFS
installs are all failing, so we can safely say that it is not a network or NFS
issue or a single driver issue.

I don't believe that we're seeing many issues -- I still think it's one single
issue that is effecting us in different ways (I could be convinced otherwise).

We originally saw this when we made the most recent jump to the latest git pull.

Some questions for Doug and things for Doug to try:

a) Has this ever happened on a DVD you've built -- you have a nifty buildos
utitlity built by a jeeneeous engineer ;) ?
b) The one BIG thing that is different between the kernel boot and the install
is the use of squashfs.  Maybe a squashfs test might be in order.
c) Moving forward and backwards through the git pulls might be a good idea.

Doug -- a) would be pretty important to know.  That way we can concentrate on a
specific build.

P.

Comment 18 Doug Chapman 2006-09-12 16:52:03 UTC

Prarit,

I just tried building a DVD using your buildos script.  The install appears to
be working on this one but I got an oops during the install while loading
packages.  The screen is garbled due to the fact that it kept installing so I
can't really cut+paste it all here so I guess we just add this to the list of
randomness.

This was a text install from DVD whtout specifying any selinux flags.

 Oops 11012296146944 [1]000009aa54, ip=0x2000000000018050       │ Modules linked
in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs
jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 mptspi
scsi_transport_spi mptscsih mptbase e100 mii tg3 ohci_hcd ehci_hcd iscsi_tcp
libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 squashfs
loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs<F12> next screen

Pid: 2760, CPU 0, comm:          load_policy
psr : 0000121008526010 ifs : 800000000000040b ip  : [<a00000010024c301>]    Not
tainted
ip is at hashtab_map+0x61/0x140

Comment 19 Doug Chapman 2006-09-13 18:29:16 UTC

Created attachment 136187 [details]
reproducer for panics seen during install

I was able to reproduce a panic with this script on an installed and running
system.  This hopefully will aid in debugging since we were only able to see it
under anaconda before.

Comment 20 Doug Chapman 2006-09-13 20:57:31 UTC

with my reproducer it seems to be slightly less random and I hit slab corruption
a little more often than others.  This one is from 2.6.17-1.2573.fc6

Slab corruption: (Not tainted) start=e00000046fb62000, len=8192

Call Trace:
 [<a000000100013de0>] show_stack+0x40/0xa0
                                sp=e000000115b2f9b0 bsp=e000000115b297c0
 [<a000000100013e70>] dump_stack+0x30/0x60
                                sp=e000000115b2fb80 bsp=e000000115b297a8
 [<a000000100131fa0>] check_poison_obj+0x120/0x4c0
                                sp=e000000115b2fb80 bsp=e000000115b29748
 [<a000000100132bc0>] cache_alloc_debugcheck_after+0x60/0x480
                                sp=e000000115b2fb80 bsp=e000000115b29708
 [<a000000100135e60>] kmem_cache_alloc+0x1e0/0x220
                                sp=e000000115b2fb80 bsp=e000000115b296d8
 [<a00000020d026260>] squashfs_get_cached_block+0x380/0x8e0 [squashfs]
                                sp=e000000115b2fb80 bsp=e000000115b29648
 [<a00000020d029980>] squashfs_iget+0x3a0/0x2c00 [squashfs]
                                sp=e000000115b2fbb0 bsp=e000000115b295e8
 [<a00000020d027530>] squashfs_lookup+0xd70/0xe40 [squashfs]
                                sp=e000000115b2fc30 bsp=e000000115b29550
 [<a0000001001681e0>] do_lookup+0x1a0/0x460
                                sp=e000000115b2fc70 bsp=e000000115b294f8
 [<a00000010016dfb0>] __link_path_walk+0x1870/0x2680
                                sp=e000000115b2fc70 bsp=e000000115b29498
 [<a00000010016ee80>] link_path_walk+0xc0/0x260
                                sp=e000000115b2fc90 bsp=e000000115b29458
 [<a00000010016f900>] do_path_lookup+0x540/0x660
                                sp=e000000115b2fd20 bsp=e000000115b29418
 [<a000000100170cc0>] __user_walk_fd+0x60/0xa0
                                sp=e000000115b2fd30 bsp=e000000115b293d8
 [<a00000010015ebb0>] vfs_lstat_fd+0x30/0xa0
                                sp=e000000115b2fd30 bsp=e000000115b293a8
 [<a00000010015f170>] sys_newlstat+0x30/0x80
                                sp=e000000115b2fdc0 bsp=e000000115b29348
 [<a00000010000c560>] ia64_ret_from_syscall+0x0/0x40
                                sp=e000000115b2fe30 bsp=e000000115b29348
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e000000115b30000 bsp=e000000115b29348
000: ae 51 6b b3 ca 4f 7e 37 39 7e f8 e0 e8 f4 d4 77
010: df bd 2f 29 0a 04 24 49 74 b9 9c 05 59 ab 56 ae
020: 5c a9 cd a0 9b 79 be 2a 8f 1a 4a 92 96 61 b4 92
030: c6 69 2c 46 f3 a5 ce 2a 18 bb 35 3e d9 d3 d3 d2
040: d2 7f fe 7d ec bf 11 0d 2a 33 af 22 58 e8 5d bb
050: 75 ab 61 d5 aa 0c 6a 05 bd 78 0c 34 69 31 5e 06
Prev obj: start=e00000046fb60000, len=8192
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

Comment 21 Prarit Bhargava 2006-09-13 21:28:13 UTC

Bumping to urgent because of corruption.
Changing to all arch's.
Adding Jeremy Katz.


From HP xw9400 x86_64 running kernel build 2617 with Doug's reproducer:

kernel BUG at mm/slab.c:2765!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /block/loop6/dev
Modules linked in: squashfs loop autofs4 hidp nfs lockd fscache nfs_acl rfcomm l
2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_table
s xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables 
ipv6 cpufreq_ondemand dm_multipath video sbs i2c_eclist_del corruption. next->pr
ev should be f2d06000, but was a1df6fb0
 button battery asus_acpi ac parport_pc lp parport snd_hda_intel snd_hda_codec s
nd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss s
nd_mixer_oss snd_pcm snd_timer ide_cd snd forcedeth cdrom sg soundcore i2c_nforc
e2 i2c_core e1000 snd_page_alloc floppy serio_raw pcspkr ohci1394 ieee1394 k8_ed
ac edac_mc dm_snapshot dm_zero dm_mirror dm_mod sata_nv libata mptsas mptscsih s
csi_transport_sas sd_mod scsi_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    1
EIP:    0060:[<c046e4d5>]    Not tainted VLI
EFLAGS: 00010002   (2.6.17-1.2617.2.1.fc6 #1) 
EIP is at cache_free_debugcheck+0x113/0x19e
eax: 02e1ed1f   ebx: f7ff1ac0   ecx: 00000054   edx: 00000012
esi: f1d70f98   edi: ffb5415a   ebp: f5aeae70   esp: f5aeae54
ds: 007b   es: 007b   ss: 0068
Process umount (pid: 23603, ti=f5aea000 task=f7171530 task.ti=f5aea000)
Stack: 02e1ed1f c04c605f f1d70000 170fc2a5 c2137164 f7ff1ac0 00000246 f5aeae8c 
       c046e8b3 f1d70f9c f6491544 f1d70f9c f1825304 f6491544 f5aeaea0 c04c605f 
       f1825304 f182530c 0000006a f5aeaeac c0489790 f1825304 f5aeaec4 c0489d9a 
Call Trace:
 [<c046e8b3>] kmem_cache_free+0x6c/0xba
 [<c04c605f>] selinux_inode_free_security+0x59/0x5e
 [<c0489790>] destroy_inode+0x25/0x4a
 [<c0489d9a>] dispose_list+0x94/0xc1
 [<c048a024>] invalidate_inodes+0xae/0xc3
 [<c0478f0b>] generic_shutdown_super+0x46/0xdf
 [<c0478fc4>] kill_block_super+0x20/0x32
 [<c0479084>] deactivate_super+0x5d/0x6f
 [<c048be95>] mntput_no_expire+0x42/0x72
 [<c047eadf>] path_release_on_umount+0x15/0x18
 [<c048cfc1>] sys_umount+0x1e7/0x21b
 [<c048d002>] sys_oldumount+0xd/0xf
 [<c0403faf>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 [<c0405391>] show_stack_log_lvl+0x8a/0x95
 [<c04054c9>] show_registers+0x12d/0x19a
 [<c04056c6>] die+0x190/0x293
 [<c06144ba>] do_trap+0x7c/0x96
 [<c0405eb6>] do_invalid_op+0x89/0x93
 [<c0404be1>] error_code+0x39/0x40
 [<c046e8b3>] kmem_cache_free+0x6c/0xba
 [<c04c605f>] selinux_inode_free_security+0x59/0x5e
 [<c0489790>] destroy_inode+0x25/0x4a
 [<c0489d9a>] dispose_list+0x94/0xc1
 [<c048a024>] invalidate_inodes+0xae/0xc3
 [<c0478f0b>] generic_shutdown_super+0x46/0xdf
 [<c0478fc4>] kill_block_super+0x20/0x32
 [<c0479084>] deactivate_super+0x5d/0x6f
 [<c048be95>] mntput_no_expire+0x42/0x72
 [<c047eadf>] path_release_on_umount+0x15/0x18
 [<c048cfc1>] sys_umount+0x1e7/0x21b
 [<c048d002>] sys_oldumount+0xd/0xf
 [<c0403faf>] syscall_call+0x7/0xb
Code: 89 d8 e8 45 f7 ff ff 8b 55 e8 89 10 8b 45 ec 31 d2 8b 8b 8c 00 00 00 8b 78
 0c 89 f0 29 f8 f7 f1 3b 83 98 00 00 00 89 45 e4 72 08 <0f> 0b cd 0a 28 ae 63 c0
 8b 45 e4 0f af c1 8d 04 07 39 c6 74 08 
EIP: [<c046e4d5>] cache_free_debugcheck+0x113/0x19e SS:ESP 0068:f5aeae54
 <0>------------[ cut here ]------------
BUG: warning at kernel/exit.c:769/do_exit() (Not tainted)
 [<c04051ee>] show_trace_log_lvl+0x58/0x171
 [<c0405802>] show_trace+0xd/0x10
 [<c040591b>] dump_stack+0x19/0x1b
 [<c0426d95>] do_exit+0x4a/0x784
 [<c04057a3>] die+0x26d/0x293
 [<c06144ba>] do_trap+0x7c/0x96
 [<c0405eb6>] do_invalid_op+0x89/0x93
 [<c0404be1>] error_code+0x39/0x40
DWARF2 unwinder stuck at error_code+0x39/0x40
Leftover inexact backtrace:
 [<c0405802>] show_trace+0xd/0x10
 [<c040591b>] dump_stack+0x19/0x1b
 [<c0426d95>] do_exit+0x4a/0x784
 [<c04057a3>] die+0x26d/0x293
 [<c06144ba>] do_trap+0x7c/0x96
 [<c0405eb6>] do_invalid_op+0x89/0x93
 [<c0404be1>] error_code+0x39/0x40
 [<c046e8b3>] kmem_cache_free+0x6c/0xba
 [<c04c605f>] selinux_inode_free_security+0x59/0x5e
 [<c0489790>] destroy_inode+0x25/0x4a
 [<c0489d9a>] dispose_list+0x94/0xc1
 [<c048a024>] invalidate_inodes+0xae/0xc3
 [<c0478f0b>] generic_shutdown_super+0x46/0xdf
 [<c0478fc4>] kill_block_super+0x20/0x32
 [<c0479084>] deactivate_super+0x5d/0x6f
 [<c048be95>] mntput_no_expire+0x42/0x72
 [<c047eadf>] path_release_on_umount+0x15/0x18
 [<c048cfc1>] sys_umount+0x1e7/0x21b
 [<c048d002>] sys_oldumount+0xd/0xf
 [<c0403faf>] syscall_call+0x7/0xb
kernel BUG at lib/list_debug.c:70!
invalid opcode: 0000 [#2]
SMP 
last sysfs file: /block/loop6/dev
Modules linked in: squashfs loop autofs4 hidp nfs lockd fscache nfs_acl rfcomm l
2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_table
s xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables 
ipv6 cpufreq_ondemand dm_multipath video sbs i2c_ec button battery asus_acpi ac 
parport_pc lp parport snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_
seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_time
r ide_cd snd forcedeth cdrom sg soundcore i2c_nforce2 i2c_core e1000 snd_page_al
loc floppy serio_raw pcspkr ohci1394 ieee1394 k8_edac edac_mc dm_snapshot dm_zer
o dm_mirror dm_mod sata_nv libata mptsas mptscsih scsi_transport_sas sd_mod scsi
_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<c04ed28b>]    Not tainted VLI
EFLAGS: 00010046   (2.6.17-1.2617.2.1.fc6 #1) 
EIP is at list_del+0x3b/0x62
eax: 00000045   ebx: f2d06000   ecx: c0424ea6   edx: dffefef4
esi: f7ff9694   edi: f7ff1ac0   ebp: dffeff00   esp: dffefef0
ds: 007b   es: 007b   ss: 0068
Process events/0 (pid: 14, ti=dffef000 task=dfd9aab0 task.ti=dffef000)
Stack: c0644271 f2d06000 a1df6fb0 f2d06000 dffeff2c c046ea76 00000009 0000001a 
       0000000d f2d06f98 f1872040 f7fc79a4 f7fc7970 0000001a f7fc7944 dffeff4c 
       c046ebfb 00000000 f7ff1ac0 f7ff96b8 f7ff9694 f7ff1ac0 f7f89464 dffeff64 
Call Trace:
 [<c046ea76>] free_block+0x65/0x175
 [<c046ebfb>] drain_array+0x75/0x99
 [<c047050f>] cache_reap+0x76/0x118
 [<c043361e>] run_workqueue+0x7a/0xbb
 [<c0433f53>] worker_thread+0xd2/0x107
 [<c04364bd>] kthread+0xc3/0xf2
 [<c0402005>] kernel_thread_helper+0x5/0xb
DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
Leftover inexact backtrace:
 [<c0405391>] show_stack_log_lvl+0x8a/0x95
 [<c04054c9>] show_registers+0x12d/0x19a
 [<c04056c6>] die+0x190/0x293
 [<c06144ba>] do_trap+0x7c/0x96
 [<c0405eb6>] do_invalid_op+0x89/0x93
 [<c0404be1>] error_code+0x39/0x40
 [<c046ea76>] free_block+0x65/0x175
 [<c046ebfb>] drain_array+0x75/0x99
 [<c047050f>] cache_reap+0x76/0x118
 [<c043361e>] run_workqueue+0x7a/0xbb
 [<c0433f53>] worker_thread+0xd2/0x107
 [<c04364bd>] kthread+0xc3/0xf2
 [<c0402005>] kernel_thread_helper+0x5/0xb
Code: 53 68 26 42 64 c0 e8 8b 7c f3 ff 0f 0b 41 00 60 42 64 c0 83 c4 0c 8b 03 8b
 40 04 39 d8 74 17 50 53 68 71 42 64 c0 e8 6b 7c f3 ff <0f> 0b 46 00 60 42 64 c0
 83 c4 0c 8b 13 8b 43 04 89 42 04 89 10 
EIP: [<c04ed28b>] list_del+0x3b/0x62 SS:ESP 0068:dffefef0
 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
 [<c04051ee>] show_trace_log_lvl+0x58/0x171
 [<c0405802>] show_trace+0xd/0x10
 [<c040591b>] dump_stack+0x19/0x1b
 [<c041de6f>] __might_sleep+0x8d/0x95
 [<c04390b8>] down_read+0x15/0x40
 [<c0430fd4>] blocking_notifier_call_chain+0x11/0x2d
 [<c0425977>] profile_task_exit+0x11/0x13
 [<c0426d67>] do_exit+0x1c/0x784
 [<c04057a3>] die+0x26d/0x293
 [<c06144ba>] do_trap+0x7c/0x96
 [<c0405eb6>] do_invalid_op+0x89/0x93
 [<c0404be1>] error_code+0x39/0x40
DWARF2 unwinder stuck at error_code+0x39/0x40
Leftover inexact backtrace:
 [<c0405802>] show_trace+0xd/0x10
 [<c040591b>] dump_stack+0x19/0x1b
 [<c041de6f>] __might_sleep+0x8d/0x95
 [<c04390b8>] down_read+0x15/0x40
 [<c0430fd4>] blocking_notifier_call_chain+0x11/0x2d
 [<c0425977>] profile_task_exit+0x11/0x13
 [<c0426d67>] do_exit+0x1c/0x784
 [<c04057a3>] die+0x26d/0x293
 [<c06144ba>] do_trap+0x7c/0x96
 [<c0405eb6>] do_invalid_op+0x89/0x93
 [<c0404be1>] error_code+0x39/0x40
 [<c046ea76>] free_block+0x65/0x175
 [<c046ebfb>] drain_array+0x75/0x99
 [<c047050f>] cache_reap+0x76/0x118
 [<c043361e>] run_workqueue+0x7a/0xbb
 [<c0433f53>] worker_thread+0xd2/0x107
 [<c04364bd>] kthread+0xc3/0xf2
 [<c0402005>] kernel_thread_helper+0x5/0xb
SELinux: initialized (dev loop1, type squashfs), not configured for labeling
BUG: spinlock lockup on CPU#0, find/23608, f7ff96b8 (Not tainted)
 [<c04051ee>] show_trace_log_lvl+0x58/0x171
 [<c0405802>] show_trace+0xd/0x10
 [<c040591b>] dump_stack+0x19/0x1b
 [<c04ecfb2>] _raw_spin_lock+0xba/0xd9
 [<c0613f5a>] _spin_lock+0x20/0x28
 [<c046edf7>] cache_alloc_refill+0x69/0x652
 [<c046f712>] kmem_cache_alloc+0x80/0xb5
 [<c04c6090>] selinux_inode_alloc_security+0x2c/0x87
 [<c0489896>] alloc_inode+0xe1/0x170
 [<c048993c>] new_inode+0x17/0x70
 [<f8dd73d4>] squashfs_new_inode+0x13/0x86 [squashfs]
 [<f8dda8d8>] squashfs_iget+0x3dd/0x12ee [squashfs]
 [<f8dd98fb>] squashfs_lookup+0x595/0x5d5 [squashfs]
 [<c047ef70>] do_lookup+0xab/0x153
 [<c0480dec>] __link_path_walk+0x8e7/0xdcc
 [<c0481321>] link_path_walk+0x50/0xca
 [<c0481728>] do_path_lookup+0x23b/0x28d
 [<c0481ec0>] __user_walk_fd+0x2f/0x43
 [<c047b7d4>] vfs_lstat_fd+0x16/0x3d
 [<c047b839>] vfs_lstat+0x11/0x13
 [<c047b84f>] sys_lstat64+0x14/0x28
 [<c0403faf>] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb
Leftover inexact backtrace:
 [<c0405802>] show_trace+0xd/0x10
 [<c040591b>] dump_stack+0x19/0x1b
 [<c04ecfb2>] _raw_spin_lock+0xba/0xd9
 [<c0613f5a>] _spin_lock+0x20/0x28
 [<c046edf7>] cache_alloc_refill+0x69/0x652
 [<c046f712>] kmem_cache_alloc+0x80/0xb5
 [<c04c6090>] selinux_inode_alloc_security+0x2c/0x87
 [<c0489896>] alloc_inode+0xe1/0x170
 [<c048993c>] new_inode+0x17/0x70
 [<f8dd73d4>] squashfs_new_inode+0x13/0x86 [squashfs]
 [<f8dda8d8>] squashfs_iget+0x3dd/0x12ee [squashfs]
 [<f8dd98fb>] squashfs_lookup+0x595/0x5d5 [squashfs]
 [<c047ef70>] do_lookup+0xab/0x153
 [<c0480dec>] __link_path_walk+0x8e7/0xdcc
 [<c0481321>] link_path_walk+0x50/0xca
 [<c0481728>] do_path_lookup+0x23b/0x28d
 [<c0481ec0>] __user_walk_fd+0x2f/0x43
 [<c047b7d4>] vfs_lstat_fd+0x16/0x3d
 [<c047b839>] vfs_lstat+0x11/0x13
 [<c047b84f>] sys_lstat64+0x14/0x28
 [<c0403faf>] syscall_call+0x7/0xb

Comment 22 Prarit Bhargava 2006-09-13 21:36:57 UTC

Jeremy suggests a few things:

a) Disable SLAB_DEBUG
b) remove the inode-diet-squashfs patch
c) try and backport "new" upstream version

I'll build these on altix3 and test ...

P.

Comment 23 Doug Chapman 2006-09-13 21:57:44 UTC

just hit another one on x86_64, this time not slab corruption but an ugly panic.
 This was with 2.6.17-1.2630.fc6


Oops: 0000 [1] SMP
last sysfs file: /block/loop4/dev
CPU 1
Modules linked in: squashfs loop autofs4 hidp rfcomm l2cap bluetooth sunrpc
ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_tables xt_state
ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6
acpi_cpufreq dm_multipath video sbs i2c_ec button battery asus_acpi ac
parport_pc lp parport sg intel_rng e752x_edac shpchp i2c_i801 edac_mc i2c_core
e1000 pcspkr i6300esb serio_raw dm_snapshot dm_zero dm_mirror dm_mod mptsas
mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd
uhci_hcd
Pid: 7113, comm: find Not tainted 2.6.17-1.2630.fc6 #1
RIP: 0010:[<ffffffff80208970>]  [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98
RSP: 0000:ffff81005d709dc8  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffffb0b0ad4f4000 RCX: 0000000000000000
RDX: 00002fb0ad4f4000 RSI: ffff8100717117d0 RDI: ffff81007efd4dd8
RBP: ffff81005d709e58 R08: 0000000000000002 R09: 0000000000000000
R10: ffffffff80269d13 R11: ffffffff80269d13 R12: ffff810000000000
R13: ffff81005d709f58 R14: ffff81007efd4dd8 R15: ffff8100717117d0
FS:  00002aaaab673710(0000) GS:ffff810003f4f988(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffb0b0ad4f4000 CR3: 000000005ce72000 CR4: 00000000000006e0
Process find (pid: 7113, threadinfo ffff81005d708000, task ffff810059f5d040)
Stack:  0000000100000001 0000000000000000 0000000000402c24 ffff81007efd4dd8
 0000000000000001 ffff81005d709f58 ffff81007efd4dd8 0000000000000014
 ffff81005d709e38 0000000000000246 ffffffff80269d13 ffff810059f5d040
Call Trace:
 [<ffffffff80269dc9>] do_page_fault+0x487/0x84c
 [<ffffffff80261391>] error_exit+0x0/0x96
DWARF2 unwinder stuck at error_exit+0x0/0x96
Leftover inexact backtrace:


Code: 48 83 3b 00 75 18 48 8b 55 80 48 8b 7d 88 48 89 de e8 69 74
RIP  [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98
 RSP <ffff81005d709dc8>
CR2: ffffb0b0ad4f4000
 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1

Call Trace:
 [<ffffffff8026e97d>] show_trace+0xae/0x336
 [<ffffffff8026ec1a>] dump_stack+0x15/0x17
 [<ffffffff8020badb>] __might_sleep+0xb2/0xb4
 [<ffffffff802a5dda>] down_read+0x1d/0x4a
 [<ffffffff8029dce5>] blocking_notifier_call_chain+0x1b/0x41
 [<ffffffff80294077>] profile_task_exit+0x15/0x17
 [<ffffffff80215cd7>] do_exit+0x24/0x911
 [<ffffffff8026a102>] do_page_fault+0x7c0/0x84c
 [<ffffffff80261391>] error_exit+0x0/0x96
DWARF2 unwinder stuck at error_exit+0x0/0x96
Leftover inexact backtrace:
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80269dc9>] do_page_fault+0x487/0x84c
 [<ffffffff80267412>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff80261391>] error_exit+0x0/0x96


=============================================
[ INFO: possible recursive locking detected ]
2.6.17-1.2630.fc6 #1
---------------------------------------------
find/7113 is trying to acquire lock:
 (&mm->mmap_sem){----}, at: [<ffffffff802b6d46>] acct_collect+0x58/0x1b7

but task is already holding lock:
 (&mm->mmap_sem){----}, at: [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c

other info that might help us debug this:
1 lock held by find/7113:
 #0:  (&mm->mmap_sem){----}, at: [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c

stack backtrace:

Call Trace:
 [<ffffffff8026e97d>] show_trace+0xae/0x336
 [<ffffffff8026ec1a>] dump_stack+0x15/0x17
 [<ffffffff802a844c>] __lock_acquire+0x135/0xa64
 [<ffffffff802a931e>] lock_acquire+0x4b/0x69
 [<ffffffff802a5dfb>] down_read+0x3e/0x4a
 [<ffffffff802b6d46>] acct_collect+0x58/0x1b7
 [<ffffffff80215eea>] do_exit+0x237/0x911
 [<ffffffff8026a102>] do_page_fault+0x7c0/0x84c
 [<ffffffff80261391>] error_exit+0x0/0x96
DWARF2 unwinder stuck at error_exit+0x0/0x96
Leftover inexact backtrace:
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80269dc9>] do_page_fault+0x487/0x84c
 [<ffffffff80267412>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff80261391>] error_exit+0x0/0x96

mm/memory.c:105: bad pgd ffff81005ce72000(a1df6fb0ad4f4f78).
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/mmap.c:2068
invalid opcode: 0000 [2] SMP
last sysfs file: /block/loop4/dev
CPU 1
Modules linked in: squashfs loop autofs4 hidp rfcomm l2cap bluetooth sunrpc
ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_tables xt_state
ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6
acpi_cpufreq dm_multipath video sbs i2c_ec button battery asus_acpi ac
parport_pc lp parport sg intel_rng e752x_edac shpchp i2c_i801 edac_mc i2c_core
e1000 pcspkr i6300esb serio_raw dm_snapshot dm_zero dm_mirror dm_mod mptsas
mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd
uhci_hcd
Pid: 7113, comm: find Not tainted 2.6.17-1.2630.fc6 #1
RIP: 0010:[<ffffffff8023cc78>]  [<ffffffff8023cc78>] exit_mmap+0xe4/0xf9
RSP: 0000:ffff81005d709b48  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff810003c2e480 RCX: 0000000000000000
RDX: 000000000000001f RSI: ffff810001000078 RDI: ffff81007e1df908
RBP: ffff81005d709b78 R08: ffff81005d709a88 R09: 0000000000000000
R10: ffff81007e1df858 R11: 00000000000000b0 R12: 0000000000000000
R13: ffff81007efd4dd8 R14: ffff810059f5d688 R15: 0000000000000000
FS:  00002aaaab673710(0000) GS:ffff810003f4f988(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffb0b0ad4f4000 CR3: 0000000000201000 CR4: 00000000000006e0
Process find (pid: 7113, threadinfo ffff81005d708000, task ffff810059f5d040)
Stack:  ffff81005d709b78 0000000000000098 ffff810003c2e480 ffff81007efd4dd8
 ffff81007efd4ed0 ffff810059f5d040 ffff81005d709b98 ffffffff8023f07f
 ffff81007efd4dd8 ffff81007efd4e40 ffff81005d709bc8 ffffffff80243813
Call Trace:
 [<ffffffff8023f07f>] mmput+0x42/0x94
 [<ffffffff80243813>] exit_mm+0xef/0xf8
 [<ffffffff80215f52>] do_exit+0x29f/0x911
 [<ffffffff8026a102>] do_page_fault+0x7c0/0x84c
 [<ffffffff80261391>] error_exit+0x0/0x96
DWARF2 unwinder stuck at error_exit+0x0/0x96
Leftover inexact backtrace:
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98
 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c
 [<ffffffff80269dc9>] do_page_fault+0x487/0x84c
 [<ffffffff80267412>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff80261391>] error_exit+0x0/0x96


Code: 0f 0b 68 78 77 4a 80 c2 14 08 48 83 c4 18 5b 41 5c 41 5d c9
RIP  [<ffffffff8023cc78>] exit_mmap+0xe4/0xf9
 RSP <ffff81005d709b48>
 <1>Fixing recursive fault but reboot is needed!

Comment 24 Prarit Bhargava 2006-09-14 00:21:09 UTC

>a) Disable SLAB_DEBUG

Still panics.

b) remove the inode-diet-squashfs patch

Kernel does not compile without this patch.

[root@altix3 linux-2.6.17.ia64]# patch -p1 -R <
../../../SOURCES/linux-2.6-inode-diet-squashfs.patch 
patching file fs/squashfs/inode.c
Hunk #1 succeeded at 604 (offset -3 lines).
patching file fs/squashfs/squashfs2_0.c
Hunk #1 succeeded at 229 (offset 1 line).
[root@altix3 linux-2.6.17.ia64]# make -j64 compressed
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CHK     include/linux/compile.h
[root@altix3 linux-2.6.17.ia64]# make -j64 modules
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CC [M]  fs/squashfs/inode.o
  CC [M]  fs/squashfs/squashfs2_0.o
fs/squashfs/inode.c:69: warning: initialization from incompatible pointer type
fs/squashfs/squashfs2_0.c: In function ‘squashfs_iget_2’:
fs/squashfs/squashfs2_0.c:232: error: ‘struct inode’ has no member named ‘i_blksize’
fs/squashfs/inode.c: In function ‘squashfs_iget’:
fs/squashfs/inode.c:607: error: ‘struct inode’ has no member named ‘i_blksize’
fs/squashfs/inode.c:660: error: ‘struct inode’ has no member named ‘i_blksize’
fs/squashfs/inode.c: In function ‘squashfs_get_sb’:
fs/squashfs/inode.c:2073: warning: return makes pointer from integer without a cast
make[2]: *** [fs/squashfs/squashfs2_0.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [fs/squashfs/inode.o] Error 1
make[1]: *** [fs/squashfs] Error 2
make: *** [fs] Error 2

c) try and backport "new" upstream version

still panics.

Comment 25 Prarit Bhargava 2006-09-14 00:54:40 UTC

Panic happens on plain 2.6.17 + 2.6.18-rc6.patch (ie, upstream latest).

P.

Comment 26 Eric Sandeen 2006-09-14 04:32:30 UTC

Hmm this is scary.  Running the test corrupted the stage2.img I was testing
with... this is supposed to be a readonly fs.

Comment 27 Eric Sandeen 2006-09-14 04:44:55 UTC

I take that back; things were ok after a reboot, this was just more corrupted
memory.

Comment 28 Zhang Yanmin 2006-09-14 09:17:28 UTC

I compiled 2.6.17-1.2630 and the new kernel could boot well on my tiger 
machine, but failed on hp cx2600. On cx2600, I used RHEL4 distribution. Then, 
I just installed the FC6 test2 kernel by rpm and it also couldn't boot.

Both failed just after kernel/initrd were loaded. No any console messages 
except initrd is loaded. Serial console has no messages. I'm not familiar with 
hp ia64 machines. Does cx2600 have debug ports?

Comment 29 Prarit Bhargava 2006-09-14 10:35:32 UTC

Zhang,

This BZ is about random memory corruption and is not for general system boot
issues.  If you have other issues, please either open up a new BZ or ask on the
fedora-ia64 list.

Thanks,

P.

Comment 30 Prarit Bhargava 2006-09-14 10:40:54 UTC

Updating the Summary (for now ...)

p.

Comment 31 Prarit Bhargava 2006-09-14 10:47:17 UTC

Adding Peter to the mix.

Quick note for Peter so he doesn't have to read EVERYTHING in this BZ:  The
issue is that if you mount, read, umount squashfs the system eventually panics.

This is bad because squashfs is used in the system installer.

squashfs is not upstream and is maintained out of stream.

It *really* hits ia64 badly, but we've reproduced it on x86_64 and ppc now.

It can be reproducing on the 2.6.17 + 2.6.18-rc6 patch + squashfs tree (vanilla,
no RH patches).

esandeen is doing some examination of it now... but thought you would like to
join in the fun :)

P.

Comment 32 Prarit Bhargava 2006-09-14 16:18:13 UTC

We have chased this down to an issue with the filesystem using fragments.
Avoiding fragments until squashfs is patched and fixed is a good idea and
will increase stability in the long run.

We are discussing with Phillip Lougher (squashfs maintainer).

Comment 33 Prarit Bhargava 2006-09-14 16:25:57 UTC

I've asked the anaconda team to modify the mk-images script to use -no-fragments
for now.  Please see BZ 206472.

P.

Comment 34 Eric Sandeen 2006-09-14 22:33:20 UTC

It looks very much to me like squashfs_read_data() is corrupting memory badly.

It's doing this:

         if (compressed) {
                 int zlib_err;
                 stream.next_in = c_buffer;
                 stream.avail_in = c_byte;
                 stream.next_out = buffer;
                 stream.avail_out = msblk->read_size;

where "stream" is a struct that zlib takes
per the comments in the struct this translates to:

                 stream.next_in = c_buffer; /* next input byte */
                 stream.avail_in = c_byte;  /* number of bytes available at
next_in */
                 stream.next_out = buffer;  /* next output byte should be put
there */
                 stream.avail_out = msblk->read_size; /* remaining free space at
next_out */

now, "buffer" in this case is the block cache buffer that we allocated, 8k
but read_size is:

         msblk->read_size = (sblk->block_size < SQUASHFS_METADATA_SIZE) ?
                                         SQUASHFS_METADATA_SIZE :
                                         sblk->block_size;

our block size is 64k, SQUASHFS_METADATA_SIZE is 8k, so read_size is 64k

now if I understand what zlib is doing, we're giving it a pointer to 8k of
memory and telling it to fill up to 64k from there

printk's seem to confirm this.  Peter (squashfs author) says this should be fine
for metadata because metadata is never > 8192, but if I look at the return
values from zlib, it appears to consistently be larger than 8192 (so perhaps
this path isn't only for metadata...)  and this is horribly corrupting memory.

Comment 35 Eric Sandeen 2006-09-14 22:55:05 UTC

I flagged when we send the 8192-byte block_cache[i].data buffers into
squashfs_read_data() and dumped stack if zlib ever wrote more than 8k:

                if (blockcache && stream.total_out > SQUASHFS_METADATA_SIZE) {
                        printk("blockcache, out %lu?\n", stream.total_out);
                        dump_stack();
                }
and sure enough:

blockcache, out 8511?

Call Trace:
 [<ffffffff80271134>] show_trace+0xb8/0x334
 [<ffffffff802713c3>] dump_stack+0x13/0x15
 [<ffffffff882d150e>] :squashfs:squashfs_read_data+0x50e/0x5a7
 [<ffffffff882d17ad>] :squashfs:squashfs_get_cached_block+0x206/0x3d1
 [<ffffffff882d1a74>] :squashfs:get_fragment_location+0xfc/0x125
 [<ffffffff882d22c1>] :squashfs:squashfs_iget+0x492/0x171e
 [<ffffffff882d5e8b>] :squashfs:squashfs_lookup+0x638/0x694
 [<ffffffff8020d200>] do_lookup+0xc3/0x175
 [<ffffffff8020a06e>] __link_path_walk+0xa35/0xf53
 [<ffffffff8020edf5>] link_path_walk+0x61/0xec
 [<ffffffff8020cf9d>] do_path_lookup+0x27e/0x2f3
 [<ffffffff80224d85>] __user_walk_fd+0x3f/0x5a
 [<ffffffff8024310c>] vfs_lstat_fd+0x24/0x5a
 [<ffffffff8022cd3d>] sys_newlstat+0x22/0x3c
 [<ffffffff80262c52>] system_call+0x7e/0x83

Comment 36 Eric Sandeen 2006-09-16 15:58:07 UTC

Looks like the fragment-fixing patch in the package is wrong, I think we need
the one attached in the 5th comment of bug #202663

Comment 37 Jeremy Katz 2006-09-18 16:52:58 UTC

Building squashfs-tools with the updated patch now -- want to try making an
image when it's done and if that fixes it, close this (and then I'll change
anaconda back)

Comment 38 Prarit Bhargava 2006-09-20 12:04:56 UTC

*** Bug 204625 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.