Description of problem: We are seeing various random panics during install of rawhide on ia64 starting sonewhere around the 20060829 build tree. Since there are so many different panics that all are appearing starting at the same time it looks like this is a memory corruption issue. I will post stack traces as attachements as I see them. Version-Release number of selected component (if applicable): 2.6.17-1.2600.fc6 How reproducible: probably 95% of the time. So far only seeing during installation. A couple of installs have been successful. Steps to Reproduce: 1. NFS install on ia64 2. 3. Actual results: Expected results: Additional info:
Created attachment 135220 [details] bugcheck at list_del+0x100/0x160
Created attachment 135221 [details] Oops at mm_init+0x190/0x240
Created attachment 135240 [details] oops from xprt_reserve+0x150/0x2e0 [sunrpc]
What one out of three and its an NFS problem??? :-) Make sure that patch from bz 204859 has been applied and also beware of bz204848 if your using NFSv4.
Reassigning to correct owner, kernel-maint.
FYI, things are looking better as of today's build (20060905) however since we could not reproduce 100% before the plan is to keep this open for a day or two and make sure we don't hit this again. But, it is looking like the other NFS patches may have fixed this.
These issues seem to be back as of 20060907. I have 1 panic that does seem to be somewhat reproducable. Other than that I get random hangs. I can sometimes get the hang evenwhen doing an http install so either we have multiple problems or it isn't NFS. The panic _does_ appear to be NFS related however. I get this 100% of the time on 1 particular system (an rx2600 w/ 1 CPU). loop0[606]: Oops 8804682956800 [1] Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 mptspi scsi_transport_spi mptscsih mptbase e100 mii tg3 ohci_hcd ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs Pid: 606, CPU 0, comm: loop0 psr : 0000101008526010 ifs : 800000000000038b ip : [<a0000002004ee080>] Not tainted ip is at xprt_reserve+0x140/0x2e0 [sunrpc] unat: 0000000000000000 pfs : 000000000000038b rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000000009681 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000002004edfa0 b6 : a00000020069bca0 b7 : a0000002004e8340 f6 : 000000000000000000000 f7 : 0ffdd8000000000000000 f8 : 000000000000000000000 f9 : 10008c200000000000000 f10 : 1003e0000000000000082 f11 : 1003e0000000000000078 r1 : a000000200521438 r2 : e00000003eb6eca8 r3 : 0000000000000000 r8 : e00000003e349998 r9 : ffffffffffffffff r10 : e00000003e3499a0 r11 : e00000003e349808 r12 : e00000003eccfcb0 r13 : e00000003ecc8000 r14 : e00000003eb6ecb0 r15 : 0000000000000000 r16 : 00000000ffffffff r17 : e00000003ecae0a8 r18 : e00000003ecae000 r19 : e00000003ecae0b0 r20 : e00000003ecae078 r21 : e00000003ecae0b8 r22 : e00000003ecae080 r23 : e00000003ecae0c0 r24 : e00000003e349834 r25 : e00000003cb18138 r26 : e00000003ecae108 r27 : e00000003cb18270 r28 : e00000003e349620 r29 : e00000003ecae0a0 r30 : a00000020055a858 r31 : 3466d843f6cebdf7 Call Trace: [<a000000100013e80>] show_stack+0x40/0xa0 sp=e00000003eccf840 bsp=e00000003ecc9610 [<a000000100014780>] show_regs+0x840/0x880 sp=e00000003eccfa10 bsp=e00000003ecc95b0 [<a000000100037b80>] die+0x1c0/0x2a0 sp=e00000003eccfa10 bsp=e00000003ecc9568 [<a000000100624940>] ia64_do_page_fault+0x8a0/0x9e0 sp=e00000003eccfa30 bsp=e00000003ecc9518 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280 sp=e00000003eccfae0 bsp=e00000003ecc9518 [<a0000002004ee080>] xprt_reserve+0x140/0x2e0 [sunrpc] sp=e00000003eccfcb0 bsp=e00000003ecc94c0 [<a0000002004e8430>] call_reserve+0xf0/0x120 [sunrpc] sp=e00000003eccfcb0 bsp=e00000003ecc94a0 [<a0000002004f8950>] __rpc_execute+0x1f0/0x660 [sunrpc] sp=e00000003eccfcb0 bsp=e00000003ecc9468 [<a0000002004f8ec0>] rpc_execute+0xa0/0xc0 [sunrpc] sp=e00000003eccfcb0 bsp=e00000003ecc9448 [<a000000200690ef0>] nfs_execute_read+0x90/0xe0 [nfs] sp=e00000003eccfcb0 bsp=e00000003ecc9420 [<a000000200691cb0>] nfs_pagein_one+0x590/0x600 [nfs] sp=e00000003eccfcc0 bsp=e00000003ecc93c0 [<a0000002006923e0>] nfs_readpages+0x6c0/0x820 [nfs] sp=e00000003eccfcd0 bsp=e00000003ecc9368 [<a0000001001048b0>] __do_page_cache_readahead+0x1f0/0x400 sp=e00000003eccfd10 bsp=e00000003ecc9308 [<a000000100104ba0>] blockable_page_cache_readahead+0xe0/0x1e0 sp=e00000003eccfda0 bsp=e00000003ecc92c0 [<a000000100105190>] page_cache_readahead+0x350/0x4a0 sp=e00000003eccfda0 bsp=e00000003ecc9268 [<a0000001000f3c30>] do_generic_mapping_read+0x190/0x8a0 sp=e00000003eccfda0 bsp=e00000003ecc91b0 [<a0000001000f43d0>] generic_file_sendfile+0x90/0xe0 sp=e00000003eccfdf0 bsp=e00000003ecc9168 [<a00000020067d5d0>] nfs_file_sendfile+0x110/0x140 [nfs] sp=e00000003eccfe10 bsp=e00000003ecc9120 [<a0000002005893b0>] loop_thread+0x750/0x840 [loop] sp=e00000003eccfe10 bsp=e00000003ecc90a8 [<a0000001000123f0>] kernel_thread_helper+0x30/0x60 sp=e00000003eccfe30 bsp=e00000003ecc9080 [<a0000001000090c0>] start_kernel_thread+0x20/0x40 sp=e00000003eccfe30 bsp=e00000003ecc9080
(In reply to comment #7) > FYI, > > things are looking better as of today's build (20060905) however since we could > not reproduce 100% before the plan is to keep this open for a day or two and > make sure we don't hit this again. > > But, it is looking like the other NFS patches may have fixed this. > While trying to get one of my systems installed with something workable I just tried the 0905 build again and hit yet another random panic there also. So, appears this wasn't fixed it just did a better job of hiding for a while. This time it appears it was just as anaconda was finishing and unmounting (so I guess the intall completed). This one was on a 4 cpu HP Integrity rx4640. sending termination signals...done sending kill signals...done disabling swap... /dev/mapper/VolGroup00-LogVol01 /tmp/sdb1 unmounting filesystems... init[1]: bugcheck! 0 [1] Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 cciss mptspi scsi_transport_spi mptscsih mptbase qla2xxx scsi_transport_fc e1000 ohci_hcd ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs Pid: 1, CPU 2, comm: init psr : 00001010085a2010 ifs : 800000000000050e ip : [<a0000001001433a0>] Not tainted ip is at cache_free_debugcheck+0x3c0/0x600 unat: 0000000000000000 pfs : 000000000000050e rsc : 0000000000000003 rnat: e0000040fe609034 bsps: ffffffffdead4ead pr : 0000000000066559 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001001433a0 b6 : a00000010015a400 b7 : a00000020bcf8900 f6 : 0fffbccccccccc8c00000 f7 : 0ffdaa200000000000000 f8 : 100008000000000000000 f9 : 10002a000000000000000 f10 : 0fffcccccccccc8c00000 f11 : 1003e0000000000000000 r1 : a000000100ba13c0 r2 : a0000001009b8bf8 r3 : e0000040fe609034 r8 : 0000000000000021 r9 : a0000001009b5960 r10 : a0000001009b8c28 r11 : a0000001009b8c28 r12 : e0000040fe60fce0 r13 : e0000040fe608000 r14 : a0000001009b8bf8 r15 : 0000000000000000 r16 : ffffffffdead4ead r17 : 00000000dead4ead r18 : a000000100841b64 r19 : a0000001009b5958 r20 : 0000000000000000 r21 : a0000001009a1a58 r22 : 0000000000000000 r23 : a0000001007f3100 r24 : a0000001009a1a58 r25 : a0000001009b8c00 r26 : a0000001009b8c00 r27 : a0000001008e8780 r28 : a0000001009a1be8 r29 : 0000000000000002 r30 : a000000100841b70 r31 : e0000040fe609034 Call Trace: [<a000000100013e80>] show_stack+0x40/0xa0 sp=e0000040fe60f870 bsp=e0000040fe609580 [<a000000100014780>] show_regs+0x840/0x880 sp=e0000040fe60fa40 bsp=e0000040fe609528 [<a000000100037ba0>] die+0x1c0/0x2a0 sp=e0000040fe60fa40 bsp=e0000040fe6094e0 [<a000000100037cd0>] die_if_kernel+0x50/0x80 sp=e0000040fe60fa60 bsp=e0000040fe6094b0 [<a00000010061ec50>] ia64_bad_break+0x270/0x4a0 sp=e0000040fe60fa60 bsp=e0000040fe609488 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280 sp=e0000040fe60fb10 bsp=e0000040fe609488 [<a0000001001433a0>] cache_free_debugcheck+0x3c0/0x600 sp=e0000040fe60fce0 bsp=e0000040fe609418 [<a000000100146a40>] kmem_cache_free+0x1c0/0x600 sp=e0000040fe60fce0 bsp=e0000040fe6093c8 [<a0000001001598b0>] free_buffer_head+0x90/0x100 sp=e0000040fe60fcf0 bsp=e0000040fe6093a8 [<a000000100159ef0>] try_to_free_buffers+0x170/0x1c0 sp=e0000040fe60fcf0 bsp=e0000040fe609378 [<a00000010015a050>] try_to_release_page+0x110/0x140 sp=e0000040fe60fd00 bsp=e0000040fe609350 [<a000000100107440>] invalidate_complete_page+0x60/0x1e0 sp=e0000040fe60fd00 bsp=e0000040fe609320 [<a000000100107ad0>] invalidate_mapping_pages+0x130/0x220 sp=e0000040fe60fd00 bsp=e0000040fe6092d0 [<a000000100107bf0>] invalidate_inode_pages+0x30/0x60 sp=e0000040fe60fd80 bsp=e0000040fe6092b0 [<a00000010015c3f0>] invalidate_bdev+0x90/0xc0 sp=e0000040fe60fd80 bsp=e0000040fe609290 [<a000000100169a30>] kill_bdev+0x30/0x80 sp=e0000040fe60fd80 bsp=e0000040fe609270 [<a00000010016acc0>] __blkdev_put+0xa0/0x3a0 sp=e0000040fe60fd80 bsp=e0000040fe609228 [<a00000010016b050>] blkdev_put+0x30/0x60 sp=e0000040fe60fd90 bsp=e0000040fe609208 [<a00000010016b0b0>] close_bdev_excl+0x30/0x60 sp=e0000040fe60fd90 bsp=e0000040fe6091e0 [<a000000100167a00>] kill_block_super+0x60/0x80 sp=e0000040fe60fd90 bsp=e0000040fe6091b8 [<a000000100167ca0>] deactivate_super+0x180/0x1c0 sp=e0000040fe60fd90 bsp=e0000040fe609190 [<a00000010019bf30>] mntput_no_expire+0xb0/0x1e0 sp=e0000040fe60fd90 bsp=e0000040fe609168 [<a0000001001772a0>] path_release_on_umount+0x40/0x60 sp=e0000040fe60fd90 bsp=e0000040fe609148 [<a00000010019f300>] sys_umount+0x620/0x700 sp=e0000040fe60fd90 bsp=e0000040fe6090d0 [<a00000010000c560>] ia64_ret_from_syscall+0x0/0x40 sp=e0000040fe60fe30 bsp=e0000040fe6090d0 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e0000040fe610000 bsp=e0000040fe6090d0 <0>Kernel panic - not syncing: Attempted to kill init!
Doug, steved suggests that we try turning off selinux. Let's try that on Monday and retest ... P.
(In reply to comment #12) > Doug, > > steved suggests that we try turning off selinux. Let's try that on Monday and > retest ... > > P. Much if not most of my installs have been done with selinux=0. I re-verified just now on today's build and hit a panic again. Oddly today's kernel is still the 2.6.17-1.2630.fc6 rev. Perhaps we should try a newer kernel with the latest git pull just in case this has been fixed upstream?
I just hit this today, symptoms look very similar. Installation on rx1620 using VNC, http. Running anaconda, the Fedora Core system installer - please wait... Probing for video card: Unable to probe No video hardware found, assuming headless Starting VNC... The VNC server is now running. Please connect to 10.202.2.7:1 to begin the install... Starting graphical installation... Press <enter> for a shell loadkeys(747): unaligned access to 0x2000000002c7ea54, ip=0x2000000000018050 loadkeys(747): unaligned access to 0x2000000002c7ea54, ip=0x2000000000018060 loadkeys(747): unaligned access to 0x2000000002c7ea6c, ip=0x2000000000018050 loadkeys(747): unaligned access to 0x2000000002c7ea6c, ip=0x2000000000018060 loadkeys(747): unaligned access to 0x2000000002c7ea84, ip=0x2000000000018050 XKB extension not present on :1 anaconda[714]: bugcheck! 0 [1] Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 mptspi scsi_transport_spi mptscsih mptbase e1000 ohci_hcd ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs Pid: 714, CPU 1, comm: anaconda psr : 0000101008522030 ifs : 800000000000050e ip : [<a0000001001433c0>] Not tainted ip is at cache_free_debugcheck+0x3c0/0x600 unat: 0000000000000000 pfs : 000000000000050e rsc : 0000000000000003 rnat: 00000000201222f6 bsps: e0000001ffcf1fac pr : 0000000000265559 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001001433c0 b6 : a000000100066fa0 b7 : a000000100500bc0 f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db r1 : a000000100bb1580 r2 : a0000001009c8fd8 r3 : e0000040fb0c9034 r8 : 0000000000000021 r9 : a0000001009c5748 r10 : a0000001009c9008 r11 : a0000001009c9008 r12 : e0000040fb0cfbc0 r13 : e0000040fb0c8000 r14 : a0000001009c8fd8 r15 : 0000000000000000 r16 : ffffffffdead4ead r17 : 00000000dead4ead r18 : a0000001008f85ec r19 : a0000001009c5740 r20 : 0000000000000000 r21 : a0000001009b1c18 r22 : 0000000000000073 r23 : a0000001007fb100 r24 : a0000001009b1c18 r25 : a0000001009c8fe0 r26 : a0000001009c8fe0 r27 : 000000003fffff00 r28 : e0000040fb0c9048 r29 : e000000106c10060 r30 : e0000040fb0c802c r31 : e000000106c1002c Call Trace: [<a000000100013e80>] show_stack+0x40/0xa0 sp=e0000040fb0cf750 bsp=e0000040fb0c94f0 [<a000000100014780>] show_regs+0x840/0x880 sp=e0000040fb0cf920 bsp=e0000040fb0c9498 [<a000000100037b80>] die+0x1c0/0x2a0 sp=e0000040fb0cf920 bsp=e0000040fb0c9450 [<a000000100037cb0>] die_if_kernel+0x50/0x80 sp=e0000040fb0cf940 bsp=e0000040fb0c9420 [<a000000100622070>] ia64_bad_break+0x270/0x4a0 sp=e0000040fb0cf940 bsp=e0000040fb0c93f0 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280 sp=e0000040fb0cf9f0 bsp=e0000040fb0c93f0 [<a0000001001433c0>] cache_free_debugcheck+0x3c0/0x600 sp=e0000040fb0cfbc0 bsp=e0000040fb0c9380 [<a000000100147f30>] kfree+0x170/0x5e0 sp=e0000040fb0cfbc0 bsp=e0000040fb0c9340 [<a00000010050a650>] skb_release_data+0x190/0x1c0 sp=e0000040fb0cfbd0 bsp=e0000040fb0c9318 [<a000000100509ee0>] kfree_skbmem+0x20/0x160 sp=e0000040fb0cfbd0 bsp=e0000040fb0c92f8 [<a00000010050a310>] __kfree_skb+0x2f0/0x320 sp=e0000040fb0cfbd0 bsp=e0000040fb0c92d0 [<a000000100581e40>] tcp_recvmsg+0x1040/0x1980 sp=e0000040fb0cfbd0 bsp=e0000040fb0c9240 [<a000000100500c50>] sock_common_recvmsg+0x90/0xe0 sp=e0000040fb0cfbf0 bsp=e0000040fb0c9200 [<a0000001004fb4b0>] sock_recvmsg+0x1f0/0x240 sp=e0000040fb0cfc00 bsp=e0000040fb0c91b8 [<a0000001004fe940>] sys_recvfrom+0x120/0x220 sp=e0000040fb0cfd60 bsp=e0000040fb0c9120 [<a0000001004fea80>] sys_recv+0x40/0x60 sp=e0000040fb0cfe30 bsp=e0000040fb0c90c8 [<a00000010000c560>] ia64_ret_from_syscall+0x0/0x40 sp=e0000040fb0cfe30 bsp=e0000040fb0c90c8 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e0000040fb0d0000 bsp=e0000040fb0c90c8
And a bit more data, to try to rule out networking I did a DVD install of the 20060911 build. It got farther but still hit a panic when 87% of the way through loading packages. This was with selinux enabled, I will try again with selinux=0. bugcheck! 0 [1]s to 0x200000000009aa54, ip=0x2000000000018050 │ ─┘ Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 cciss mptspi scsi_transport_spi mptscsih mptbase tg3 e100 mii ohci_hcd ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfsnext screen Pid: 172, CPU 0, comm: kswapd0 psr : 0000101008022038 ifs : 800000000000050e ip : [<a0000001001433c0>] Not tainted ip is at cache_free_debugcheck+0x3c0/0x600 unat: 0000000000000000 pfs : 000000000000050e rsc : 0000000000000003 rnat: 0000000000000000 bsps: e0000040ffd78588 pr : 0000000000009541 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a74433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001001433c0 b6 : a0000001000112e0 b7 : a000000201be4bc0 f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db r1 : a000000100bb1580 r2 : a0000001009c8fd8 r3 : e0000040fc301034 r8 : 0000000000000021 r9 : a0000001009c5748 r10 : a0000001009c9008 r11 : a0000001009c9008 r12 : e0000040fc307c10 r13 : e0000040fc300000 r14 : a0000001009c8fd8 r15 : 0000000000000000 r16 : ffffffffdead4ead r17 : 00000000dead4ead r18 : a0000001008f85ec r19 : a0000001009c5740 r20 : 0000000000000000 r21 : a0000001009b1c18 r22 : 0000000000000004 r23 : a0000001007fb100 r24 : a0000001009b1c18 r25 : a0000001009c8fe0 r26 : a0000001009c8fe0 r27 : e0000040fb3f1020 r28 : e0000040fb3f0008 r29 : e0000040fc360060 r30 : e0000040fb3f002c r31 : e0000040fc36002c Call Trace: [<a000000100013e80>] show_stack+0x40/0xa0 sp=e0000040fc3077a0 bsp=e0000040fc301468 [<a000000100014780>] show_regs+0x840/0x880 sp=e0000040fc307970 bsp=e0000040fc301410 [<a000000100037b80>] die+0x1c0/0x2a0 sp=e0000040fc307970 bsp=e0000040fc3013c0 [<a000000100037cb0>] die_if_kernel+0x50/0x80 sp=e0000040fc307990 bsp=e0000040fc301390 [<a000000100622070>] ia64_bad_break+0x270/0x4a0 sp=e0000040fc307990 bsp=e0000040fc301368 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280 sp=e0000040fc307a40 bsp=e0000040fc301368 [<a0000001001433c0>] cache_free_debugcheck+0x3c0/0x600 sp=e0000040fc307c10 bsp=e0000040fc3012f8 [<a000000100146a60>] kmem_cache_free+0x1c0/0x600 sp=e0000040fc307c10 bsp=e0000040fc3012b0 [<a000000201be4bf0>]gnome-applets-2.16.0.1-1.fc6-ia64 [squashfs] 34019k sp=e0000040fc307c20 bsp=e0000040fc301290 [<a000000100194980>]Small applications for the GNOME panel 040fc307c20 bsp=e0000040fc301270 [<a000000100195c40>] dispose_list+0x160/0x200 78% 1238 [<a0000001001968c0>] shrink_icache_memory+0x480/0x5a0 sp=e0000040fc307c20 bsp=e0000040fc3011e0 [<a00000010010af60>] shrink_slab+0x220/0x380 55 sp=e00000463c307c30 20p=e0000040f4301190 [<a00000010010c3a0>] kswapd+0x6c0/0x900 3 1 sp=e0000040fc307c30 bsp=e0000040fc3010f0 [<a0000001000adc00>] kthread+0x220/0x2a0 sp=e0000040fc307d50 bsp=e0000040fc3010a8 [<a0000001000123f0>] kernel_thread_helper+0x30/0x60 sp=e0000040fc307e30 bsp=e0000040fc301080 [<a0000001000090c0>] start_kernel_thread+0x20/0x40 sp=e0000040fc307e30 bsp=e0000040fc301080
DVD install of 20060911 with selinux=0 failed even sooner. Appears anaconda stage2 was just starting: anaconda[782]: bugcheck! 0 [1] Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 cciss mptspi scsi_transport_spi mptscsih mptbase tg3 e100 mii ohci_hcd ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs Pid: 782, CPU 0, comm: anaconda psr : 0000101008522030 ifs : 800000000000038c ip : [<a0000001001411f0>] Not tainted ip is at check_slabp+0x210/0x240 unat: 0000000000000000 pfs : 000000000000038c rsc : 0000000000000003 rnat: 0000000000250259 bsps: a0000001001a73e0 pr : 00000000002a5559 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001001411f0 b6 : a000000100066fa0 b7 : a000000100010570 f6 : 0fffbccccccccc8c00000 f7 : 0ffdaa200000000000000 f8 : 100008000000000000000 f9 : 10002a000000000000000 f10 : 0fffcccccccccc8c00000 f11 : 1003e0000000000000000 r1 : a000000100bb1580 r2 : a0000001009c8fd8 r3 : e000004046af1034 r8 : 0000000000000021 r9 : a0000001009c5748 r10 : a0000001009c9008 r11 : a0000001009c9008 r12 : e000004046af7e20 r13 : e000004046af0000 r14 : a0000001009c8fd8 r15 : 0000000000000000 r16 : ffffffffdead4ead r17 : 00000000dead4ead r18 : a0000001008f85ec r19 : a0000001009c5740 r20 : 0000000000000000 r21 : a0000001009b1c18 r22 : 0000000000000000 r23 : a0000001007fb100 r24 : a0000001009b1c18 r25 : a0000001009c8fe0 r26 : a0000001009c8fe0 r27 : e0000040fc3c7b18 r28 : e000000004c046e0 r29 : 0000000000000000 r30 : e000000004c046e8 r31 : e000004046af1034 Call Trace: [<a000000100013e80>] show_stack+0x40/0xa0 sp=e000004046af79b0 bsp=e000004046af15c0 [<a000000100014780>] show_regs+0x840/0x880 sp=e000004046af7b80 bsp=e000004046af1568 [<a000000100037b80>] die+0x1c0/0x2a0 sp=e000004046af7b80 bsp=e000004046af1520 [<a000000100037cb0>] die_if_kernel+0x50/0x80 sp=e000004046af7ba0 bsp=e000004046af14f0 [<a000000100622070>] ia64_bad_break+0x270/0x4a0 sp=e000004046af7ba0 bsp=e000004046af14c8 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280 sp=e000004046af7c50 bsp=e000004046af14c8 [<a0000001001411f0>] check_slabp+0x210/0x240 sp=e000004046af7e20 bsp=e000004046af1468 [<a0000001001449b0>] cache_alloc_refill+0x1f0/0x5e0 sp=e000004046af7e20 bsp=e000004046af1410 [<a000000100145e30>] kmem_cache_alloc+0x190/0x220 sp=e000004046af7e20 bsp=e000004046af13d8 [<a0000001004fb070>] sock_alloc_inode+0x30/0xc0 sp=e000004046af7e20 bsp=e000004046af13b8 [<a000000100194a40>] alloc_inode+0x60/0x3e0 sp=e000004046af7e20 bsp=e000004046af1388 [<a000000100194e00>] new_inode+0x40/0x140 sp=e000004046af7e20 bsp=e000004046af1360 [<a0000001004fd3a0>] sock_alloc+0x40/0x100 sp=e000004046af7e20 bsp=e000004046af1348 [<a0000001004fd6a0>] __sock_create+0x240/0x660 sp=e000004046af7e20 bsp=e000004046af12f0 [<a0000001004fdb60>] sock_create+0x40/0x60 sp=e000004046af7e20 bsp=e000004046af12b8 [<a0000001004fe1d0>] sys_socket+0x30/0xc0 sp=e000004046af7e20 bsp=e000004046af1258 [<a00000010000c560>] ia64_ret_from_syscall+0x0/0x40 sp=e000004046af7e30 bsp=e000004046af1258 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e000004046af8000 bsp=e000004046af1258 install exited abnormally [1/1] <0>BUG: spinlock cpu recursion on CPU#0, events/0/8 (Not tainted) lock: e0000040fffb75c0, .magic: dead4ead, .owner: anaconda/782, .owner_cpu: 0 Call Trace: [<a000000100013e80>] show_stack+0x40/0xa0 sp=e0000040fe637b30 bsp=e0000040fe6312c8 [<a000000100013f10>] dump_stack+0x30/0x60 sp=e0000040fe637d00 bsp=e0000040fe6312b0 [<a0000001002acbb0>] spin_bug+0x130/0x1e0 sp=e0000040fe637d00 bsp=e0000040fe631258 [<a0000001002ace30>] _raw_spin_lock+0xb0/0x260 sp=e0000040fe637d00 bsp=e0000040fe631220 [<a000000100620690>] _spin_lock_irq+0x30/0x60 sp=e0000040fe637d00 bsp=e0000040fe631200 [<a000000100147730>] drain_array+0xb0/0x200 sp=e0000040fe637d00 bsp=e0000040fe6311a8 [<a00000010014af10>] cache_reap+0x1f0/0x580 sp=e0000040fe637d00 bsp=e0000040fe631160 [<a0000001000a4000>] run_workqueue+0x1c0/0x280 sp=e0000040fe637d00 bsp=e0000040fe631120 [<a0000001000a5ee0>] worker_thread+0x1a0/0x240 sp=e0000040fe637d00 bsp=e0000040fe6310f0 [<a0000001000adc00>] kthread+0x220/0x2a0 sp=e0000040fe637d50 bsp=e0000040fe6310a8 [<a0000001000123f0>] kernel_thread_helper+0x30/0x60 sp=e0000040fe637e30 bsp=e0000040fe631080 [<a0000001000090c0>] start_kernel_thread+0x20/0x40 sp=e0000040fe637e30 bsp=e0000040fe631080 BUG: spinlock lockup on CPU#0, events/0/8, e0000040fffb75c0 (Not tainted) Call Trace: [<a000000100013e80>] show_stack+0x40/0xa0 sp=e0000040fe637b30 bsp=e0000040fe631270 [<a000000100013f10>] dump_stack+0x30/0x60 sp=e0000040fe637d00 bsp=e0000040fe631258 [<a0000001002acf80>] _raw_spin_lock+0x200/0x260 sp=e0000040fe637d00 bsp=e0000040fe631220 [<a000000100620690>] _spin_lock_irq+0x30/0x60 sp=e0000040fe637d00 bsp=e0000040fe631200 [<a000000100147730>] drain_array+0xb0/0x200 sp=e0000040fe637d00 bsp=e0000040fe6311a8 [<a00000010014af10>] cache_reap+0x1f0/0x580 sp=e0000040fe637d00 bsp=e0000040fe631160 [<a0000001000a4000>] run_workqueue+0x1c0/0x280 sp=e0000040fe637d00 bsp=e0000040fe631120 [<a0000001000a5ee0>] worker_thread+0x1a0/0x240 sp=e0000040fe637d00 bsp=e0000040fe6310f0 [<a0000001000adc00>] kthread+0x220/0x2a0 sp=e0000040fe637d50 bsp=e0000040fe6310a8 [<a0000001000123f0>] kernel_thread_helper+0x30/0x60 sp=e0000040fe637e30 bsp=e0000040fe631080 [<a0000001000090c0>] start_kernel_thread+0x20/0x40 sp=e0000040fe637e30 bsp=e0000040fe631080
Adding Dave in case he's heard of any corruption issues ... Okay -- just some thoughts here. AFAICT the problem is only seen in the install. HTTP, FTP, CD/DVD, and NFS installs are all failing, so we can safely say that it is not a network or NFS issue or a single driver issue. I don't believe that we're seeing many issues -- I still think it's one single issue that is effecting us in different ways (I could be convinced otherwise). We originally saw this when we made the most recent jump to the latest git pull. Some questions for Doug and things for Doug to try: a) Has this ever happened on a DVD you've built -- you have a nifty buildos utitlity built by a jeeneeous engineer ;) ? b) The one BIG thing that is different between the kernel boot and the install is the use of squashfs. Maybe a squashfs test might be in order. c) Moving forward and backwards through the git pulls might be a good idea. Doug -- a) would be pretty important to know. That way we can concentrate on a specific build. P.
Prarit, I just tried building a DVD using your buildos script. The install appears to be working on this one but I got an oops during the install while loading packages. The screen is garbled due to the fact that it kept installing so I can't really cut+paste it all here so I guess we just add this to the list of randomness. This was a text install from DVD whtout specifying any selinux flags. Oops 11012296146944 [1]000009aa54, ip=0x2000000000018050 │ Modules linked in: dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 ext3 jbd msdos raid456 xor raid1 raid0 mptspi scsi_transport_spi mptscsih mptbase e100 mii tg3 ohci_hcd ehci_hcd iscsi_tcp libiscsi scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 squashfs loop nfs nfs_acl fscache lockd sunrpc vfat fat cramfs<F12> next screen Pid: 2760, CPU 0, comm: load_policy psr : 0000121008526010 ifs : 800000000000040b ip : [<a00000010024c301>] Not tainted ip is at hashtab_map+0x61/0x140
Created attachment 136187 [details] reproducer for panics seen during install I was able to reproduce a panic with this script on an installed and running system. This hopefully will aid in debugging since we were only able to see it under anaconda before.
with my reproducer it seems to be slightly less random and I hit slab corruption a little more often than others. This one is from 2.6.17-1.2573.fc6 Slab corruption: (Not tainted) start=e00000046fb62000, len=8192 Call Trace: [<a000000100013de0>] show_stack+0x40/0xa0 sp=e000000115b2f9b0 bsp=e000000115b297c0 [<a000000100013e70>] dump_stack+0x30/0x60 sp=e000000115b2fb80 bsp=e000000115b297a8 [<a000000100131fa0>] check_poison_obj+0x120/0x4c0 sp=e000000115b2fb80 bsp=e000000115b29748 [<a000000100132bc0>] cache_alloc_debugcheck_after+0x60/0x480 sp=e000000115b2fb80 bsp=e000000115b29708 [<a000000100135e60>] kmem_cache_alloc+0x1e0/0x220 sp=e000000115b2fb80 bsp=e000000115b296d8 [<a00000020d026260>] squashfs_get_cached_block+0x380/0x8e0 [squashfs] sp=e000000115b2fb80 bsp=e000000115b29648 [<a00000020d029980>] squashfs_iget+0x3a0/0x2c00 [squashfs] sp=e000000115b2fbb0 bsp=e000000115b295e8 [<a00000020d027530>] squashfs_lookup+0xd70/0xe40 [squashfs] sp=e000000115b2fc30 bsp=e000000115b29550 [<a0000001001681e0>] do_lookup+0x1a0/0x460 sp=e000000115b2fc70 bsp=e000000115b294f8 [<a00000010016dfb0>] __link_path_walk+0x1870/0x2680 sp=e000000115b2fc70 bsp=e000000115b29498 [<a00000010016ee80>] link_path_walk+0xc0/0x260 sp=e000000115b2fc90 bsp=e000000115b29458 [<a00000010016f900>] do_path_lookup+0x540/0x660 sp=e000000115b2fd20 bsp=e000000115b29418 [<a000000100170cc0>] __user_walk_fd+0x60/0xa0 sp=e000000115b2fd30 bsp=e000000115b293d8 [<a00000010015ebb0>] vfs_lstat_fd+0x30/0xa0 sp=e000000115b2fd30 bsp=e000000115b293a8 [<a00000010015f170>] sys_newlstat+0x30/0x80 sp=e000000115b2fdc0 bsp=e000000115b29348 [<a00000010000c560>] ia64_ret_from_syscall+0x0/0x40 sp=e000000115b2fe30 bsp=e000000115b29348 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 sp=e000000115b30000 bsp=e000000115b29348 000: ae 51 6b b3 ca 4f 7e 37 39 7e f8 e0 e8 f4 d4 77 010: df bd 2f 29 0a 04 24 49 74 b9 9c 05 59 ab 56 ae 020: 5c a9 cd a0 9b 79 be 2a 8f 1a 4a 92 96 61 b4 92 030: c6 69 2c 46 f3 a5 ce 2a 18 bb 35 3e d9 d3 d3 d2 040: d2 7f fe 7d ec bf 11 0d 2a 33 af 22 58 e8 5d bb 050: 75 ab 61 d5 aa 0c 6a 05 bd 78 0c 34 69 31 5e 06 Prev obj: start=e00000046fb60000, len=8192 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Bumping to urgent because of corruption. Changing to all arch's. Adding Jeremy Katz. From HP xw9400 x86_64 running kernel build 2617 with Doug's reproducer: kernel BUG at mm/slab.c:2765! invalid opcode: 0000 [#1] SMP last sysfs file: /block/loop6/dev Modules linked in: squashfs loop autofs4 hidp nfs lockd fscache nfs_acl rfcomm l 2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_table s xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand dm_multipath video sbs i2c_eclist_del corruption. next->pr ev should be f2d06000, but was a1df6fb0 button battery asus_acpi ac parport_pc lp parport snd_hda_intel snd_hda_codec s nd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss s nd_mixer_oss snd_pcm snd_timer ide_cd snd forcedeth cdrom sg soundcore i2c_nforc e2 i2c_core e1000 snd_page_alloc floppy serio_raw pcspkr ohci1394 ieee1394 k8_ed ac edac_mc dm_snapshot dm_zero dm_mirror dm_mod sata_nv libata mptsas mptscsih s csi_transport_sas sd_mod scsi_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 1 EIP: 0060:[<c046e4d5>] Not tainted VLI EFLAGS: 00010002 (2.6.17-1.2617.2.1.fc6 #1) EIP is at cache_free_debugcheck+0x113/0x19e eax: 02e1ed1f ebx: f7ff1ac0 ecx: 00000054 edx: 00000012 esi: f1d70f98 edi: ffb5415a ebp: f5aeae70 esp: f5aeae54 ds: 007b es: 007b ss: 0068 Process umount (pid: 23603, ti=f5aea000 task=f7171530 task.ti=f5aea000) Stack: 02e1ed1f c04c605f f1d70000 170fc2a5 c2137164 f7ff1ac0 00000246 f5aeae8c c046e8b3 f1d70f9c f6491544 f1d70f9c f1825304 f6491544 f5aeaea0 c04c605f f1825304 f182530c 0000006a f5aeaeac c0489790 f1825304 f5aeaec4 c0489d9a Call Trace: [<c046e8b3>] kmem_cache_free+0x6c/0xba [<c04c605f>] selinux_inode_free_security+0x59/0x5e [<c0489790>] destroy_inode+0x25/0x4a [<c0489d9a>] dispose_list+0x94/0xc1 [<c048a024>] invalidate_inodes+0xae/0xc3 [<c0478f0b>] generic_shutdown_super+0x46/0xdf [<c0478fc4>] kill_block_super+0x20/0x32 [<c0479084>] deactivate_super+0x5d/0x6f [<c048be95>] mntput_no_expire+0x42/0x72 [<c047eadf>] path_release_on_umount+0x15/0x18 [<c048cfc1>] sys_umount+0x1e7/0x21b [<c048d002>] sys_oldumount+0xd/0xf [<c0403faf>] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb Leftover inexact backtrace: [<c0405391>] show_stack_log_lvl+0x8a/0x95 [<c04054c9>] show_registers+0x12d/0x19a [<c04056c6>] die+0x190/0x293 [<c06144ba>] do_trap+0x7c/0x96 [<c0405eb6>] do_invalid_op+0x89/0x93 [<c0404be1>] error_code+0x39/0x40 [<c046e8b3>] kmem_cache_free+0x6c/0xba [<c04c605f>] selinux_inode_free_security+0x59/0x5e [<c0489790>] destroy_inode+0x25/0x4a [<c0489d9a>] dispose_list+0x94/0xc1 [<c048a024>] invalidate_inodes+0xae/0xc3 [<c0478f0b>] generic_shutdown_super+0x46/0xdf [<c0478fc4>] kill_block_super+0x20/0x32 [<c0479084>] deactivate_super+0x5d/0x6f [<c048be95>] mntput_no_expire+0x42/0x72 [<c047eadf>] path_release_on_umount+0x15/0x18 [<c048cfc1>] sys_umount+0x1e7/0x21b [<c048d002>] sys_oldumount+0xd/0xf [<c0403faf>] syscall_call+0x7/0xb Code: 89 d8 e8 45 f7 ff ff 8b 55 e8 89 10 8b 45 ec 31 d2 8b 8b 8c 00 00 00 8b 78 0c 89 f0 29 f8 f7 f1 3b 83 98 00 00 00 89 45 e4 72 08 <0f> 0b cd 0a 28 ae 63 c0 8b 45 e4 0f af c1 8d 04 07 39 c6 74 08 EIP: [<c046e4d5>] cache_free_debugcheck+0x113/0x19e SS:ESP 0068:f5aeae54 <0>------------[ cut here ]------------ BUG: warning at kernel/exit.c:769/do_exit() (Not tainted) [<c04051ee>] show_trace_log_lvl+0x58/0x171 [<c0405802>] show_trace+0xd/0x10 [<c040591b>] dump_stack+0x19/0x1b [<c0426d95>] do_exit+0x4a/0x784 [<c04057a3>] die+0x26d/0x293 [<c06144ba>] do_trap+0x7c/0x96 [<c0405eb6>] do_invalid_op+0x89/0x93 [<c0404be1>] error_code+0x39/0x40 DWARF2 unwinder stuck at error_code+0x39/0x40 Leftover inexact backtrace: [<c0405802>] show_trace+0xd/0x10 [<c040591b>] dump_stack+0x19/0x1b [<c0426d95>] do_exit+0x4a/0x784 [<c04057a3>] die+0x26d/0x293 [<c06144ba>] do_trap+0x7c/0x96 [<c0405eb6>] do_invalid_op+0x89/0x93 [<c0404be1>] error_code+0x39/0x40 [<c046e8b3>] kmem_cache_free+0x6c/0xba [<c04c605f>] selinux_inode_free_security+0x59/0x5e [<c0489790>] destroy_inode+0x25/0x4a [<c0489d9a>] dispose_list+0x94/0xc1 [<c048a024>] invalidate_inodes+0xae/0xc3 [<c0478f0b>] generic_shutdown_super+0x46/0xdf [<c0478fc4>] kill_block_super+0x20/0x32 [<c0479084>] deactivate_super+0x5d/0x6f [<c048be95>] mntput_no_expire+0x42/0x72 [<c047eadf>] path_release_on_umount+0x15/0x18 [<c048cfc1>] sys_umount+0x1e7/0x21b [<c048d002>] sys_oldumount+0xd/0xf [<c0403faf>] syscall_call+0x7/0xb kernel BUG at lib/list_debug.c:70! invalid opcode: 0000 [#2] SMP last sysfs file: /block/loop6/dev Modules linked in: squashfs loop autofs4 hidp nfs lockd fscache nfs_acl rfcomm l 2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_table s xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand dm_multipath video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_ seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_time r ide_cd snd forcedeth cdrom sg soundcore i2c_nforce2 i2c_core e1000 snd_page_al loc floppy serio_raw pcspkr ohci1394 ieee1394 k8_edac edac_mc dm_snapshot dm_zer o dm_mirror dm_mod sata_nv libata mptsas mptscsih scsi_transport_sas sd_mod scsi _mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<c04ed28b>] Not tainted VLI EFLAGS: 00010046 (2.6.17-1.2617.2.1.fc6 #1) EIP is at list_del+0x3b/0x62 eax: 00000045 ebx: f2d06000 ecx: c0424ea6 edx: dffefef4 esi: f7ff9694 edi: f7ff1ac0 ebp: dffeff00 esp: dffefef0 ds: 007b es: 007b ss: 0068 Process events/0 (pid: 14, ti=dffef000 task=dfd9aab0 task.ti=dffef000) Stack: c0644271 f2d06000 a1df6fb0 f2d06000 dffeff2c c046ea76 00000009 0000001a 0000000d f2d06f98 f1872040 f7fc79a4 f7fc7970 0000001a f7fc7944 dffeff4c c046ebfb 00000000 f7ff1ac0 f7ff96b8 f7ff9694 f7ff1ac0 f7f89464 dffeff64 Call Trace: [<c046ea76>] free_block+0x65/0x175 [<c046ebfb>] drain_array+0x75/0x99 [<c047050f>] cache_reap+0x76/0x118 [<c043361e>] run_workqueue+0x7a/0xbb [<c0433f53>] worker_thread+0xd2/0x107 [<c04364bd>] kthread+0xc3/0xf2 [<c0402005>] kernel_thread_helper+0x5/0xb DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb Leftover inexact backtrace: [<c0405391>] show_stack_log_lvl+0x8a/0x95 [<c04054c9>] show_registers+0x12d/0x19a [<c04056c6>] die+0x190/0x293 [<c06144ba>] do_trap+0x7c/0x96 [<c0405eb6>] do_invalid_op+0x89/0x93 [<c0404be1>] error_code+0x39/0x40 [<c046ea76>] free_block+0x65/0x175 [<c046ebfb>] drain_array+0x75/0x99 [<c047050f>] cache_reap+0x76/0x118 [<c043361e>] run_workqueue+0x7a/0xbb [<c0433f53>] worker_thread+0xd2/0x107 [<c04364bd>] kthread+0xc3/0xf2 [<c0402005>] kernel_thread_helper+0x5/0xb Code: 53 68 26 42 64 c0 e8 8b 7c f3 ff 0f 0b 41 00 60 42 64 c0 83 c4 0c 8b 03 8b 40 04 39 d8 74 17 50 53 68 71 42 64 c0 e8 6b 7c f3 ff <0f> 0b 46 00 60 42 64 c0 83 c4 0c 8b 13 8b 43 04 89 42 04 89 10 EIP: [<c04ed28b>] list_del+0x3b/0x62 SS:ESP 0068:dffefef0 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 [<c04051ee>] show_trace_log_lvl+0x58/0x171 [<c0405802>] show_trace+0xd/0x10 [<c040591b>] dump_stack+0x19/0x1b [<c041de6f>] __might_sleep+0x8d/0x95 [<c04390b8>] down_read+0x15/0x40 [<c0430fd4>] blocking_notifier_call_chain+0x11/0x2d [<c0425977>] profile_task_exit+0x11/0x13 [<c0426d67>] do_exit+0x1c/0x784 [<c04057a3>] die+0x26d/0x293 [<c06144ba>] do_trap+0x7c/0x96 [<c0405eb6>] do_invalid_op+0x89/0x93 [<c0404be1>] error_code+0x39/0x40 DWARF2 unwinder stuck at error_code+0x39/0x40 Leftover inexact backtrace: [<c0405802>] show_trace+0xd/0x10 [<c040591b>] dump_stack+0x19/0x1b [<c041de6f>] __might_sleep+0x8d/0x95 [<c04390b8>] down_read+0x15/0x40 [<c0430fd4>] blocking_notifier_call_chain+0x11/0x2d [<c0425977>] profile_task_exit+0x11/0x13 [<c0426d67>] do_exit+0x1c/0x784 [<c04057a3>] die+0x26d/0x293 [<c06144ba>] do_trap+0x7c/0x96 [<c0405eb6>] do_invalid_op+0x89/0x93 [<c0404be1>] error_code+0x39/0x40 [<c046ea76>] free_block+0x65/0x175 [<c046ebfb>] drain_array+0x75/0x99 [<c047050f>] cache_reap+0x76/0x118 [<c043361e>] run_workqueue+0x7a/0xbb [<c0433f53>] worker_thread+0xd2/0x107 [<c04364bd>] kthread+0xc3/0xf2 [<c0402005>] kernel_thread_helper+0x5/0xb SELinux: initialized (dev loop1, type squashfs), not configured for labeling BUG: spinlock lockup on CPU#0, find/23608, f7ff96b8 (Not tainted) [<c04051ee>] show_trace_log_lvl+0x58/0x171 [<c0405802>] show_trace+0xd/0x10 [<c040591b>] dump_stack+0x19/0x1b [<c04ecfb2>] _raw_spin_lock+0xba/0xd9 [<c0613f5a>] _spin_lock+0x20/0x28 [<c046edf7>] cache_alloc_refill+0x69/0x652 [<c046f712>] kmem_cache_alloc+0x80/0xb5 [<c04c6090>] selinux_inode_alloc_security+0x2c/0x87 [<c0489896>] alloc_inode+0xe1/0x170 [<c048993c>] new_inode+0x17/0x70 [<f8dd73d4>] squashfs_new_inode+0x13/0x86 [squashfs] [<f8dda8d8>] squashfs_iget+0x3dd/0x12ee [squashfs] [<f8dd98fb>] squashfs_lookup+0x595/0x5d5 [squashfs] [<c047ef70>] do_lookup+0xab/0x153 [<c0480dec>] __link_path_walk+0x8e7/0xdcc [<c0481321>] link_path_walk+0x50/0xca [<c0481728>] do_path_lookup+0x23b/0x28d [<c0481ec0>] __user_walk_fd+0x2f/0x43 [<c047b7d4>] vfs_lstat_fd+0x16/0x3d [<c047b839>] vfs_lstat+0x11/0x13 [<c047b84f>] sys_lstat64+0x14/0x28 [<c0403faf>] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb Leftover inexact backtrace: [<c0405802>] show_trace+0xd/0x10 [<c040591b>] dump_stack+0x19/0x1b [<c04ecfb2>] _raw_spin_lock+0xba/0xd9 [<c0613f5a>] _spin_lock+0x20/0x28 [<c046edf7>] cache_alloc_refill+0x69/0x652 [<c046f712>] kmem_cache_alloc+0x80/0xb5 [<c04c6090>] selinux_inode_alloc_security+0x2c/0x87 [<c0489896>] alloc_inode+0xe1/0x170 [<c048993c>] new_inode+0x17/0x70 [<f8dd73d4>] squashfs_new_inode+0x13/0x86 [squashfs] [<f8dda8d8>] squashfs_iget+0x3dd/0x12ee [squashfs] [<f8dd98fb>] squashfs_lookup+0x595/0x5d5 [squashfs] [<c047ef70>] do_lookup+0xab/0x153 [<c0480dec>] __link_path_walk+0x8e7/0xdcc [<c0481321>] link_path_walk+0x50/0xca [<c0481728>] do_path_lookup+0x23b/0x28d [<c0481ec0>] __user_walk_fd+0x2f/0x43 [<c047b7d4>] vfs_lstat_fd+0x16/0x3d [<c047b839>] vfs_lstat+0x11/0x13 [<c047b84f>] sys_lstat64+0x14/0x28 [<c0403faf>] syscall_call+0x7/0xb
Jeremy suggests a few things: a) Disable SLAB_DEBUG b) remove the inode-diet-squashfs patch c) try and backport "new" upstream version I'll build these on altix3 and test ... P.
just hit another one on x86_64, this time not slab corruption but an ugly panic. This was with 2.6.17-1.2630.fc6 Oops: 0000 [1] SMP last sysfs file: /block/loop4/dev CPU 1 Modules linked in: squashfs loop autofs4 hidp rfcomm l2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_tables xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 acpi_cpufreq dm_multipath video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport sg intel_rng e752x_edac shpchp i2c_i801 edac_mc i2c_core e1000 pcspkr i6300esb serio_raw dm_snapshot dm_zero dm_mirror dm_mod mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 7113, comm: find Not tainted 2.6.17-1.2630.fc6 #1 RIP: 0010:[<ffffffff80208970>] [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98 RSP: 0000:ffff81005d709dc8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffffb0b0ad4f4000 RCX: 0000000000000000 RDX: 00002fb0ad4f4000 RSI: ffff8100717117d0 RDI: ffff81007efd4dd8 RBP: ffff81005d709e58 R08: 0000000000000002 R09: 0000000000000000 R10: ffffffff80269d13 R11: ffffffff80269d13 R12: ffff810000000000 R13: ffff81005d709f58 R14: ffff81007efd4dd8 R15: ffff8100717117d0 FS: 00002aaaab673710(0000) GS:ffff810003f4f988(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffb0b0ad4f4000 CR3: 000000005ce72000 CR4: 00000000000006e0 Process find (pid: 7113, threadinfo ffff81005d708000, task ffff810059f5d040) Stack: 0000000100000001 0000000000000000 0000000000402c24 ffff81007efd4dd8 0000000000000001 ffff81005d709f58 ffff81007efd4dd8 0000000000000014 ffff81005d709e38 0000000000000246 ffffffff80269d13 ffff810059f5d040 Call Trace: [<ffffffff80269dc9>] do_page_fault+0x487/0x84c [<ffffffff80261391>] error_exit+0x0/0x96 DWARF2 unwinder stuck at error_exit+0x0/0x96 Leftover inexact backtrace: Code: 48 83 3b 00 75 18 48 8b 55 80 48 8b 7d 88 48 89 de e8 69 74 RIP [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98 RSP <ffff81005d709dc8> CR2: ffffb0b0ad4f4000 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 Call Trace: [<ffffffff8026e97d>] show_trace+0xae/0x336 [<ffffffff8026ec1a>] dump_stack+0x15/0x17 [<ffffffff8020badb>] __might_sleep+0xb2/0xb4 [<ffffffff802a5dda>] down_read+0x1d/0x4a [<ffffffff8029dce5>] blocking_notifier_call_chain+0x1b/0x41 [<ffffffff80294077>] profile_task_exit+0x15/0x17 [<ffffffff80215cd7>] do_exit+0x24/0x911 [<ffffffff8026a102>] do_page_fault+0x7c0/0x84c [<ffffffff80261391>] error_exit+0x0/0x96 DWARF2 unwinder stuck at error_exit+0x0/0x96 Leftover inexact backtrace: [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80269dc9>] do_page_fault+0x487/0x84c [<ffffffff80267412>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff80261391>] error_exit+0x0/0x96 ============================================= [ INFO: possible recursive locking detected ] 2.6.17-1.2630.fc6 #1 --------------------------------------------- find/7113 is trying to acquire lock: (&mm->mmap_sem){----}, at: [<ffffffff802b6d46>] acct_collect+0x58/0x1b7 but task is already holding lock: (&mm->mmap_sem){----}, at: [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c other info that might help us debug this: 1 lock held by find/7113: #0: (&mm->mmap_sem){----}, at: [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c stack backtrace: Call Trace: [<ffffffff8026e97d>] show_trace+0xae/0x336 [<ffffffff8026ec1a>] dump_stack+0x15/0x17 [<ffffffff802a844c>] __lock_acquire+0x135/0xa64 [<ffffffff802a931e>] lock_acquire+0x4b/0x69 [<ffffffff802a5dfb>] down_read+0x3e/0x4a [<ffffffff802b6d46>] acct_collect+0x58/0x1b7 [<ffffffff80215eea>] do_exit+0x237/0x911 [<ffffffff8026a102>] do_page_fault+0x7c0/0x84c [<ffffffff80261391>] error_exit+0x0/0x96 DWARF2 unwinder stuck at error_exit+0x0/0x96 Leftover inexact backtrace: [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80269dc9>] do_page_fault+0x487/0x84c [<ffffffff80267412>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff80261391>] error_exit+0x0/0x96 mm/memory.c:105: bad pgd ffff81005ce72000(a1df6fb0ad4f4f78). ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at mm/mmap.c:2068 invalid opcode: 0000 [2] SMP last sysfs file: /block/loop4/dev CPU 1 Modules linked in: squashfs loop autofs4 hidp rfcomm l2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_tables xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 acpi_cpufreq dm_multipath video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport sg intel_rng e752x_edac shpchp i2c_i801 edac_mc i2c_core e1000 pcspkr i6300esb serio_raw dm_snapshot dm_zero dm_mirror dm_mod mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 7113, comm: find Not tainted 2.6.17-1.2630.fc6 #1 RIP: 0010:[<ffffffff8023cc78>] [<ffffffff8023cc78>] exit_mmap+0xe4/0xf9 RSP: 0000:ffff81005d709b48 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff810003c2e480 RCX: 0000000000000000 RDX: 000000000000001f RSI: ffff810001000078 RDI: ffff81007e1df908 RBP: ffff81005d709b78 R08: ffff81005d709a88 R09: 0000000000000000 R10: ffff81007e1df858 R11: 00000000000000b0 R12: 0000000000000000 R13: ffff81007efd4dd8 R14: ffff810059f5d688 R15: 0000000000000000 FS: 00002aaaab673710(0000) GS:ffff810003f4f988(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffb0b0ad4f4000 CR3: 0000000000201000 CR4: 00000000000006e0 Process find (pid: 7113, threadinfo ffff81005d708000, task ffff810059f5d040) Stack: ffff81005d709b78 0000000000000098 ffff810003c2e480 ffff81007efd4dd8 ffff81007efd4ed0 ffff810059f5d040 ffff81005d709b98 ffffffff8023f07f ffff81007efd4dd8 ffff81007efd4e40 ffff81005d709bc8 ffffffff80243813 Call Trace: [<ffffffff8023f07f>] mmput+0x42/0x94 [<ffffffff80243813>] exit_mm+0xef/0xf8 [<ffffffff80215f52>] do_exit+0x29f/0x911 [<ffffffff8026a102>] do_page_fault+0x7c0/0x84c [<ffffffff80261391>] error_exit+0x0/0x96 DWARF2 unwinder stuck at error_exit+0x0/0x96 Leftover inexact backtrace: [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80208970>] __handle_mm_fault+0xce/0xc98 [<ffffffff80269d13>] do_page_fault+0x3d1/0x84c [<ffffffff80269dc9>] do_page_fault+0x487/0x84c [<ffffffff80267412>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff80261391>] error_exit+0x0/0x96 Code: 0f 0b 68 78 77 4a 80 c2 14 08 48 83 c4 18 5b 41 5c 41 5d c9 RIP [<ffffffff8023cc78>] exit_mmap+0xe4/0xf9 RSP <ffff81005d709b48> <1>Fixing recursive fault but reboot is needed!
>a) Disable SLAB_DEBUG Still panics. b) remove the inode-diet-squashfs patch Kernel does not compile without this patch. [root@altix3 linux-2.6.17.ia64]# patch -p1 -R < ../../../SOURCES/linux-2.6-inode-diet-squashfs.patch patching file fs/squashfs/inode.c Hunk #1 succeeded at 604 (offset -3 lines). patching file fs/squashfs/squashfs2_0.c Hunk #1 succeeded at 229 (offset 1 line). [root@altix3 linux-2.6.17.ia64]# make -j64 compressed CHK include/linux/version.h CHK include/linux/utsrelease.h CHK include/linux/compile.h [root@altix3 linux-2.6.17.ia64]# make -j64 modules CHK include/linux/version.h CHK include/linux/utsrelease.h CC [M] fs/squashfs/inode.o CC [M] fs/squashfs/squashfs2_0.o fs/squashfs/inode.c:69: warning: initialization from incompatible pointer type fs/squashfs/squashfs2_0.c: In function ‘squashfs_iget_2’: fs/squashfs/squashfs2_0.c:232: error: ‘struct inode’ has no member named ‘i_blksize’ fs/squashfs/inode.c: In function ‘squashfs_iget’: fs/squashfs/inode.c:607: error: ‘struct inode’ has no member named ‘i_blksize’ fs/squashfs/inode.c:660: error: ‘struct inode’ has no member named ‘i_blksize’ fs/squashfs/inode.c: In function ‘squashfs_get_sb’: fs/squashfs/inode.c:2073: warning: return makes pointer from integer without a cast make[2]: *** [fs/squashfs/squashfs2_0.o] Error 1 make[2]: *** Waiting for unfinished jobs.... make[2]: *** [fs/squashfs/inode.o] Error 1 make[1]: *** [fs/squashfs] Error 2 make: *** [fs] Error 2 c) try and backport "new" upstream version still panics.
Panic happens on plain 2.6.17 + 2.6.18-rc6.patch (ie, upstream latest). P.
Hmm this is scary. Running the test corrupted the stage2.img I was testing with... this is supposed to be a readonly fs.
I take that back; things were ok after a reboot, this was just more corrupted memory.
I compiled 2.6.17-1.2630 and the new kernel could boot well on my tiger machine, but failed on hp cx2600. On cx2600, I used RHEL4 distribution. Then, I just installed the FC6 test2 kernel by rpm and it also couldn't boot. Both failed just after kernel/initrd were loaded. No any console messages except initrd is loaded. Serial console has no messages. I'm not familiar with hp ia64 machines. Does cx2600 have debug ports?
Zhang, This BZ is about random memory corruption and is not for general system boot issues. If you have other issues, please either open up a new BZ or ask on the fedora-ia64 list. Thanks, P.
Updating the Summary (for now ...) p.
Adding Peter to the mix. Quick note for Peter so he doesn't have to read EVERYTHING in this BZ: The issue is that if you mount, read, umount squashfs the system eventually panics. This is bad because squashfs is used in the system installer. squashfs is not upstream and is maintained out of stream. It *really* hits ia64 badly, but we've reproduced it on x86_64 and ppc now. It can be reproducing on the 2.6.17 + 2.6.18-rc6 patch + squashfs tree (vanilla, no RH patches). esandeen is doing some examination of it now... but thought you would like to join in the fun :) P.
We have chased this down to an issue with the filesystem using fragments. Avoiding fragments until squashfs is patched and fixed is a good idea and will increase stability in the long run. We are discussing with Phillip Lougher (squashfs maintainer).
I've asked the anaconda team to modify the mk-images script to use -no-fragments for now. Please see BZ 206472. P.
It looks very much to me like squashfs_read_data() is corrupting memory badly. It's doing this: if (compressed) { int zlib_err; stream.next_in = c_buffer; stream.avail_in = c_byte; stream.next_out = buffer; stream.avail_out = msblk->read_size; where "stream" is a struct that zlib takes per the comments in the struct this translates to: stream.next_in = c_buffer; /* next input byte */ stream.avail_in = c_byte; /* number of bytes available at next_in */ stream.next_out = buffer; /* next output byte should be put there */ stream.avail_out = msblk->read_size; /* remaining free space at next_out */ now, "buffer" in this case is the block cache buffer that we allocated, 8k but read_size is: msblk->read_size = (sblk->block_size < SQUASHFS_METADATA_SIZE) ? SQUASHFS_METADATA_SIZE : sblk->block_size; our block size is 64k, SQUASHFS_METADATA_SIZE is 8k, so read_size is 64k now if I understand what zlib is doing, we're giving it a pointer to 8k of memory and telling it to fill up to 64k from there printk's seem to confirm this. Peter (squashfs author) says this should be fine for metadata because metadata is never > 8192, but if I look at the return values from zlib, it appears to consistently be larger than 8192 (so perhaps this path isn't only for metadata...) and this is horribly corrupting memory.
I flagged when we send the 8192-byte block_cache[i].data buffers into squashfs_read_data() and dumped stack if zlib ever wrote more than 8k: if (blockcache && stream.total_out > SQUASHFS_METADATA_SIZE) { printk("blockcache, out %lu?\n", stream.total_out); dump_stack(); } and sure enough: blockcache, out 8511? Call Trace: [<ffffffff80271134>] show_trace+0xb8/0x334 [<ffffffff802713c3>] dump_stack+0x13/0x15 [<ffffffff882d150e>] :squashfs:squashfs_read_data+0x50e/0x5a7 [<ffffffff882d17ad>] :squashfs:squashfs_get_cached_block+0x206/0x3d1 [<ffffffff882d1a74>] :squashfs:get_fragment_location+0xfc/0x125 [<ffffffff882d22c1>] :squashfs:squashfs_iget+0x492/0x171e [<ffffffff882d5e8b>] :squashfs:squashfs_lookup+0x638/0x694 [<ffffffff8020d200>] do_lookup+0xc3/0x175 [<ffffffff8020a06e>] __link_path_walk+0xa35/0xf53 [<ffffffff8020edf5>] link_path_walk+0x61/0xec [<ffffffff8020cf9d>] do_path_lookup+0x27e/0x2f3 [<ffffffff80224d85>] __user_walk_fd+0x3f/0x5a [<ffffffff8024310c>] vfs_lstat_fd+0x24/0x5a [<ffffffff8022cd3d>] sys_newlstat+0x22/0x3c [<ffffffff80262c52>] system_call+0x7e/0x83
Looks like the fragment-fixing patch in the package is wrong, I think we need the one attached in the 5th comment of bug #202663
Building squashfs-tools with the updated patch now -- want to try making an image when it's done and if that fixes it, close this (and then I'll change anaconda back)
*** Bug 204625 has been marked as a duplicate of this bug. ***