Bug 209062 - fc6-pre BUG searching for fedora core installations while XFS filesystem exists
Summary: fc6-pre BUG searching for fedora core installations while XFS filesystem exists
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 6
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-10-03 04:33 UTC by Roger J. Allen
Modified: 2007-11-30 22:11 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2006-10-12 16:57:57 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
syslog with BUG output 33,772 bytes (32.98 KB, text/plain)
2006-10-03 04:33 UTC, Roger J. Allen
no flags Details
unlock xfs inode locks before freeing the inode (426 bytes, text/x-patch)
2006-10-10 15:42 UTC, Eric Sandeen
no flags Details

Description Roger J. Allen 2006-10-03 04:33:50 UTC
Description of problem:

Booted FC6-pre-dvd-i386 DVD with updates-fc6pre floppy:

boot: linux updates vga=791

After message about "Searching for Fedora Core Installations" appears
with an empty bargraph, screen goes black with just the pointer and the
circling blue dots.  The last filesystem that gets mounted is on an SATAII
drive with XFS.  The XFS filesystem only has three directories in it.
There is no installation on the XFS filesystem.

Attached is the syslog which has the BUG output.

After reformating the XFS filesytem to ext3, the install worked fine.

Then, reformating the filesystem back to XFS reproduced the install BUG again.

Version-Release number of selected component (if applicable):

FC6-pre-dvd-i386.iso with
http://people.redhat.com/~katzj/updates-fc6pre.img

How reproducible:

Every time I try to install with or without updates floppy when there
is an XFS filesystem.

Steps to Reproduce:
1. create XFS filesystem on a partition
2. boot: linux updates
3. answer prompts with defaults till Searching for FC Installations
  
Actual results:

hangs with BUG and black screen with pointer

able to get to command prompt with CTRL-ALT-F2 to save syslog to floppy

Expected results:

install or upgrade screen should appear

Additional info:

This is a test box with many versions of FC4 thru FC6-test3 (both i386
and X86_64), Centos, and two WinXP's installed.

ASRock 939DUAL-VSTA with Athlon64, 1GB memory, Nvidia 6800 Ultra, two IDE
Seagate drives and one SATAII Seagate drive.

FC6T3 was successfully installed on SATAII sda3 before creating XFS
filesystem on sda5.  Attempted to install FC6-pre onto sda6, but couldn't
get past "Searching for Fedora Core Installations" while XFS filesystem
existed.  FC6-pre dvd passed mediacheck test.

Plus Bugzilla didn't have a Version for fc6-pre, so I used fc6test3.

Comment 1 Roger J. Allen 2006-10-03 04:33:51 UTC
Created attachment 137624 [details]
syslog with BUG output 33,772 bytes

Comment 2 Roger J. Allen 2006-10-03 04:41:19 UTC
Also happens with fc6-test3-dvd-i386 DVD.

Comment 3 Dave Jones 2006-10-04 23:58:06 UTC
Perhaps Eric has some clues (and maybe he needs a break from ext3 bugs :)

Comment 4 Eric Sandeen 2006-10-05 02:12:15 UTC
Looks like lockdep problems, I'll look into it.

Comment 5 Eric Sandeen 2006-10-05 04:23:46 UTC
xfs looks completely DOA w/ lockdep enabled at this point (and probably also w/
slab debugging) - a simple mount/unmount will oops.  I'll look into it more
tomorrow, probably something straightforward.

Comment 6 Eric Sandeen 2006-10-05 22:56:55 UTC
We are blowing up in mark_lock because when it does:

        if (likely(this->class->usage_mask & new_mask))
                return 1;

this->class is 0x6b6b6b6b, meaning freed memory.

This got assigned previously in __lock_acquire,

        if (!subclass)
                class = lock->class_cache;

but class_cache is freed (0x6b6b6b6b) at this point.  I've not been able to
figure out where/why this happened yet...

Comment 7 Eric Sandeen 2006-10-06 22:02:00 UTC
Doing a little debugging in lock_release_non_nested:

lockdepth_depth 4 held_locks dde9cff4

first loop:

testing i==3, hlock dde9d06c, hlock->instance d1400ea4
testing i==2, hlock dde9d044, hlock->instance d1400edc
testing i==1, hlock dde9d01c, hlock->instance d1691110
Matched lock at d1691110 (i==1)

curr lockdep_depth now 1

i==2, hlock dde9d044, hlock->instance d1400edc, our lock d1691110
lock map at d1400edc lock->class_cache 6b6b6b6b lock->key 6b6b6b6b lock->name
6b6b6b6b?

somehow locks 2 and 3 above are freed, and -that- is why we are oopsing, not due
to the mutex_unlock(&sb->s_lock);.  Perhaps this means xfs has freed some locks
w/o unlocking them first....

Comment 8 Eric Sandeen 2006-10-09 17:51:52 UTC
Found out over the weekend that this is due to this code in xfs_ireclaim:

xfs_ireclaim(xfs_inode_t *ip)
{
   ...
        /*
         * Here we do a spurious inode lock in order to coordinate with
         * xfs_sync().  This is because xfs_sync() references the inodes
         * in the mount list without taking references on the corresponding
         * vnodes.  We make that OK here by ensuring that we wait until
         * the inode is unlocked in xfs_sync() before we go ahead and
         * free it.  We get both the regular lock and the io lock because
         * the xfs_sync() code may need to drop the regular one but will
         * still hold the io lock.
         */
        xfs_ilock(ip, XFS_ILOCK_EXCL | XFS_IOLOCK_EXCL)
...
        /*
         * Free all memory associated with the inode.
         */
        xfs_idestroy(ip);
}

So, lock & free.  This frees memory that lockdep is still pointing to.

Will huddle w/ the xfs guys when I get a chance to make sure unlocking the inode
just before freeing is acceptable.  (it does seem to fix this problem at least).

Comment 9 Eric Sandeen 2006-10-10 15:42:07 UTC
Created attachment 138154 [details]
unlock xfs inode locks before freeing the inode

this fixes the problem and has been ACKed by the sgi xfs folks.

Comment 10 Eric Sandeen 2006-10-12 16:57:57 UTC
attached patch is in rawhide now.


Note You need to log in before you can comment on or make changes to this bug.