Bug 475488

Summary: GFS2: mkfs.gfs2 stuck in gfs2_dirent_next loop
Product: Red Hat Enterprise Linux 5 Reporter: Nate Straz <nstraz>
Component: gfs2-utilsAssignee: Robert Peterson <rpeterso>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: edamato
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-12-09 23:11:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 471618    
Attachments:
Description Flags
gzipped i386 core dump for mkfs.gfs2 from gfs2-utils-0.1.51-1.el5
none
patch to fix the problem none

Description Nate Straz 2008-12-09 13:45:04 UTC
Description of problem:

While trying to make a 1T file system with 1k file system block size, mkfs got stuck in a loop and did not complete.  I attached GDB and stepped through the command:
(gdb) bt
#0  leaf_search (dip=0x1acc72f8, bh=0x1addb2b8, 
    filename=0xbfcdb350 "inum_range5", len=11, dent_out=0xbfcdb24c, 
    dent_prev=0x0) at fs_ops.c:1258
#1  0x08051f59 in linked_leaf_search (dip=0x1acc72f8, 
    filename=0xbfcdb350 "inum_range5", len=11, dent_out=0xbfcdb24c, 
    dent_prev=0x0, bh_out=0xbfcdb250) at fs_ops.c:1321
#2  0x08052023 in dir_e_search (dip=0x1acc72f8, 
    filename=0xbfcdb350 "inum_range5", len=11, type=0x0, inum=0xbfcdb2b0)
    at fs_ops.c:1358
#3  0x080521a2 in dir_search (dip=0x1acc72f8, 
    filename=0xbfcdb350 "inum_range5", len=11, type=0x0, inum=0xbfcdb2b0)
    at fs_ops.c:1423
#4  0x080525c1 in gfs2_lookupi (dip=0x1acc72f8, 
    filename=0xbfcdb350 "inum_range5", len=11, ipp=0xbfcdb2f8) at fs_ops.c:1551
#5  0x08051b62 in createi (dip=0x1acc72f8, filename=0xbfcdb350 "inum_range5", 
    mode=33152, flags=513) at fs_ops.c:1186
#6  0x080585b7 in build_inum_range (per_node=0x1acc72f8, j=5)
    at structures.c:170
#7  0x080588d9 in build_per_node (sdp=0xbfcdb4bc) at structures.c:249
#8  0x08049c6d in main_mkfs (argc=11, argv=0xbfcfda14) at main_mkfs.c:445
#9  0x08049339 in main (argc=11, argv=0xbfcfda14) at main.c:55
#10 0x0054ee8c in __libc_start_main () from /lib/libc.so.6
#11 0x080491b1 in ?? ()
(gdb) n
1275            } while (gfs2_dirent_next(dip, bh, &dent) == 0);
(gdb) 
1257                    if (!dent->de_inum.no_formal_ino){
(gdb) 
1258                            prev = dent;
(gdb) 
1259                            continue;
(gdb) 
1275            } while (gfs2_dirent_next(dip, bh, &dent) == 0);
(gdb) info locals
hash = 2442621791
dent = (struct gfs2_dirent *) 0x1addb3cc
prev = (struct gfs2_dirent *) 0x1addb3cc
entries = 0
x = 0
type = 2
(gdb) n
1257                    if (!dent->de_inum.no_formal_ino){
(gdb) 
1258                            prev = dent;
(gdb) print *dent
$1 = {de_inum = {no_formal_ino = 0, no_addr = 0}, de_hash = 0, de_rec_len = 0, 
  de_name_len = 0, de_type = 0, __pad = '\0' <repeats 13 times>}
(gdb) n
1259                            continue;
(gdb) n
1275            } while (gfs2_dirent_next(dip, bh, &dent) == 0);
(gdb) n
1257                    if (!dent->de_inum.no_formal_ino){
(gdb) print *dent
$2 = {de_inum = {no_formal_ino = 0, no_addr = 0}, de_hash = 0, de_rec_len = 0, 
  de_name_len = 0, de_type = 0, __pad = '\0' <repeats 13 times>}
(gdb) info locals
hash = 2442621791
dent = (struct gfs2_dirent *) 0x1addb3cc
prev = (struct gfs2_dirent *) 0x1addb3cc
entries = 0
x = 0
type = 2


Version-Release number of selected component (if applicable):
gfs2-utils-0.1.51-1.el5

How reproducible:
Unknown, this is the first time I've hit this.

Steps to Reproduce:
1. lvcreate -l 1T -n brawl0 brawl
2. mkfs.gfs2 -O -b 1024 -j 6 -p lock_dlm -t tankmorph:brawl0 /dev/brawl/brawl0

  
Actual results:
I'll attach the 12M core file I generated.

Expected results:
mkfs.gfs2 should complete

Additional info:

Comment 1 Nate Straz 2008-12-09 13:50:03 UTC
Created attachment 326312 [details]
gzipped i386 core dump for mkfs.gfs2 from gfs2-utils-0.1.51-1.el5

Comment 2 Nate Straz 2008-12-09 19:55:32 UTC
While trying to verify that things are working correctly, I was able to make the file system with only five journals instead of six.  But when I mount the gfs2meta file system I find that I cannot do an "ls -l" inside of per_node.  It does work in the root of the file system.

I also tried running gfs2_jadd on the file system and umounting after that command was taking a long time.

Comment 3 Nate Straz 2008-12-09 19:59:15 UTC
After a journal add, the quota_change4 file disappeared.

[root@tank-01 ~]# gfs2_jadd -j 1 /mnt/brawl
Filesystem:            /mnt/brawl
Old Journals           5
New Journals           6
[root@tank-01 ~]# ls /mnt/meta/per_node
inum_range0  inum_range4    quota_change2   statfs_change1  statfs_change5
inum_range1  inum_range5    quota_change3   statfs_change2
inum_range2  quota_change0  quota_change5   statfs_change3
inum_range3  quota_change1  statfs_change0  statfs_change4

Comment 4 Nate Straz 2008-12-09 22:02:21 UTC
I dug through old logs to see when I last ran this test case and it was on Nov 19 with gfs2-utils-0.1.49-1.el5.

I reinstalled gfs2-utils-0.1.49-1.el5 and 0.1.50-1.el5 and the test case passed with those two versions.

NOTE: The 1k block size makes mkfs.gfs2 take a lot longer to create the file system.  On a 1TB block device, a regular mkfs.gfs2 takes 0:51, a mkfs.gfs2 -b 1024 takes 13:41.

Comment 5 Robert Peterson 2008-12-09 22:58:22 UTC
Created attachment 326420 [details]
patch to fix the problem

This patch fixes the problem.

This turned out to be a regression introduced in bug #471618.
New function compute_heightsize needs to work a tiny bit
differently when dealing with jdata by using sdp->sd_jbsize
rather than the normal block size.  This subtle difference
caused the code that morphs a gfs2 directory from linear to
exhash to stop working properly, if the directory is jdata, as
it is in the case of the per_node directory.

Comment 6 Robert Peterson 2008-12-09 23:11:19 UTC
Seeing as how this is a regression with bug #471618's fix, I'm closing
this bug as DUPLICATE of that one.

*** This bug has been marked as a duplicate of bug 471618 ***

Comment 7 Fedora Update System 2009-01-24 02:36:00 UTC
gfs2-utils-2.03.11-1.fc9, cman-2.03.11-1.fc9, rgmanager-2.03.11-1.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.