Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 698298

Summary:

gfs2_edit savemeta/restoremeta problem with different block size

Product:

Red Hat Enterprise Linux 5

Reporter:

Martin Juricek <mjuricek>

Component:

gfs2-utils

Assignee:

Andrew Price <anprice>

Status:

CLOSED ERRATA

QA Contact:

Cluster QE <mspqa-list>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

5.6

CC:

adas, anprice, edamato, martinez, rpeterso

Target Milestone:

Keywords:

Regression

Target Release:

5.7

Hardware:

Unspecified

OS:

Linux

Whiteboard:

Fixed In Version:

gfs2-utils-0.1.62-31.el5

Doc Type:

Bug Fix

Doc Text:

Indirect blocks were being prematurely released from a gfs2_edit savemeta queue, this meant that some meta data were not being saved and resulted in an incomplete meta data set which did not pass a fsck when restored with gfs2_edit restoremeta. This was fixed so that the required blocks are now left on the queue and thus saved off with the rest of the meta data. Saving the meta data of a consistent file system now results in a complete meta data set which passes a fsck when restored.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-07-21 11:10:20 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Commit d8c8af9612e2e00b8d8739a3d72ca9669a593878, for reference	none
Reproducer script	none
Patch to leave indirect blocks queued for saving	none

Description Martin Juricek 2011-04-20 15:26:47 UTC

Description of problem:
gfs2_edit has problem with savemeta from source volume and restoremeta to target volume, when block size on both volumes is different. Restoring metadata runs with no error, but fsck.gfs2 finds corrupted journal.


Version-Release number of selected component (if applicable):
gfs2-utils-0.1.62-30.el5

How reproducible:
100%

Steps to Reproduce:

[root@a1 ~]# rpm -qa | grep gfs2
gfs2-utils-0.1.62-30.el5
[root@a1 ~]# mkfs.gfs2 -p lock_dlm -t a_cluster:savemeta0 -b 1024 -j 3 -J 32 /dev/savemeta/savemeta0 -O
Device:                    /dev/savemeta/savemeta0
Blocksize:                 1024
Device Size                101.68 GB (106622976 blocks)
Filesystem Size:           101.68 GB (106622974 blocks)
Journals:                  3
Resource Groups:           407
Locking Protocol:          "lock_dlm"
Lock Table:                "a_cluster:savemeta0"
UUID:                      A9099A2A-7920-31CB-7783-5C755C4E5054

[root@a1 ~]# mkfs.gfs2 -p lock_dlm -t a_cluster:savemeta1 -j 3 -J 32 /dev/savemeta/savemeta1 -O
Device:                    /dev/savemeta/savemeta1
Blocksize:                 4096
Device Size                101.68 GB (26655744 blocks)
Filesystem Size:           101.68 GB (26655744 blocks)
Journals:                  3
Resource Groups:           407
Locking Protocol:          "lock_dlm"
Lock Table:                "a_cluster:savemeta1"
UUID:                      18A9A6D9-1934-F25B-353B-FCA20DEDBE30

[root@a1 ~]# gfs2_edit savemeta /dev/savemeta/savemeta0 /tmp/meta.000
There are 106622976 blocks of 1024 bytes in the destination device.
Reading resource groups...Done. File system size: 101.700G

106622976 metadata blocks (100%) processed, 

Metadata saved to file /tmp/meta.000.
[root@a1 ~]# gfs2_edit restoremeta /tmp/meta.000 /dev/savemeta/savemeta1
File system size: 106361069 (0x656f0ed) blocks, aka 101.444GB
There are 106622976 blocks of 1024 bytes in the destination device.

106622976 metadata blocks (100%) processed, 
File /tmp/meta.000 restore successful.
[root@a1 ~]# fsck.gfs2 -vn /dev/savemeta/savemeta1
Initializing fsck
Initializing lists...
jid=0: Looking at journal...
Journal #1 ("journal0") is corrupt.
Not fixing it due to the -n option.
jid=0: Failed
jid=0: journal not cleared.
[root@a1 ~]# 

  
Actual results:
Journal is corrupted

Expected results:
Journal is OK

Additional info:

Comment 1 Andrew Price 2011-04-27 18:43:36 UTC

I've reproduced this using all RHEL5 gfs2-utils packages after version 0.1.62-20.el5. It seems to be a regression which appeared in the large patchset relating to bug #455300.

The good news is that the regression has been fixed in the upstream gfs2-utils master (commit d8c8af9612e2e00b8d8739a3d72ca9669a593878 ) but the less-good news is that the commit doesn't apply cleanly to the RHEL5 package so it's not a straightforward cherry-pick.

Comment 2 Robert Peterson 2011-05-05 14:26:09 UTC

Reassigning to Andy, since he's done the initial research.
Thanks, Andy.

Comment 4 Andrew Price 2011-05-05 16:01:01 UTC

Created attachment 497136 [details]
Commit d8c8af9612e2e00b8d8739a3d72ca9669a593878, for reference

Applying this part of commit d8c8af9612e2e00b8d8739a3d72ca9669a593878 to savemeta.c seems to (minimally) resolve the issue:

@@ -324,7 +324,9 @@ static void save_indirect_blocks(int out_fd, osi_list_t *cur_list,
                if (height != hgt) { /* If not at max height */
                        nbh = bread(&sbd, indir_block);
                        osi_list_add_prev(&nbh->b_altlist, cur_list);
-                       brelse(nbh);
+                       /* The buffer_head needs to be queued ahead, so
+                          don't release it!
+                          brelse(nbh);*/
                }
        } /* for all data on the indirect block */
 }

However my familiarity of the gfs2_edit code is still a little sketchy so I'd like to spend more time on it to make sure I understand what's going on properly and check if other hunks from that diff are also needed.

Comment 5 Andrew Price 2011-05-07 16:31:39 UTC

Created attachment 497550 [details]
Reproducer script

Comment 6 Andrew Price 2011-05-09 18:14:48 UTC

Created attachment 497883 [details]
Patch to leave indirect blocks queued for saving

save_indirect_blocks() currently releases buffer_heads which should be left queued to save off later. This patch leaves them on the queue to avoid corruption.

After applying this patch, the reproducer script exits successfully.

Comment 8 Andrew Price 2011-05-09 22:16:33 UTC

Have pushed the patch into the RHEL57 branch of the cluster git tree. Setting status to POST until it's rolled into a package.

Comment 9 Andrew Price 2011-05-26 13:30:14 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Indirect blocks were being prematurely released from a gfs2_edit savemeta queue, this meant that some meta data were not being saved and resulted in an incomplete meta data set which did not pass a fsck when restored with gfs2_edit restoremeta. This was fixed so that the required blocks are now left on the queue and thus saved off with the rest of the meta data. Saving the meta data of a consistent file system now results in a complete meta data set which passes a fsck when restored.

Comment 10 Abhijith Das 2011-06-02 16:02:04 UTC

http://download.devel.redhat.com/brewroot/packages/gfs2-utils/0.1.62/31.el5/

Comment 11 Martin Juricek 2011-06-03 07:04:23 UTC

Verified with version gfs2-utils-0.1.62-31.el5, kernel 2.6.18-262.el5


[root@a1 ~]# rpm -q gfs2-utils
gfs2-utils-0.1.62-31.el5
[root@a1 ~]# mkfs.gfs2 -p lock_dlm -j 3 -J 32 -b 1024 -t a_cluster:test1 /dev/vg1/lv1 -O
Device:                    /dev/vg1/lv1
Blocksize:                 1024
Device Size                100.00 GB (104857600 blocks)
Filesystem Size:           100.00 GB (104857599 blocks)
Journals:                  3
Resource Groups:           400
Locking Protocol:          "lock_dlm"
Lock Table:                "a_cluster:test1"
UUID:                      AA83D03F-84F0-0B3F-D3FE-C2B1EBEC9BB9

[root@a1 ~]# mkfs.gfs2 -p lock_dlm -j 3 -J 32 -b 4096 -t a_cluster:test2 /dev/vg1/lv2 -O
Device:                    /dev/vg1/lv2
Blocksize:                 4096
Device Size                100.00 GB (26214400 blocks)
Filesystem Size:           100.00 GB (26214398 blocks)
Journals:                  3
Resource Groups:           400
Locking Protocol:          "lock_dlm"
Lock Table:                "a_cluster:test2"
UUID:                      552E690C-D3A4-9A77-7CEF-CBEC006183A5

[root@a1 ~]# gfs2_edit savemeta /dev/vg1/lv1 /tmp/meta.1
There are 104857600 blocks of 1024 bytes in the destination device.
Reading resource groups...Done. File system size: 100.0G

104857600 metadata blocks (100%) processed, 

Metadata saved to file /tmp/meta.1.
[root@a1 ~]# gfs2_edit restoremeta /tmp/meta.1 /dev/vg1/lv2
File system size: 104595522 (0x63c0042) blocks, aka 99.768GB
There are 104857600 blocks of 1024 bytes in the destination device.

104857600 metadata blocks (100%) processed, 
File /tmp/meta.1 restore successful.

[root@a1 ~]# fsck.gfs2 -n /dev/vg1/lv2
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Pass1 complete      
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete      
Starting pass3
Pass3 complete      
Starting pass4
Pass4 complete      
Starting pass5
Pass5 complete      
gfs2_fsck complete

Comment 12 errata-xmlrpc 2011-07-21 11:10:20 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1042.html