Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1834456

Summary: fsck.gfs2 can not fix partially added journals from a failed gfs2_jadd
Product: Red Hat Enterprise Linux 8 Reporter: Nate Straz <nstraz>
Component: gfs2-utilsAssignee: gfs2-maint
Status: CLOSED WONTFIX QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.3CC: adas, cluster-maint, cluster-qe, gfs2-maint, rhandlin
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1833141 Environment:
Last Closed: 2021-11-11 07:27:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1833141, 1837640    
Bug Blocks:    

Description Nate Straz 2020-05-11 18:17:42 UTC
+++ This bug was initially created as a clone of Bug #1833141 +++

Description of problem:

If we run out of space on the file system while gfs2_jadd is trying to create new journals, it exits, leaving files behind and gfs2meta mounted.


Version-Release number of selected component (if applicable):
gfs2-utils-3.2.0-7.el8.x86_64

How reproducible:
easily

Steps to Reproduce:
1. gfs2_jadd too many large journals on a small file system
2.
3.

Actual results:
[root@host-028 ~]# lvcreate -L 1g -n brawl0 brawl
WARNING: gfs2 signature detected on /dev/brawl/brawl0 at offset 65536. Wipe it? [y/n]: y
  Wiping gfs2 signature on /dev/brawl/brawl0.
  Logical volume "brawl0" created.
[root@host-028 ~]# mkfs.gfs2 -p lock_nolock -j 1 -J 128 -O /dev/brawl/brawl0
/dev/brawl/brawl0 is a symbolic link to /dev/dm-2
This will destroy any data on /dev/dm-2
Discarding device contents (may take a while on large devices): Done
Adding journals: Done
Building resource groups: Done
Creating quota file: Done
Writing superblock and syncing: Done
Device:                    /dev/brawl/brawl0
Block size:                4096
Device size:               1.00 GB (262144 blocks)
Filesystem size:           1.00 GB (262142 blocks)
Journals:                  1
Journal size:              128MB
Resource groups:           5
Locking protocol:          "lock_nolock"
Lock table:                ""
UUID:                      914a1553-6647-4df2-8721-3ffc30f56d20
[root@host-028 ~]# mount /dev/brawl/brawl0 /mnt/brawl
[root@host-028 ~]# df /mnt/brawl
Filesystem               1K-blocks   Used Available Use% Mounted on
/dev/mapper/brawl-brawl0   1048400 132404    915996  13% /mnt/brawl
[root@host-028 ~]# gfs2_jadd -j 10 -J 128 /mnt/brawl
add_j: No space left on device
[root@host-028 ~]# df
Filesystem                      1K-blocks    Used Available Use% Mounted on
devtmpfs                           965440       0    965440   0% /dev
tmpfs                              982048   51696    930352   6% /dev/shm
tmpfs                              982048   16892    965156   2% /run
tmpfs                              982048       0    982048   0% /sys/fs/cgroup
/dev/mapper/rhel_host--028-root   6486016 4849696   1636320  75% /
/dev/vda1                         1038336  320660    717676  31% /boot
tmpfs                              196352       0    196352   0% /run/user/0
/dev/mapper/brawl-brawl0          1048400 1048272       128 100% /mnt/brawl
[root@host-028 ~]# mount
...
/dev/mapper/brawl-brawl0 on /mnt/brawl type gfs2 (rw,relatime,seclabel,localflocks)
/mnt/brawl on /tmp/.gfs2meta.eSuC5g type gfs2 (rw,relatime,seclabel,meta,localflocks)
[root@host-028 ~]# ls /tmp/.gfs2meta.eSuC5g/ -l
total 120516
-rw-------. 1 root root         8 May  7 16:17 inum
drwx------. 2 root root      3864 May  7 16:19 jindex
-rw-------. 1 root root 123129856 May  7 16:19 new_inode
drwx------. 2 root root      3864 May  7 16:19 per_node
-rw-------. 1 root root       176 May  7 16:17 quota
-rw-------. 1 root root       480 May  7 16:17 rindex
-rw-------. 1 root root        24 May  7 16:17 statfs
[root@host-028 ~]# ls /tmp/.gfs2meta.eSuC5g/jindex -l
total 919376
-rw-------. 1 root root 134217728 May  7 16:17 journal0
-rw-------. 1 root root 134217728 May  7 16:18 journal1
-rw-------. 1 root root 134217728 May  7 16:18 journal2
-rw-------. 1 root root 134217728 May  7 16:19 journal3
-rw-------. 1 root root 134217728 May  7 16:19 journal4
-rw-------. 1 root root 134217728 May  7 16:19 journal5
-rw-------. 1 root root 134217728 May  7 16:19 journal6


Expected results:
gfs2_jadd should umount gfs2meta and clean up any journals it was not able to completely create (or not make them to begin with)

Additional info:

--- Additional comment from Nate Straz on 2020-05-07 21:32:06 UTC ---

fsck.gfs2 can not fix these file systems either.

[root@host-028 ~]# fsck.gfs2 -y /dev/brawl/brawl0
Initializing fsck
Validating resource group index.
Level 1 resource group check: Checking if all rgrp and rindex values are good.
(level 1 passed)
File system journal "journal7" is missing or corrupt: pass1 will try to recreate it.

Journal recovery complete.
Starting pass1
Invalid or missing journal7 system inode (is 'free', should be 'inode').
Rebuilding system file "journal7"
get_file_buf
[root@host-028 ~]# echo $?
1

--- Additional comment from Abhijith Das on 2020-05-11 03:20:17 UTC ---

If gfs2_jadd runs out of disk space while adding journals, it does
not exit gracefully. It partially does its job and bails out when
it hits -ENOSPC. This leaves the metafs mounted and most likely a
corrupted filesystem that even fsck.gfs2 can't fix.

This patch adds a pre-check that ensures that the journals requested
will fit in the available space before proceeding. Note that this is
not foolproof because gfs2_jadd operates on a mounted filesystem.
While it is required that the filesystem be idle (and mounted on only
one node) while gfs2_jadd is being run, there is nothing stopping a
user from having some I/O process competing with gfs2_jadd for disk
blocks and consequently crashing it.

This patch also does some cleanup of data structures when gfs2_jadd
exits due to errors.

Comment 3 RHEL Program Management 2021-11-11 07:27:09 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.