Bug 1833141
| Summary: | gfs2_jadd doesn't clean up if it runs out of space | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Nate Straz <nstraz> | |
| Component: | gfs2-utils | Assignee: | Abhijith Das <adas> | |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 8.3 | CC: | adas, cluster-maint, gfs2-maint, rhandlin | |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | 8.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | gfs2-utils-3.2.0-9.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1834456 1837640 (view as bug list) | Environment: | ||
| Last Closed: | 2020-11-04 02:01:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1834456, 1837640 | |||
| Attachments: | ||||
|
Description
Nate Straz
2020-05-07 21:31:02 UTC
fsck.gfs2 can not fix these file systems either. [root@host-028 ~]# fsck.gfs2 -y /dev/brawl/brawl0 Initializing fsck Validating resource group index. Level 1 resource group check: Checking if all rgrp and rindex values are good. (level 1 passed) File system journal "journal7" is missing or corrupt: pass1 will try to recreate it. Journal recovery complete. Starting pass1 Invalid or missing journal7 system inode (is 'free', should be 'inode'). Rebuilding system file "journal7" get_file_buf [root@host-028 ~]# echo $? 1 Created attachment 1687127 [details]
gfs2_jadd-out-of-space-issues-and-other-fixes
If gfs2_jadd runs out of disk space while adding journals, it does
not exit gracefully. It partially does its job and bails out when
it hits -ENOSPC. This leaves the metafs mounted and most likely a
corrupted filesystem that even fsck.gfs2 can't fix.
This patch adds a pre-check that ensures that the journals requested
will fit in the available space before proceeding. Note that this is
not foolproof because gfs2_jadd operates on a mounted filesystem.
While it is required that the filesystem be idle (and mounted on only
one node) while gfs2_jadd is being run, there is nothing stopping a
user from having some I/O process competing with gfs2_jadd for disk
blocks and consequently crashing it.
This patch also does some cleanup of data structures when gfs2_jadd
exits due to errors.
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=28472231 Here's a scratch build with the above patch. (In reply to Abhijith Das from comment #2) > Created attachment 1687127 [details] > gfs2_jadd-out-of-space-issues-and-other-fixes > Note that this is > not foolproof because gfs2_jadd operates on a mounted filesystem. I think it would be better to add error handling to the write() and close() calls, as currently write() errors cause an exit() without cleanup, and the close()es aren't checked at all. It probably needs fsync()s too, to make sure all error cases get flagged up. At the least it should unmount the metafs before bailing out in all cases. Nate, could you open a separate bz for the fsck.gfs2 failure? Created attachment 1687379 [details]
Revised patch - moved error handling fixes to another patch
Created attachment 1687380 [details]
First bash at error handling fixes
Andy, when you get a chance, could you go over this (and the previous) patch and let me know if it looks ok to you?
(In reply to Abhijith Das from comment #6) > Created attachment 1687380 [details] > First bash at error handling fixes Just some minor style/maintainability nits - using "close" as a label can make it difficult to search for close() calls later, and returning errno where the caller isn't expecting an errno value can be confusing... returning -1 will probably be safer in those cases. Other than that it looks like a good improvement, thanks Abhi. Created attachment 1687481 [details] Revised error handling cleanup patch This version has Andy's suggested fixes. Here's a build with this patch and the previous bugfix patch: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=28499297 I've pushed the patches upstream (commits deb620675, bb22cafab) with a slight tweak to the second patch to clear up these warnings:
main_jadd.c:518:12: warning: unused variable ‘blk_addr’ [-Wunused-variable]
518 | uint64_t blk_addr = 0;
| ^~~~~~~~
main_jadd.c:618:19: warning: too many arguments for format [-Wformat-extra-args]
618 | fprintf(stderr, "%s: not a mounted gfs2 file system\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let me know when you're happy for them to be added to the RHEL package (move the bug to POST) and please make any further fixes on top of the master branch. Thanks!
Covscan found some minor issues in gfs2_jadd so the RHEL8 gfs2-utils build didn't pass gating. I've sent an addendum patch upstream to fix it up and I'll integrate it once it's pushed. (In reply to Andrew Price from comment #10) > Covscan found some minor issues in gfs2_jadd so the RHEL8 gfs2-utils build > didn't pass gating. Correction: the covscan warnings didn't prevent the build from passing gating. Since the issues it found are minor and not actual bugs I'm pushing this along. https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1221688 Verified with gfs2-utils-3.2.0-9.el8.x86_64 SCENARIO - [jadd_no_space] Creating 1G LV jadded on host-140 Creating file system on /dev/fsck/jadded with options '-p lock_nolock -j 1 -J 128' on host-140 It appears to contain an existing filesystem (gfs2) /dev/fsck/jadded is a symbolic link to /dev/dm-2 This will destroy any data on /dev/dm-2 Discarding device contents (may take a while on large devices): Done Adding journals: Done Building resource groups: Done Creating quota file: Done Writing superblock and syncing: Done Device: /dev/fsck/jadded Block size: 4096 Device size: 1.00 GB (262144 blocks) Filesystem size: 1.00 GB (262142 blocks) Journals: 1 Journal size: 128MB Resource groups: 5 Locking protocol: "lock_nolock" Lock table: "" UUID: 10b06c03-8a46-42d5-ab3c-f25be8a8ecb8 Mounting gfs2 /dev/fsck/jadded on host-140 with opts '' Filling some space Try to add more journals than there is space Failed to add journals: No space left on device Insufficient space on the device to add 5 128MB journals (1MB QC size) Required space : 165465 blks (33093 blks per journal) Available space : 100745 blks Good, no gfs2meta mounts found Unmounting /mnt/fsck on host-140 Removing LV jadded on host-140 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (gfs2-utils bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4550 |