1698858 – mkfs.gfs2 can max out iovec limit with wide enough alignment holes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1698858 - mkfs.gfs2 can max out iovec limit with wide enough alignment holes

Summary: mkfs.gfs2 can max out iovec limit with wide enough alignment holes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	gfs2-utils
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	8.0
Assignee:	Andrew Price
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-11 11:40 UTC by Andrew Price
Modified:	2020-11-14 12:02 UTC (History)
CC List:	6 users (show)
Fixed In Version:	gfs2-utils-3.2.0-5.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-11-05 22:17:06 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:3567	0	None	None	None	2019-11-05 22:17:07 UTC

Description Andrew Price 2019-04-11 11:40:08 UTC

Reported by Barry Marson in a sas test environment:

NAME            ALIGNMENT MIN-IO  OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE   RA WSAME
vg_sas-lsaswork         0 131072 6291456     512     512    1           128 8192   32M

[root@bills _scripts]# mkfs -t gfs2 -O -j 8 -r 256 -t afc_cluster:work /dev/mapper/vg_sas-lsaswork
It appears to contain an existing filesystem (xfs)
This will destroy any data on /dev/dm-60
Discarding device contents (may take a while on large devices): Done
Adding journals: [1/8]Zeroing write failed at block 17
Failed to create journals

With this topology we're getting EINVAL from pwritev() because the iovec required to zero the hole between the superblock and the first rgrp is too large. The write should either get chunked into IOV_MAX batches or done using a larger zeroed buffer so that the vector can be smaller.

Comment 1 Andrew Price 2019-04-11 11:53:21 UTC

Impact:
- Can fail mkfs.gfs2 with a vague error message on some striped devices
- Workaround available - mkfs.gfs2 -o align=0  but this could have a (small, workload dependent) performance impact

Simple test:

$ mkfs.gfs2 -O -o test_topology=0:512:131072:6291456:512 -p lock_nolock testvol 
This will destroy any data on testvol
Adding journals: [1/1]Zeroing write failed at block 17
Failed to create journals

(Will be added to the upstream test suite)

Comment 2 Andrew Price 2019-04-25 12:48:45 UTC

Patch posted upstream: https://www.redhat.com/archives/cluster-devel/2019-April/msg00040.html

Comment 5 Justin Payne 2019-10-01 23:59:00 UTC

Verified in gfs2-utils-3.2.0-5:

[root@p8-224-node4 ~]# rpm -q gfs2-utils
gfs2-utils-3.2.0-5.el8.ppc64le
[root@p8-224-node4 ~]# mkfs.gfs2 -O -o test_topology=0:512:131072:6291456:512 -p lock_nolock -t test:test /dev/sdb1
This will destroy any data on /dev/sdb1
Discarding device contents (may take a while on large devices): Done
Adding journals: Done 
Building resource groups: Done     
Creating quota file: Done
Writing superblock and syncing: Done
Device:                    /dev/sdb1
Block size:                4096
Device size:               99.97 GB (26206204 blocks)
Filesystem size:           99.97 GB (26206176 blocks)
Journals:                  1
Journal size:              128MB
Resource groups:           398
Locking protocol:          "lock_nolock"
Lock table:                "test:test"
UUID:                      3cce2766-8549-4598-88c9-6562331b7a9b

Comment 7 errata-xmlrpc 2019-11-05 22:17:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3567

Note You need to log in before you can comment on or make changes to this bug.