Bug 1149912

Summary:	ENOSPC after expanding LVG and XFS file system
Product:	Red Hat Enterprise Linux 7	Reporter:	Harold Miller <hamiller>
Component:	xfsprogs	Assignee:	Eric Sandeen <esandeen>
Status:	CLOSED DUPLICATE	QA Contact:	Filesystem QE <fs-qe>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	7.0	CC:	lvaz, pasteur, skippy
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-10-08 22:39:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Harold Miller 2014-10-06 22:04:04 UTC

Description of problem:XFS reports "no space left on device" after lvextend


Version-Release number of selected component (if applicable):
xfsprogs-3.2.0-0.10.alpha2.el7.x86_64
xfsdump-3.1.3-5.el7.x86_64
kernel 3.10.0-123.6.3.el7.x86_64
RHEL7

How reproducible: Every time


Steps to Reproduce:
root@g1:TESTING:~> lvcreate -L 1G -n forms1 bricks
root@g1:TESTING:~> mkfs.xfs -i size=512 /dev/mapper/bricks-forms1
root@g1:TESTING:~> mount /bricks/forms1/
root@g1:TESTING:~> mount | grep forms
root@g1:TESTING:~> dd if=/dev/zero of=/bricks/forms1/two.dat bs=1M count=100
root@g1:TESTING:~> cp -r /var /bricks/forms1/
root@g1:TESTING:~> mkdir /bricks/forms1/brick
root@g1:TESTING:~> cp -r /var/log /bricks/forms1/
root@g1:TESTING:~> cp -r /var/cache /bricks/forms1/
root@g1:TESTING:~> cp -r /var/lib /bricks/forms1/
root@g1:TESTING:~> lvextend -L +2G -t /dev/mapper/bricks-forms1
root@g1:TESTING:~> lvextend -L +2G /dev/mapper/bricks-forms1
root@g1:TESTING:~> xfs_growfs /dev/mapper/bricks-forms1
root@g1:TESTING:~> cp -r /var/tmp /bricks/forms1/brick/
root@g1:TESTING:~> cp -r /var/lib /bricks/forms1/brick/
root@g1:TESTING:~> cp -r /var/cache /bricks/forms1/brick/
root@g1:TESTING:~> df -h | grep forms
root@g1:TESTING:~> dd if=/dev/zero of=/bricks/forms1/brick/one.dat bs=1M count=25
root@g1:TESTING:~> dd if=/dev/zero of=/bricks/forms1/brick/two.dat bs=1M count=100
root@g1:TESTING:~> df -h | grep forms
root@g1:TESTING:~> for i in $(seq 1 10); do mkdir /bricks/forms1/brick/$i; cp -r /var/lib /bricks/forms1/brick/$i/; cp -r /var/cache /bricks/forms1/brick/$i; dd if=/dev/zero of=/bricks/forms1/brick/$i.dat bs=1M count=25; done;

Actual results:
25+0 records in
25+0 records out
26214400 bytes (26 MB) copied, 0.0150613 s, 1.7 GB/s
25+0 records in
25+0 records out
26214400 bytes (26 MB) copied, 0.013528 s, 1.9 GB/s
25+0 records in
25+0 records out
26214400 bytes (26 MB) copied, 0.0121836 s, 2.2 GB/s
cp: cannot create directory ‘/bricks/forms1/brick/4/cache/yum/x86_64/7Server/glusterfs-x86_64/packages’: No space left on device
cp: cannot create regular file ‘/bricks/forms1/brick/4/cache/yum/x86_64/7Server/glusterfs-x86_64/repomd.xml’: No space left on device
cp: cannot create regular file ‘/bricks/forms1/brick/4/cache/yum/x86_64/7Server/glusterfs-x86_64/cachecookie’: No space left on device
cp: cannot create regular file ‘/bricks/forms1/brick/4/cache/yum/x86_64/7Server/glusterfs-x86_64/1e4de5e80c10374a34d812fdd8b961e4e6deef82f6d4c72af0dd18e2770be3da-primary.sqlite.bz2’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/4/cache/abrt-di’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/4/cache/krb5rcache’: No space left on device
dd: failed to open ‘/bricks/forms1/brick/4.dat’: No space left on device
mkdir: cannot create directory ‘/bricks/forms1/brick/5’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/5/’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/5’: No space left on device
dd: failed to open ‘/bricks/forms1/brick/5.dat’: No space left on device
mkdir: cannot create directory ‘/bricks/forms1/brick/6’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/6/’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/6’: No space left on device
dd: failed to open ‘/bricks/forms1/brick/6.dat’: No space left on device
mkdir: cannot create directory ‘/bricks/forms1/brick/7’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/7/’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/7’: No space left on device
dd: failed to open ‘/bricks/forms1/brick/7.dat’: No space left on device
mkdir: cannot create directory ‘/bricks/forms1/brick/8’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/8/’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/8’: No space left on device
dd: failed to open ‘/bricks/forms1/brick/8.dat’: No space left on device
mkdir: cannot create directory ‘/bricks/forms1/brick/9’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/9/’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/9’: No space left on device
dd: failed to open ‘/bricks/forms1/brick/9.dat’: No space left on device
mkdir: cannot create directory ‘/bricks/forms1/brick/10’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/10/’: No space left on device
cp: cannot create directory ‘/bricks/forms1/brick/10’: No space left on device
dd: failed to open ‘/bricks/forms1/brick/10.dat’: No space left on device
root@g1:TESTING:~> df -h | grep forms
/dev/mapper/bricks-forms1    3.0G  1.5G  1.6G  49% /bricks/forms1

Expected results:
Run until file system is full

Additional info:

Comment 1 Harold Miller 2014-10-06 22:05:29 UTC

Actual results available in case, I can copy them over if need be

Comment 2 Eric Sandeen 2014-10-06 22:10:32 UTC

This is probably a dup of:

Bug 1115201 - [xfs] can't create inodes in newly added space after xfs_growfs 

which could have a z-stream request if needed.

Does unmounting & re-mounting the fs solve it?
In the other bug it was also mentioned that "mount -o remount,inode64" might resolve it; I'd need to verify that.

Comment 3 Scott Merrill 2014-10-07 12:47:18 UTC

As the original reporter of this issue (by way of an access.redhat.com support issue), I can confirm that inode64 *does not* resolve the issue.

inode64 deals with file systems greater than 1TB.  The file systems that triggered this error for me started out at 1GB and were grown to 4GB.  I'm at a loss as to how inode exhaustion or 64-bit inode allocation could be the culprit.

Comment 4 Eric Sandeen 2014-10-07 14:08:12 UTC

Did you test it?  The workaround succeeds here:

...
cp: cannot create directory `mnt/brick/10/': No space left on device
cp: cannot create directory `mnt/brick/10': No space left on device
dd: opening `mnt/brick/10.dat': No space left on device
[root@bp-05 bz1149912]# df -h mnt/
Filesystem            Size  Used Avail Use% Mounted on
/mnt/test2/bz1149912/testfile
                      3.0G  2.4G  680M  78% /mnt/test2/bz1149912/mnt
[root@bp-05 bz1149912]# touch mnt/mynewfile
touch: cannot touch `mnt/mynewfile': No space left on device
[root@bp-05 bz1149912]# mount -o remount,inode64 mnt
[root@bp-05 bz1149912]# touch mnt/mynewfile
[root@bp-05 bz1149912]# 

The problem is explained in the commit which fixes things:

commit 9de67c3ba9ea961ba420573d56479d09d33a7587
Author: Eric Sandeen <sandeen>
Date:   Thu Jul 24 20:51:54 2014 +1000

    xfs: allow inode allocations in post-growfs disk space
    
    Today, if we perform an xfs_growfs which adds allocation groups,
    mp->m_maxagi is not properly updated when the growfs is complete.
    
    Therefore inodes will continue to be allocated only in the
    AGs which existed prior to the growfs, and the new space
    won't be utilized.
    
    This is because of this path in xfs_growfs_data_private():
    
    xfs_growfs_data_private
        xfs_initialize_perag(mp, nagcount, &nagimax);
                if (mp->m_flags & XFS_MOUNT_32BITINODES)
                        index = xfs_set_inode32(mp);
                else
                        index = xfs_set_inode64(mp);
    
                if (maxagi)
                        *maxagi = index;
    
    where xfs_set_inode* iterates over the (old) agcount in
    mp->m_sb.sb_agblocks, which has not yet been updated
    in the growfs path.  So "index" will be returned based on
    the old agcount, not the new one, and new AGs are not available
    for inode allocation.
    
    Fix this by explicitly passing the proper AG count (which
    xfs_initialize_perag() already has) down another level,
    so that xfs_set_inode* can make the proper decision about
    acceptable AGs for inode allocation in the potentially
    newly-added AGs.
    
    This has been broken since 3.7, when these two
    xfs_set_inode* functions were added in commit 2d2194f.
    Prior to that, we looped over "agcount" not sb_agblocks
    in these calculations.

    Signed-off-by: Eric Sandeen <sandeen>
    Reviewed-by: Brian Foster <bfoster>
    Signed-off-by: Dave Chinner <david>

and the reason "mount -o remount,inode64" works around the problem is because that explicitly re-sets the mp->m_maxagi variable:

                case Opt_inode64:
                        mp->m_maxagi = xfs_set_inode64(mp, sbp->sb_agcount);
                        break;

Yes, it's unexpected that remounting with inode64 (which is already the default in any case) on a small filesystem changes behavior, but that's why we call it "a bug."  ;)

The remount trick should be a workaround for existing RHEL7.0 kernels, and the root cause of the bug will be resolved in RHEL7.1.  If there's a need for the root cause fix in RHEL7.0, support can request a z-stream update.

In the meantime, I think this bug should be dup'd to 
Bug 1115201 - [xfs] can't create inodes in newly added space after xfs_growfs 

Thanks,
-Eric

Comment 5 Scott Merrill 2014-10-07 15:31:04 UTC

I do not have access to BZ 1115201, so could not benefit from any discussion therein.

Thank you very much for the thorough explanation. That clears up a lot!

I can confirm that the following sequence worked:

# lvextend ...
# xfs_growfs ...
# mount -o remount,inode64 ...

Using this process, I was able to extend the volume, grow the filesystem, and write without interruption.

What is not entirely clear to me are the implications of remounting on a filesystem that may have active I/O. Can I remount a filesystem if write operations are underway without negative consequence?

Comment 6 Eric Sandeen 2014-10-07 16:03:40 UTC

Yep, sorry the other BZ is private, I wish we didn't default so many bugs to private.

The remount path in do_remount_sb() does:

        shrink_dcache_sb(sb);
        sync_filesystem(sb);

so I guess it has some side-effects, and may take some time due to the syncing, but I don't think there should be any overly negative consequences.

I assume that your 1G->3G gluster brick testing is only *for* testing, and in general I don't expect that people will hit this bug too often; nonetheless it's slated to be fixed in the next release, at the latest.

Comment 7 Eric Sandeen 2014-10-08 22:39:41 UTC


*** This bug has been marked as a duplicate of bug 1115201 ***