Bug 1790740 - btrfs: file system full on a single disk
Summary: btrfs: file system full on a single disk
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-14 04:39 UTC by Christian Kujau
Modified: 2020-02-21 08:13 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-21 08:13:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Christian Kujau 2020-01-14 04:39:53 UTC
This is basically a copy of the following thread on linux-btrfs, and Chris Murphy suggested to open a bug report for this:

 > file system full on a single disk?
 > https://lore.kernel.org/linux-btrfs/alpine.DEB.2.21.99999.375.2001131400390.21037@trent.utfs.org/

The first post, from the thread:

      -----

I realize that this comes up every now and then but always for slightly 
more complicated setups, or so I thought:


============================================================
# df -h /
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/luks-root  825G  389G     0 100% /

# btrfs filesystem show /
Label: 'root'  uuid: 75a6d93a-5a5c-48e0-a237-007b2e812477
        Total devices 1 FS bytes used 388.00GiB
        devid    1 size 824.40GiB used 395.02GiB path /dev/mapper/luks-root

# blockdev --getsize64 /dev/mapper/luks-root | awk '{print $1/1024^3, "GB"}'
824.398 GB

# btrfs filesystem df /
Data, single: total=388.01GiB, used=387.44GiB
System, single: total=4.00MiB, used=64.00KiB
Metadata, single: total=2.01GiB, used=1.57GiB
GlobalReserve, single: total=512.00MiB, used=80.00KiB
============================================================


This is on a Fedora 31 (5.4.8-200.fc31.x86_64) workstation. Where did the 
other 436 GB go? Or, why are only 395 GB allocated from the 824 GB device?

I'm running a --full-balance now and it's progressing, slowly. I've seen 
tricks on the interwebs to temporarily add a ramdisk, run another balance, 
remove the ramdisk again - but that seems hackish.

Isn't there a way to prevent this from happening? (Apart from better 
monitoring, so I can run the balance at an earlier stage next time).


Thanks,
Christian.


# btrfs filesystem usage -T /
Overall:
    Device size:                 824.40GiB
    Device allocated:            395.02GiB
    Device unallocated:          429.38GiB
    Device missing:                  0.00B
    Used:                        388.00GiB
    Free (estimated):            435.94GiB      (min: 435.94GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

                         Data      Metadata System              
Id Path                  single    single   single   Unallocated
-- --------------------- --------- -------- -------- -----------
 1 /dev/mapper/luks-root 393.01GiB  2.01GiB  4.00MiB   429.38GiB
-- --------------------- --------- -------- -------- -----------
   Total                 393.01GiB  2.01GiB  4.00MiB   429.38GiB
   Used                  386.45GiB  1.55GiB 64.00KiB

Comment 1 Chris Murphy 2020-01-14 05:53:01 UTC
The file system is still healthy and not at risk; but it is a real bug and if you hit ENOSPC, it'd be inconvenient.

Two manifestations of this (you can see one or the other, or both):

1. ENOSPC, which is fixed in this patchset still under testing by upstream
Introduce per-profile available space array to avoid over-confident can_overcommit() 
https://patchwork.kernel.org/project/linux-btrfs/list/?series=223921

2. 'df' shows 100% used, which is fixed in this patchset also still testing
btrfs: super: Make btrfs_statfs() work with metadata over-commiting
https://patchwork.kernel.org/patch/11293419/

Short term work around (pick one):

1. Revert to kernel 5.3.18, this problem doesn't show up there

2.
sudo btrfs balance start -dlimit=2 /mountpoint/
sudo mount -o remount,metadata_ratio=1 /mountpoint/

The first command will rebalance two data block groups (should only take a few seconds). Full balance is not recommended.

As a matter of convenience, it's optional to add metadata_ratio=1 to fstab until this gets fixed. It won't hurt anything if you forget to remove it later, but ideally you want to be using default mount options. 'man 5 btrfs' for more info on metadata_ratio

Comment 2 Chris Murphy 2020-01-30 22:20:12 UTC
OK so nix the workaround #2 above, that's not reliably working and really just papers over the actual problem, which is misreporting by statfs.

https://lore.kernel.org/linux-btrfs/16325152.4fYaUy9WYm@merkaba/T/#m1f536b213f988e53bcc6a4ef27119328308bae24

Comment 3 Chris Murphy 2020-02-02 00:56:00 UTC
This is the latest patch.
https://patchwork.kernel.org/patch/11359995/

Comment 4 Chris Murphy 2020-02-03 19:14:15 UTC
Commit d55966c427 is in linux-next, and is headed to stable for 5.4 and 5.5.

https://lore.kernel.org/stable/20200203182949.GD2654@twin.jikos.cz/T/#u

Comment 5 Chris Murphy 2020-02-06 05:46:52 UTC
This is now in 5.5.2, and 5.4.18.

Comment 6 Christian Kujau 2020-02-08 23:32:10 UTC
Great, thnks for the updates! Running 5.5.0-0.rc6.git3.1.fc32.x86_64 for some time now (and without "metadata_ratio=1" for a few days) and the issue did not re-appear.

Comment 7 Chris Murphy 2020-02-21 08:13:56 UTC
fc31 has 5.5.5 in updates-testing
https://bodhi.fedoraproject.org/updates/FEDORA-2020-cf2eacc932

fc30 has 5.4.21 in updates-testing
https://bodhi.fedoraproject.org/updates/FEDORA-2020-e05afa496a

Both contain the fix for this bug, as do their current stable updates (5.4.20-200.fc31, kernel-5.4.19-100.fc30)


Note You need to log in before you can comment on or make changes to this bug.