Bug 1170905 - System freeze and WARNING: at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty [NEEDINFO]
Summary: System freeze and WARNING: at fs/btrfs/delayed-inode.c:1410 btrfs_assert_dela...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: fedora-kernel-btrfs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-05 05:08 UTC by Bernardo Donadio
Modified: 2015-06-29 23:44 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-06-29 23:44:06 UTC
Type: Bug
Embargoed:
kernel-team: needinfo?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 89271 0 None None None Never

Description Bernardo Donadio 2014-12-05 05:08:15 UTC
Description of problem:
The system is having frequent and short freezes (like 2 seconds in general), and kernel assertions errors related to btrfs are being spilled on the journal.

Version-Release number of selected component (if applicable):
3.17.3-200.fc20._x86_64


How reproducible: always, at least on my system.


Steps to Reproduce:
1. Have a btrfs partition.
2. Do some snapshotting and rollback.
3. Profit.


Additional info:
Those assertions errors are being written in the kernel log:

Dez 01 19:15:12 darwin.donadio.be kernel: ------------[ cut here ]------------
Dez 01 19:15:12 darwin.donadio.be kernel: WARNING: CPU: 3 PID: 494 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x34/0x40 [btrfs]()
Dez 01 19:15:12 darwin.donadio.be kernel: Modules linked in: ufs hfsplus hfs minix vfat msdos fat jfs xfs libcrc32c reiserfs rfcomm fuse xt_CHECKSUM iptable_mangle tun bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables cfg80211 bnep usblp btusb bluetooth rfkill uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core snd_usb_audio v4l2_common videodev snd_usbmidi_lib snd_rawmidi media joydev kvm_amd kvm serio_raw k10temp edac_core snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic edac_mce_amd snd_hda_intel snd_hda_controller sp5100_tco snd_hda_codec snd_seq i2c_piix4 snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore shpchp acpi_cpufreq binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs xor raid6_pq r8169 mii radeon i2c_algo_bit drm_kms_helper ttm drm
Dez 01 19:15:12 darwin.donadio.be kernel: CPU: 3 PID: 494 Comm: btrfs-transacti Tainted: G        W      3.17.3-200.fc20.x86_64 #1
Dez 01 19:15:12 darwin.donadio.be kernel: Hardware name: ECS A890GXM-A/A890GXM-A, BIOS 080015  03/24/2010
Dez 01 19:15:12 darwin.donadio.be kernel:  0000000000000000 00000000314b2122 ffff8802213e3db8 ffffffff81728acc
Dez 01 19:15:12 darwin.donadio.be kernel:  0000000000000000 ffff8802213e3df0 ffffffff81094e6d ffff8801e6d78000
Dez 01 19:15:12 darwin.donadio.be kernel:  ffff88022101e800 ffff8802097d9f00 0000000000000000 0000000000000000
Dez 01 19:15:12 darwin.donadio.be kernel: Call Trace:
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffff81728acc>] dump_stack+0x45/0x56
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffff81094e6d>] warn_slowpath_common+0x7d/0xa0
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffff81094f9a>] warn_slowpath_null+0x1a/0x20
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffffa02cc674>] btrfs_assert_delayed_root_empty+0x34/0x40 [btrfs]
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffffa02771b2>] btrfs_commit_transaction+0x3a2/0x9c0 [btrfs]
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffffa0272c45>] transaction_kthread+0x1c5/0x250 [btrfs]
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffffa0272a80>] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs]
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffff810b2498>] kthread+0xd8/0xf0
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffff810b23c0>] ? kthread_create_on_node+0x190/0x190
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffff817300fc>] ret_from_fork+0x7c/0xb0
Dez 01 19:15:12 darwin.donadio.be kernel:  [<ffffffff810b23c0>] ? kthread_create_on_node+0x190/0x190
Dez 01 19:15:12 darwin.donadio.be kernel: ---[ end trace 81e909e70c8984e0 ]---
Dez 01 19:15:42 darwin.donadio.be kernel: ------------[ cut here ]------------
Dez 01 19:15:42 darwin.donadio.be kernel: WARNING: CPU: 2 PID: 494 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x34/0x40 [btrfs]()
Dez 01 19:15:42 darwin.donadio.be kernel: Modules linked in: ufs hfsplus hfs minix vfat msdos fat jfs xfs libcrc32c reiserfs rfcomm fuse xt_CHECKSUM iptable_mangle tun bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables cfg80211 bnep usblp btusb bluetooth rfkill uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core snd_usb_audio v4l2_common videodev snd_usbmidi_lib snd_rawmidi media joydev kvm_amd kvm serio_raw k10temp edac_core snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic edac_mce_amd snd_hda_intel snd_hda_controller sp5100_tco snd_hda_codec snd_seq i2c_piix4 snd_hwdep snd_seq_device snd_pcm snd_timer snd soundcore shpchp acpi_cpufreq binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs xor raid6_pq r8169 mii radeon i2c_algo_bit drm_kms_helper ttm drm
Dez 01 19:15:42 darwin.donadio.be kernel: CPU: 2 PID: 494 Comm: btrfs-transacti Tainted: G        W      3.17.3-200.fc20.x86_64 #1
Dez 01 19:15:42 darwin.donadio.be kernel: Hardware name: ECS A890GXM-A/A890GXM-A, BIOS 080015  03/24/2010
Dez 01 19:15:42 darwin.donadio.be kernel:  0000000000000000 00000000314b2122 ffff8802213e3db8 ffffffff81728acc
Dez 01 19:15:42 darwin.donadio.be kernel:  0000000000000000 ffff8802213e3df0 ffffffff81094e6d ffff8801eb8fc140
Dez 01 19:15:42 darwin.donadio.be kernel:  ffff88022101e800 ffff88009da84b40 0000000000000000 0000000000000000
Dez 01 19:15:42 darwin.donadio.be kernel: Call Trace:
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffff81728acc>] dump_stack+0x45/0x56
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffff81094e6d>] warn_slowpath_common+0x7d/0xa0
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffff81094f9a>] warn_slowpath_null+0x1a/0x20
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffffa02cc674>] btrfs_assert_delayed_root_empty+0x34/0x40 [btrfs]
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffffa02771b2>] btrfs_commit_transaction+0x3a2/0x9c0 [btrfs]
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffffa0272c45>] transaction_kthread+0x1c5/0x250 [btrfs]
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffffa0272a80>] ? btrfs_cleanup_transaction+0x550/0x550 [btrfs]
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffff810b2498>] kthread+0xd8/0xf0
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffff810b23c0>] ? kthread_create_on_node+0x190/0x190
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffff817300fc>] ret_from_fork+0x7c/0xb0
Dez 01 19:15:42 darwin.donadio.be kernel:  [<ffffffff810b23c0>] ? kthread_create_on_node+0x190/0x190
Dez 01 19:15:42 darwin.donadio.be kernel: ---[ end trace 81e909e70c8984e1 ]---

Comment 1 Bernardo Donadio 2014-12-05 05:57:27 UTC
Two different people in the btrfs mailing list are also having the same issue. So, it's confirmed.

http://comments.gmane.org/gmane.comp.file-systems.btrfs/39719

http://comments.gmane.org/gmane.comp.file-systems.btrfs/40999

Comment 3 Christian Klomp 2014-12-06 20:52:03 UTC
Same thing happened to me twice today.
https://retrace.fedoraproject.org/faf/reports/499651/
In my case it seemed to be triggered (both times) by a nouveau bug that on its turn is caused by an F21 guest in VirtualBox with 3d acceleration on, but that's most likely unrelated since it is not even stored on a partition with btrfs.
Besides the VirtualBox/nouveau thing it hasn't happened before so I can't say anything about reproducibility.
Also never seen this behaviour before today.

Comment 4 Bernardo Donadio 2014-12-06 21:12:32 UTC
I think this is probably unrelated. There's no nouveau here, just radeon.

Comment 5 Christian Klomp 2014-12-13 17:47:56 UTC
Right, it is unrelated.
I can reproduce the btrfs problem pretty consistently by (too quickly?) issuing the sysrq REISUB sequence.
Then on the subsequent boot the error message shows up from time to time during that boot.
The next one it is gone.

Comment 6 gHcAgree 2014-12-16 08:29:49 UTC
(In reply to Christian Klomp from comment #5)
> Right, it is unrelated.
> I can reproduce the btrfs problem pretty consistently by (too quickly?)
> issuing the sysrq REISUB sequence.
> Then on the subsequent boot the error message shows up from time to time
> during that boot.
> The next one it is gone.

Hi Christian,

I wonder what is your btrfs "like"?
data/meta profile? running snapshots? running heavy r/w ops?
Or anything special to mention?

Comment 7 gHcAgree 2014-12-17 05:40:57 UTC
(In reply to Bernardo Donadio from comment #4)
> I think this is probably unrelated. There's no nouveau here, just radeon.

Hi Bernardo,

Could you please share more info about the bug, like some simple scripts if possible?

Comment 8 Christian Klomp 2014-12-17 20:15:25 UTC
Hi.
In my case all occasions seem to be the result of too quickly usuing REISUB (like around or a little less than 5 seconds between each command).
My personal machine has a few snapshots but one of my VMs doesn't and it behaves the same so I don't think snapshots are involved here.
My personal machine is an update to date F20 and my VM's run F20 and F21, both fully updated.
On my personal machine I wasn't creating a snapshot (last one is from some months ago) or generation high io when the problem arose.
I did however reset my machine via REISUB because it was completely unresponsive due a  nouveau crash and ctrl-alt-del wasn't effective anymore.
On my VM's (default btrfs configuration from anaconda) the systems were just idling on the desktop and the problem is triggered the same way through REISUB.

The following output is from the F21 VM:
# btrfs filesystem df /
Data, single: total=11.88GiB, used=3.93GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=260.28MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=96.00MiB, used=0.00B

Comment 9 Ben 2014-12-21 13:38:30 UTC
Coming from https://bugzilla.kernel.org/show_bug.cgi?id=89271

Oh boy -- btrfs is giving me _really_ a hard time! Using btrfs as / and ZFS for /srv on trusty (3.13) with btrfs/zfs auto snapshots. 

A week ago btrfs presented me "disk full" with 32GB of 128GB in use w/ desktop lockup: Metadata full. Probably due auto snapshots. Even with the options from the btrfs Wiki I had to delete snapshots before I was able to balance & resolve the issue.

Updated to utopic kernel 3.16.0-29 hoping to avoid these problems. After a week: Sudden kernel panic. After reboot btrfs keeps spilling the stack below.

I downgraded back to 3.13 for the rescue. 

As newbie to both, btrfs and ZFS, it's really sad to see btrfs still needs a substantial amount of work to achieve even roughly the stability of ZFS. Not to talk about features and ease of use i.e. regarding snapshots, disk usage, RAIDZ and daily operation. 

Anything easy I can do to assist you tracking this error down?

------

CPU: 5 PID: 347 Comm: btrfs-transacti Tainted: P        W  OE 3.16.0-29-generic #39-Ubuntu
Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/785GM-E51 (MS-7596), BIOS V2.12 02/18/2011
 0000000000000009 ffff8800cf8ebda0 ffffffff817623e0 0000000000000000
 ffff8800cf8ebdd8 ffffffff8106dd2d ffff8802fcda4460 ffff88030eb55000
 ffff8802e8f30000 0000000000000000 0000000000000000 ffff8800cf8ebde8
Call Trace:
 [<ffffffff817623e0>] dump_stack+0x45/0x56
 [<ffffffff8106dd2d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff8106de0a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffc046da04>] btrfs_assert_delayed_root_empty+0x34/0x40 [btrfs]
 [<ffffffffc041adf2>] btrfs_commit_transaction.part.21+0x422/0x9c0 [btrfs]
 [<ffffffffc0419f4b>] btrfs_commit_transaction+0x3b/0x70 [btrfs]
 [<ffffffffc0415d85>] transaction_kthread+0x1b5/0x240 [btrfs]
 [<ffffffffc0415bd0>] ? btrfs_cleanup_transaction+0x560/0x560 [btrfs]
 [<ffffffff810912c2>] kthread+0xd2/0xf0
 [<ffffffff810911f0>] ? kthread_create_on_node+0x1c0/0x1c0
 [<ffffffff8176aa7c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810911f0>] ? kthread_create_on_node+0x1c0/0x1c0
---[ end trace d141c620a3b26396 ]---

Comment 10 Tim Hughes 2014-12-31 11:20:24 UTC
Not seeing the freezes so this may be different but I am seeing a very similar trace and so are many others.


https://retrace.fedoraproject.org/faf/reports/464310/

Comment 11 Christian Klomp 2015-02-18 14:55:02 UTC
I've had some other encounters with this, one involved a (very carefully i.e. slowly executed) REISUB that created a boot failure.

[  382.546181] BTRFS: detected SSD devices, enabling SSD mode
[  382.546276] BTRFS: bad tree block start 0 46663204864
[  382.546287] BTRFS: failed to read log tree
[  382.570288] BTRFS: open_ctree failed

To solve this I had to zero the log root tree.

To elaborate again I also do not see any freezes but have the many traces which definitely isn't good.
It also still happens on 3.18.

Comment 12 Fedora Kernel Team 2015-02-24 16:23:41 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.18.7-100.fc20.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21.

If you experience different issues, please open a new bug report for those.

Comment 13 Christian Klomp 2015-03-21 13:31:03 UTC
I can't reproduce this anymore on 3.18 (3.18.9-200.fc21.x86_64 to be precise).
The problem https://retrace.fedoraproject.org/faf/problems/470261/ states that kernel-3.18.5-101.fc20 and kernel-3.18.3-201.fc21 are 'Probable fixes' and there do not seem to be reports after these.
So unless the original bug reporter can elaborate on this problem I'll root for marking it fixed.

Comment 14 Fedora End Of Life 2015-05-29 13:27:46 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Fedora End Of Life 2015-06-29 23:44:06 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.