Bug 1028750 - [abrt] list_del corruption. next->prev should be ffff88017cf31958, but was ffff88021f5dddb8
[abrt] list_del corruption. next->prev should be ffff88017cf31958, but was ff...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
20
x86_64 Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
https://retrace.fedoraproject.org/faf...
abrt_hash:d9df4ead1a29007614601351a91...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-10 07:00 EST by orti1980
Modified: 2014-01-06 13:50 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-06 13:50:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
File: dmesg (69.62 KB, text/plain)
2013-11-10 07:00 EST, orti1980
no flags Details
Patch from the Linux git repository (1.09 KB, patch)
2013-12-19 15:47 EST, Pavel Roskin
no flags Details | Diff
New patch suggested by btrfs-linux mailing list (491 bytes, patch)
2013-12-23 05:34 EST, Geert Jansen
no flags Details | Diff

  None (edit)
Description orti1980 2013-11-10 07:00:20 EST
Additional info:
reporter:       libreport-2.1.9
list_del corruption. next->prev should be ffff88017cf31958, but was ffff88021f5dddb8
Modules linked in: bnep bluetooth rfkill snd_hda_codec_hdmi kvm_amd kvm crc32_pclmul crc32c_intel mxm_wmi ghash_clmulni_intel snd_hda_codec_realtek microcode serio_raw fam15h_power k10temp edac_core edac_mce_amd r8169 snd_hda_intel mii snd_hda_codec sp5100_tco i2c_piix4 snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd soundcore wmi shpchp acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc ata_generic pata_acpi btrfs libcrc32c xor zlib_deflate raid6_pq radeon i2c_algo_bit drm_kms_helper ttm pata_atiixp drm i2c_core
CPU: 6 PID: 232 Comm: btrfs-transacti Not tainted 3.11.6-300.fc20.x86_64 #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./990FX Professional, BIOS L1.95A 07/04/2013
 0000000000000009 ffff88021f5ddce8 ffffffff8164894b ffff88021f5ddd30
 ffff88021f5ddd20 ffffffff8106715d ffff880220a5b000 ffff88021f5dddc8
 ffff88017cf31a00 ffff88017ce36ea8 ffff88017cf31958 ffff88021f5ddd80
Call Trace:
 [<ffffffff8164894b>] dump_stack+0x45/0x56
 [<ffffffff8106715d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff810671cc>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff81310f42>] __list_del_entry+0x82/0xd0
 [<ffffffffa025fe4e>] btrfs_run_ordered_operations+0xce/0x2a0 [btrfs]
 [<ffffffffa02471eb>] btrfs_flush_all_pending_stuffs+0x3b/0x40 [btrfs]
 [<ffffffffa0247e4f>] btrfs_commit_transaction+0x20f/0x950 [btrfs]
 [<ffffffffa023f72d>] transaction_kthread+0x18d/0x220 [btrfs]
 [<ffffffffa023f5a0>] ? verify_parent_transid+0x150/0x150 [btrfs]
 [<ffffffff81088650>] kthread+0xc0/0xd0
 [<ffffffff81088590>] ? insert_kthread_work+0x40/0x40
 [<ffffffff81657aac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81088590>] ? insert_kthread_work+0x40/0x40
Comment 1 orti1980 2013-11-10 07:00:33 EST
Created attachment 822051 [details]
File: dmesg
Comment 2 Pavel Roskin 2013-12-18 13:47:51 EST
I believe it's fixed in 931aa87791af46640a46b11fa503a119e36943ec.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=931aa87791af46640a46b11fa503a119e36943ec

The fix is in Linux 3.13-rc3, but not in 3.12.5.  There is no "cc: stable" in the commit.  I think the patch should be backported.
Comment 3 Geert Jansen 2013-12-19 12:54:44 EST
What is the process to get this backported? My server is crashing about once a week with this error.
Comment 4 Pavel Roskin 2013-12-19 15:45:28 EST
(In reply to Geert Jansen from comment #3)
The official backporting requires the appropriate access.  If you just want to recompile the kernel, here's the outline of the process (sorry, no time for detailed instructions).

yumdownloader --source kernel
rpm -i kernel*.src.rpm
save the patch from the link
put it to ~/rpmbuild/SOURCES
list it in ~/rpmbuild/SPECS/kernel*.spec
increment the kernel revision in ~/rpmbuild/SPECS/kernel*.spec
rebuild the kernel package with "rpmbuild -ba" or (safer but slower) with mock
install the recompiled kernel
reboot and make sure the new kernel is being loaded by grub
enjoy the result
watch for kernel upgrades and don't reboot to the unfixed kernels
Comment 5 Pavel Roskin 2013-12-19 15:47:41 EST
Created attachment 839241 [details]
Patch from the Linux git repository
Comment 6 Josh Boyer 2013-12-20 09:02:09 EST
Did anyone actually test that patch on top of a 3.11 kernel?  The upstream commit in comment #2 says it fixes an error that was introduced with commit b02441999efcc6152b87cd58e7970bb7843f76cf "Btrfs: don't wait for the completion of all the ordered extents".  That referenced commit is in 3.13-rc3 as well.  So the patch was fixing something that supposedly is only broken in 3.13, and that broken commit wasn't brought back to 3.11.y or 3.12.y.  I'm not sure this patch will fix anything.

Josef?
Comment 7 Geert Jansen 2013-12-23 03:32:53 EST
The patch does not apply to 3.12. The function btrfs_wait_all_ordered_extents has been renamed to btrfs_wait_ordered_roots and has gotten an extra "nr". So I have no idea if this patch still fixes the issue.

I will ask on linux-btrfs@vger.kernel.org.
Comment 8 Geert Jansen 2013-12-23 04:41:22 EST
Posted the question upstream:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg29914.html
Comment 9 Geert Jansen 2013-12-23 05:33:15 EST
As pointed out in the mailing list, a different patch was provided by Josef Bacik a few days ago. The patch is here:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg29917.html

I've attached the patch to this bugzilla. I applies cleanly on the 3.12.5 kernel. I'm building it now on my system and will see if it resolves the issues.
Comment 10 Geert Jansen 2013-12-23 05:34:52 EST
Created attachment 840745 [details]
New patch suggested by btrfs-linux mailing list
Comment 11 Josh Boyer 2014-01-06 08:09:32 EST
Did your test of the patch work?
Comment 12 Geert Jansen 2014-01-06 08:25:07 EST
I have been using kernel 3.12.6-300.fc20 from updates-testing for 11 days now, and no crashes so far.
Comment 13 Josh Boyer 2014-01-06 13:50:19 EST
OK, 3.12.6 contains:

commit 486d1e163be2d32150a053c7ac3fc853ba6fd998
Author: Josef Bacik <jbacik@fusionio.com>
Date:   Mon Oct 28 09:13:25 2013 -0400

    Btrfs: take ordered root lock when removing ordered operations inode
    
    commit 93858769172c4e3678917810e9d5de360eb991cc upstream.

which is the patch that was suggested.  That's already in stable updates, so closing this out.  Thanks much!

Note You need to log in before you can comment on or make changes to this bug.