Bug 1028750

Summary: [abrt] list_del corruption. next->prev should be ffff88017cf31958, but was ffff88021f5dddb8
Product: [Fedora] Fedora Reporter: orti1980 <orti1980>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: gansalmon, geertj, itamar, jonathan, josef, kernel-maint, madhu.chinakonda, plroskin
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/cda01281ff7d8b768acb4472d0a201c9282ac075
Whiteboard: abrt_hash:d9df4ead1a29007614601351a91ac19789351de6
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-06 18:50:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: dmesg
none
Patch from the Linux git repository
none
New patch suggested by btrfs-linux mailing list none

Description orti1980 2013-11-10 12:00:20 UTC
Additional info:
reporter:       libreport-2.1.9
list_del corruption. next->prev should be ffff88017cf31958, but was ffff88021f5dddb8
Modules linked in: bnep bluetooth rfkill snd_hda_codec_hdmi kvm_amd kvm crc32_pclmul crc32c_intel mxm_wmi ghash_clmulni_intel snd_hda_codec_realtek microcode serio_raw fam15h_power k10temp edac_core edac_mce_amd r8169 snd_hda_intel mii snd_hda_codec sp5100_tco i2c_piix4 snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd soundcore wmi shpchp acpi_cpufreq mperf nfsd auth_rpcgss nfs_acl lockd sunrpc ata_generic pata_acpi btrfs libcrc32c xor zlib_deflate raid6_pq radeon i2c_algo_bit drm_kms_helper ttm pata_atiixp drm i2c_core
CPU: 6 PID: 232 Comm: btrfs-transacti Not tainted 3.11.6-300.fc20.x86_64 #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./990FX Professional, BIOS L1.95A 07/04/2013
 0000000000000009 ffff88021f5ddce8 ffffffff8164894b ffff88021f5ddd30
 ffff88021f5ddd20 ffffffff8106715d ffff880220a5b000 ffff88021f5dddc8
 ffff88017cf31a00 ffff88017ce36ea8 ffff88017cf31958 ffff88021f5ddd80
Call Trace:
 [<ffffffff8164894b>] dump_stack+0x45/0x56
 [<ffffffff8106715d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff810671cc>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff81310f42>] __list_del_entry+0x82/0xd0
 [<ffffffffa025fe4e>] btrfs_run_ordered_operations+0xce/0x2a0 [btrfs]
 [<ffffffffa02471eb>] btrfs_flush_all_pending_stuffs+0x3b/0x40 [btrfs]
 [<ffffffffa0247e4f>] btrfs_commit_transaction+0x20f/0x950 [btrfs]
 [<ffffffffa023f72d>] transaction_kthread+0x18d/0x220 [btrfs]
 [<ffffffffa023f5a0>] ? verify_parent_transid+0x150/0x150 [btrfs]
 [<ffffffff81088650>] kthread+0xc0/0xd0
 [<ffffffff81088590>] ? insert_kthread_work+0x40/0x40
 [<ffffffff81657aac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81088590>] ? insert_kthread_work+0x40/0x40

Comment 1 orti1980 2013-11-10 12:00:33 UTC
Created attachment 822051 [details]
File: dmesg

Comment 2 Pavel Roskin 2013-12-18 18:47:51 UTC
I believe it's fixed in 931aa87791af46640a46b11fa503a119e36943ec.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=931aa87791af46640a46b11fa503a119e36943ec

The fix is in Linux 3.13-rc3, but not in 3.12.5.  There is no "cc: stable" in the commit.  I think the patch should be backported.

Comment 3 Geert Jansen 2013-12-19 17:54:44 UTC
What is the process to get this backported? My server is crashing about once a week with this error.

Comment 4 Pavel Roskin 2013-12-19 20:45:28 UTC
(In reply to Geert Jansen from comment #3)
The official backporting requires the appropriate access.  If you just want to recompile the kernel, here's the outline of the process (sorry, no time for detailed instructions).

yumdownloader --source kernel
rpm -i kernel*.src.rpm
save the patch from the link
put it to ~/rpmbuild/SOURCES
list it in ~/rpmbuild/SPECS/kernel*.spec
increment the kernel revision in ~/rpmbuild/SPECS/kernel*.spec
rebuild the kernel package with "rpmbuild -ba" or (safer but slower) with mock
install the recompiled kernel
reboot and make sure the new kernel is being loaded by grub
enjoy the result
watch for kernel upgrades and don't reboot to the unfixed kernels

Comment 5 Pavel Roskin 2013-12-19 20:47:41 UTC
Created attachment 839241 [details]
Patch from the Linux git repository

Comment 6 Josh Boyer 2013-12-20 14:02:09 UTC
Did anyone actually test that patch on top of a 3.11 kernel?  The upstream commit in comment #2 says it fixes an error that was introduced with commit b02441999efcc6152b87cd58e7970bb7843f76cf "Btrfs: don't wait for the completion of all the ordered extents".  That referenced commit is in 3.13-rc3 as well.  So the patch was fixing something that supposedly is only broken in 3.13, and that broken commit wasn't brought back to 3.11.y or 3.12.y.  I'm not sure this patch will fix anything.

Josef?

Comment 7 Geert Jansen 2013-12-23 08:32:53 UTC
The patch does not apply to 3.12. The function btrfs_wait_all_ordered_extents has been renamed to btrfs_wait_ordered_roots and has gotten an extra "nr". So I have no idea if this patch still fixes the issue.

I will ask on linux-btrfs.org.

Comment 8 Geert Jansen 2013-12-23 09:41:22 UTC
Posted the question upstream:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg29914.html

Comment 9 Geert Jansen 2013-12-23 10:33:15 UTC
As pointed out in the mailing list, a different patch was provided by Josef Bacik a few days ago. The patch is here:

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg29917.html

I've attached the patch to this bugzilla. I applies cleanly on the 3.12.5 kernel. I'm building it now on my system and will see if it resolves the issues.

Comment 10 Geert Jansen 2013-12-23 10:34:52 UTC
Created attachment 840745 [details]
New patch suggested by btrfs-linux mailing list

Comment 11 Josh Boyer 2014-01-06 13:09:32 UTC
Did your test of the patch work?

Comment 12 Geert Jansen 2014-01-06 13:25:07 UTC
I have been using kernel 3.12.6-300.fc20 from updates-testing for 11 days now, and no crashes so far.

Comment 13 Josh Boyer 2014-01-06 18:50:19 UTC
OK, 3.12.6 contains:

commit 486d1e163be2d32150a053c7ac3fc853ba6fd998
Author: Josef Bacik <jbacik>
Date:   Mon Oct 28 09:13:25 2013 -0400

    Btrfs: take ordered root lock when removing ordered operations inode
    
    commit 93858769172c4e3678917810e9d5de360eb991cc upstream.

which is the patch that was suggested.  That's already in stable updates, so closing this out.  Thanks much!