Bug 1301220

Summary: [abrt] WARNING: CPU: 4 PID: 1191 at drivers/md/raid5.c:4246 break_stripe_batch_list+0x1a9/0x250 [raid456]() [raid456]
Product: [Fedora] Fedora Reporter: Brian <bugzilla-redhat>
Component: kernelAssignee: fedora-kernel-raid
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: bugzilla-redhat, extras-qa, gansalmon, itamar, jonathan, kernel-maint, labbott, madhu.chinakonda, mchehab
Target Milestone: ---Flags: bugzilla-redhat: needinfo-
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/5857f2cd15c43af419e6962bfcc11806dc073ded
Whiteboard: abrt_hash:d0f4f9cfcf46183648ff065569980002ffa2efe4;VARIANT_ID=server;
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-03 11:50:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: dmesg none

Description Brian 2016-01-22 22:05:28 UTC
Description of problem:
This happened with kernel 4.3.3.300.fc23.x86_64 too

15 virtual machines run on this array.  There seems to be no specific action that causes the crash.

Additional info:
reporter:       libreport-2.6.3
WARNING: CPU: 4 PID: 1191 at drivers/md/raid5.c:4246 break_stripe_batch_list+0x1a9/0x250 [raid456]()
Modules linked in: rpcsec_gss_krb5 bluetooth vhost_net vhost macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 8021q garp mrp cfg80211 rfkill ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables ip6table_security ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle w83795 xfs libcrc32c btrfs raid456 kvm_amd async_raid6_recov kvm async_memcpy async_pq async_xor async_tx xor crct10dif_pclmul crc32_pclmul crc32c_intel amd64_edac_mod joydev edac_core raid6_pq sp5100_tco fam15h_power k10temp edac_mce_amd i2c_piix4 shpchp tpm_tis
 tpm acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc raid1 mgag200 i2c_algo_bit drm_kms_helper ttm drm e1000 serio_raw e1000e sata_sil24 mpt2sas raid_class ptp scsi_transport_sas pps_core
CPU: 4 PID: 1191 Comm: md124_raid6 Not tainted 4.2.8-300.fc23.x86_64 #1
Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5        11/25/2013
 0000000000000000 00000000a427c98a ffff880bdd1f3a98 ffffffff817738ca
 0000000000000000 0000000000000000 ffff880bdd1f3ad8 ffffffff8109e4c6
 0000000000000010 0000000000000000 ffff8817e87a48f0 ffff8817c8092500
Call Trace:
 [<ffffffff817738ca>] dump_stack+0x45/0x57
 [<ffffffff8109e4c6>] warn_slowpath_common+0x86/0xc0
 [<ffffffff8109e5fa>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa04a61b9>] break_stripe_batch_list+0x1a9/0x250 [raid456]
 [<ffffffffa04af9b9>] handle_stripe+0x9b9/0x2550 [raid456]
 [<ffffffff81779d1e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
 [<ffffffffa04b16e6>] handle_active_stripes.isra.45+0x196/0x4b0 [raid456]
 [<ffffffffa04b1e7f>] raid5d+0x47f/0x660 [raid456]
 [<ffffffff815e3909>] md_thread+0x139/0x150
 [<ffffffff810df9a0>] ? wake_atomic_t_function+0x70/0x70
 [<ffffffff815e37d0>] ? find_pers+0x80/0x80
 [<ffffffff810bc8c8>] kthread+0xd8/0xf0
 [<ffffffff810bc7f0>] ? kthread_worker_fn+0x160/0x160
 [<ffffffff8177a69f>] ret_from_fork+0x3f/0x70
 [<ffffffff810bc7f0>] ? kthread_worker_fn+0x160/0x160

Comment 1 Brian 2016-01-22 22:05:38 UTC
Created attachment 1117353 [details]
File: dmesg

Comment 2 Brian 2016-01-23 01:44:19 UTC
This happened last week, Jan 15 09:09:32 , with kernel 4.2.8-300

 kernel: ------------[ cut here ]------------
 kernel: WARNING: CPU: 4 PID: 1191 at drivers/md/raid5.c:4246 break_stripe_batch_list+0x1a9/0x250 [raid456]()
 kernel: Modules linked in: rpcsec_gss_krb5 bluetooth vhost_net vhost macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 8021q garp mrp cfg80211 rfkill ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables ip6table_security ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_security iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle w83795 xfs libcrc32c btrfs raid456 kvm_amd async_raid6_recov kvm async_memcpy async_pq async_xor async_tx xor crct10dif_pclmul crc32_pclmul crc32c_intel amd64_edac_mod joydev edac_core raid6_pq sp5100_tco fam15h_power k10temp edac_mce_amd i2c_piix4 shpchp tpm_tis
 kernel:  tpm acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc raid1 mgag200 i2c_algo_bit drm_kms_helper ttm drm e1000 serio_raw e1000e sata_sil24 mpt2sas raid_class ptp scsi_transport_sas pps_core
 kernel: CPU: 4 PID: 1191 Comm: md124_raid6 Not tainted 4.2.8-300.fc23.x86_64 #1
 kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5        11/25/2013
 kernel:  0000000000000000 00000000a427c98a ffff880bdd1f3a98 ffffffff817738ca
 kernel:  0000000000000000 0000000000000000 ffff880bdd1f3ad8 ffffffff8109e4c6
 kernel:  0000000000000010 0000000000000000 ffff8817e87a48f0 ffff8817c8092500
 kernel: Call Trace:
 kernel:  [<ffffffff817738ca>] dump_stack+0x45/0x57
 kernel:  [<ffffffff8109e4c6>] warn_slowpath_common+0x86/0xc0
 kernel:  [<ffffffff8109e5fa>] warn_slowpath_null+0x1a/0x20
 kernel:  [<ffffffffa04a61b9>] break_stripe_batch_list+0x1a9/0x250 [raid456]
 kernel:  [<ffffffffa04af9b9>] handle_stripe+0x9b9/0x2550 [raid456]
 kernel:  [<ffffffff81779d1e>] ? _raw_spin_unlock_irqrestore+0xe/0x10
 kernel:  [<ffffffffa04b16e6>] handle_active_stripes.isra.45+0x196/0x4b0 [raid456]
 kernel:  [<ffffffffa04b1e7f>] raid5d+0x47f/0x660 [raid456]
 kernel:  [<ffffffff815e3909>] md_thread+0x139/0x150
 kernel:  [<ffffffff810df9a0>] ? wake_atomic_t_function+0x70/0x70
 kernel:  [<ffffffff815e37d0>] ? find_pers+0x80/0x80
 kernel:  [<ffffffff810bc8c8>] kthread+0xd8/0xf0
 kernel:  [<ffffffff810bc7f0>] ? kthread_worker_fn+0x160/0x160
 kernel:  [<ffffffff8177a69f>] ret_from_fork+0x3f/0x70
 kernel:  [<ffffffff810bc7f0>] ? kthread_worker_fn+0x160/0x160
 kernel: ---[ end trace fed71451b49ee7b6 ]---

Comment 3 Brian 2016-02-05 23:23:32 UTC
Another Friday, another crash.

This time kernel-4.3.3-303.fc23.x86_64

Related to bug 1258153  ?

I also set /sys/block/md124/md/stripe_cache_size to 16384 on boot as noted by

bug 1258153, comment 1


Feb  5 15:18:11 vh0 kernel: ------------[ cut here ]------------
Feb  5 15:18:11 vh0 kernel: WARNING: CPU: 11 PID: 1177 at drivers/md/raid5.c:4240 break_stripe_batch_list+0x1a9/0x250 [raid456]()
Feb  5 15:18:11 vh0 kernel: Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 8021q garp mrp cfg80211 rfkill ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security iptable_raw w83795 xfs libcrc32c kvm_amd kvm btrfs crct10dif_pclmul crc32_pclmul raid456 crc32c_intel async_raid6_recov async_memcpy async_pq async_xor xor async_tx joydev raid6_pq amd64_edac_mod edac_mce_amd fam15h_power sp5100_tco edac_core k10temp shpchp i2c_piix4 tpm_tis tpm acpi_cpufreq nfsd
Feb  5 15:18:11 vh0 kernel: auth_rpcgss nfs_acl lockd grace sunrpc raid1 mgag200 i2c_algo_bit drm_kms_helper ttm drm e1000 serio_raw e1000e mpt2sas sata_sil24 raid_class ptp scsi_transport_sas pps_core fjes
Feb  5 15:18:11 vh0 kernel: CPU: 11 PID: 1177 Comm: md124_raid6 Not tainted 4.3.3-303.fc23.x86_64 #1
Feb  5 15:18:11 vh0 kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5        11/25/2013
Feb  5 15:18:11 vh0 kernel: 0000000000000000 000000004b473b69 ffff88003661bad8 ffffffff813a625f
Feb  5 15:18:11 vh0 kernel: 0000000000000000 ffff88003661bb10 ffffffff810a07c2 0000000000000000
Feb  5 15:18:11 vh0 kernel: ffff8817cb9fc8f0 ffff8817dc89bd50 ffff8817dc89bcc8 ffff8817cb9fc8f0
Feb  5 15:18:11 vh0 kernel: Call Trace:
Feb  5 15:18:11 vh0 kernel: [<ffffffff813a625f>] dump_stack+0x44/0x55
Feb  5 15:18:11 vh0 kernel: [<ffffffff810a07c2>] warn_slowpath_common+0x82/0xc0
Feb  5 15:18:11 vh0 kernel: [<ffffffff810a090a>] warn_slowpath_null+0x1a/0x20
Feb  5 15:18:11 vh0 kernel: [<ffffffffa038c0d9>] break_stripe_batch_list+0x1a9/0x250 [raid456]
Feb  5 15:18:11 vh0 kernel: [<ffffffffa03959e4>] handle_stripe+0xa44/0x2640 [raid456]
Feb  5 15:18:11 vh0 kernel: [<ffffffffa0059e61>] ? _scsih_qcmd+0x281/0x7c0 [mpt2sas]
Feb  5 15:18:11 vh0 kernel: [<ffffffffa039776d>] handle_active_stripes.isra.44+0x18d/0x4a0 [raid456]
Feb  5 15:18:11 vh0 kernel: [<ffffffffa038b87d>] ? do_release_stripe+0x8d/0x170 [raid456]
Feb  5 15:18:11 vh0 kernel: [<ffffffff815fbd96>] ? bitmap_daemon_work+0x1c6/0x350
Feb  5 15:18:11 vh0 kernel: [<ffffffffa038b975>] ? __release_stripe+0x15/0x20 [raid456]
Feb  5 15:18:11 vh0 kernel: [<ffffffffa0397efc>] raid5d+0x47c/0x710 [raid456]
Feb  5 15:18:11 vh0 kernel: [<ffffffff811086be>] ? try_to_del_timer_sync+0x5e/0x90
Feb  5 15:18:11 vh0 kernel: [<ffffffff81108460>] ? trace_event_raw_event_tick_stop+0xf0/0xf0
Feb  5 15:18:11 vh0 kernel: [<ffffffff815ed089>] md_thread+0x139/0x150
Feb  5 15:18:11 vh0 kernel: [<ffffffff810e2370>] ? wake_atomic_t_function+0x70/0x70
Feb  5 15:18:11 vh0 kernel: [<ffffffff815ecf50>] ? find_pers+0x70/0x70
Feb  5 15:18:11 vh0 kernel: [<ffffffff810bede8>] kthread+0xd8/0xf0
Feb  5 15:18:11 vh0 kernel: [<ffffffff810bed10>] ? kthread_worker_fn+0x160/0x160
Feb  5 15:18:11 vh0 kernel: [<ffffffff81781adf>] ret_from_fork+0x3f/0x70
Feb  5 15:18:11 vh0 kernel: [<ffffffff810bed10>] ? kthread_worker_fn+0x160/0x160
Feb  5 15:18:11 vh0 kernel: ---[ end trace 537dd668d3493211 ]---
Feb  5 15:18:12 vh0 abrt-dump-journal-oops: abrt-dump-journal-oops: Found oopses: 1
Feb  5 15:18:12 vh0 abrt-dump-journal-oops: abrt-dump-journal-oops: Creating problem directories
Feb  5 15:18:13 vh0 abrt-server: Looking for kernel package
Feb  5 15:18:13 vh0 abrt-server: Kernel package kernel-core-4.3.3-303.fc23.x86_64 found
Feb  5 15:18:13 vh0 abrt-dump-journal-oops: Reported 1 kernel oopses to Abrt

Comment 4 Brian 2016-02-08 15:21:34 UTC
Crashed again in kernel-4.3.4-300.fc23.x86_64, after 2 days

Feb  7 22:46:39 vh0 kernel: ------------[ cut here ]------------                                                                                                                                              
Feb  7 22:46:39 vh0 kernel: WARNING: CPU: 1 PID: 1184 at drivers/md/raid5.c:4240 break_stripe_batch_list+0x1a9/0x250 [raid456]()                                                                              
Feb  7 22:46:39 vh0 kernel: Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 8021q garp mrp cfg80211 rfkill ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_broute bridge stp llc ebtable_filter ebtable_nat ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_security ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_mangle iptable_security w83795 xfs libcrc32c kvm_amd kvm raid456 btrfs async_raid6_recov crct10dif_pclmul crc32_pclmul async_memcpy async_pq crc32c_intel async_xor xor async_tx joydev raid6_pq sp5100_tco amd64_edac_mod edac_mce_amd i2c_piix4 k10temp fam15h_power shpchp edac_core acpi_cpufreq tpm_tis tpm nfsd  Feb  7 22:46:39 vh0 kernel: auth_rpcgss nfs_acl lockd grace sunrpc raid1 mgag200 i2c_algo_bit drm_kms_helper ttm drm serio_raw e1000 e1000e mpt2sas sata_sil24 raid_class ptp scsi_transport_sas pps_core fjes
Feb  7 22:46:39 vh0 kernel: CPU: 1 PID: 1184 Comm: md124_raid6 Not tainted 4.3.4-300.fc23.x86_64 #1                                                                                                           
Feb  7 22:46:39 vh0 kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5        11/25/2013                                                                                                                 
Feb  7 22:46:39 vh0 kernel: 0000000000000000 00000000c0bdb046 ffff8817e175fad8 ffffffff813a625f                                                                                                               
Feb  7 22:46:39 vh0 kernel: 0000000000000000 ffff8817e175fb10 ffffffff810a07c2 0000000000000000                                                                                                               
Feb  7 22:46:39 vh0 kernel: ffff8814d8e28c28 ffff8817d0acd5a0 ffff8817d0acd518 ffff8817d442d518                                                                                                               Feb  7 22:46:39 vh0 kernel: Call Trace:                                                                                                                                                                       
Feb  7 22:46:39 vh0 kernel: [<ffffffff813a625f>] dump_stack+0x44/0x55                                                                                                                                         Feb  7 22:46:39 vh0 kernel: [<ffffffff810a07c2>] warn_slowpath_common+0x82/0xc0                                                                                                                               
Feb  7 22:46:39 vh0 kernel: [<ffffffff810a090a>] warn_slowpath_null+0x1a/0x20                                                                                                                                 Feb  7 22:46:39 vh0 kernel: [<ffffffffa03340d9>] break_stripe_batch_list+0x1a9/0x250 [raid456]                                                                                                                
Feb  7 22:46:39 vh0 kernel: [<ffffffffa033d9e4>] handle_stripe+0xa44/0x2640 [raid456]                                                                                                                         Feb  7 22:46:39 vh0 kernel: [<ffffffffa005ce61>] ? _scsih_qcmd+0x281/0x7c0 [mpt2sas]                                                                                                                          
Feb  7 22:46:39 vh0 kernel: [<ffffffffa033f76d>] handle_active_stripes.isra.44+0x18d/0x4a0 [raid456]                                                                                                          
Feb  7 22:46:39 vh0 kernel: [<ffffffffa033387d>] ? do_release_stripe+0x8d/0x170 [raid456]                                                                                                                     
Feb  7 22:46:39 vh0 kernel: [<ffffffff815fbdf6>] ? bitmap_daemon_work+0x1c6/0x350                                                                                                                             
Feb  7 22:46:39 vh0 kernel: [<ffffffffa0333975>] ? __release_stripe+0x15/0x20 [raid456]                                                                                                                       Feb  7 22:46:39 vh0 kernel: [<ffffffffa033fefc>] raid5d+0x47c/0x710 [raid456]                                                                                                                                 
Feb  7 22:46:39 vh0 kernel: [<ffffffff811086be>] ? try_to_del_timer_sync+0x5e/0x90                                                                                                                            
Feb  7 22:46:39 vh0 kernel: [<ffffffff81108460>] ? trace_event_raw_event_tick_stop+0xf0/0xf0                                                                                                                  
Feb  7 22:46:39 vh0 kernel: [<ffffffff815ed0e9>] md_thread+0x139/0x150                                                                                                                                        Feb  7 22:46:39 vh0 kernel: [<ffffffff810e2370>] ? wake_atomic_t_function+0x70/0x70                                                                                                                           
Feb  7 22:46:39 vh0 kernel: [<ffffffff815ecfb0>] ? find_pers+0x70/0x70                                                                                                                                        Feb  7 22:46:39 vh0 kernel: [<ffffffff810bede8>] kthread+0xd8/0xf0                                                                                                                                            
Feb  7 22:46:39 vh0 kernel: [<ffffffff810bed10>] ? kthread_worker_fn+0x160/0x160                                                                                                                              
Feb  7 22:46:39 vh0 kernel: [<ffffffff81781b9f>] ret_from_fork+0x3f/0x70                                                                                                                                      
Feb  7 22:46:39 vh0 kernel: [<ffffffff810bed10>] ? kthread_worker_fn+0x160/0x160                                                                                                                              
Feb  7 22:46:39 vh0 kernel: ---[ end trace 9d45f5089741982a ]---

Comment 5 Brian 2016-02-14 20:51:12 UTC
Happy Valentine's day to me, with a trip to the office to hard reset the server. . . . again.

Is there anything else I can do to help diagnose this?  I ask because I'm currently compiling a 4.0.8 kernel for f23( per advice from bug 1258153 ) and I'll be unable to help further.  At some point though, I'd like to have confidence that I can use future kernels again.

This is a RAID 6, and the stripe_cache_size was set to 16384 via udev rules at boot.  It runs anywhere between 3 and 10 days before this happens.


Feb 14 07:31:47 vh0 kernel: ------------[ cut here ]------------
Feb 14 07:31:47 vh0 kernel: WARNING: CPU: 2 PID: 1230 at drivers/md/raid5.c:4240 break_stripe_batch_list+0x1a9/0x250 [raid456]()
Feb 14 07:31:47 vh0 kernel: Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security vhost_net vhost macvtap macvlan tun 8021q garp mrp cfg80211 rfkill bridge stp llc w83795 xfs libcrc32c kvm_amd btrfs kvm raid456 crct10dif_pclmul crc32_pclmul async_raid6_recov crc32c_intel async_memcpy async_pq async_xor xor async_tx raid6_pq joydev amd64_edac_mod sp5100_tco acpi_cpufreq edac_mce_amd shpchp k10temp fam15h_power i2c_piix4 edac_core tpm_tis tpm nfsd
Feb 14 07:31:47 vh0 kernel: auth_rpcgss nfs_acl lockd grace sunrpc raid1 mgag200 i2c_algo_bit drm_kms_helper ttm drm e1000 serio_raw e1000e sata_sil24 mpt2sas ptp raid_class scsi_transport_sas pps_core fjes [last unloaded: iptable_raw]
Feb 14 07:31:47 vh0 kernel: CPU: 2 PID: 1230 Comm: md124_raid6 Not tainted 4.3.5-300.fc23.x86_64 #1
Feb 14 07:31:47 vh0 kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5        11/25/2013
Feb 14 07:31:47 vh0 kernel: 0000000000000000 00000000b32e7713 ffff880be70dfad8 ffffffff813a643f
Feb 14 07:31:47 vh0 kernel: 0000000000000000 ffff880be70dfb10 ffffffff810a07d2 0000000000000000
Feb 14 07:31:47 vh0 kernel: ffff8817ddb18000 ffff8817c9412500 ffff8817c9412478 ffff8817dbfe48f0
Feb 14 07:31:47 vh0 kernel: Call Trace:
Feb 14 07:31:47 vh0 kernel: [<ffffffff813a643f>] dump_stack+0x44/0x55
Feb 14 07:31:47 vh0 kernel: [<ffffffff810a07d2>] warn_slowpath_common+0x82/0xc0
Feb 14 07:31:47 vh0 kernel: [<ffffffff810a091a>] warn_slowpath_null+0x1a/0x20
Feb 14 07:31:47 vh0 kernel: [<ffffffffa03650d9>] break_stripe_batch_list+0x1a9/0x250 [raid456]
Feb 14 07:31:47 vh0 kernel: [<ffffffffa036e9e4>] handle_stripe+0xa44/0x2640 [raid456]
Feb 14 07:31:47 vh0 kernel: [<ffffffff810c9dd7>] ? try_to_wake_up+0x47/0x350
Feb 14 07:31:47 vh0 kernel: [<ffffffffa037076d>] handle_active_stripes.isra.44+0x18d/0x4a0 [raid456]
Feb 14 07:31:47 vh0 kernel: [<ffffffffa0370efc>] raid5d+0x47c/0x710 [raid456]
Feb 14 07:31:47 vh0 kernel: [<ffffffff811086ce>] ? try_to_del_timer_sync+0x5e/0x90
Feb 14 07:31:47 vh0 kernel: [<ffffffff815ed479>] md_thread+0x139/0x150
Feb 14 07:31:47 vh0 kernel: [<ffffffff810e2380>] ? wake_atomic_t_function+0x70/0x70
Feb 14 07:31:47 vh0 kernel: [<ffffffff815ed340>] ? find_pers+0x70/0x70
Feb 14 07:31:47 vh0 kernel: [<ffffffff810bedf8>] kthread+0xd8/0xf0
Feb 14 07:31:47 vh0 kernel: [<ffffffff810bed20>] ? kthread_worker_fn+0x160/0x160
Feb 14 07:31:47 vh0 kernel: [<ffffffff8178219f>] ret_from_fork+0x3f/0x70
Feb 14 07:31:47 vh0 kernel: [<ffffffff810bed20>] ? kthread_worker_fn+0x160/0x160
Feb 14 07:31:47 vh0 kernel: ---[ end trace 8ebf5228f7cce4cf ]---
Feb 14 07:31:48 vh0 abrt-dump-journal-oops: abrt-dump-journal-oops: Found oopses: 1
Feb 14 07:31:48 vh0 abrt-dump-journal-oops: abrt-dump-journal-oops: Creating problem directories
Feb 14 07:31:48 vh0 abrt-server: Deleting problem directory oops-2016-02-14-07:31:48-1742-0 (dup of oops-2016-01-15-09:09:33-1582-0)
Feb 14 07:31:48 vh0 dbus[1690]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Feb 14 07:31:49 vh0 dbus[1690]: [system] Successfully activated service 'org.freedesktop.problems'
Feb 14 07:31:49 vh0 abrt-dump-journal-oops: Reported 1 kernel oopses to Abrt
Feb 14 07:31:51 vh0 abrt-server: This problem has already been reported.

Comment 6 Brian 2016-02-14 21:06:00 UTC
Do I need to reinitialize the entire array to upgrade the metadata?

https://bbs.archlinux.org/viewtopic.php?id=205801

Comment 7 Brian 2016-02-19 01:24:09 UTC
The metadata on this array is actually 1.2, and thus comment 6 is irrelevant.

I'm still running 4.3.5-300.fc23.x86_64 with 
# cat /etc/udev/rules.d/99-md-raid6-tuning.rules 
SUBSYSTEM=="block", KERNEL=="md*", ACTION=="change", TEST=="md/stripe_cache_size", ATTR{md/stripe_cache_size}="8192"

So far with the 8192 value, it's at 4 days uptime, but after the next crash, I'm reverting to a 4.0 kernel.  Previously the mean uptime was about 7 days with 16384.



# mdadm --detail /dev/md123
/dev/md123:
        Version : 1.2
  Creation Time : Thu Mar 10 08:13:56 2011
     Raid Level : raid6
     Array Size : 5859787776 (5588.33 GiB 6000.42 GB)
  Used Dev Size : 976631296 (931.39 GiB 1000.07 GB)
   Raid Devices : 8
  Total Devices : 8
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Feb 18 20:21:48 2016
          State : clean 
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : hq2.advancedopen.net:aos1
           UUID : 35a4aca9:e895efd5:4d6dc573:354ba5c3
         Events : 5651279

    Number   Major   Minor   RaidDevice State
      15       8      112        0      active sync   /dev/sdh
      13       8       64        1      active sync   /dev/sde
      14       8       80        2      active sync   /dev/sdf
      10       8      128        3      active sync   /dev/sdi
      11       8      144        4      active sync   /dev/sdj
       9       8      160        5      active sync   /dev/sdk
       8       8      176        6      active sync   /dev/sdl
      12       8       96        7      active sync   /dev/sdg

Comment 8 Brian 2016-04-13 00:50:53 UTC
Well, the crash took a LOT longer to happen with stripe_cache_size of 8192, but it happened again today.  Anything from anybody? I can't be the only one suffering this.

Apr 12 15:03:29 vh0 kernel: ------------[ cut here ]------------
Apr 12 15:03:29 vh0 kernel: WARNING: CPU: 4 PID: 1349 at drivers/md/raid5.c:4240 break_stripe_batch_list+0x1a9/0x250 [raid456]()
Apr 12 15:03:29 vh0 kernel: Modules linked in: bluetooth vhost_net vhost macvtap macvlan xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 8021q garp mrp cfg80211 rfkill ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_broute bridge stp llc ebtable_filter ebtable_nat ebtables ip6table_mangle ip6table_raw ip6table_security ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx btrfs w83795 xor raid6_pq xfs libcrc32c kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel joydev amd64_edac_mod edac_mce_amd edac_core sp5100_tco acpi_cpufreq k10temp fam15h_power i2c_piix4 tpm_tis shpchp
Apr 12 15:03:29 vh0 kernel: tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc raid1 mgag200 i2c_algo_bit drm_kms_helper ttm drm e1000 serio_raw e1000e sata_sil24 mpt2sas raid_class ptp scsi_transport_sas pps_core fjes
Apr 12 15:03:29 vh0 kernel: CPU: 4 PID: 1349 Comm: md123_raid6 Not tainted 4.3.5-300.fc23.x86_64 #1
Apr 12 15:03:29 vh0 kernel: Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.5        11/25/2013
Apr 12 15:03:29 vh0 kernel: 0000000000000000 000000002c6177e2 ffff880be73dfad8 ffffffff813a643f
Apr 12 15:03:29 vh0 kernel: 0000000000000000 ffff880be73dfb10 ffffffff810a07d2 0000000000000000
Apr 12 15:03:29 vh0 kernel: ffff8817d1e16140 ffff8802fb143d50 ffff8802fb143cc8 ffff88162b5c0c28
Apr 12 15:03:29 vh0 kernel: Call Trace:
Apr 12 15:03:29 vh0 kernel: [<ffffffff813a643f>] dump_stack+0x44/0x55
Apr 12 15:03:29 vh0 kernel: [<ffffffff810a07d2>] warn_slowpath_common+0x82/0xc0
Apr 12 15:03:29 vh0 kernel: [<ffffffff810a091a>] warn_slowpath_null+0x1a/0x20
Apr 12 15:03:29 vh0 kernel: [<ffffffffa05860d9>] break_stripe_batch_list+0x1a9/0x250 [raid456]
Apr 12 15:03:29 vh0 kernel: [<ffffffffa058f9e4>] handle_stripe+0xa44/0x2640 [raid456]
Apr 12 15:03:29 vh0 kernel: [<ffffffff810d8684>] ? set_next_entity+0xa4/0x880
Apr 12 15:03:29 vh0 kernel: [<ffffffffa059176d>] handle_active_stripes.isra.44+0x18d/0x4a0 [raid456]
Apr 12 15:03:29 vh0 kernel: [<ffffffffa0591efc>] raid5d+0x47c/0x710 [raid456]
Apr 12 15:03:29 vh0 kernel: [<ffffffff811086ce>] ? try_to_del_timer_sync+0x5e/0x90
Apr 12 15:03:29 vh0 kernel: [<ffffffff815ed479>] md_thread+0x139/0x150
Apr 12 15:03:29 vh0 kernel: [<ffffffff810e2380>] ? wake_atomic_t_function+0x70/0x70
Apr 12 15:03:29 vh0 kernel: [<ffffffff815ed340>] ? find_pers+0x70/0x70
Apr 12 15:03:29 vh0 kernel: [<ffffffff810bedf8>] kthread+0xd8/0xf0
Apr 12 15:03:29 vh0 kernel: [<ffffffff810bed20>] ? kthread_worker_fn+0x160/0x160
Apr 12 15:03:29 vh0 kernel: [<ffffffff8178219f>] ret_from_fork+0x3f/0x70
Apr 12 15:03:29 vh0 kernel: [<ffffffff810bed20>] ? kthread_worker_fn+0x160/0x160
Apr 12 15:03:29 vh0 kernel: ---[ end trace a3997021533c32c3 ]---
Apr 12 15:03:30 vh0 abrt-dump-journal-oops: abrt-dump-journal-oops: Found oopses: 1
Apr 12 15:03:30 vh0 abrt-dump-journal-oops: abrt-dump-journal-oops: Creating problem directories
Apr 12 15:03:30 vh0 abrt-server: Deleting problem directory oops-2016-04-12-15:03:30-1362-0 (dup of oops-2016-01-15-09:09:33-1582-0)
Apr 12 15:03:31 vh0 dbus[1289]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Apr 12 15:03:31 vh0 dbus[1289]: [system] Successfully activated service 'org.freedesktop.problems'
Apr 12 15:03:31 vh0 abrt-dump-journal-oops: Reported 1 kernel oopses to Abrt
Apr 12 15:03:32 vh0 abrt-server: This problem has already been reported.
Apr 12 15:03:32 vh0 abrt-server: https://retrace.fedoraproject.org/faf/reports/980973/

Comment 9 Brian 2016-04-13 01:24:43 UTC
Discussion:
https://bugzilla.kernel.org/show_bug.cgi?id=108741

Patch:
http://thread.gmane.org/87r3fkjttq.fsf@notabene.neil.brown.name

Mainlined here:
https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.7


But 4.4.7 isn't built yet in Koji or in F23 updates . . . .

Laura, help?

Comment 10 Laura Abbott 2016-04-13 02:35:39 UTC
Koji was down for scheduled maintenance for part of the afternoon so I couldn't get the build out. It's building now. In the future, please give us at least 24 hours after a stable release is available to get it building (4.4.7 only came out today)

Comment 11 Brian 2016-04-13 02:49:50 UTC
In my indiscribable glee that I had found a patch, I failed to note the day of the kernel release.  You are absolutely correct.  Apologies for any implied deficiencies on your part.  Thank you for your snappy response.  I rest well tonight knowing a fix is coming.

Comment 12 Laura Abbott 2016-04-13 16:59:35 UTC
https://bodhi.fedoraproject.org/updates/kernel-4.4.7-300.fc23 now available in bodhi for testing. Please give karma as appropriate.

Comment 13 Laura Abbott 2016-09-23 19:30:57 UTC
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.

Comment 14 Brian 2016-10-03 11:50:21 UTC
Confirming that kernel > 4.4.7-300 fixes this.