Version-Release number of selected component: kernel-PAE-core-4.13.12-200.fc26 Additional info: reporter: libreport-2.9.1 cmdline: BOOT_IMAGE=/vmlinuz-4.13.12-200.fc26.i686+PAE root=UUID=30a9af7c-df05-4249-a2ad-b920bcbd4f45 ro rd.md=0 rd.lvm=0 rd.dm=0 rd.luks=0 vconsole.font=latarcyrheb-sun16 vconsole.keymap=de rhgb acpi_backlight=vendor acpi_osi=Linux resume=/dev/sda6 quiet LANG=en_US.UTF-8 crash_function: __do_softirq kernel: 4.13.12-200.fc26.i686+PAE kernel_tainted_short: GDW runlevel: N 5 type: Kerneloops Truncated backtrace: WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:2821 rcu_process_callbacks+0x436/0x460 Modules linked in: fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ccm ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc coretemp kvm_intel kvm uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core irqbypass videodev iTCO_wdt iTCO_vendor_support snd_hda_codec_via arc4 media snd_hda_codec_generic ath9k snd_hda_intel ath9k_common ath9k_hw snd_hda_codec joydev lpc_ich snd_hda_core mac80211 snd_hwdep snd_seq snd_seq_device snd_pcm ath asus_laptop cfg80211 sparse_keymap snd_timer rfkill tpm_tis input_polldev tpm_tis_core snd tpm soundcore acpi_cpufreq dm_multipath i915 serio_raw i2c_algo_bit drm_kms_helper atl1e drm video CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D W 4.13.12-200.fc26.i686+PAE #1 Hardware name: ASUSTeK Computer Inc. P50IJ /P50IJ , BIOS 203 12/04/2009 task: dd2d2280 task.stack: dd2ca000 EIP: rcu_process_callbacks+0x436/0x460 EFLAGS: 00010002 CPU: 0 EAX: 00000000 EBX: f77ddb40 ECX: 00000002 EDX: 00000001 ESI: f77ddb60 EDI: dd2f0940 EBP: f70c7fc8 ESP: f70c7f98 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 80050033 CR2: b7f01e87 CR3: 1d498000 CR4: 000406f0 Call Trace: <SOFTIRQ> __do_softirq+0xb1/0x260 ? takeover_tasklets+0x1b0/0x1b0 do_softirq_own_stack+0x24/0x30 </SOFTIRQ> irq_exit+0xbd/0xd0 smp_apic_timer_interrupt+0x38/0x50 apic_timer_interrupt+0x39/0x40 EIP: cpuidle_enter_state+0x144/0x360 EFLAGS: 00000246 CPU: 0 EAX: 00000000 EBX: dd335e30 ECX: 3e349739 EDX: 00000000 ESI: 3e349739 EDI: 000000ca EBP: dd2cbf24 ESP: dd2cbef0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 ? trace_event_raw_event_sched_kthread_stop_ret+0x7b/0xa0 cpuidle_enter+0x14/0x20 call_cpuidle+0x21/0x40 do_idle+0x174/0x1d0 cpu_startup_entry+0x65/0x70 rest_init+0x9c/0xa0 start_kernel+0x404/0x41d i386_start_kernel+0x94/0x98 startup_32_smp+0x16b/0x16d Code: 04 2f dd 0f 8f d8 fd ff ff 8b 15 60 04 2f dd 89 53 64 e9 ca fd ff ff 8d b6 00 00 00 00 0f ff e9 29 fc ff ff 0f ff e9 04 fd ff ff <0f> ff e9 e7 fd ff ff 8b 55 dc 89 f0 e8 e9 df 6d 00 e9 53 fc ff
Created attachment 1354507 [details] File: backtrace
Created attachment 1354508 [details] File: cpuinfo
Created attachment 1354509 [details] File: dmesg
Created attachment 1354510 [details] File: kernel_tainted_long
Created attachment 1354511 [details] File: not-reportable
Created attachment 1354512 [details] File: proc_modules
Created attachment 1354513 [details] File: suspend_stats
We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. The kernel moves very fast so bugs may get fixed as part of a kernel update. Due to this, we are doing a mass bug update across all of the Fedora 26 kernel bugs. Fedora 26 has now been rebased to 4.15.4-200.fc26. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 27, and are still experiencing this issue, please change the version to Fedora 27. If you experience different issues, please open a new bug report for those.
I'm seeing something very similar on 4.15.4-300: [Wed Feb 28 14:03:57 2018] WARNING: CPU: 6 PID: 0 at kernel/rcu/tree.c:2792 rcu_process_callbacks+0x4cb/0x4e0 and: [Wed Feb 28 14:03:57 2018] Call Trace: [Wed Feb 28 14:03:57 2018] <IRQ> [Wed Feb 28 14:03:57 2018] __do_softirq+0xe7/0x2cb [Wed Feb 28 14:03:57 2018] irq_exit+0xf1/0x100 [Wed Feb 28 14:03:57 2018] smp_apic_timer_interrupt+0x6c/0x120 [Wed Feb 28 14:03:57 2018] apic_timer_interrupt+0xa2/0xb0 [Wed Feb 28 14:03:57 2018] </IRQ> This happens when trying to use a set of disks behind an eSATA port multiplier. After this, disconnecting the disks doesn't produce any dmesg output, sync hangs, etc., and a restart seems to be the only thing that gets things back to normal. I didn't see this a week ago, on 4.15.3, or any time previously, though it may be unrelated to the kernel update. I can't change the Fedora version BTW; someone else will need to do that if necessary.
Obviously not statistically significant yet, but I booted into 4.15.3 and didn't see the same error when connecting/mounting etc. the eSATA box. Before that I saw it each of the three times I tried to do the same on 4.15.4.
Two more data points: 4.15.6: same failure seen when using eSATA disks behind multiplier. 4.15.3: worked fine. So currently 100% of 4 attempts on >= 4.15.4 have failed as above, 100% of 2 attempts on 4.15.3 (since first seeing this issue) have *not* failed. Starting to look more and more like a kernel regression - what's the best way of dealing with this issue so it doesn't languish in RHBZ?
Upstream bug (patch already accepted presumably to master but not in stable trees as of 3 days ago, apparently): https://bugzilla.kernel.org/show_bug.cgi?id=198861
From upstream: > Kernels 4.15.10 and 4.14.27 include patch "scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops". So 4.15.10 should do the trick.
Created attachment 1409466 [details] traceback Looks like I have the same problem. I have attached my traceback to the bug. I'll test out 4.15.10 from updates-testing and see if it addresses my issue.
Does it make a difference that I am not using a PAE kernel (I use x86_64)? I tried 4.15.10 from updates-testing, I still experience a freeze, curiously though, I don't see a traceback now. In fact, I have a weird issue. The traceback and the ata errors don't always show up in the journal. For example with 4.13.9, I see ata errors in the journal like this: ata5.00: exception Emask 0x11 SAct 0x7ff7ffff SErr 0x400000 action 0x6 frozen ata5.00: irq_stat 0x48000008, interface fatal error ata5: SError: { Handshk } ata5.00: failed command: WRITE FPDMA QUEUED ata5.00: cmd 61/58:00:60:72:4c/05:00:17:00:00/40 tag 0 ncq dma 700416 out ata5.00: status: { DRDY } ata5.00: failed command: WRITE FPDMA QUEUED ata5.00: cmd 61/a8:08:b8:77:4c/02:00:17:00:00/40 tag 1 ncq dma 348160 out ata5.00: status: { DRDY } but no traceback. However for 4.15+ kernels up to 4.15.9 it's the opposite, I do not see the ata errors, but I see the traceback I attached above. On upgrading to 4.15.10, I see the above ata errors again, but the traceback is missing. The freezes are a constant through all these kernels though.
Suvayu this isn't limited to PAE kernels, no. The backtrace is a result of a bug introduced into the kernel in 4.15.4 (see the upstream bug), which shouldn't happen, and has been fixed in 4.15.10+. The ATA errors are (very likely) a result of a poor-quality link (or failing/buggy hardware), and aren't (or very unlikely to be) a kernel bug. The freezes are probably related to the faulty/failing disk hardware/link. .... In general reply to this bug, 4.15.10 seems to have fixed this issue - I saw some link resets (normal for this crap eSATA box) but no backtrace, and didn't end up in a state that required a reboot.
Stephen, sorry about my late response. Thank you for your comments, they are reassuring. If it's alright, I would like to ask a follow-up question. The old drive on my system is not mounted at a critical point. In fact I boot without it, and mount when I need some dump space. My system freezes happen both when I'm using it, or not (as in, unmounted, or mounted but no process is accessing files in the partition). When I'm using it, the freeze will happen, it's just a matter of time, but when I'm not, it's quite random. Also, for a disk related freeze where the partition is non-critical, I would expect the process accessing files in that partition to freeze and go to "uninterruptible sleep" not instantaneously freeze the whole system. Do you think this points to other problems beside my disks? I am having graphics issues (kernel support is incomplete), so I boot with nomodeset. All critical components in my system are brand new.
Suvayu, I suggest to run a long selftest on your drive and to examine carefully the report. Perhaps there exists a firmware update for the drive, which is able to resolve the problem. Remember that, on a PC, a drive is able to freeze the whole system, via the controller, in the case of a malfunction or even when using a hidden or badly documented option. The kernel is not always able to recognize any of such behaviours. Please be careful and ensure that the drive itself is working well.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs. Fedora 27 has now been rebased to 4.17.7-100.fc27. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28. If you experience different issues, please open a new bug report for those.
This is fixed; I can't close it.
Thanks for the update.