Bug 1514734
| Summary: | [abrt] kernel-PAE-core: __do_softirq(): WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:2821 rcu_process_callbacks+0x436/0x460 | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Claude Frantz <Claude.Frantz> | ||||||||||||||||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||
| Version: | 27 | CC: | airlied, bskeggs, Claude.Frantz, ewk, fatkasuvayu, fedora2021q2, hdegoede, ichavero, itamar, jarodwilson, jeremy, jforbes, jglisse, john.j5live, jonathan, josef, kernel-maint, labbott, linville, mchehab, mjg59, oggust, steved | ||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||
| Hardware: | i686 | ||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||
| Whiteboard: | abrt_hash:e295f92f80cdf8992a61fd5079554cada79237c9; | ||||||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||
| Last Closed: | 2018-07-30 13:41:50 UTC | Type: | --- | ||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
| Embargoed: | |||||||||||||||||||||
| Bug Depends On: | |||||||||||||||||||||
| Bug Blocks: | 1489998 | ||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
|
Description
Claude Frantz
2017-11-18 07:06:08 UTC
Created attachment 1354507 [details]
File: backtrace
Created attachment 1354508 [details]
File: cpuinfo
Created attachment 1354509 [details]
File: dmesg
Created attachment 1354510 [details]
File: kernel_tainted_long
Created attachment 1354511 [details]
File: not-reportable
Created attachment 1354512 [details]
File: proc_modules
Created attachment 1354513 [details]
File: suspend_stats
We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. The kernel moves very fast so bugs may get fixed as part of a kernel update. Due to this, we are doing a mass bug update across all of the Fedora 26 kernel bugs. Fedora 26 has now been rebased to 4.15.4-200.fc26. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 27, and are still experiencing this issue, please change the version to Fedora 27. If you experience different issues, please open a new bug report for those. I'm seeing something very similar on 4.15.4-300: [Wed Feb 28 14:03:57 2018] WARNING: CPU: 6 PID: 0 at kernel/rcu/tree.c:2792 rcu_process_callbacks+0x4cb/0x4e0 and: [Wed Feb 28 14:03:57 2018] Call Trace: [Wed Feb 28 14:03:57 2018] <IRQ> [Wed Feb 28 14:03:57 2018] __do_softirq+0xe7/0x2cb [Wed Feb 28 14:03:57 2018] irq_exit+0xf1/0x100 [Wed Feb 28 14:03:57 2018] smp_apic_timer_interrupt+0x6c/0x120 [Wed Feb 28 14:03:57 2018] apic_timer_interrupt+0xa2/0xb0 [Wed Feb 28 14:03:57 2018] </IRQ> This happens when trying to use a set of disks behind an eSATA port multiplier. After this, disconnecting the disks doesn't produce any dmesg output, sync hangs, etc., and a restart seems to be the only thing that gets things back to normal. I didn't see this a week ago, on 4.15.3, or any time previously, though it may be unrelated to the kernel update. I can't change the Fedora version BTW; someone else will need to do that if necessary. Obviously not statistically significant yet, but I booted into 4.15.3 and didn't see the same error when connecting/mounting etc. the eSATA box. Before that I saw it each of the three times I tried to do the same on 4.15.4. Two more data points: 4.15.6: same failure seen when using eSATA disks behind multiplier. 4.15.3: worked fine. So currently 100% of 4 attempts on >= 4.15.4 have failed as above, 100% of 2 attempts on 4.15.3 (since first seeing this issue) have *not* failed. Starting to look more and more like a kernel regression - what's the best way of dealing with this issue so it doesn't languish in RHBZ? Upstream bug (patch already accepted presumably to master but not in stable trees as of 3 days ago, apparently): https://bugzilla.kernel.org/show_bug.cgi?id=198861 From upstream:
> Kernels 4.15.10 and 4.14.27 include patch "scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops".
So 4.15.10 should do the trick.
Created attachment 1409466 [details]
traceback
Looks like I have the same problem. I have attached my traceback to the bug. I'll test out 4.15.10 from updates-testing and see if it addresses my issue.
Does it make a difference that I am not using a PAE kernel (I use x86_64)? I tried 4.15.10 from updates-testing, I still experience a freeze, curiously though, I don't see a traceback now.
In fact, I have a weird issue. The traceback and the ata errors don't always show up in the journal. For example with 4.13.9, I see ata errors in the journal like this:
ata5.00: exception Emask 0x11 SAct 0x7ff7ffff SErr 0x400000 action 0x6 frozen
ata5.00: irq_stat 0x48000008, interface fatal error
ata5: SError: { Handshk }
ata5.00: failed command: WRITE FPDMA QUEUED
ata5.00: cmd 61/58:00:60:72:4c/05:00:17:00:00/40 tag 0 ncq dma 700416 out
ata5.00: status: { DRDY }
ata5.00: failed command: WRITE FPDMA QUEUED
ata5.00: cmd 61/a8:08:b8:77:4c/02:00:17:00:00/40 tag 1 ncq dma 348160 out
ata5.00: status: { DRDY }
but no traceback. However for 4.15+ kernels up to 4.15.9 it's the opposite, I do not see the ata errors, but I see the traceback I attached above. On upgrading to 4.15.10, I see the above ata errors again, but the traceback is missing. The freezes are a constant through all these kernels though.
Suvayu this isn't limited to PAE kernels, no. The backtrace is a result of a bug introduced into the kernel in 4.15.4 (see the upstream bug), which shouldn't happen, and has been fixed in 4.15.10+. The ATA errors are (very likely) a result of a poor-quality link (or failing/buggy hardware), and aren't (or very unlikely to be) a kernel bug. The freezes are probably related to the faulty/failing disk hardware/link. .... In general reply to this bug, 4.15.10 seems to have fixed this issue - I saw some link resets (normal for this crap eSATA box) but no backtrace, and didn't end up in a state that required a reboot. Stephen, sorry about my late response. Thank you for your comments, they are reassuring. If it's alright, I would like to ask a follow-up question. The old drive on my system is not mounted at a critical point. In fact I boot without it, and mount when I need some dump space. My system freezes happen both when I'm using it, or not (as in, unmounted, or mounted but no process is accessing files in the partition). When I'm using it, the freeze will happen, it's just a matter of time, but when I'm not, it's quite random. Also, for a disk related freeze where the partition is non-critical, I would expect the process accessing files in that partition to freeze and go to "uninterruptible sleep" not instantaneously freeze the whole system. Do you think this points to other problems beside my disks? I am having graphics issues (kernel support is incomplete), so I boot with nomodeset. All critical components in my system are brand new. Suvayu, I suggest to run a long selftest on your drive and to examine carefully the report. Perhaps there exists a firmware update for the drive, which is able to resolve the problem. Remember that, on a PC, a drive is able to freeze the whole system, via the controller, in the case of a malfunction or even when using a hidden or badly documented option. The kernel is not always able to recognize any of such behaviours. Please be careful and ensure that the drive itself is working well. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 27 kernel bugs. Fedora 27 has now been rebased to 4.17.7-100.fc27. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 28, and are still experiencing this issue, please change the version to Fedora 28. If you experience different issues, please open a new bug report for those. This is fixed; I can't close it. Thanks for the update. |