| Summary: | kernel BUG at kernel/timer.c:844! | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Albert Strasheim <fullung> | ||||||
| Component: | kernel | Assignee: | Lukáš Czerner <lczerner> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 15 | CC: | fullung, gansalmon, itamar, jonathan, kernel-maint, lczerner, madhu.chinakonda | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | kernel-2.6.38.8-32.fc15 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-06-23 08:48:05 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Albert Strasheim
2011-04-26 12:32:48 UTC
void add_timer(struct timer_list *timer)
{
BUG_ON(timer_pending(timer));
mod_timer(timer, timer->expires);
}
Happens on CPU 19, meaning there are lots of processors in that machine...
Lukas, looks like one for you :) -Eric (In reply to comment #0) > How reproducible: > > Sometimes when doing lots of ext4 operations. Any particular type of "lots of ext4 operations" ? I have done a lot of testing/benchmarks with lazyinit running and never saw this, so maybe it is workload related ? Could you provide more information about a box you're using ? Have you been able to reproduce it reliably, or did you saw this on different machines ? Thanks! -Lukas Created attachment 495255 [details]
Patch to use schedule_timeout_interruptible()
I *think* I found the problem. The problem is that add_timer(&eli->li_timer); in ext4_lazyinit_thread() might be called again before the previous timer expired, hence causing the BUG_ON(). It might be easily solved by usind mod_timer() instead, however I have a better solution which uses schedule_timeout_interruptible() and simplify things a lot.
Will you be willing to test this quick & dirty patch since I am not able to reproduce the problem ??
Thanks in advance!
-Lukas
(In reply to comment #4) > Created attachment 495255 [details] > Patch to use schedule_timeout_interruptible() > Have you posted this upstream to get some feedback? (In reply to comment #5) > (In reply to comment #4) > > Created attachment 495255 [details] > > Patch to use schedule_timeout_interruptible() > > > Have you posted this upstream to get some feedback? Of course not. This is not upstream material, the purpose was just to verify that this really caused the problem, since I am not able to reproduce it. Now I am testing the final version of the patch and I will send it upstream soon as it should go in regardless on whether it fixes this problem or not. Created attachment 496229 [details] ext4: Use schedule_timeout_interruptible() for waiting in lazyinit thread The patch has been posted upstream. I would really appreciate if you can test whether this fixes the problem, or we have problem elsewhere. I do not have enough information to actually reproduce it. http://marc.info/?l=linux-ext4&m=130433537707670&w=2 http://marc.info/?l=linux-ext4&m=130433537807674&w=2 Thanks! -Lukas Will test this tomorrow. I haven't been able to reliably reproduce this crash with the original kernel yet. My test program that caused this once basically created a bunch of multi-gigabytes files in /dev/shm, ran mkfs.ext4 with lazy_itable_init=1 on them, then loopback mounted then, created some directories containing big files (~100 MB), did various operations (read, mmap, etc.), and then unmounted the volumes. Hi Albert, what is the licence of you test program ? Can you share it so I can better see what is the load and I can try to reproduce it ? Thanks! -Lukas Hello I haven't been able to reproduce this BUG, even without the patch. It seems that triggering the necessary sequence of events to see the bug is quite tricky. I'll try to do some more testing to come up with a reproducable test, but if you think you've fixed it, that's good enough for me. Regards Albert Hi Albert, I believe that this particular bug has been fixed with that patch, but it would have been better to be able to confirm that. But nevermind, thanks a lot for the report and for the testing! Thanks! -Lukas Lukas has this one well in hand, assigning over to him... Thanks, -Eric This issue has been fixed with upstream commit 4ed5c033c11b33149d993734a6a8de1016e8f03f ext4: Use schedule_timeout_interruptible() for waiting in lazyinit thread but there are more useful commits in this area: e1290b3e62c496ade19939ce036f35bb69306820 ext4: Remove unnecessary wait_event ext4_run_lazyinit_thread() 51ce65115642b77040f5582b8d2fc8815ac450f9 ext4: fix the mount option "init_itable=n" to work as expected for n=0 1bb933fb1fa8e4cb337a0d5dfd2ff4c0dc2073e8 ext4: fix possible use-after-free in ext4_remove_li_request() though, I am not sure what is proper course of action for Fedora... Chuck ? Thanks! -Lukas (In reply to comment #14) > This issue has been fixed with upstream commit > > 4ed5c033c11b33149d993734a6a8de1016e8f03f ext4: Use > schedule_timeout_interruptible() for waiting in lazyinit thread > This is in 2.6.38.8 > but there are more useful commits in this area: > > e1290b3e62c496ade19939ce036f35bb69306820 ext4: Remove unnecessary wait_event > ext4_run_lazyinit_thread() > 51ce65115642b77040f5582b8d2fc8815ac450f9 ext4: fix the mount option > "init_itable=n" to work as expected for n=0 > 1bb933fb1fa8e4cb337a0d5dfd2ff4c0dc2073e8 ext4: fix possible use-after-free in > ext4_remove_li_request() > That last one is also in 2.6.38.8 I'll just close this now - those other two patches don't look as critical. |