Login
[x]
Log in using an account from:
Fedora Account System
Red Hat Associate
Red Hat Customer
Or login using a Red Hat Bugzilla account
Forgot Password
Login:
Hide Forgot
Create an Account
Red Hat Bugzilla – Attachment 915342 Details for
Bug 716520
cfq-iosched: CFQ can get GPF at cfq_free_io_context()
[?]
New
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
|
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh83 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
This site requires JavaScript to be enabled to function correctly, please enable it.
Comment
comment_3840344.txt (text/plain), 123.46 KB, created by
Vivek Goyal
on 2011-06-24 18:32:40 UTC
(
hide
)
Description:
Comment
Filename:
MIME Type:
Creator:
Vivek Goyal
Created:
2011-06-24 18:32:40 UTC
Size:
123.46 KB
patch
obsolete
>+++ This bug was initially created as a clone of Bug #577968 +++ > >Created attachment 403363 >dmesg output for 2.6.34-0.16.rc2.git0.fc14.x86_64 with a fault > >Actual results: > > >Expected results: >dmesg output for 2.6.34-0.16.rc2.git0.fc14.x86_64 with a fault > >Description of problem: > >What looks like a side effect of running /etc/cron.daily a general protection fault: 0000 [#1] SMP was registered triggered by "Pid: 5613, comm: ld-linux-x86-64" and with >RIP: 0010:[<ffffffff81226d77>] [<ffffffff81226d77>] cfq_free_io_context+0x18/0x34 >All that ended with "Fixing recursive fault but reboot is needed!". >Full dmesg output attached > >Version-Release number of selected component (if applicable): >kernel-2.6.34-0.16.rc2.git0.fc14.x86_64 > >How reproducible: >So far this was quite a surprise > >--- Additional comment from jmoyer@redhat.com on 2010-03-30 16:10:42 EDT --- > >/usr/src/debug/kernel-2.6.33/linux-2.6.33.x86_64/block/cfq-iosched.c:2410 >ffffffff81226d6b: 48 8b 5f 78 mov 0x78(%rdi),%rbx <======= >cfq_free_io_context(): >/usr/src/debug/kernel-2.6.33/linux-2.6.33.x86_64/block/cfq-iosched.c:2475 >ffffffff81226d6f: 49 89 fc mov %rdi,%r12 >__call_for_each_cic(): >/usr/src/debug/kernel-2.6.33/linux-2.6.33.x86_64/block/cfq-iosched.c:2410 >ffffffff81226d72: 48 85 db test %rbx,%rbx >ffffffff81226d75: 74 17 je ffffffff81226d8e <cfq_free_io_context+0x2f> >ffffffff81226d77: 48 8b 03 mov (%rbx),%rax <=========== >/usr/src/debug/kernel-2.6.33/linux-2.6.33.x86_64/block/cfq-iosched.c:2411 >ffffffff81226d7a: 48 8d 73 a8 lea -0x58(%rbx),%rsi >ffffffff81226d7e: 4c 89 e7 mov %r12,%rdi >/usr/src/debug/kernel-2.6.33/linux-2.6.33.x86_64/block/cfq-iosched.c:2410 >ffffffff81226d81: 0f 18 08 prefetcht0 (%rax) >/usr/src/debug/kernel-2.6.33/linux-2.6.33.x86_64/block/cfq-iosched.c:2411 >ffffffff81226d84: e8 53 ff ff ff callq ffffffff81226cdc <cic_free_func> > >rbx is ioc->cic_list (as cic_list is at offset 0x78 from the start of struct io_context), which the register dump shows is 0x6b6b6b6b6b6b6b6b. So, the io context has already been freed. > >Kerneloops.org shows that this bug has manifested as early as 2.6.30.8, so I don't think this is a new problem. Though 2.6.34 seems to be triggerring it more easily. > >--- Additional comment from jens.axboe@oracle.com on 2010-03-31 04:20:09 EDT --- > >OK, that's not good. Probably some tricky RCU sneakiness, I'll see if I can reproduce something like that here. > >--- Additional comment from jens.axboe@oracle.com on 2010-03-31 14:35:23 EDT --- > >Any chance someone has a full trace of this? > >--- Additional comment from jmoyer@redhat.com on 2010-03-31 14:41:58 EDT --- > >The full trace was in the attachment: > >general protection fault: 0000 [#1] SMP >last sysfs file: /sys/devices/virtual/block/dm-9/uevent >CPU 0 >Modules linked in: autofs4 w83627hf hwmon_vid sunrpc ipv6 ipt_LOG iptable_nat nf_nat ext2 uinput e100 ppdev edac_core edac_mce_amd microcode snd_mpu401 ns558 i2c_viapro parport_pc parport snd_via82xx gameport snd_ac97_codec ac97_bus snd_seq snd_pcm floppy snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd k8temp soundcore hwmon skge mii shpchp ata_generic pata_acpi firewire_ohci pata_via firewire_core crc_itu_t sata_via sata_promise radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: freq_table] > >Pid: 5613, comm: ld-linux-x86-64 Not tainted 2.6.34-0.16.rc2.git0.fc14.x86_64 #1 SK8V/To Be Filled By O.E.M. >RIP: 0010:[<ffffffff81226d77>] [<ffffffff81226d77>] cfq_free_io_context+0x18/0x34 >RSP: 0018:ffff880016713e28 EFLAGS: 00010202 >RAX: 00000001002c0f33 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 >RDX: 00000001002c0f36 RSI: ffff880002dd0bc0 RDI: 0000000000000296 >RBP: ffff880016713e38 R08: ffff880015374680 R09: ffff880016bac4e0 >R10: 0000000000000046 R11: ffff880016713d48 R12: ffff880015374680 >R13: ffff88001665cfc8 R14: 0000000000000000 R15: ffff88001f5a8000 >FS: 00007f2122d0e7a0(0000) GS:ffff880002c00000(0000) knlGS:0000000000000000 >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >CR2: 00007feac7253000 CR3: 000000001b77c000 CR4: 00000000000006f0 >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >Process ld-linux-x86-64 (pid: 5613, threadinfo ffff880016712000, task ffff88001665c900) >Stack: > ffff880015374680 ffff88001665c900 ffff880016713e68 ffffffff8121f358 ><0> ffffffff8121f323 ffff88001665c900 ffff88001665cfc8 ffff880015374680 ><0> ffff880016713ea8 ffffffff8121f464 ffffffff8121f40f ffffffff8107dee4 >Call Trace: > [<ffffffff8121f358>] put_io_context+0x69/0x9c > [<ffffffff8121f323>] ? put_io_context+0x34/0x9c > [<ffffffff8121f464>] exit_io_context+0xa0/0xac > [<ffffffff8121f40f>] ? exit_io_context+0x4b/0xac > [<ffffffff8107dee4>] ? trace_hardirqs_on+0xd/0xf > [<ffffffff81054775>] do_exit+0x79d/0x7d1 > [<ffffffff81129fb7>] ? path_put+0x22/0x27 > [<ffffffff8105482d>] do_group_exit+0x84/0xb0 > [<ffffffff81054870>] sys_exit_group+0x17/0x1b > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b >Code: c6 93 6d 22 81 e8 3d a3 e8 ff 5b 41 5c 41 5d 41 5e c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00 00 48 8b 5f 78 49 89 fc 48 85 db 74 17 <48> 8b 03 48 8d 73 a8 4c 89 e7 0f 18 08 e8 53 ff ff ff 48 8b 1b >RIP [<ffffffff81226d77>] cfq_free_io_context+0x18/0x34 > RSP <ffff880016713e28> >---[ end trace 1a4cf5320fdfc46c ]--- >Fixing recursive fault but reboot is needed! > >Is that what you're looking for? > >--- Additional comment from jens.axboe@oracle.com on 2010-03-31 14:56:24 EDT --- > >Thanks Jeff, dunno how I missed that there was an attachment. > >So next question, seems this was triggered by 'readahead'. That's the RH readahead thing, correct? Does it use io priorities? Or CLONE_IO? > >--- Additional comment from jmoyer@redhat.com on 2010-03-31 15:18:19 EDT --- > >I'm not convinced readahead is the problem. I think the reporter just said that as that was the last thing in the log before the GPF. The task associated with the failure is ld-linux-x86-64, which may be related to the prelink cron job. prelink cron job does this: > >renice +19 -p $$ >/dev/null 2>&1 > >but I don't see any changing of io priorities. The prelink binary doesn't seem to use CLONE_IO. Not sure I want to dig around in ld-linux-x86_64. ;-) > >--- Additional comment from michal@harddata.com on 2010-03-31 15:24:59 EDT --- > >(In reply to comment #5) > >> So next question, seems this was triggered by 'readahead'. > >By timiing this happened while running 'cron.daily' and apparently in a >scheduler and dmesg output suggests that 'readahead' may be involved so the >above looked like a reasonable "working hypothesis" but I do not know if this is >really proven. > >> That's the RH readahead thing, correct? > >This is from a "rawhide" installation for Fedora. > >--- Additional comment from jens.axboe@oracle.com on 2010-03-31 15:39:26 EDT --- > >Jeff, I would not want you to dive into ld-linux-x86_64 either :-). > >So what else is being run by cron.daily on that system? > >Oh, and a kernel config from this kernel would help, just in case it depends on some special RCU setting (or something like that). > >I'll see if I can reproduce locally, from a look over the ioc exit/free path I didn't spot anything. I'll let something run over the next few days once I have an idea what that something should be. thu/fri/mon are holidays here, so don't expect anything before next week. > >--- Additional comment from jmoyer@redhat.com on 2010-03-31 16:17:09 EDT --- > >Created attachment 403822 >kernel config for x86_64 > >My system (F12) shows the following in cron.daily: > cups logrotate mlocate.cron readahead.cron makewhatis.cron prelink tmpwatch > >--- Additional comment from jmoyer@redhat.com on 2010-03-31 16:18:16 EDT --- > >Thanks for looking into this, Jens! I'll try to spend some time on it as well when I next come up for air. > >--- Additional comment from jmoyer@redhat.com on 2010-03-31 17:15:48 EDT --- > >FYI, you can also look here for backtrace information: > >http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&start=2228224&end=2260991&class=oops > >--- Additional comment from michal@harddata.com on 2010-04-11 16:29:37 EDT --- > >(In reply to comment #9) > >> My system (F12) shows the following in cron.daily: >> cups logrotate mlocate.cron readahead.cron makewhatis.cron prelink tmpwatch > >Just got that again with 2.6.34-0.28.rc3.git3.fc14.x86_64 kernel. "Call Trace" and "Code" do not really differ from those in attachment (id=403363) but this time one can find "Process gzip (pid: 5083, threadinfo ffff880013416000, task ffff880007f08000)" instead of ld-linux-x86-64 from the previous instance. > >Timestamps in logs locate that oops squarely between "starting makewhatis.cron" >and "finished makewhatis.cron" with ample margins on both sides. > >--- Additional comment from michal@harddata.com on 2010-04-19 11:40:25 EDT --- > >I have the next one with 2.6.34-0.28.rc3.git3.fc14.x86_64. This time makewhatis.cron is gone but timing implicates man-db.cron and a triggering process is gzip again. > >--- Additional comment from michal@harddata.com on 2010-04-28 19:43:32 EDT --- > >man-db.cron seems to be rather good in triggering this bug. Recently I see these recursive faults quite regularly and always when man-db.cron script is triggered by cron. The current ones with kernel-2.6.34-0.38.rc5.git0.fc14.x86_64. > >--- Additional comment from michal@harddata.com on 2010-07-14 12:52:27 EDT --- > >I was away for quite a while and not running rawhide. Upon return I got pretty quickly something familiar but this time from 2.6.35-... kernel. Here we go: > >general protection fault: 0000 [#1] SMP >last sysfs file: /sys/devices/pci0000:00/0000:00:0f.0/host0/target0:0:0/0:0:0:0/block/sda/sda11/uevent >CPU 0 >Modules linked in: autofs4 w83627hf hwmon_vid sunrpc ipv6 ipt_LOG iptable_nat nf_nat ext2 uinput snd_via82xx microcode snd_ac97_codec ac97_bus snd_seq e100 snd_pcm snd_timer snd_page_alloc ppdev edac_core edac_mce_amd snd_mpu401 k8temp snd_mpu401_uart snd_rawmidi parport_pc snd_seq_device parport i2c_viapro snd soundcore ns558 gameport skge mii shpchp ata_generic pata_acpi firewire_ohci firewire_core sata_promise pata_via sata_via crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: mperf] > >Pid: 28772, comm: gzip Not tainted 2.6.35-0.31.rc4.git4.fc14.x86_64 #1 SK8V/To Be Filled By O.E.M. >RIP: 0010:[<ffffffff81228a7c>] [<ffffffff81228a7c>] __call_for_each_cic+0x21/0x3f >RSP: 0018:ffff880011537e28 EFLAGS: 00010202 >RAX: 00000001001fd43f RBX: 6b6b6b6b6b6b6b6b RCX: ffff88001bbdef48 >RDX: 00000001001fd442 RSI: ffff880002dd0d70 RDI: 0000000000000286 >RBP: ffff880011537e48 R08: 0000000000000286 R09: ffff880011537d98 >R10: ffff88001b0d1028 R11: ffff880011537df8 R12: ffff88001b0d1000 >R13: ffffffff81228ac8 R14: 0000000000000001 R15: ffff88001f5c0000 >FS: 00007f5d74174720(0000) GS:ffff880002c00000(0000) knlGS:00000000f7782720 >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >CR2: 00007f5d73a88470 CR3: 0000000001a43000 CR4: 00000000000006f0 >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >Process gzip (pid: 28772, threadinfo ffff880011536000, task ffff88001b6f0000) >Stack: > 0000000000000001 ffff88001b0d1000 ffff88001b6f0000 ffff88001b6f0690 ><0> ffff880011537e58 ffffffff81228aaf ffff880011537e78 ffffffff8121fc25 ><0> ffff880011537e78 ffff88001b0d1000 ffff880011537ea8 ffffffff8121fce5 >Call Trace: > [<ffffffff81228aaf>] cfq_free_io_context+0x15/0x17 > [<ffffffff8121fc25>] put_io_context+0x41/0x5e > [<ffffffff8121fce5>] exit_io_context+0x6c/0x74 > [<ffffffff8105416a>] do_exit+0x7a9/0x7d0 > [<ffffffff81496aae>] ? lockdep_sys_exit_thunk+0x35/0x67 > [<ffffffff81054427>] do_group_exit+0x88/0xb6 > [<ffffffff8105446c>] sys_exit_group+0x17/0x1b > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b >Code: ed 00 00 41 59 48 98 5b c9 c3 55 48 89 e5 41 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 48 8b 5f 78 49 89 fc 49 89 f5 48 85 db 74 15 <48> 8b 03 48 8d 73 b0 4c 89 e7 0f 18 08 41 ff d5 48 8b 1b eb e6 >RIP [<ffffffff81228a7c>] __call_for_each_cic+0x21/0x3f > RSP <ffff880011537e28> >---[ end trace 8e807d826e855c8f ]--- >Fixing recursive fault but reboot is needed! > >If this is really a new bug and/or full dmesg output is desired then please let me know. It seems to be triggered by the same man-db.cron though. > >--- Additional comment from triage@lists.fedoraproject.org on 2010-07-30 07:13:05 EDT --- > > >This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. >Changing version to '14'. > >More information and reason for this action is here: >http://fedoraproject.org/wiki/BugZappers/HouseKeeping > >--- Additional comment from michal@harddata.com on 2010-08-05 16:05:18 EDT --- > >(In reply to comment #16) > >> Changing version to '14'. > >Should this be left as '14'? I was just hit but one of those on >a rawhide kernel 2.6.35-0.57.rc6.git5.fc15.x86_64. That trace is basically the same as the one from comment 15 for 2.6.35-0.31.rc4.git4.fc14.x86_64. > >--- Additional comment from cebbert@redhat.com on 2010-08-06 16:15:35 EDT --- > >*** Bug 585659 has been marked as a duplicate of this bug. *** > >--- Additional comment from cebbert@redhat.com on 2010-08-11 08:09:36 EDT --- > >block/cfq-iosched.c:2492: > >static void >__call_for_each_cic(struct io_context *ioc, > void (*func)(struct io_context *, struct cfq_io_context *)) >{ > struct cfq_io_context *cic; > struct hlist_node *n; > > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) > func(ioc, cic); >} > >#define hlist_for_each_entry_rcu(tpos, pos, head, member) \ > for (pos = rcu_dereference_raw((head)->first); \ > pos && ({ prefetch(pos->next); 1; }) && \ > ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \ > pos = rcu_dereference_raw(pos->next)) > >We've made at least one trip through the loop (i.e. head->first was valid) and now we've hit a case where ->next points to freed memory. > >--- Additional comment from cebbert@redhat.com on 2010-08-18 08:22:30 EDT --- > >*** Bug 624979 has been marked as a duplicate of this bug. *** > >--- Additional comment from cebbert@redhat.com on 2010-08-18 08:25:01 EDT --- > >Attachment 439347 has a trace from kernel 2.6.35.2-9 > >--- Additional comment from michal@harddata.com on 2010-09-07 15:02:57 EDT --- > >Created attachment 445756 >backtrace for 2.6.36-0.16.rc3.git0.fc15.x86_64 kernel > >Just to record that this general protection fault shows up in 2.6.36... kernels as well. > >--- Additional comment from michal@harddata.com on 2010-09-15 16:33:47 EDT --- > >So far 2.6.36-0.16.rc3.git0.fc15.x86_64 was the last one with which I have seen it. I cannot vouch that this oops will not return but this was a pretty regular event up to that time. Maybe it is really gone? At least on my box. > >--- Additional comment from jmoyer@redhat.com on 2010-09-15 16:49:41 EDT --- > >I haven't seen anything go by that should fix this, so I'd be surprised. Thanks for the update, though! > >--- Additional comment from yaneti@declera.com on 2010-09-16 08:41:55 EDT --- > >Created attachment 447730 >crash picture > >I've just had a very similar crash with kernel-2.6.36-0.21.rc4.git1.fc15.x86_64 >Only managed to capture the attached picture. >This is the first time ever that I've seen this. >Maybe there was some memory pressure at that point. Evolution grows a lot. > >Please ignore the taint, its something in networking that has been going on and of for a number of kernel releases without any effect on the system as far as I can tell. > >--- Additional comment from jmoyer@redhat.com on 2010-09-17 16:34:58 EDT --- > >Staring at the code isn't bringing me any close to enlightenment. Is there any way someone who can reproduce this could catch a crashdump for me? > >--- Additional comment from michal@harddata.com on 2010-09-17 18:07:23 EDT --- > >(In reply to comment #26) >> Is there any >> way someone who can reproduce this could catch a crashdump for me? > >Anywhere some detailed description explaining how to get what you would like to see? > >I have also this problem that, as noted in comment 23, I stopped seeing that while earlier running makewhatis.cron was a really good way to trigger such crash. OTOH what Chuck Ebbert wrote in comment 19 looked really convincing. >Only a good question how to test that pos->next points to something already freed. > >--- Additional comment from jmoyer@redhat.com on 2010-09-20 07:51:07 EDT --- > >You can install the 'kexec-tools' package. Modify /etc/grub.conf, appending a "crashkernel=256M" to the kernel command line. Then, chkconfig kdump on and reboot. When your system comes back up, it should build an initrd for kdump. You can check to see if kdump loaded properly with 'service kdump status'. That should report that kdump is operational. > >When the kernel next panics, your system will boot into the kdump kernel and store the contents of memory into a directory under /var/crash/. I'd need the vmcore file that gets saved, and the exact kernel version you are running to then debug the problem further. > >Chuck's comment means that there were at least two cic's active, which means that you had at least 2 processes doing I/O. Also, that list poison means exactly that there is a race between two list removal code paths. I've been trying to reproduce this by running the cron.daily jobs every minute, and then every hour (two separate tests). I also tried to keep the system busy with I/O from multiple processes over the weekend and was not able to reproduce the crash. > >--- Additional comment from michal@harddata.com on 2010-09-20 18:18:30 EDT --- > >(In reply to comment #28) >> When your system comes back up, it should build an initrd for kdump. > >I wish for that to be this simple. Unfortunately mkdumprd is utterly broken >and it gets itself into such infinite loop: >.... >+ cd .. >+ '[' '!' -f modalias ']' >+ '[' / = /sys/devices ']' >+ cd .. >+ '[' '!' -f modalias ']' >+ '[' / = /sys/devices ']' >+ cd .. >.... >It gets also stuck without -d option. Sigh! > >> When the kernel next panics, > >That does not appear to be as easy as in the past where that was a quite regular event. > >> I've been >> trying to reproduce this by running the cron.daily jobs every minute, and then >> every hour (two separate tests). > >There is no makewhatis.cron anymore which seemed particularly good in triggering that crash; mandb does not look like a suitable substitute. > >Anyway, I reinstalled and rebooted into kernel-2.6.36-0.16.rc3.git0.fc15.x86_64, as it was crashing like that, but so far no luck. > >I will try to see if I will get that crash again but, like I said, the last time it happened on September 7th on 11:12 and 11:14. Both times "comm" was dbus-uuidgen. I have the following in logs: > >Sep 7 11:11:54 dyna0 kernel: RPC: Registered udp transport module. >Sep 7 11:11:54 dyna0 kernel: RPC: Registered tcp transport module. >Sep 7 11:11:54 dyna0 kernel: RPC: Registered tcp NFSv4.1 backchannel transport >module. >Sep 7 11:11:56 dyna0 kernel: general protection fault: 0000 [#1] SMP >Sep 7 11:11:56 dyna0 kernel: last sysfs file: /sys/module/mperf/refcnt > >and later > >Sep 7 11:12:32 dyna0 kernel: readahead-collector: sorting >Sep 7 11:12:33 dyna0 kernel: readahead-collector: finished >Sep 7 11:14:13 dyna0 abrt: Kerneloops: Reported 1 kernel oopses to Abrt > >and that created another directory in /var/spool/abrt/ but no other details in logs. > >It appears that user-space changes will make tracing that much harder (or maybe even hide it completely). > >--- Additional comment from michal@harddata.com on 2010-09-20 19:48:20 EDT --- > >(In reply to comment #29) >> (In reply to comment #28) >> Unfortunately mkdumprd is utterly broken > >See bug 635893. > >--- Additional comment from michal@harddata.com on 2010-09-21 16:56:54 EDT --- > >(In reply to comment #28) > >> You can check to see if kdump loaded properly with 'service kdump status'. >> That should report that kdump is operational. > >There is a problem. 'service kdump status' reports "Kdump is operational" and I have reproduced this crash using 2.6.36-0.16.rc3.git0.fc15.x86_64 kernel. Moreover in cmdline I have 'crashkernel=256M' and also /boot/initrd-2.6.36-0.16.rc3.git0.fc15.x86_64kdump.img as created yesterday after fixing mkdumprd to actually run. > >> When the kernel next panics, your system will boot into the kdump kernel and >> store the contents of memory into a directory under /var/crash/. > >dmesg ends up with: > >RIP [<ffffffff81232c36>] __call_for_each_cic+0x21/0x3f > RSP <ffff880017899e28> >---[ end trace 438ad5fa6ce9222e ]--- >Fixing recursive fault but reboot is needed! > >but /var/crash/ is empty and no vmcore anywhere in sight. Now what? >Should initrd-2.6.36-0.16.rc3.git0.fc15.x86_64kdump.img be used instead of a corresponding initramfs while booting? > >--- Additional comment from jmoyer@redhat.com on 2010-09-21 17:12:57 EDT --- > >Sorry, I forgot that fedora doesn't set panic_on_oops by default. Add the following line to /etc/sysctl.conf: > >kernel.panic_on_oops = 1 > >Then, next time you boot, that will be set. You can set it at runtime via: > >sysctl -w kernel.panic_on_oops=1 > >I'm pretty sure that will work for a GPF. Again, sorry about this. > >--- Additional comment from michal@harddata.com on 2010-09-21 18:11:26 EDT --- > >(In reply to comment #32) >> Add the following line to /etc/sysctl.conf: >> >> kernel.panic_on_oops = 1 > >OK. Will do. Now I will have to wait for the next crash. Unfortunately >it does not happen on demand, > >I am using 2.6.36-0.16.rc3.git0.fc15.x86_64 because I know that at least with this one this was happening. > >With the last crash a trigger program was reported as "Pid: 20242, comm: gzip" >and by timestamps this happened by the end of man-db.cron run performed by cron.daily. Running that "by hand" so far does not seem to have the same effects. > >--- Additional comment from michal@harddata.com on 2010-09-24 21:34:33 EDT --- > >Created attachment 449551 >dmesg from _2.6.36-0.16.rc3.git0.fc15.x86_64 with crash at _call_for_each_cic+0x21/0x3f > >It looks like I got it this time. By timing it appears that this happened in the middle of a series of updates as perfromed by yum and messing up /var/lib/rpm/ as a side effect. I am not exactly sure what happened as the whole machine just died cutting off a local console and a network as well and also killing X server for a good measure. abrt generated backtrace claims: > >last sysfs file: /sys/devices/virtual/block/dm-9/uevent >Pid: 7124, comm: hardlink > >if that can be relied upon under circumstances. > >A produced vmcore is 282M. Looks like something not exactly of a "bugzilla size". > >--- Additional comment from michal@harddata.com on 2010-09-25 17:24:58 EDT --- > >Scratch comment 23. I got hit by this bug while running 2.6.36-0.26.rc5.git4.fc15.x86_64. I do not have anything in /var/crash from that as in the meantime 'kexec-tools' update broke /sbin/mkdumprd again making it impossible to boot with 'kdump' service active. GPF happened when I was in the process of tracing and fixing mkdumprd. > >Hopefuly data from 2.6.36-0.16.rc3.git0.fc15.x86_64 will be good enough. I have no reliable method of reproduction. > >--- Additional comment from michal@harddata.com on 2010-09-25 17:35:09 EDT --- > >Added to comment 35. > >Timestamps indicate that crash happened again while running man-db.cron from cron.daily and in 'backtrace' I have 'Pid: 10062, comm: gzip'. OTOH in /var/log/messages I can find this: > >abrtd: New crash /var/spool/abrt/kerneloops-1285446849-2141-1, processing >abrtd: Registered Action plugin 'RunApp' >abrtd: RunApp('/var/spool/abrt/kerneloops-1285446849-2141-1','test x"`cat component`" = x"xorg-x11-server-Xorg" && cp /var/log/Xorg.0.log > >and X server is indeed gone. Not sure how to interpret that. > >--- Additional comment from pmatilai@redhat.com on 2010-09-30 02:51:18 EDT --- > >I just got this too on kernel 2.6.35.4-28.fc14.x86_64, and based on log timestamps it occurred while running man-db.cron here too. > >general protection fault: 0000 [#1] SMP >last sysfs file: /sys/devices/virtual/block/dm-1/dm/name >CPU 0 >Modules linked in: fuse rfcomm sco bnep l2cap nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc cpufreq_ondemand powernow_k8 freq_table mperf nf_conntrack_netbios_ns ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput snd_via82xx gameport snd_via82xx_modem snd_ac97_codec ac97_bus snd_seq snd_pcm ppdev parport_pc snd_timer shpchp snd_mpu401_uart snd_rawmidi snd_seq_device btusb bluetooth amd64_edac_mod parport rfkill snd microcode soundcore skge edac_core i2c_viapro snd_page_alloc k8temp edac_mce_amd ata_generic pata_acpi firewire_ohci pata_via firewire_core sata_via sata_promise crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] > >Pid: 29167, comm: gzip Not tainted 2.6.35.4-28.fc14.x86_64 #1 A8V Deluxe/To Be Filled By O.E.M. >RIP: 0010:[<ffffffff8122d6b8>] [<ffffffff8122d6b8>] __call_for_each_cic+0x21/0x3f >RSP: 0018:ffff88000c017e28 EFLAGS: 00010202 >RAX: 00000001014dbb1d RBX: 6b6b6b6b6b6b6b6b RCX: ffff880035acea98 >RDX: 00000001014dbb20 RSI: ffff8800047d0da0 RDI: 0000000000000286 >RBP: ffff88000c017e48 R08: 0000000000000286 R09: ffff88000c017d98 >R10: ffff88003a563df8 R11: ffff88000c017df8 R12: ffff88003a563dd0 >R13: ffffffff8122d704 R14: 0000000000000001 R15: ffff88003e680000 >FS: 00007f0c4abb5720(0000) GS:ffff880004600000(0000) knlGS:00000000f77cc850 >CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >CR2: 00007f0c4a4be770 CR3: 0000000001a43000 CR4: 00000000000006f0 >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >Process gzip (pid: 29167, threadinfo ffff88000c016000, task ffff88001d3c8000) > 0000000000000001 ffff88003a563dd0 ffff88001d3c8000 ffff88001d3c8688 ><0> ffff88000c017e58 ffffffff8122d6eb ffff88000c017e78 ffffffff81224861 ><0> ffff88000c017e78 ffff88003a563dd0 ffff88000c017ea8 ffffffff81224921 >Call Trace: > [<ffffffff8122d6eb>] cfq_free_io_context+0x15/0x17 > [<ffffffff81224861>] put_io_context+0x41/0x5e > [<ffffffff81224921>] exit_io_context+0x6c/0x74 > [<ffffffff810551b7>] do_exit+0x7a9/0x7d0 > [<ffffffff8149d437>] ? lockdep_sys_exit_thunk+0x35/0x67 > [<ffffffff81055474>] do_group_exit+0x88/0xb6 > [<ffffffff810554b9>] sys_exit_group+0x17/0x1b > [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b >Code: ed 00 00 41 59 48 98 5b c9 c3 55 48 89 e5 41 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 48 8b 5f 78 49 89 fc 49 89 f5 48 85 db 74 15 <48> 8b 03 48 8d 73 b0 4c 89 e7 0f 18 08 41 ff d5 48 8b 1b eb e6 >RIP [<ffffffff8122d6b8>] __call_for_each_cic+0x21/0x3f > RSP <ffff88000c017e28> >---[ end trace 48092b8495a64693 ]--- >Fixing recursive fault but reboot is needed! > >--- Additional comment from cebbert@redhat.com on 2010-09-30 17:35:35 EDT --- > >*** Bug 637068 has been marked as a duplicate of this bug. *** > >--- Additional comment from cebbert@redhat.com on 2010-09-30 17:36:25 EDT --- > >*** Bug 638677 has been marked as a duplicate of this bug. *** > >--- Additional comment from michal@harddata.com on 2010-09-30 21:51:47 EDT --- > >Anybody interested in vmcore produced (see comment 34 and also comment 28 for a procedure description)? Please contact me directly if yes. It took a considerable effort to produce that one. Due to a size of this file it is for me not a good candidate for a bugzilla attachment. A corresponding dmesg output is in attachment 449551. > >--- Additional comment from jmoyer@redhat.com on 2010-10-04 09:26:25 EDT --- > >I sent email privately; I am definitely interested in looking at that crash. > >--- Additional comment from jreiser@bitwagon.com on 2010-11-04 20:26:21 EDT --- > >*** Bug 520620 has been marked as a duplicate of this bug. *** > >--- Additional comment from jfrieben@hotmail.com on 2011-02-12 10:30:22 EST --- > >I do not see this issue for kernel-2.6.35.11-83.fc14.x86_64 anymore. This bug should be safe to close now. > >--- Additional comment from jmoyer@redhat.com on 2011-02-14 09:34:28 EST --- > >(In reply to comment #43) >> I do not see this issue for kernel-2.6.35.11-83.fc14.x86_64 anymore. This bug >> should be safe to close now. > >I'm glad it's not crashing for you anymore. I got a similar report from Michal. I would like to keep this bug open, though, as I believe there is still a race in the code, it's just not tickled for you anymore. > >--- Additional comment from pebolle@tiscali.nl on 2011-04-08 15:39:17 EDT --- > >0) Still occurs on vanilla v2.6.38.2 (otherwise standard Rawhide). Not a Fedora blessed kernel, but should be enough to convince people this bug can still be triggered. > >1) From /var/log/messages: > >Apr 6 11:46:40 t41 kernel: [ 573.045271] BUG: unable to handle kernel paging request at 6b6b6b6b >Apr 6 11:46:40 t41 kernel: [ 573.045411] IP: [<c05e10ae>] __call_for_each_cic+0x24/0x3a >Apr 6 11:46:40 t41 kernel: [ 573.045522] *pde = 00000000 >Apr 6 11:46:40 t41 kernel: [ 573.045581] Oops: 0000 [#1] SMP >Apr 6 11:46:40 t41 kernel: [ 573.045650] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.1/host1/target1:0:0/1:0:0:0/block/sr0/uevent >Apr 6 11:46:40 t41 kernel: [ 573.045805] Modules linked in: sco bnep l2cap bluetooth fuse cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_netbios_ns snd_intel8x0m arc4 snd_intel8x0 ath5k snd_ac97_codec ath ac97_bus mac80211 snd_seq snd_seq_device snd_pcm ppdev cfg80211 thinkpad_acpi microcode snd_timer parport_pc rfkill e1000 iTCO_wdt snd_page_alloc joydev pcspkr i2c_i801 iTCO_vendor_support snd soundcore parport uinput ipv6 yenta_socket video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] >Apr 6 11:46:40 t41 kernel: [ 573.046032] >Apr 6 11:46:40 t41 kernel: [ 573.046032] Pid: 2111, comm: cdrom_id Not tainted 2.6.38.2-0.local0.fc16.i686 #1 IBM / >Apr 6 11:46:40 t41 kernel: [ 573.046032] EIP: 0060:[<c05e10ae>] EFLAGS: 00010202 CPU: 0 >Apr 6 11:46:40 t41 kernel: [ 573.046032] EIP is at __call_for_each_cic+0x24/0x3a >Apr 6 11:46:40 t41 kernel: [ 573.046032] EAX: 00000001 EBX: 6b6b6b6b ECX: 00000246 EDX: c0a9a6b0 >Apr 6 11:46:40 t41 kernel: [ 573.046032] ESI: edd3a180 EDI: c05e13c1 EBP: ed793f18 ESP: ed793f0c >Apr 6 11:46:40 t41 kernel: [ 573.046032] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >Apr 6 11:46:40 t41 kernel: [ 573.046032] Process cdrom_id (pid: 2111, ti=ed792000 task=edf0bf90 task.ti=ed792000) >Apr 6 11:46:40 t41 kernel: [ 573.046032] Stack: >Apr 6 11:46:40 t41 kernel: [ 573.046032] edd3a180 edf0bf90 edf0c368 ed793f20 c05e10d6 ed793f2c c05d691a edd3a180 >Apr 6 11:46:40 t41 kernel: [ 573.046032] ed793f40 c05d69aa edf0bf90 00000012 ed793f74 ed793f8c c04405d6 00000085 >Apr 6 11:46:40 t41 kernel: [ 573.046032] 00000000 00000000 edf0bf90 ed799e28 edf0c19c edf0bf88 edf0c19c 00000001 >Apr 6 11:46:40 t41 kernel: [ 573.046032] Call Trace: >Apr 6 11:46:40 t41 kernel: [ 573.046032] [<c05e10d6>] cfq_free_io_context+0x12/0x14 >Apr 6 11:46:40 t41 kernel: [ 573.046032] [<c05d691a>] put_io_context+0x34/0x5c >Apr 6 11:46:40 t41 kernel: [ 573.046032] [<c05d69aa>] exit_io_context+0x68/0x6d >Apr 6 11:46:40 t41 kernel: [ 573.046032] [<c04405d6>] do_exit+0x609/0x62c >Apr 6 11:46:40 t41 kernel: [ 573.046032] [<c0440810>] do_group_exit+0x63/0x86 >Apr 6 11:46:40 t41 kernel: [ 573.046032] [<c044084b>] __wake_up_parent+0x0/0x2b >Apr 6 11:46:40 t41 kernel: [ 573.046032] [<c04030df>] sysenter_do_call+0x12/0x38 >Apr 6 11:46:40 t41 kernel: [ 573.046032] Code: e8 d8 41 e8 ff 5d c3 55 89 e5 57 56 53 3e 8d 74 26 00 8b 58 4c 89 c6 89 d7 e8 bf 22 e7 ff 85 c0 74 05 e8 bc ff ff ff 85 db 74 11 <8b> 03 0f 18 00 90 8d 53 d8 89 f0 ff d7 8b 1b eb dd 5b 5e 5f 5d >Apr 6 11:46:40 t41 kernel: [ 573.046032] EIP: [<c05e10ae>] __call_for_each_cic+0x24/0x3a SS:ESP 0068:ed793f0c >Apr 6 11:46:40 t41 kernel: [ 573.046032] CR2: 000000006b6b6b6b >Apr 6 11:46:40 t41 kernel: [ 573.074456] ---[ end trace 7ef6dd0aa5934c5e ]--- >Apr 6 11:46:40 t41 kernel: [ 573.074468] Fixing recursive fault but reboot is needed! > >--- Additional comment from pebolle@tiscali.nl on 2011-04-08 16:00:22 EDT --- > >0) And this one is from vanilla v2.6.39-rc2. Again, not Fedora blessed, but it might still be interesting: > >Apr 8 09:25:34 t41 kernel: [ 1917.490095] BUG: unable to handle kernel paging request at 6b6b6b6b >Apr 8 09:25:34 t41 kernel: [ 1917.490781] IP: [<c05e84c6>] __call_for_each_cic+0x24/0x3a >Apr 8 09:25:34 t41 kernel: [ 1917.491020] *pde = 00000000 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] Oops: 0000 [#1] SMP >Apr 8 09:25:34 t41 kernel: [ 1917.491020] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.1/host0/target0:0:0/0:0:0:0/block/sda/uevent >Apr 8 09:25:34 t41 kernel: [ 1917.491020] Modules linked in: bnep bluetooth fuse cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter nf_conntrack_netbios_ns nf_conntrack_broadcast ip6_tables snd_intel8x0 snd_intel8x0m arc4 snd_ac97_codec ac97_bus ath5k snd_seq ath mac80211 snd_seq_device ppdev snd_pcm microcode thinkpad_acpi parport_pc cfg80211 parport e1000 snd_timer rfkill snd iTCO_wdt joydev iTCO_vendor_support i2c_i801 soundcore pcspkr snd_page_alloc uinput ipv6 yenta_socket video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] >Apr 8 09:25:34 t41 kernel: [ 1917.491020] >Apr 8 09:25:34 t41 kernel: [ 1917.491020] Pid: 15233, comm: mandb Not tainted 2.6.39-0.rc2.git0.0.local0.fc16.i686 #1 IBM / >Apr 8 09:25:34 t41 kernel: [ 1917.491020] EIP: 0060:[<c05e84c6>] EFLAGS: 00010202 CPU: 0 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] EIP is at __call_for_each_cic+0x24/0x3a >Apr 8 09:25:34 t41 kernel: [ 1917.491020] EAX: 00000001 EBX: 6b6b6b6b ECX: 00000246 EDX: c0acaa98 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] ESI: eae6fc80 EDI: c05e87d9 EBP: e546bf18 ESP: e546bf0c >Apr 8 09:25:34 t41 kernel: [ 1917.491020] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] Process mandb (pid: 15233, ti=e546a000 task=e5ffe9f0 task.ti=e546a000) >Apr 8 09:25:34 t41 kernel: [ 1917.491020] Stack: >Apr 8 09:25:34 t41 kernel: [ 1917.491020] eae6fc80 e5ffe9f0 e5ffedc0 e546bf20 c05e84ee e546bf2c c05ddbae eae6fc80 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] e546bf40 c05ddc3e e5ffe9f0 00000012 e546bf74 e546bf8c c0440821 eaa97a28 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] e5fb8e08 e546bf70 c04f49ea 00000001 e5ffebf4 e5ffe9e8 e5ffebf4 00000001 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] Call Trace: >Apr 8 09:25:34 t41 kernel: [ 1917.491020] [<c05e84ee>] cfq_free_io_context+0x12/0x14 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] [<c05ddbae>] put_io_context+0x34/0x5c >Apr 8 09:25:34 t41 kernel: [ 1917.491020] [<c05ddc3e>] exit_io_context+0x68/0x6d >Apr 8 09:25:34 t41 kernel: [ 1917.491020] [<c0440821>] do_exit+0x634/0x657 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] [<c04f49ea>] ? fsnotify_modify+0x64/0x6c >Apr 8 09:25:34 t41 kernel: [ 1917.491020] [<c0440a5b>] do_group_exit+0x63/0x86 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] [<c0440a96>] sys_exit_group+0x18/0x18 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] [<c082045f>] sysenter_do_call+0x12/0x38 >Apr 8 09:25:34 t41 kernel: [ 1917.491020] Code: e8 5c db e7 ff 5d c3 55 89 e5 57 56 53 3e 8d 74 26 00 8b 58 4c 89 c6 89 d7 e8 df b1 e6 ff 85 c0 74 05 e8 bc ff ff ff 85 db 74 11 <8b> 03 0f 18 00 90 8d 53 d8 89 f0 ff d7 8b 1b eb dd 5b 5e 5f 5d >Apr 8 09:25:34 t41 kernel: [ 1917.491020] EIP: [<c05e84c6>] __call_for_each_cic+0x24/0x3a SS:ESP 0068:e546bf0c >Apr 8 09:25:34 t41 kernel: [ 1917.491020] CR2: 000000006b6b6b6b >Apr 8 09:25:34 t41 kernel: [ 1917.575977] ---[ end trace 6895ae03d9c0f56d ]--- >Apr 8 09:25:34 t41 kernel: [ 1917.575982] Fixing recursive fault but reboot is needed! > >1) A quick look at the output of "objdump -l -d block/cfq-iosched.o" suggests this fault is triggered by "prefetch(pos->next)" in the hlist_for_each_entry_rcu macro in include/linux/rculist.h > >[...]block/cfq-iosched.c:2594 > 1840: e8 fc ff ff ff call 1841 <__call_for_each_cic+0x13> > 1845: 85 c0 test %eax,%eax > 1847: 74 05 je 184e <__call_for_each_cic+0x20> > 1849: e8 bc ff ff ff call 180a <rcu_read_lock_held> > 184e: 85 db test %ebx,%ebx > 1850: 74 11 je 1863 <__call_for_each_cic+0x35> >prefetch(): >[...]arch/x86/include/asm/processor.h:833 > 1852: 8b 03 mov (%ebx),%eax > 1854: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi > >(This was already hinted at in an earlier comment.) > >--- Additional comment from cebbert@redhat.com on 2011-04-13 21:44:59 EDT --- > >We could "fix" this by making deadline the default IO scheduler :/ I switched to that on all my systems long ago... > >--- Additional comment from jmoyer@redhat.com on 2011-04-14 09:22:34 EDT --- > >(In reply to comment #47) >> We could "fix" this by making deadline the default IO scheduler :/ I switched >> to that on all my systems long ago... > >I believe that's an entirely different discussion, Chuck. Deadline has its own set of issues when run on a S-ATA rotational disk (which, I might add, are very common). I understand the frustration surrounding this issue, and apologize that it hasn't been addressed in a timely fashion. > >--- Additional comment from cebbert@redhat.com on 2011-04-17 23:02:07 EDT --- > >(In reply to comment #48) >> (In reply to comment #47) >> > We could "fix" this by making deadline the default IO scheduler :/ I switched >> > to that on all my systems long ago... >> >> I believe that's an entirely different discussion, Chuck. Deadline has its own >> set of issues when run on a S-ATA rotational disk (which, I might add, are very >> common). > >What issues? I'm certainly not getting any oopses by running with deadline. > >--- Additional comment from jmoyer@redhat.com on 2011-04-18 09:12:54 EDT --- > >Background writeout can completely starve readers with deadline on SATA disks. For example, if you start a dd process writing to a file, then startup firefox or some other application that is not in cache, firefox may not start until the dd has completed. > >For those people experiencing this oops, switching to deadline is a good workaround until it is fixed. I don't think that many people are hitting this, though. > >--- Additional comment from pebolle@tiscali.nl on 2011-04-18 10:40:45 EDT --- > >(In reply to comment #50) >> I don't think that many people are hitting this, >> though. > >And, as far as I know, no one has been able to reliably reproduce this. (I tried a few things, but without success. Ie, without even once seeing this oops when I actually wanted it to occur.) > >--- Additional comment from michal@harddata.com on 2011-04-18 11:00:34 EDT --- > >(In reply to comment #51) >> And, as far as I know, no one has been able to reliably reproduce this. > >There were a long series of kernels last year where I was hitting that quite regularly. Jeffrey Moyer got from me a vmcore file produced in one of those incidents. Still it some moment it stopped and I did not see a single instance ever since. No idea why. > >--- Additional comment from vgoyal@redhat.com on 2011-04-18 11:04:33 EDT --- > >(In reply to comment #46) >> >> 1) A quick look at the output of "objdump -l -d block/cfq-iosched.o" suggests >> this fault is triggered by "prefetch(pos->next)" in the >> hlist_for_each_entry_rcu macro in include/linux/rculist.h >> >> [...]block/cfq-iosched.c:2594 >> 1840: e8 fc ff ff ff call 1841 <__call_for_each_cic+0x13> >> 1845: 85 c0 test %eax,%eax >> 1847: 74 05 je 184e <__call_for_each_cic+0x20> >> 1849: e8 bc ff ff ff call 180a <rcu_read_lock_held> >> 184e: 85 db test %ebx,%ebx >> 1850: 74 11 je 1863 <__call_for_each_cic+0x35> >> prefetch(): >> [...]arch/x86/include/asm/processor.h:833 >> 1852: 8b 03 mov (%ebx),%eax >> 1854: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi >> >> (This was already hinted at in an earlier comment.) >I am little confused between two. IIUC, previous comment had hinted pos->next >is a valid pointer but the object pos->next is pointing to has been freed. And >this comment seems to be suggesting that pos->next itself is 6b6b6b.. and we >are freed object already. > >Anyway, so the bottom line seems to be that ioc->cic_list has a cic which >has already been freed. > >I have looked at the code closely and can't find anything wrong yet. There seem >to be two paths/places where cic's are removed from ioc->cic_list. > >static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic) >{ > unsigned long flags; > unsigned long dead_key = (unsigned long) cic->key; > > BUG_ON(!(dead_key & CIC_DEAD_KEY)); > > spin_lock_irqsave(&ioc->lock, flags); > radix_tree_delete(&ioc->radix_root, dead_key >> CIC_DEAD_INDEX_SHIFT); > hlist_del_rcu(&cic->cic_list); > spin_unlock_irqrestore(&ioc->lock, flags); > > cfq_cic_free(cic); >} > >static void >cfq_drop_dead_cic(struct cfq_data *cfqd, struct io_context *ioc, > struct cfq_io_context *cic) >{ > unsigned long flags; > > WARN_ON(!list_empty(&cic->queue_list)); > BUG_ON(cic->key != cfqd_dead_key(cfqd)); > > spin_lock_irqsave(&ioc->lock, flags); > > BUG_ON(ioc->ioc_data == cic); > > radix_tree_delete(&ioc->radix_root, cfqd->cic_index); > hlist_del_rcu(&cic->cic_list); > spin_unlock_irqrestore(&ioc->lock, flags); > > cfq_cic_free(cic); >} > >In both the places we take ioc->lock to remove the cic from list and then >call cfq_cic_free() which frees up a cic only after rcu time has elapsed. > >static void cfq_cic_free(struct cfq_io_context *cic) >{ > call_rcu(&cic->rcu_head, cfq_cic_free_rcu); >} > > >Given above, I am still wondering how can we reach a situation where we get to >a cic which has been freed but still on the ioc->cic_list. > >Also a question. Does anybody know how this poisoning scheme works. I mean >these poison bytes of 6b6b6b.. are written on whole slab object whenever a >object is freed? > >--- Additional comment from jmoyer@redhat.com on 2011-04-18 11:15:30 EDT --- > >When I was working on it, my guess was that we were getting into the drop_dead_cic case, and that this was racy (no clue why we keep the cic on the rbtree at all... I think it would be safe to remove it from there). I couldn't prove it, though. > >--- Additional comment from pebolle@tiscali.nl on 2011-04-18 11:36:30 EDT --- > >(In reply to comment #53) >0) One annoyance with the code in question is that there are three members called cic_list which are all (slightly) different. I found that rather confusing ... > >> Given above, I am still wondering how can we reach a situation where we get to >> a cic which has been freed but still on the ioc->cic_list. > >1) This is all RCU stuff, which I'm rather clueless about. > >Anyhow, my hunch is that the code here runs with readers (code just walking the hlist here) and writers (code manipulating that same list) working with different copies of that list at the same time (which is what RCU seems to be all about). Normally it shouldn't matter if a reader has an outdated view of an hlist. But here it apparently does matter to some readers (and so maybe an hlist doesn't really fit the job here). But my hunch didn't help me while looking at the code until, well, I just gave up ... > >> Also a question. Does anybody know how this poisoning scheme works. I mean >> these poison bytes of 6b6b6b.. are written on whole slab object whenever a >> object is freed? > >2) I'm afraid I don't. > >--- Additional comment from jlayton@redhat.com on 2011-04-18 11:47:43 EDT --- > >(In reply to comment #53) >> >> Also a question. Does anybody know how this poisoning scheme works. I mean >> these poison bytes of 6b6b6b.. are written on whole slab object whenever a >> object is freed? >> > >Correct: > >#define POISON_FREE 0x6b /* for use-after-free poisoning */ > >...mainly this happens in cache_free_debugcheck for slab allocator. I'm a little less familiar with slub though, but I'm pretty sure it uses the same preprocessor constant. > >--- Additional comment from vgoyal@redhat.com on 2011-04-18 14:10:29 EDT --- > >(In reply to comment #55) >> (In reply to comment #53) >> 0) One annoyance with the code in question is that there are three members >> called cic_list which are all (slightly) different. I found that rather >> confusing ... >> >> > Given above, I am still wondering how can we reach a situation where we get to >> > a cic which has been freed but still on the ioc->cic_list. >> >> 1) This is all RCU stuff, which I'm rather clueless about. >> >> Anyhow, my hunch is that the code here runs with readers (code just walking the >> hlist here) and writers (code manipulating that same list) working with >> different copies of that list at the same time (which is what RCU seems to be >> all about). Normally it shouldn't matter if a reader has an outdated view of an >> hlist. But here it apparently does matter to some readers (and so maybe an >> hlist doesn't really fit the job here). But my hunch didn't help me while >> looking at the code until, well, I just gave up ... > >I have checked that readers are always running with rcu read lock held. That will make sure that even if a writer has removed a cic from ioc->cic_list >it will not free the object till we drop the rcu read lock (until and unless rcu has stopped working :-)) > >- Because we are doing call_rcu() before freeing the cic, we should never run into a situation where reader gets the reference to the freed object as long as we have taken the read lock. (That's how I understand it). > >Now tricky bit is why we are running into the freed object despite the fact that we are holding rcu read lock in cfq_free_io_context() path. > >--- Additional comment from vgoyal@redhat.com on 2011-04-18 14:17:21 EDT --- > >(In reply to comment #54) >> When I was working on it, my guess was that we were getting into the >> drop_dead_cic case, and that this was racy (no clue why we keep the cic on the >> rbtree at all... I think it would be safe to remove it from there). I couldn't >> prove it, though. > >Jeff, > >I guess there are two reasons. > >- ioc->cic_list and radix tree is protected by ioc->ioc_lock. May be we are trying to avoid taking that lock when process is exiting and just mark cic dead. > >- If we drop it from rb-tree then we will lose track of old/dead cic which needs to be dropped. By keeping it in tree, it pops up in our search and we can drop it hence no need to look into separate list. > >Do you recall details that why do you think drop_dead_cic is racy. I do see that we are taking ioc->lock before we modify list and rb-tree and freeing of object is done under call_rcu(). So that sounds good atleast on the surface. > >--- Additional comment from vgoyal@redhat.com on 2011-04-18 14:19:20 EDT --- > >Paul, is this problem reproducible consistently for you. One of the big stumbling blocks here is that we can't reproduce it consistently to dive deeper. > >Is there any specific config option enabled like all the preemption stuff in your config. > >--- Additional comment from jmoyer@redhat.com on 2011-04-18 14:40:43 EDT --- > >(In reply to comment #58) >> (In reply to comment #54) >> > When I was working on it, my guess was that we were getting into the >> > drop_dead_cic case, and that this was racy (no clue why we keep the cic on the >> > rbtree at all... I think it would be safe to remove it from there). I couldn't >> > prove it, though. >> >> Jeff, >> >> I guess there are two reasons. >> >> - ioc->cic_list and radix tree is protected by ioc->ioc_lock. May be we are >> trying to avoid taking that lock when process is exiting and just mark cic >> dead. > >Could be. I'd have to check to see if taking the lock could cause problems. > >> - If we drop it from rb-tree then we will lose track of old/dead cic which >> needs to be dropped. By keeping it in tree, it pops up in our search and we can >> drop it hence no need to look into separate list. > >It should be removed anyway when the grace period expires, no? My memory is pretty fuzzy on this, and I don't have time to dig back in right now, sorry. > >> Do you recall details that why do you think drop_dead_cic is racy. I do see >> that we are taking ioc->lock before we modify list and rb-tree and freeing of >> object is done under call_rcu(). So that sounds good atleast on the surface. > >Fuzzy memory. As you know, I was also trying to perform io_context tracking throughout the I/O stack, so some of my memories may be from kernels with my patches applied. > >--- Additional comment from pebolle@tiscali.nl on 2011-04-18 14:43:49 EDT --- > >(In reply to comment #59) >> Paul, is this problem reproducible consistently for you. One of the big >> stumbling blocks here is that we can't reproduce it consistently to dive >> deeper. > >No, see comment #51. > >(The context of the oopses in comment #45 and comment #46 - which isn't clear from the comments themselves - could be said to be something like "a period with a lot of (relatively small?) IO". I seem to remember that comment #46 was triggered by installing a new kernel rpm. But, again, I haven't been able to reproduce this oops.) > >--- Additional comment from pebolle@tiscali.nl on 2011-04-22 16:28:29 EDT --- > >For what it's worth, shutting down (not Fedora distributed) vanilla v2.6.39-rc3 just gave me this: > >Apr 22 21:57:21 t41 kernel: [253346.725182] BUG: unable to handle kernel paging request at 6b6b6b6b >Apr 22 21:57:21 t41 kernel: [253346.726034] IP: [<c05e850a>] __call_for_each_cic+0x24/0x3a >Apr 22 21:57:21 t41 kernel: [253346.726034] *pde = 00000000 >Apr 22 21:57:21 t41 kernel: [253346.726034] Oops: 0000 [#1] SMP >Apr 22 21:57:21 t41 kernel: [253346.726034] last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/ >BAT0/energy_full >Apr 22 21:57:21 t41 kernel: [253346.726034] Modules linked in: bnep bluetooth fuse cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT nf_conntrack_ipv6 nf_ >defrag_ipv6 ip6table_filter nf_conntrack_netbios_ns nf_conntrack_broadcast ip6_tables snd_intel8x0 snd_intel8x0m snd_ac97_codec ac97_bus snd_seq arc4 sn >d_seq_device ath5k ath mac80211 ppdev snd_pcm microcode thinkpad_acpi cfg80211 parport_pc snd_timer snd e1000 parport rfkill iTCO_wdt soundcore iTCO_ven >dor_support snd_page_alloc pcspkr joydev i2c_i801 uinput ipv6 yenta_socket video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scs >i_wait_scan] >Apr 22 21:57:21 t41 kernel: [253346.726034] >Apr 22 21:57:21 t41 kernel: [253346.726034] Pid: 1405, comm: krb5-auth-dialo Tainted: G W 2.6.39-0.rc3.git0.0.local0.fc16.i686 #1 IBM / > >Apr 22 21:57:21 t41 kernel: [253346.726034] EIP: 0060:[<c05e850a>] EFLAGS: 00010202 CPU: 0 >Apr 22 21:57:21 t41 kernel: [253346.726034] EIP is at __call_for_each_cic+0x24/0x3a >Apr 22 21:57:21 t41 kernel: [253346.726034] EAX: 00000001 EBX: 6b6b6b6b ECX: 00000246 EDX: c0acaa98 >Apr 22 21:57:21 t41 kernel: [253346.726034] ESI: c802f200 EDI: c05e881d EBP: eef91dec ESP: eef91de0 >Apr 22 21:57:21 t41 kernel: [253346.726034] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >Apr 22 21:57:21 t41 kernel: [253346.726034] Process krb5-auth-dialo (pid: 1405, ti=eef90000 task=ee043f90 task.ti=eef90000) >Apr 22 21:57:21 t41 kernel: [253346.726034] Stack: >Apr 22 21:57:21 t41 kernel: [253346.726034] c802f200 ee043f90 ee044360 eef91df4 c05e8532 eef91e00 c05ddbf2 c802f200 >Apr 22 21:57:21 t41 kernel: [253346.726034] eef91e14 c05ddc82 ee043f90 00000000 eef91e48 eef91e60 c04407dd eef91e24 >Apr 22 21:57:21 t41 kernel: [253346.726034] c0465bac eef91e38 c045af59 c0bb46b8 ee044194 ee043f88 ee044194 00000001 >Apr 22 21:57:21 t41 kernel: [253346.726034] Call Trace: >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c05e8532>] cfq_free_io_context+0x12/0x14 >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c05ddbf2>] put_io_context+0x34/0x5c >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c05ddc82>] exit_io_context+0x68/0x6d >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c04407dd>] do_exit+0x634/0x657 >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c0465bac>] ? trace_hardirqs_off+0xb/0xd >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c045af59>] ? local_clock+0x32/0x49 >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c0440a17>] do_group_exit+0x63/0x86 >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c044c600>] get_signal_to_deliver+0x3a9/0x3c0 >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c0402615>] do_signal+0x5b/0x4f0 >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c0466087>] ? lock_release_holdtime.part.8+0x5d/0x63 >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c046889e>] ? lock_release+0x166/0x16d >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c081a503>] ? _raw_spin_unlock+0x22/0x25 >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c05008ac>] ? set_close_on_exec+0xa3/0xaa >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c04fce1f>] ? path_put+0x1a/0x1d >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c0402d3f>] do_notify_resume+0x26/0x6f >Apr 22 21:57:21 t41 kernel: [253346.726034] [<c081a868>] work_notifysig+0x13/0x1b >Apr 22 21:57:21 t41 kernel: [253346.726034] Code: e8 d0 da e7 ff 5d c3 55 89 e5 57 56 53 3e 8d 74 26 00 8b 58 4c 89 c6 89 d7 e8 53 b1 e6 ff 85 c0 74 05 e8 bc ff ff ff 85 db 74 11 <8b> 03 0f 18 00 90 8d 53 d8 89 f0 ff d7 8b 1b eb dd 5b 5e 5f 5d >Apr 22 21:57:21 t41 kernel: [253346.726034] EIP: [<c05e850a>] __call_for_each_cic+0x24/0x3a SS:ESP 0068:eef91de0 >Apr 22 21:57:21 t41 kernel: [253346.726034] CR2: 000000006b6b6b6b > >Just slightly different from the Oopses I previously submitted. > >--- Additional comment from cebbert@redhat.com on 2011-04-23 02:18:48 EDT --- > >__call_for_each_cic() has been removed by this commit in 2.6.39: > >http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5f45c69589b7d2953584e6cd0b31e35dbe960ad0 > >--- Additional comment from pebolle@tiscali.nl on 2011-04-29 15:51:01 EDT --- > >(In reply to comment #63) >> __call_for_each_cic() has been removed by this commit in 2.6.39: >> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5f45c69589b7d2953584e6cd0b31e35dbe960ad0 > >0) This Oops (generated with non-Fedroa blessed vanilla 2.6.39-rc5) suggests that commit (which is included in 2.6.39-rc5, if my memory is correct) was not enough to beat mandb into submission: > >BUG: unable to handle kernel paging request at 6b6b6b6b >IP: [<c05e859d>] call_for_each_cic+0x29/0x44 >*pde = 00000000 >Oops: 0000 [#1] SMP >last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full >Modules linked in: bnep bluetooth fuse cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_netbios_ns ip6table_filter nf_conntrack_broadcast ip6_tables snd_intel8x0m snd_intel8x0 arc4 snd_ac97_codec ath5k ac97_bus snd_seq ath snd_seq_device mac80211 snd_pcm ppdev thinkpad_acpi cfg80211 microcode snd_timer parport_pc e1000 snd rfkill parport iTCO_wdt soundcore snd_page_alloc iTCO_vendor_support joydev i2c_i801 pcspkr uinput ipv6 yenta_socket video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] > >Pid: 11397, comm: mandb Not tainted 2.6.39-0.rc5.git0.0.local.fc16.i686 #1 IBM / >EIP: 0060:[<c05e859d>] EFLAGS: 00010202 CPU: 0 >EIP is at call_for_each_cic+0x29/0x44 >EAX: 00000001 EBX: 6b6b6b6b ECX: 00000246 EDX: c0ad4a98 >ESI: ecfd0980 EDI: c05e88a4 EBP: c670bf18 ESP: c670bf0c > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >Process mandb (pid: 11397, ti=c670a000 task=d40769f0 task.ti=c670a000) >Stack: > ecfd0980 d40769f0 d4076dc0 c670bf20 c05e85ca c670bf2c c05ddc86 ecfd0980 > c670bf40 c05ddd16 d40769f0 00000012 c670bf74 c670bf8c c0440813 f2a182e8 > d401ac08 c670bf70 c04f48da 00000001 d4076bf4 d40769e8 d4076bf4 00000001 >Call Trace: > [<c05e85ca>] cfq_free_io_context+0x12/0x14 > [<c05ddc86>] put_io_context+0x34/0x5c > [<c05ddd16>] exit_io_context+0x68/0x6d > [<c0440813>] do_exit+0x63e/0x661 > [<c04f48da>] ? fsnotify_modify+0x64/0x6c > [<c0440a4d>] do_group_exit+0x63/0x86 > [<c0440a88>] sys_exit_group+0x18/0x18 > [<c082551f>] sysenter_do_call+0x12/0x38 >Code: 5d c3 55 89 e5 57 56 53 3e 8d 74 26 00 89 c6 89 d7 e8 3a ec ff ff 8b 5e 4c e8 00 b1 e6 ff 85 c0 74 05 e8 b7 ff ff ff 85 db 74 11 <8b> 03 0f 18 00 90 8d 53 d8 89 f0 ff d7 8b 1b eb dd e8 49 ec ff >EIP: [<c05e859d>] call_for_each_cic+0x29/0x44 SS:ESP 0068:c670bf0c >CR2: 000000006b6b6b6b >---[ end trace 70dea08ef41fda48 ]--- >Fixing recursive fault but reboot is needed! >BUG: unable to handle kernel paging request at 6b6b6b6b >IP: [<c05e859d>] call_for_each_cic+0x29/0x44 >*pde = 00000000 >Oops: 0000 [#2] SMP >last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full >Modules linked in: bnep bluetooth fuse cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_netbios_ns ip6table_filter nf_conntrack_broadcast ip6_tables snd_intel8x0m snd_intel8x0 arc4 snd_ac97_codec ath5k ac97_bus snd_seq ath snd_seq_device mac80211 snd_pcm ppdev thinkpad_acpi cfg80211 microcode snd_timer parport_pc e1000 snd rfkill parport iTCO_wdt soundcore snd_page_alloc iTCO_vendor_support joydev i2c_i801 pcspkr uinput ipv6 yenta_socket video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] > >Pid: 8732, comm: mandb Tainted: G D 2.6.39-0.rc5.git0.0.local.fc16.i686 #1 IBM / >EIP: 0060:[<c05e859d>] EFLAGS: 00010202 CPU: 0 >EIP is at call_for_each_cic+0x29/0x44 >EAX: 00000000 EBX: 6b6b6b6b ECX: 00000286 EDX: 025d2b76 >ESI: ecfd0700 EDI: c05e88a4 EBP: d88a1f18 ESP: d88a1f0c > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >Process mandb (pid: 8732, ti=d88a0000 task=f2df69f0 task.ti=d88a0000) >Stack: > ecfd0700 f2df69f0 f2df6dc0 d88a1f20 c05e85ca d88a1f2c c05ddc86 ecfd0700 > d88a1f40 c05ddd16 f2df69f0 00000012 d88a1f74 d88a1f8c c0440813 ef3c6000 > cb778508 d88a1f70 c04f48da 00000001 f2df6bf4 f2df69e8 f2df6bf4 00000001 >Call Trace: > [<c05e85ca>] cfq_free_io_context+0x12/0x14 > [<c05ddc86>] put_io_context+0x34/0x5c > [<c05ddd16>] exit_io_context+0x68/0x6d > [<c0440813>] do_exit+0x63e/0x661 > [<c04f48da>] ? fsnotify_modify+0x64/0x6c > [<c04fcd68>] ? path_put+0x1a/0x1d > [<c0440a4d>] do_group_exit+0x63/0x86 > [<c0440a88>] sys_exit_group+0x18/0x18 > [<c082551f>] sysenter_do_call+0x12/0x38 >Code: 5d c3 55 89 e5 57 56 53 3e 8d 74 26 00 89 c6 89 d7 e8 3a ec ff ff 8b 5e 4c e8 00 b1 e6 ff 85 c0 74 05 e8 b7 ff ff ff 85 db 74 11 <8b> 03 0f 18 00 90 8d 53 d8 89 f0 ff d7 8b 1b eb dd e8 49 ec ff >EIP: [<c05e859d>] call_for_each_cic+0x29/0x44 SS:ESP 0068:d88a1f0c >CR2: 000000006b6b6b6b >---[ end trace 70dea08ef41fda49 ]--- >Fixing recursive fault but reboot is needed! >BUG: unable to handle kernel paging request at 6b6b6b6b >IP: [<c05e859d>] call_for_each_cic+0x29/0x44 >*pde = 00000000 >Oops: 0000 [#3] SMP >last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A03:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full >Modules linked in: bnep bluetooth fuse cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_netbios_ns ip6table_filter nf_conntrack_broadcast ip6_tables snd_intel8x0m snd_intel8x0 arc4 snd_ac97_codec ath5k ac97_bus snd_seq ath snd_seq_device mac80211 snd_pcm ppdev thinkpad_acpi cfg80211 microcode snd_timer parport_pc e1000 snd rfkill parport iTCO_wdt soundcore snd_page_alloc iTCO_vendor_support joydev i2c_i801 pcspkr uinput ipv6 yenta_socket video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] > >Pid: 14230, comm: mandb Tainted: G D 2.6.39-0.rc5.git0.0.local.fc16.i686 #1 IBM / >EIP: 0060:[<c05e859d>] EFLAGS: 00010202 CPU: 0 >EIP is at call_for_each_cic+0x29/0x44 >EAX: 00000000 EBX: 6b6b6b6b ECX: 00000286 EDX: 025dbc75 >ESI: ecfd0b80 EDI: c05e88a4 EBP: da0bbf18 ESP: da0bbf0c > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >Process mandb (pid: 14230, ti=da0ba000 task=ebb30000 task.ti=da0ba000) >Stack: > ecfd0b80 ebb30000 ebb303d0 da0bbf20 c05e85ca da0bbf2c c05ddc86 ecfd0b80 > da0bbf40 c05ddd16 ebb30000 00000012 da0bbf74 da0bbf8c c0440813 ef3c68b8 > ca4eef08 da0bbf70 c04f48da 00000001 ebb30204 ebb2fff8 ebb30204 00000001 >Call Trace: > [<c05e85ca>] cfq_free_io_context+0x12/0x14 > [<c05ddc86>] put_io_context+0x34/0x5c > [<c05ddd16>] exit_io_context+0x68/0x6d > [<c0440813>] do_exit+0x63e/0x661 > [<c04f48da>] ? fsnotify_modify+0x64/0x6c > [<c04fcd68>] ? path_put+0x1a/0x1d > [<c0440a4d>] do_group_exit+0x63/0x86 > [<c0440a88>] sys_exit_group+0x18/0x18 > [<c082551f>] sysenter_do_call+0x12/0x38 >Code: 5d c3 55 89 e5 57 56 53 3e 8d 74 26 00 89 c6 89 d7 e8 3a ec ff ff 8b 5e 4c e8 00 b1 e6 ff 85 c0 74 05 e8 b7 ff ff ff 85 db 74 11 <8b> 03 0f 18 00 90 8d 53 d8 89 f0 ff d7 8b 1b eb dd e8 49 ec ff >EIP: [<c05e859d>] call_for_each_cic+0x29/0x44 SS:ESP 0068:da0bbf0c >CR2: 000000006b6b6b6b >---[ end trace 70dea08ef41fda4a ]--- >Fixing recursive fault but reboot is needed! > >--- Additional comment from pebolle@tiscali.nl on 2011-05-01 06:42:00 EDT --- > >0) I just hit (basically) the Oops of comment #64 again, once again triggered by mandb. > >1) I won't paste the backtrace, that should be noise by now. But we might be able to make this reproducible, as the events seem to be: >- install a few dozen packages that have been updated for Rawhide; >- reboot, because something low level has been updated so one might as wel just reboot; >- wait until (I guess) cron fires of mandb (after, maybe, 15 minutes). > >2) Would there be a way to force mandb to do what it does - including its side effects, ie generate troublesome disk activity somehow - in order to trigger this manually? > >--- Additional comment from jreiser@bitwagon.com on 2011-05-01 11:04:32 EDT --- > >(In reply to comment #65) >> ... the events seem to be: >> - install a few dozen packages that have been updated for Rawhide; >> - reboot, because something low level has been updated so one might as wel just >> reboot; >> - wait until (I guess) cron fires of mandb (after, maybe, 15 minutes). > >Please post the list of packages that were updated. On my system the list appears in /var/log/messages: >----- >Apr 30 11:12:23 f14-64 yum[3593]: Updated: firefox-3.6.17-1.fc14.x86_64 >----- > >If it is cron (or anacron ?) that runs mandb, then the mechanism is by invoking a shell script. So find that shell script (look soon after reboot), and post the script. > >--- Additional comment from vgoyal@redhat.com on 2011-05-02 10:15:04 EDT --- > >(In reply to comment #65) >> 0) I just hit (basically) the Oops of comment #64 again, once again triggered >> by mandb. >> >> 1) I won't paste the backtrace, that should be noise by now. But we might be >> able to make this reproducible, as the events seem to be: >> - install a few dozen packages that have been updated for Rawhide; >> - reboot, because something low level has been updated so one might as wel just >> reboot; >> - wait until (I guess) cron fires of mandb (after, maybe, 15 minutes). >> >> 2) Would there be a way to force mandb to do what it does - including its side >> effects, ie generate troublesome disk activity somehow - in order to trigger >> this manually? > >Bolle, is it possible to enable kdump on this box and capture the vmcore whenever this happens again. > >Also can you please keep the kernel debug info compiled (CONFIG_DEBUG_INFO=y) so that I can open the vmcore using "crash" and try to get some more information out. > >--- Additional comment from cebbert@redhat.com on 2011-05-02 21:13:29 EDT --- > >(In reply to comment #64) >> (In reply to comment #63) >> > __call_for_each_cic() has been removed by this commit in 2.6.39: >> > >> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5f45c69589b7d2953584e6cd0b31e35dbe960ad0 >> >> 0) This Oops (generated with non-Fedroa blessed vanilla 2.6.39-rc5) suggests >> that commit (which is included in 2.6.39-rc5, if my memory is correct) was not >> enough to beat mandb into submission: >> > >Confirmed, that commit is in -rc5 > >--- Additional comment from vgoyal@redhat.com on 2011-05-03 09:01:19 EDT --- > >(In reply to comment #68) >> (In reply to comment #64) >> > (In reply to comment #63) >> > > __call_for_each_cic() has been removed by this commit in 2.6.39: >> > > >> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5f45c69589b7d2953584e6cd0b31e35dbe960ad0 >> > >> > 0) This Oops (generated with non-Fedroa blessed vanilla 2.6.39-rc5) suggests >> > that commit (which is included in 2.6.39-rc5, if my memory is correct) was not >> > enough to beat mandb into submission: >> > >> >> Confirmed, that commit is in -rc5 > >That commit was put to fix a different race and is not going to help with this problem. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-07 14:38:20 EDT --- > >(In reply to comment #66) > >The trick turned out to be to (basically) run > /etc/cron.daily/man-db.cron > >with > OPTS="-c" > >(See also /etc/sysconfig/man-db.) > >Running that command triggered this Oops (within a few minutes). So, finally, it looks like I can trigger this manually. Progress, of sorts ... > >--- Additional comment from pebolle@tiscali.nl on 2011-05-07 14:45:14 EDT --- > >(In reply to comment #67) >> Bolle, > >0) Please feel free to use my first name (Paul). But in bugzilla, one isn't even impolite when not using the other commenter's name at all. > >> is it possible to enable kdump on this box and capture the vmcore >> whenever this happens again. >> >> Also can you please keep the kernel debug info compiled (CONFIG_DEBUG_INFO=y) >> so that I can open the vmcore using "crash" and try to get some more >> information out. > >1) I'm about to recompile (vanilla) v2.6.39-rc6, using a (not too old) Rawhide i686 .config. Does it have all the needed CONFIG_*s? > >2) Is Documentation/kdump/kdump.txt the best introduction to this (rather intimidating) stuff? > >--- Additional comment from pebolle@tiscali.nl on 2011-05-08 15:40:20 EDT --- > >(In reply to comment #71) >> 2) Is Documentation/kdump/kdump.txt the best introduction to this (rather >> intimidating) stuff? > >OK, I'm in danger of ending up shaving the yak: kexec-tools is installed, kdump is apparently happy when loading a "capture kernel", the "system kernel" is set to panic on oops, but somehow refuses to boot into the "capture kernel" and simply (seems to) hang(s)... What should I try next to solve that issue? > >--- Additional comment from pebolle@tiscali.nl on 2011-05-09 08:49:51 EDT --- > >(In reply to comment #72) >> [...] >> simply (seems to) hang(s)... What should I try next to solve that issue? > >Attaching a serial console learned me that the the "capture kernel" actually is being booted and also learned me that last thing this "capture kernel" prints is >> Trying to unpack rootfs image as initramfs... > >Any suggestions what I might be doing wrong? > >--- Additional comment from vgoyal@redhat.com on 2011-05-09 09:20:36 EDT --- > >(In reply to comment #73) >> (In reply to comment #72) >> > [...] >> > simply (seems to) hang(s)... What should I try next to solve that issue? >> >> Attaching a serial console learned me that the the "capture kernel" actually is >> being booted and also learned me that last thing this "capture kernel" prints >> is >> > Trying to unpack rootfs image as initramfs... >> >> Any suggestions what I might be doing wrong? > >How much memory are you reserving? Try increasing the amount of memory reserved. Also you need not to wait for it to crash. First you can manually crash the system using "echo c > /proc/sysrq-trigger". > >--- Additional comment from michal@harddata.com on 2011-05-09 09:39:48 EDT --- > >(In reply to comment #72) > >> ... but somehow refuses to boot into the "capture kernel" and >> simply (seems to) hang(s)... What should I try next to solve that issue? > >It is not clear from your description but you are possibly running into >bug 635893 and/or bug 654431. A workaround was to boot with a regular kernel, >by hook and by crook do a job of /sbin/mkdumprd for a kdump one and reboot >into a desired kernel. In an absence of a needed initrd /sbin/mkdumprd runs >in a boot sequence and this gets stuck. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-10 04:55:55 EDT --- > >(In reply to comment #74) >> How much memory are you reserving? Try increasing the amount of memory >> reserved. Also you need not to wait for it to crash. > >0) I was reserving (the suggested amount of) 64M. > >> First you can manually >> crash the system using "echo c > /proc/sysrq-trigger". > >1) You're not just interested in the vmcore at the moment of the Oops (upgraded to a panic, so kexec kicks in)? > >(In reply to comment #75) >> In an absence of a needed initrd /sbin/mkdumprd runs >> in a boot sequence and this gets stuck. > >2) This doesn't seem to be the issue. At the boot of the "system kernel" /etc/rc.d/init.d/kdump did generate a initrd for for the "capture kernel". > >3) My trouble apparently was that I was using the initramfs for the "system kernel" while I should have been using the initramfs (called initrd) generated by kdump for the "capture kernel". However using that initramfs doesn't help much since kdump doesn't generate an initramfs compatible with my setup (ie, root on ext3 on LVM2). So the kexec reboot now stopped at the Busybox prompt of a session without rootfs and without the means, as far as I could discover, to mount a rootfs from a LVM2 logical volume. > >4) I filed bug #703369 for that last issue, because it seems if I had a initramfs.conf kdump would invoke dracut, and dracut should be able to generate an initramfs compatible with my setup (which essentially is a Fedora default setup, isn't it?). > >--- Additional comment from vgoyal@redhat.com on 2011-05-10 08:16:35 EDT --- > >(In reply to comment #76) >> (In reply to comment #74) >> > How much memory are you reserving? Try increasing the amount of memory >> > reserved. Also you need not to wait for it to crash. >> >> 0) I was reserving (the suggested amount of) 64M. > >Depending on kernel and initramfs size it can take much more than that. I would suggest reserving 256M to be safe. > >> >> > First you can manually >> > crash the system using "echo c > /proc/sysrq-trigger". >> >> 1) You're not just interested in the vmcore at the moment of the Oops (upgraded >> to a panic, so kexec kicks in)? > >I suggested above just to make sure that you are able to capture dump when you crash the system manually. If that works, then we are reasonably sure that it will work when CFQ crashes. So yes, I am interested in the vmcore when CFQ actually crashes. > >> >> (In reply to comment #75) >> > In an absence of a needed initrd /sbin/mkdumprd runs >> > in a boot sequence and this gets stuck. >> >> 2) This doesn't seem to be the issue. At the boot of the "system kernel" >> /etc/rc.d/init.d/kdump did generate a initrd for for the "capture kernel". >> >> 3) My trouble apparently was that I was using the initramfs for the "system >> kernel" while I should have been using the initramfs (called initrd) generated >> by kdump for the "capture kernel". > >It is not mandatory. One can use dracut generated initramfs also. Just that it is much bigger than kdump generated initrd and does not have some automatic kdump known options for saving dump. > >> However using that initramfs doesn't help >> much since kdump doesn't generate an initramfs compatible with my setup (ie, >> root on ext3 on LVM2). So the kexec reboot now stopped at the Busybox prompt of >> a session without rootfs and without the means, as far as I could discover, to >> mount a rootfs from a LVM2 logical volume. > >Is your network up. Can you scp the vmcore over network to a different machine from busybox prompt. > >So root is not mounted because some driver or dm modules are not packed in initrd? You can compile these in your kernel and hence you don't have to depend on kdump initrd packing those in. > >> >> 4) I filed bug #703369 for that last issue, because it seems if I had a >> initramfs.conf kdump would invoke dracut, and dracut should be able to generate >> an initramfs compatible with my setup (which essentially is a Fedora default >> setup, isn't it?). > >If you want, you can disable kdump service entirely and rely on loading kdump kernel manually. Reserve 256M of memory and you can use dracut generated >initramfs for command line below. > >kdump -p /boot/vmlinuz --initrd=/boot/initrd --append="`cat /proc/cmdline` maxcpus=1, irqpoll, reset_devices" > >--- Additional comment from vgoyal@redhat.com on 2011-05-10 08:24:04 EDT --- > >Paul, > >Can you please upload your config file for the kernel you are able to recreate the crash. I will use the same config on upstream kernel and boot on fedora and try to reproduce the issue. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-10 15:18:12 EDT --- > >Created attachment 498131 >my v2.6.39-rc6 vanilla config > >(In reply to comment #78) >> Can you please upload your config file for the kernel you are able to recreate >> the crash. I will use the same config on upstream kernel and boot on fedora and >> try to reproduce the issue. > >This is based on the first appearance of v2.6.39-rc6 in Fedora's kernel.git, but only "rpmbuilt" as a vanilla kernel, so it should be close to what Fedora might be shipping by now. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-11 06:25:15 EDT --- > >0) Finally managed to create a vmcore dump. crashkernel=256M helped. It also feels like using a serial console is the only (reliable) way to use kexec (see https://lkml.org/lkml/2011/4/12/210 for a similar report). Perhaps throttling the output to 115200 baud stops something, somewhere from being racey. But, this is just a feeling. > >1) Now I have a (2048 - 256 =) 1792 MB core file. Not practical to upload. > >2) First impressions with gdb: > >Reading symbols from /home/[...]/Rpmbuild/BUILD/kernel-2.6.38.fc16/linux-2.6.38.i686/vmlinux...done. >[New LWP 14366] >#0 prefetch (x=Cannot access memory at address 0x6b6b6b6b >) > at /home/[...]/Rpmbuild/BUILD/kernel-2.6.38.fc16/linux-2.6.38.i686/arch/x86/include/asm/processor.h:833 >833 alternative_input(BASE_PREFETCH, >(gdb) bt 9 >#0 prefetch (x=Cannot access memory at address 0x6b6b6b6b >) > at /home/peb/Rpmbuild/BUILD/kernel-2.6.38.fc16/linux-2.6.38.i686/arch/x86/include/asm/processor.h:833 >#1 call_for_each_cic (ioc=0xeeb60500, func=0xc05e88f8 <cic_free_func>) > at block/cfq-iosched.c:2596 >#2 0xc05e861e in cfq_free_io_context (ioc=<optimized out>) > at block/cfq-iosched.c:2659 >#3 0xc05ddcda in cfq_dtor (ioc=0xeeb60500) at block/blk-ioc.c:26 >#4 put_io_context (ioc=0xeeb60500) at block/blk-ioc.c:43 >#5 0xc05ddd6a in exit_io_context (task=0xed1469f0) at block/blk-ioc.c:80 >#6 0xc044083f in do_exit (code=<optimized out>) at kernel/exit.c:1044 >#7 0xc0440a79 in do_group_exit (exit_code=0) at kernel/exit.c:1106 >#8 0xc0440ab4 in sys_exit_group (error_code=0) at kernel/exit.c:1117 >(More stack frames follow...) >(gdb) f 3 >#3 0xc05ddcda in cfq_dtor (ioc=0xeeb60500) at block/blk-ioc.c:26 >26 cic->dtor(ioc); >(gdb) l >21 if (!hlist_empty(&ioc->cic_list)) { >22 struct cfq_io_context *cic; >23 >24 cic = list_entry(ioc->cic_list.first, struct cfq_io_context, >25 cic_list); >26 cic->dtor(ioc); >27 } >28 } >29 >30 /* >(gdb) print ioc->cic_list >$1 = {first = 0x0} >(gdb) > >3) So apparently something set ioc->cic_list.first to NULL after the hlist_empty() test was run but before cic->dtor() / cfq_free_io_context() was called. Is that possible? > >4) Nit pick: list_entry() should actually be hlist_entry() here. > >--- Additional comment from vgoyal@redhat.com on 2011-05-11 09:52:18 EDT --- > >(In reply to comment #80) >> 0) Finally managed to create a vmcore dump. crashkernel=256M helped. It also >> feels like using a serial console is the only (reliable) way to use kexec (see >> https://lkml.org/lkml/2011/4/12/210 for a similar report). Perhaps throttling >> the output to 115200 baud stops something, somewhere from being racey. But, >> this is just a feeling. >> >> 1) Now I have a (2048 - 256 =) 1792 MB core file. Not practical to upload. > >You can filter it out using mkdumpfile utitliy and just extract kernel pages and discard user space pages and i think cache pages too. That might bring it to reasonable size to upload. > >--- Additional comment from vgoyal@redhat.com on 2011-05-11 10:01:18 EDT --- > >(In reply to comment #80) > >> (gdb) bt 9 >> #0 prefetch (x=Cannot access memory at address 0x6b6b6b6b >> ) >> at >> /home/peb/Rpmbuild/BUILD/kernel-2.6.38.fc16/linux-2.6.38.i686/arch/x86/include/asm/processor.h:833 >> #1 call_for_each_cic (ioc=0xeeb60500, func=0xc05e88f8 <cic_free_func>) >> at block/cfq-iosched.c:2596 >> #2 0xc05e861e in cfq_free_io_context (ioc=<optimized out>) >> at block/cfq-iosched.c:2659 >> #3 0xc05ddcda in cfq_dtor (ioc=0xeeb60500) at block/blk-ioc.c:26 >> #4 put_io_context (ioc=0xeeb60500) at block/blk-ioc.c:43 >> #5 0xc05ddd6a in exit_io_context (task=0xed1469f0) at block/blk-ioc.c:80 >> #6 0xc044083f in do_exit (code=<optimized out>) at kernel/exit.c:1044 >> #7 0xc0440a79 in do_group_exit (exit_code=0) at kernel/exit.c:1106 >> #8 0xc0440ab4 in sys_exit_group (error_code=0) at kernel/exit.c:1117 >> (More stack frames follow...) >> (gdb) f 3 >> #3 0xc05ddcda in cfq_dtor (ioc=0xeeb60500) at block/blk-ioc.c:26 >> 26 cic->dtor(ioc); >> (gdb) l >> 21 if (!hlist_empty(&ioc->cic_list)) { >> 22 struct cfq_io_context *cic; >> 23 >> 24 cic = list_entry(ioc->cic_list.first, struct >> cfq_io_context, >> 25 >> cic_list); >> 26 cic->dtor(ioc); >> 27 } >> 28 } >> 29 >> 30 /* >> (gdb) print ioc->cic_list >> $1 = {first = 0x0} >> (gdb) >> >> 3) So apparently something set ioc->cic_list.first to NULL after the >> hlist_empty() test was run but before cic->dtor() / cfq_free_io_context() was >> called. Is that possible? > >I don't think that happened. After extracting cic pointer you successfully >called into cfq_free_io_context(). I think what happened that there was >atleast 1 element on the cic_list and then cfq_free_io_context() released >that element hence you see ioc->cic_list = null now. > >--- Additional comment from vgoyal@redhat.com on 2011-05-13 08:42:39 EDT --- > >Paul, > >Planning to update this vmcore (after filtering) and vmlinux? I am curious to have a look and see if I can spot something. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-15 14:24:04 EDT --- > >(In reply to comment #83) >> Planning to update this vmcore (after filtering) and vmlinux? I am curious to >> have a look and see if I can spot something. > >0) Well, after getting sidetracked, I managed to reproduce this under v2.6.39-rc7. The vmcore file is still (almost) 200M after butchering it makedumpfile, which is still rather big to upload. > >1) Perhaps you can contact me directly (ie, with bypassing bugzilla.redhat.com) to discuss possibilities here. > >--- Additional comment from vgoyal@redhat.com on 2011-05-17 13:31:41 EDT --- > >Thanks for the vmcore paul. Some observations after looking at it. > >- Crash happened when we tried to do pos->next where pos=0x6b6b6b6b6b. > >- ioc->cic_list is empty at the time of crash. So it looks like when we > entered the loop, atleast one element was on the list and later we > fetched next element which turned out to be freed object. > > pos = rcu_dereference_raw(hlist_next_rcu(pos) > >- One conspiracy theory is that somehow the cic we removed from the ioc->cic_list, got freed before rcu read lock was released. That way >cic object got poisoned and we fetched a poisoned pointer. Now I don't >know if that's possible or not. > >- I tried hard looking for pointer to recently freed object but looks like > we have lost that. > >- Dave Anderson helped looking at the crash trying to find anything else. We > did "bt -F" and we see that leftover on the stack there seems to be pointer > to cic->rcu_head field. We are assuming that that's the pointer to the > recently freed cic. > >- 0xf6ab7460 seems to be the last cic freed (guess from the stack lefovers). > >- "kmem -S cfq_io_context" verifies that this is a free object in the cache. > >- struct cfq_io_context 0xf6ab7460 shows that object is poisoned. > >So at this point of time I am not sure if we are looking at the right object >and whether it is the most recent free object. If yes, then question will >come did we free it up while we were in rcu_read_lock(). > >Paul, as you can reproduce the problem, I guess I will write a small patch >to print recently freed objects and see what we are freeing when. > >--- Additional comment from vgoyal@redhat.com on 2011-05-17 14:58:02 EDT --- > >Created attachment 499426 >CFQ cic debug patch > >a debug patch to get some more info out of system during crash. > >--- Additional comment from vgoyal@redhat.com on 2011-05-17 14:58:48 EDT --- > >Paul, can you please apply above patch and try to reproduce the issue again. Do attach the serial console and capture the console output. > >--- Additional comment from vgoyal@redhat.com on 2011-05-17 15:11:12 EDT --- > >Also, keep kdump still configured. A vmcore gives whole lot of information about various kmem caches and object states. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-18 03:34:41 EDT --- > >(In reply to comment #85) >> Some observations after looking at it. > >Two minor (lazy) questions, if that's allowed. > >> - [..] We >> did "bt -F" and we see that leftover on the stack there seems to be pointer >> to cic->rcu_head field. > >What's "bt -F"? > >> - "kmem -S cfq_io_context" verifies that this is a free object in the cache. > >What's kmem? > >--- Additional comment from vgoyal@redhat.com on 2011-05-18 05:47:27 EDT --- > >These are "crash" commands. So I used "crash vmlinux dumpfile" to open the dump and not gdb. > >You can use "help bt" to get more details, but in summary, crash travels through >stack contents and if there are any address found from any of slab caches, it prints the name of the slab symbolically. > >In this case, we saw following. > > #3 [ec05be50] do_page_fault at c08228ba > [RA: c08205d4 SP: ec05be54 FP: ec05becc SIZE: 124] > ec05be54: 00000007 f699bf90 restore_all_notrace [cfq_io_context] > ec05be64: 0000005c 00000000 00000008 f699bf90 > ec05be74: 00000000 trace_hardirqs_on_thunk+12 runqueues init_task > ec05be84: 00000000 restore_all_notrace f4d43764 00000080 > ec05be94: 00000080 rcu_sched_state [cfq_io_context] ec05bec8 > ec05bea4: 00000046 ec05beb8 trace_hardirqs_off_caller+46 6b6b6b6b > ec05beb4: [blkdev_ioc] ec05bf18 6b6b6b6b [blkdev_ioc] > ec05bec4: do_page_fault ec05bf18 error_code+108 > >There seem to be two addresses in one of the stack frames which seem to >be coming from cfq_io_context slab. > >Then we did "bt -f" to get raw addresses. > > #3 [ec05be50] do_page_fault at c08228ba > [RA: c08205d4 SP: ec05be54 FP: ec05becc SIZE: 124] > ec05be54: 00000007 f699bf90 c081f8ed f6ab7498 > ec05be64: 0000005c 00000000 00000008 f699bf90 > ec05be74: 00000000 c05f3ef4 c0d017c0 c0ac5fe0 > ec05be84: 00000000 c081f8ed f4d43764 00000080 > ec05be94: 00000080 c0ad9e80 f6ab7498 ec05bec8 > ec05bea4: 00000046 ec05beb8 c0465c5d 6b6b6b6b > ec05beb4: ee9deb00 ec05bf18 6b6b6b6b ee9deb00 > ec05bec4: c082273f ec05bf18 c08205d4 > >And the address seems to be 0xf6ab7498 at both the instances. At this point of time it only tells that this address points to with-in some object in cfq_io_context slab. > >kmem command shows kernel memory state. -s and -S show slab state. Then we did following. > >crash> kmem -s 0xf6ab7498 >CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE >f696b580 cfq_io_context 64 26 36 1 4k > SLAB MEMORY NODE TOTAL ALLOCATED FREE > f58ac6e0 f6ab7000 0 36 26 10 > FREE / [ALLOCATED] > f6ab7460 (cpu 0 cache) > >This tells that 0xf6ab7498 was pointing inside one of the cfq_io_context object and beginning address of the object is 0xf6ab7460, which has been freed. it can be further verified using kmem -S. > >crash> kmem -S cfq_io_context >CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE >f696b580 cfq_io_context 64 26 36 1 4k >CPU 0 SLAB: > SLAB MEMORY NODE TOTAL ALLOCATED FREE > f58ac6e0 f6ab7000 0 36 26 10 > FREE / [ALLOCATED] > [f6ab7000] > [f6ab7070] > f6ab70e0 (cpu 0 cache) > [f6ab7150] > [f6ab71c0] > [f6ab7230] > [f6ab72a0] > [f6ab7310] > [f6ab7380] > [f6ab73f0] > f6ab7460 (cpu 0 cache) > f6ab74d0 (cpu 0 cache) >.... >.... >.... > >Notice above address does not have [] around it. That means this object has been freed and returned to the slub. > >So at this point we only know that we found a cic pointer on stack which has been freed. We don't know if that's the latest cic we freed under RCU. Hence >the patch with debug message might verify/deny that. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-18 16:06:18 EDT --- > >0) Interim report (since I, well, just forgot to enable panic on oops). > >1) I captured 6M+ via minicom over 115200 baud. mandb therefore took about 20 min! This part shows that we're on the right track, doesn't it? (This will probably end up a bit mangled.) > >[...] >debug: Will decouple cic=0xef3f1380, Next pointer is (null) >debug: cic=0xef3f1380 decoupled. Next pointer is (null) >debug: Will decouple cic=0xef3f1380, Next pointer is (null) >debug: cic=0xef3f1380 decoupled. Next pointer is (null) >debug: freeing up cic=ef3f1380 >cic=ef3f1380 freed under while in RCU >------------[ cut here ]------------ >kernel BUG at block/cfq-iosched.c:2606! >invalid opcode: 0000 [#1] SMP >last sysfs file: /sys/firmware/memmap/8/type >Modules linked in: cpufreq_ondemand acpi_cpufreq mperf bnep bluetooth ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast snd_intel8x0m arc4 snd_intel8x0 ath5k snd_ac97_codec ath ac97_bus snd_seq mac80211 snd_seq_device cfg80211 e1000 thinkpad_acpi snd_pcm ppdev rfkill iTCO_wdt snd_timer parport_pc iTCO_vendor_support snd i2c_i801 snd_page_alloc parport soundcore pcspkr joydev microcode uinput ipv6 yenta_socket video radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] > >Pid: 12275, comm: mandb Not tainted 2.6.39-0.rc7.git0.0.local1.fc16.i686 #1 IBM / >EIP: 0060:[<c05e86c6>] EFLAGS: 00010296 CPU: 0 >EIP is at call_for_each_cic+0x3b/0xa8 >EAX: 00000038 EBX: ef3f13a8 ECX: 00000086 EDX: 00000000 >ESI: ef3f1380 EDI: ef7baf00 EBP: eca75f18 ESP: eca75ef8 > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 >Process mandb (pid: 12275, ti=eca74000 task=ef693f90 task.ti=eca74000) >Stack: > c09a8b73 ef3f1380 00000000 c05e8a1f 6b6b6b6b ef7baf00 ef693f90 ef694360 > eca75f20 c05e8745 eca75f2c c05ddd86 ef7baf00 eca75f40 c05dde16 ef693f90 > 00000012 eca75f74 eca75f8c c04408ab efbe65dc efbe6580 00000009 ef693f90 >Call Trace: > [<c05e8a1f>] ? cfqq_process_refs+0x18/0x18 > [<c05e8745>] cfq_free_io_context+0x12/0x14 > [<c05ddd86>] put_io_context+0x34/0x5c > [<c05dde16>] exit_io_context+0x68/0x6d > [<c04408ab>] do_exit+0x63e/0x661 > [<c0440ae5>] do_group_exit+0x63/0x86 > [<c0440b20>] sys_exit_group+0x18/0x18 > [<c082571f>] sysenter_do_call+0x12/0x38 >Code: 89 55 ec e8 1f ec ff ff 8b 5f 4c eb 22 e8 ba ff ff ff 81 7d f0 6b 6b 6b 6b 75 12 89 74 24 04 c7 04 24 73 8b 9a c0 e8 51 e1 22 00 <0f> 0b 8b 1b e8 c1 b0 e6 ff 85 c0 74 05 e8 8f ff ff ff 85 db 74 >EIP: [<c05e86c6>] call_for_each_cic+0x3b/0xa8 SS:ESP 0068:eca75ef8 >---[ end trace be23eecaac4d4ddf ]--- >Fixing recursive fault but reboot is needed! >[...] > >--- Additional comment from vgoyal@redhat.com on 2011-05-18 16:47:13 EDT --- > >This one looks interesting. The only question I have is that why are we decoupling same cic multiple times. Is it that cic was decoupled, allocated again and then put again on the list or something like that. > >Can you provide either the whole of the console log or atleast last 100-200 lines or so. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-18 17:06:38 EDT --- > >(In reply to comment #92) >> This one looks interesting. The only question I have is that why are we >> decoupling same cic multiple times. Is it that cic was decoupled, allocated >> again and then put again on the list or something like that. >> >> Can you provide either the whole of the console log or atleast last 100-200 >> lines or so. > >0) Well, that cic was used a lot: >$ grep "freeing up cic=ef3f1380" call_for_each_cic.out | wc -l >7367 > >1) Perhaps we should try another way of printk debugging this. > >[2) Vivek, I just uploaded the (xz compressed) vmlinux and (ditto) dumpfile of a similar crash, now with panic_on_oops enabled. That one should have its log embedded in the dumpfile.] > >--- Additional comment from vgoyal@redhat.com on 2011-05-18 19:01:09 EDT --- > >Thanks Paul. Ok, call_for_each_cic() is called by 4 functions. > >- exit single io context >- decouple and free cic >- change ioprio >- change group > >On my system I see call to exit_io_context first and then call to freeing up the cic. So I see cic decouple message twice and then cic is freed. > >In your logs i see that call_for_each_cic() has been called 3 times on each cic. I am wondering what is that 3rd call. Is it change of ioprio or change of group etc. > >Can you please try another debug patch. In this patch I am also identifying the caller of the function. > >--- Additional comment from vgoyal@redhat.com on 2011-05-18 19:03:07 EDT --- > >Created attachment 499716 >CFQ cic debug patch V2 > >Another debug patch for CFQ. It also identifies the caller of the call_for_each_cic function. It also puts a printk for drop_dead_cic() which is only other path where cic is freed. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-19 16:34:07 EDT --- > >0) Vivek, I ran into trouble trying your new debug patch since, for some reason, my "capture kernel" now loops while printing things like this: > >irq 11: nobody cared (try booting with the "irqpoll" option) >Pid: 205, comm: modprobe Not tainted 2.6.39-0.rc7.git0.0.local.fc16.i686 #1 >Call Trace: > [<d08167d4>] ? printk+0x2d/0x2f > [<d0818025>] __report_bad_irq+0x40/0x9f > [<d04930b2>] note_interrupt+0x122/0x188 > [<d0491b1f>] handle_irq_event_percpu+0x171/0x18c > [<d04689b2>] ? lock_release+0x166/0x16d > [<d0491b6b>] handle_irq_event+0x31/0x4a > [<d0493553>] ? unmask_irq+0x23/0x23 > [<d04935a8>] handle_level_irq+0x55/0x73 > <IRQ> [<d0404040>] ? do_IRQ+0x3c/0x92 > [<d0465c5d>] ? trace_hardirqs_off_caller+0x2e/0x86 > [<d0825c75>] ? common_interrupt+0x35/0x3c > [<d044007b>] ? mm_update_next_owner+0xad/0x18a > [<d044007b>] ? mm_update_next_owner+0xad/0x18a > [<d04600e0>] ? clocksource_watchdog_work+0x16/0x3d > [<d0441a6e>] ? arch_local_irq_enable+0x5/0xb > [<d044272e>] ? __do_softirq+0x68/0x185 > [<d04426c6>] ? __tasklet_hi_schedule_first+0x30/0x30 > <IRQ> [<d0442a03>] ? irq_exit+0x4f/0x8b > [<d0404082>] ? do_IRQ+0x7e/0x92 > [<d0465c5d>] ? trace_hardirqs_off_caller+0x2e/0x86 > [<d0825c75>] ? common_interrupt+0x35/0x3c > [<d081007b>] ? powernow_cpu_init+0x51/0x5d0 > [<d04600e0>] ? clocksource_watchdog_work+0x16/0x3d > [<d046dfd1>] ? arch_local_irq_restore+0x5/0xb > [<d081f625>] ? _raw_spin_unlock_irqrestore+0x44/0x48 > [<d0492b00>] ? __setup_irq+0x27c/0x324 > [<d0492c1b>] ? request_threaded_irq+0x73/0xd9 > [<e07c048f>] ? cb_writel+0x59/0x59 [yenta_socket] > [<d0492c5a>] ? request_threaded_irq+0xb2/0xd9 > [<e07c16e3>] ? yenta_probe_cb_irq+0x46/0x121 [yenta_socket] > [<e07c1d54>] ? ti12xx_override+0x371/0x4b4 [yenta_socket] > [<e07c28c4>] ? yenta_probe+0x2c9/0x568 [yenta_socket] > [<d06a524c>] ? pm_runtime_enable+0x60/0x68 > [<d0609080>] ? pci_device_probe+0x62/0xab > [<d069ef4a>] ? driver_probe_device+0x129/0x208 > [<d081e20d>] ? mutex_lock_nested+0x39/0x3e > [<d069f078>] ? __driver_attach+0x4f/0x6b > [<d069e139>] ? bus_for_each_dev+0x42/0x6b > [<d069eb8e>] ? driver_attach+0x1f/0x23 > [<d069f029>] ? driver_probe_device+0x208/0x208 > [<d069e801>] ? bus_add_driver+0xcd/0x214 > [<d069f4b6>] ? driver_register+0x84/0xe3 > [<d05f7b6b>] ? __raw_spin_lock_init+0x2d/0x4e > [<d0609829>] ? __pci_register_driver+0x4f/0xaa > [<e07a6017>] ? yenta_socket_init+0x17/0x1000 [yenta_socket] > [<d040118a>] ? do_one_initcall+0x8c/0x146 > [<e07a6000>] ? 0xe07a5fff > [<e07a6000>] ? 0xe07a5fff > [<d0472108>] ? sys_init_module+0x14fc/0x1704 > [<d082569f>] ? sysenter_do_call+0x12/0x38 >handlers: >[<d06f0464>] (usb_hcd_irq+0x0/0x95) >[<d06f0464>] (usb_hcd_irq+0x0/0x95) >[<d06f0464>] (usb_hcd_irq+0x0/0x95) >[<d06f0464>] (usb_hcd_irq+0x0/0x95) >[<e0904a79>] (radeon_driver_irq_handler_kms+0x0/0x19 [radeon]) >[<e07c17be>] (yenta_interrupt+0x0/0xb3 [yenta_socket]) >[<e07c048f>] (yenta_probe_handler+0x0/0x4c [yenta_socket]) >Disabling IRQ #11 > >1) According to the previous printed messages "irqpoll" already is set. > >2) Any suggestions what could be going wrong? > >--- Additional comment from vgoyal@redhat.com on 2011-05-19 16:55:24 EDT --- > >Paul, does not look like that it is related to my debug patch. I am surprised that it is looping for long time. Because generally if lots of interrupts come when nobody is handling the interrupts, kernel disables the interrupt line and moves on. So kernel should have done that. > >May be just try the test again. > >In the mean time, I can look at the console logs while you retrying capturing >the crash dump. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-19 17:10:06 EDT --- > >Created attachment 499949 >last 100 lines (before unsuccesfull reboot in capture kernel) > >(In reply to comment #97) >> Paul, does not look like that it is related to my debug patch. > >Indeed. Not sure what I did to mess things up this way. > >> In the mean time, I can look at the console logs while you retrying capturing >> the crash dump. > >I've attached the last 100 lines printed by the (patched) "system kernel", before it rebooted into the "capture kernel". > >--- Additional comment from vgoyal@redhat.com on 2011-05-19 17:27:20 EDT --- > >Hmm..., I see lots of following message which i don't see on my system. > >debug: Will change cgroup cic=0xee226380, Next pointer is (null) >debug: cic=0xee226380 change cgroup done. Next pointer is (null) > >Have you got any type of automatic rules setup for cgroups. So that if a thread is launched it is moved in some cgroup or so (/etc/cgrules.conf?). > >Can you disable this cgroup related special behavior and see if problem is still reproducible? > >--- Additional comment from vgoyal@redhat.com on 2011-05-19 17:29:32 EDT --- > >you can disable cgconfig and cgred service on the system (if these are running). > >--- Additional comment from pebolle@tiscali.nl on 2011-05-19 17:50:23 EDT --- > >(In reply to comment #99) >> Have you got any type of automatic rules setup for cgroups. So that if a thread >> is launched it is moved in some cgroup or so (/etc/cgrules.conf?). >> >> Can you disable this cgroup related special behavior and see if problem is >> still reproducible? > >I know nothing of cgroup related stuff. I guess it should be configured as current Rawhide has pushed it onto people (like me) tracking Rawhide. But at this moment I can't check as the kernel is (one again) happily flooding all consoles with the "cic" debug messages I patched in. > >(In reply to comment #100) >> you can disable cgconfig and cgred service on the system (if these are >> running). > >Ditto. > >--- Additional comment from vgoyal@redhat.com on 2011-05-19 17:54:51 EDT --- > >you can just kill the mandb process and then kernel should stop filling the console? > >I enabled cgred and put a rule in cgrules.conf that put all root processes in /cgroup/blk/test1. Then I see changed cgroup message only few times as process >migrates to new group. But I don't see this constant string of messages of change cgroup as you are seeing. > >Now my "mandb -c" is running /cgroup/blk/test1/ group but no constant stream of cgroup changed messages. > >Can you try capturing say 10 seconds of blktrace on your root disk (sda I guess?) > >- start mandb >- blktrace -d /dev/sda >- after 10 sec press Ctrl-C >- blkparse -i sda > blkparse-output > >And upload this blkparse-output file which contains the trace message. You might have to install the blktrace package though. > >--- Additional comment from vgoyal@redhat.com on 2011-05-19 18:05:21 EDT --- > >Though we are changing cgroups frequently in your setup looking at the cgroup change code, it looks harmless in this context. We just release the cfqq associated with the cic and don't try to free cic as such. > >Still I am curious what's happening in your setup which causes all these messages and may be that could explain the issue. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-19 18:06:57 EDT --- > >0) Things settled down. For once mandb did not manage to trigger this issue. The joy of race conditions? > >1) /etc/cgrules.conf is entirely commented out (its default state, I suppose). > >2) If chkconfig is to be believed neither cgconfig nor cgred should run (but maybe systemd is interfering with this, dunno...). > >--- Additional comment from pebolle@tiscali.nl on 2011-05-19 18:13:52 EDT --- > >(In reply to comment #102) >> you can just kill the mandb process and then kernel should stop filling the >> console? > >My comment #104 collided with your comment #102 and comment #103. Anyway, I hope to give this (ie, your blktrace stuff) another try tomorrow. I'm a bit fed up now with seeing this laptop booting, crashing, rebooting and crashing again without any apparent progress .... > >--- Additional comment from pebolle@tiscali.nl on 2011-05-20 03:38:48 EDT --- > >Created attachment 500006 >16+ seconds of blktrace output > >(In reply to comment #102) >> Can you try capturing say 10 seconds of blktrace on your root disk (sda I >> guess?) > >16+ seconds of blktrace output (while running mandb - which eventually oopsed the box, a single hard disk, single CPU system). > >--- Additional comment from vgoyal@redhat.com on 2011-05-20 14:27:44 EDT --- > >Ok, nothing suspicious in blktrace. There are no "changed cgroup" messages. So i think these are newly allocated cic's where ioc->cgroup_changed = 1 but there is no associated cfqq hence blktrace does not carry "chaged_cgroup" message. > >Why are we changing cgroup of mandb threads so frequently, still beats me. I guess that's a different problem we can analyze later. > >Back to the core issue of freeing up object while in RCU. I am really running out of ideas that how to debug this. I am hoping that Paul McKenney gives us some tips how to prove/disprove that it is an RCU issue or not. > >BTW, out of curiosity, is this issue also reproducible on stock fedora kernel for you? Is it reproducible if you just trim down your config to bare minimum. Just trying to figure out if it is some CONFIG option dependent or not. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-20 15:04:01 EDT --- > >(In reply to comment #95) >> CFQ cic debug patch V2 >> >> Another debug patch for CFQ. It also identifies the caller of the >> call_for_each_cic function. It also puts a printk for drop_dead_cic() which is >> only other path where cic is freed. > >0) I've finally uploaded a (xz compressed) vmlinux and (ditto) dumpfile of >this crash (both using the V2 debug patch). > >(1) I've been running into trouble with kexec again and again for the last 48 hours. To aid my memory, it seems I need to add these options to the "system kernel" to get things to work: > >crashkernel=256M console=ttyS0,115200n8 rd_NO_PLYMOUTH nousb nomodeset yenta_socket.foo=1 ath5k.foo=1 3 > >- "nomodeset" (and no console=tty0) for radeon related bugs (of the "capture kernel"); >- yenta_socket.foo=1 and ath5k.foo=1 are tricks to make sure those modules won't get loaded (for IRQ=11 related bugs of the "capture kernel"; >- nousb for ehci relted bugs (of the "capture kernel"); >- rd_NO_PLYMOUTH to be able to log into the "system kernel" over serial line. > >This is just what I ended up with after beating kexec into (apparent) submission. Not sure what's really needed and what's just cargo cult.) > >--- Additional comment from pebolle@tiscali.nl on 2011-05-20 15:06:38 EDT --- > >(In reply to comment #107) >> BTW, out of curiosity, is this issue also reproducible on stock fedora kernel >> for you? Is it reproducible if you just trim down your config to bare minimum. >> Just trying to figure out if it is some CONFIG option dependent or not. > >Could also be a gcc related issue, I guess. > >What is the latest i686 kernel now actually shipped on the Rawhide mirrors? > >--- Additional comment from pebolle@tiscali.nl on 2011-05-21 17:57:57 EDT --- > >(In reply to comment #109) >> Could also be a gcc related issue, I guess. > >I upgraded gcc and friends (from 4.6.0-6) to 4.6.0-7, recompiled v2.6.39-rc7 and my favourite testcase (mandb -c) Oops'ed twice. My next comment should prove it's not gcc related. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-21 18:10:27 EDT --- > >(In reply to comment #107) >> BTW, out of curiosity, is this issue also reproducible on stock fedora kernel >> for you? Is it reproducible if you just trim down your config to bare minimum. >> Just trying to figure out if it is some CONFIG option dependent or not. > >Ha! > >I installed kernel-2.6.39-0 (as pre-compiled for i686 Rawhide). I also rebuilt that rpm locally (using gcc-4.6.0-7) and rebuilt that rpm in its vanilla variant (using rpmbuild's "--with vanilla" option). I'm unable to generate this Oops under v2.6.39 (in any of the three variants I used)! > >This suggests things got fixed between v.2.6.39-rc7 and v2.6.39. > >I hope to answer Paul McKenney's message to LKML (https://lkml.org/lkml/2011/5/21/151) shortly to find out whether things really got fixed in v2.6.39 or whether the window of opportunity for this Oops just got too small for my attempts at reproducing it. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-22 08:15:43 EDT --- > >(In reply to comment #111) >> I installed kernel-2.6.39-0 (as pre-compiled for i686 Rawhide). I also rebuilt >> that rpm locally (using gcc-4.6.0-7) and rebuilt that rpm in its vanilla >> variant (using rpmbuild's "--with vanilla" option). I'm unable to generate this >> Oops under v2.6.39 (in any of the three variants I used)! >> >> This suggests things got fixed between v.2.6.39-rc7 and v2.6.39. > >0) It turns out the Fedora kernel specfile uses rather different config files for release candidates and final releases: a lot of debug options will be unset for final releases. In this case I guess I was fooled by this: > >diff -U 0 /boot/config-2.6.39-0.rc7.git0.0.local.fc16.i686 /boot/config-2.6.39-0.local.fc16.i686 | grep -e RCU >-CONFIG_PROVE_RCU=y >-# CONFIG_PROVE_RCU_REPEATEDLY is not set > >1) I battled a bit with the specfile to get it to not delete config.debug, but I have to admit defeat. I'll have to figure out another way to use the debug config options. > >(2) For those people maintaining kernel.spec: > >Are these lines correct: > %if !%{debugbuildsenabled} > rm -f kernel-%{version}-*debug.config > %endif > >? > >It seems their effect is that both the information shipped in config-debug and config-nodebug is dropped. But this stuff is all rather confusing ....) > >--- Additional comment from pebolle@tiscali.nl on 2011-05-22 15:20:47 EDT --- > >(In reply to comment #112) >> 0) It turns out the Fedora kernel specfile uses rather different config files >> for release candidates and final releases: a lot of debug options will be unset >> for final releases. In this case I guess I was fooled by this: >> >> diff -U 0 /boot/config-2.6.39-0.rc7.git0.0.local.fc16.i686 >> /boot/config-2.6.39-0.local.fc16.i686 | grep -e RCU >> -CONFIG_PROVE_RCU=y >> -# CONFIG_PROVE_RCU_REPEATEDLY is not set > >And, indeed, when I rebuilt and booted v2.6.39 with a .config file more in line with what it used for the release candidates (eg, with CONFIG_PROVE_RCU=y) I ran into this issue the first time I ran my precious little testing tool (mandb -c). > >--- Additional comment from vgoyal@redhat.com on 2011-05-23 10:11:17 EDT --- > >(In reply to comment #113) >> (In reply to comment #112) >> > 0) It turns out the Fedora kernel specfile uses rather different config files >> > for release candidates and final releases: a lot of debug options will be unset >> > for final releases. In this case I guess I was fooled by this: >> > >> > diff -U 0 /boot/config-2.6.39-0.rc7.git0.0.local.fc16.i686 >> > /boot/config-2.6.39-0.local.fc16.i686 | grep -e RCU >> > -CONFIG_PROVE_RCU=y >> > -# CONFIG_PROVE_RCU_REPEATEDLY is not set >> >> And, indeed, when I rebuilt and booted v2.6.39 with a .config file more in line >> with what it used for the release candidates (eg, with CONFIG_PROVE_RCU=y) I >> ran into this issue the first time I ran my precious little testing tool (mandb >> -c). > >PaulB, > >So if you take fedora final release file and just enable CONFIG_PROVE_RCU=y, then you can reproduce the issue? Or more config options are involved too. > >--- Additional comment from vgoyal@redhat.com on 2011-05-23 10:25:43 EDT --- > >(In reply to comment #103) >> Though we are changing cgroups frequently in your setup looking at the cgroup >> change code, it looks harmless in this context. We just release the cfqq >> associated with the cic and don't try to free cic as such. >> >> Still I am curious what's happening in your setup which causes all these >> messages and may be that could explain the issue. > >Ok, I found out why you are seeing so many cgroup changed messages. We are not >initializing ioc->cgroup_changed field during ioc creation and CFQ thinks that >cgroup has changed. > >I am going to post a patch and that should remove cgroup changed messages from >the logs. Though I think that this is completely orthogonal and has nothing to >do with original problem. > >--- Additional comment from vgoyal@redhat.com on 2011-05-23 10:28:36 EDT --- > >Created attachment 500449 >Fix frequent cgroup changed messages visible in debug logs > >PaulB was seeing lots of cgroup changed messages when we enabled debug. This patch removes those unnecessary extra calls to call_for_each_cic(changed_cgroup). > >--- Additional comment from vgoyal@redhat.com on 2011-05-23 11:58:48 EDT --- > >PaulB, would you like to re-run those tests with some config options enabled as suggested by PaulM. Also it would be good to run the script to capture the >rcu traces. > >CONFIG_PROVE_RCU = y >CONFIG_PREEMPT = y >CONFIG_DEBUG_OBJECTS_RCU_HEAD = y >CONFIG_RCU_TRACE = y > >--- Additional comment from pebolle@tiscali.nl on 2011-05-23 16:53:05 EDT --- > >(In reply to comment #114) >> So if you take fedora final release file and just enable CONFIG_PROVE_RCU=y, >> then you can reproduce the issue? Or more config options are involved too. > >0) The Fedora final release I ran was kernel-2.6.39-0.fc16.i686.rpm. The first version of that release that I managed to built with a config resembling release candidates (but as a vanilla kernel) was kernel-vanilla-2.6.39-0.local2.fc16.i686. > >1) This horror shows the differences in their respective configs: > >diff -U 0 <(rpm2cpio Download/kernel-2.6.39-0.fc16.i686.rpm | cpio --quiet -i --to-stdout ./boot/config-2.6.39-0.fc16.i686) /boot/config-2.6.39-0.local2.fc16.i686 > >2) That generates this output (of which "+CONFIG_PROVE_RCU=y" seems to be the most important change, but most of it is entirely new for me anyway): > >--- /dev/fd/63 2011-05-23 22:48:15.071686130 +0200 >+++ /boot/config-2.6.39-0.local2.fc16.i686 2011-05-22 19:42:32.000000000 +0200 >@@ -3,2 +3,2 @@ >-# Linux/i386 2.6.39-0.fc16.i686 Kernel Configuration >-# Thu May 19 20:07:58 2011 >+# Linux/i386 2.6.39-0.local2.fc16.i686 Kernel Configuration >+# Sun May 22 17:51:58 2011 >@@ -178,0 +179 @@ >+CONFIG_PERF_USE_VMALLOC=y >@@ -185 +186 @@ >-# CONFIG_DEBUG_PERF_USE_VMALLOC is not set >+CONFIG_DEBUG_PERF_USE_VMALLOC=y >@@ -229 +230 @@ >-# CONFIG_MODULE_FORCE_UNLOAD is not set >+CONFIG_MODULE_FORCE_UNLOAD=y >@@ -258 +259 @@ >-CONFIG_INLINE_SPIN_UNLOCK=y >+# CONFIG_INLINE_SPIN_UNLOCK is not set >@@ -260 +261 @@ >-CONFIG_INLINE_SPIN_UNLOCK_IRQ=y >+# CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set >@@ -267 +268 @@ >-CONFIG_INLINE_READ_UNLOCK=y >+# CONFIG_INLINE_READ_UNLOCK is not set >@@ -269 +270 @@ >-CONFIG_INLINE_READ_UNLOCK_IRQ=y >+# CONFIG_INLINE_READ_UNLOCK_IRQ is not set >@@ -276 +277 @@ >-CONFIG_INLINE_WRITE_UNLOCK=y >+# CONFIG_INLINE_WRITE_UNLOCK is not set >@@ -278 +279 @@ >-CONFIG_INLINE_WRITE_UNLOCK_IRQ=y >+# CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set >@@ -280 +281 @@ >-CONFIG_MUTEX_SPIN_ON_OWNER=y >+# CONFIG_MUTEX_SPIN_ON_OWNER is not set >@@ -415 +416 @@ >-CONFIG_SPLIT_PTLOCK_CPUS=4 >+CONFIG_SPLIT_PTLOCK_CPUS=999999 >@@ -501 +502,2 @@ >-# CONFIG_ACPI_DEBUG is not set >+CONFIG_ACPI_DEBUG=y >+# CONFIG_ACPI_DEBUG_FUNC_TRACE is not set >@@ -596 +597,0 @@ >-CONFIG_PCI_MSI_DEFAULT_ON=y >@@ -1189 +1190 @@ >-# CONFIG_CAN_DEBUG_DEVICES is not set >+CONFIG_CAN_DEBUG_DEVICES=y >@@ -1323 +1324 @@ >-# CONFIG_CEPH_LIB_PRETTYDEBUG is not set >+CONFIG_CEPH_LIB_PRETTYDEBUG=y >@@ -1520 +1521 @@ >-# CONFIG_DRBD_FAULT_INJECTION is not set >+CONFIG_DRBD_FAULT_INJECTION=y >@@ -2115 +2116 @@ >-# CONFIG_ATH_DEBUG is not set >+CONFIG_ATH_DEBUG=y >@@ -2131 +2132 @@ >-# CONFIG_CARL9170_DEBUGFS is not set >+CONFIG_CARL9170_DEBUGFS=y >@@ -2180 +2181 @@ >-# CONFIG_IWLWIFI_DEVICE_TRACING is not set >+CONFIG_IWLWIFI_DEVICE_TRACING=y >@@ -2662 +2662,0 @@ >-CONFIG_INPUT_APPLEIR=m >@@ -2695 +2694,0 @@ >-CONFIG_CRASH=m >@@ -4522 +4521,2 @@ >-# CONFIG_DMADEVICES_DEBUG is not set >+CONFIG_DMADEVICES_DEBUG=y >+CONFIG_DMADEVICES_VDEBUG=y >@@ -4721 +4721 @@ >-# CONFIG_EXT4_DEBUG is not set >+CONFIG_EXT4_DEBUG=y >@@ -4726 +4726 @@ >-# CONFIG_JBD2_DEBUG is not set >+CONFIG_JBD2_DEBUG=y >@@ -4766 +4766 @@ >-# CONFIG_QUOTA_DEBUG is not set >+CONFIG_QUOTA_DEBUG=y >@@ -4817 +4816,0 @@ >-CONFIG_TMPFS_XATTR=y >@@ -5020,2 +5019,8 @@ >-# CONFIG_DEBUG_OBJECTS is not set >-# CONFIG_SLUB_DEBUG_ON is not set >+CONFIG_DEBUG_OBJECTS=y >+# CONFIG_DEBUG_OBJECTS_SELFTEST is not set >+CONFIG_DEBUG_OBJECTS_FREE=y >+CONFIG_DEBUG_OBJECTS_TIMERS=y >+CONFIG_DEBUG_OBJECTS_WORK=y >+CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y >+CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1 >+CONFIG_SLUB_DEBUG_ON=y >@@ -5024 +5029,2 @@ >-# CONFIG_DEBUG_RT_MUTEXES is not set >+CONFIG_DEBUG_RT_MUTEXES=y >+CONFIG_DEBUG_PI_LIST=y >@@ -5026,4 +5032,6 @@ >-# CONFIG_DEBUG_SPINLOCK is not set >-# CONFIG_DEBUG_MUTEXES is not set >-# CONFIG_DEBUG_LOCK_ALLOC is not set >-# CONFIG_PROVE_LOCKING is not set >+CONFIG_DEBUG_SPINLOCK=y >+CONFIG_DEBUG_MUTEXES=y >+CONFIG_DEBUG_LOCK_ALLOC=y >+CONFIG_PROVE_LOCKING=y >+CONFIG_PROVE_RCU=y >+# CONFIG_PROVE_RCU_REPEATEDLY is not set >@@ -5031 +5039,4 @@ >-# CONFIG_LOCK_STAT is not set >+CONFIG_LOCKDEP=y >+CONFIG_LOCK_STAT=y >+# CONFIG_DEBUG_LOCKDEP is not set >+CONFIG_TRACE_IRQFLAGS=y >@@ -5040 +5051 @@ >-# CONFIG_DEBUG_VM is not set >+CONFIG_DEBUG_VM=y >@@ -5042 +5053 @@ >-# CONFIG_DEBUG_WRITECOUNT is not set >+CONFIG_DEBUG_WRITECOUNT=y >@@ -5045,4 +5056,4 @@ >-# CONFIG_TEST_LIST_SORT is not set >-# CONFIG_DEBUG_SG is not set >-# CONFIG_DEBUG_NOTIFIERS is not set >-# CONFIG_DEBUG_CREDENTIALS is not set >+CONFIG_TEST_LIST_SORT=y >+CONFIG_DEBUG_SG=y >+CONFIG_DEBUG_NOTIFIERS=y >+CONFIG_DEBUG_CREDENTIALS=y >@@ -5059,2 +5070,8 @@ >-# CONFIG_CPU_NOTIFIER_ERROR_INJECT is not set >-# CONFIG_FAULT_INJECTION is not set >+CONFIG_CPU_NOTIFIER_ERROR_INJECT=m >+CONFIG_FAULT_INJECTION=y >+CONFIG_FAILSLAB=y >+CONFIG_FAIL_PAGE_ALLOC=y >+CONFIG_FAIL_MAKE_REQUEST=y >+CONFIG_FAIL_IO_TIMEOUT=y >+CONFIG_FAULT_INJECTION_DEBUG_FS=y >+CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y >@@ -5062 +5079 @@ >-# CONFIG_SYSCTL_SYSCALL_CHECK is not set >+CONFIG_SYSCTL_SYSCALL_CHECK=y >@@ -5100 +5117,2 @@ >-# CONFIG_MMIOTRACE is not set >+CONFIG_MMIOTRACE=y >+# CONFIG_MMIOTRACE_TEST is not set >@@ -5106 +5124 @@ >-# CONFIG_DMA_API_DEBUG is not set >+CONFIG_DMA_API_DEBUG=y >@@ -5125,3 +5143,3 @@ >-# CONFIG_DEBUG_STACK_USAGE is not set >-# CONFIG_DEBUG_PER_CPU_MAPS is not set >-# CONFIG_X86_PTDUMP is not set >+CONFIG_DEBUG_STACK_USAGE=y >+CONFIG_DEBUG_PER_CPU_MAPS=y >+CONFIG_X86_PTDUMP=y >@@ -5130 +5148 @@ >-# CONFIG_DEBUG_SET_MODULE_RONX is not set >+CONFIG_DEBUG_SET_MODULE_RONX=y >@@ -5195,0 +5214 @@ >+CONFIG_CRYPTO_FIPS=y >@@ -5210 +5229 @@ >-CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y >+# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set >@@ -5373,0 +5393 @@ >+CONFIG_CPUMASK_OFFSTACK=y >@@ -5378 +5397,0 @@ >-CONFIG_PCI_DEFAULT_USE_CRS=y > >--- Additional comment from pebolle@tiscali.nl on 2011-05-23 17:18:32 EDT --- > >(In reply to comment #117) >> PaulB, would you like to re-run those tests with some config options enabled as >> suggested by PaulM. Also it would be good to run the script to capture the >> rcu traces. >> >> CONFIG_PROVE_RCU = y >> CONFIG_PREEMPT = y >> CONFIG_DEBUG_OBJECTS_RCU_HEAD = y >> CONFIG_RCU_TRACE = y > >0) The first of Paul E. McKenney's suggestions I tried was CONFIG_DEBUG_OBJECTS_RCU_HEAD=y (since Fedora relase candidates already ship with CONFIG_PROVE_RCU=y). CONFIG_DEBUG_OBJECTS_RCU_HEAD has some dependencies, so I basically ended up with: > > $ grep -e PREEMPT -e RCU /boot/config-2.6.39-0.local3.fc16.i686 | grep -v "^#" > CONFIG_TREE_PREEMPT_RCU=y > CONFIG_PREEMPT_RCU=y > CONFIG_RCU_FANOUT=32 > CONFIG_PREEMPT_NOTIFIERS=y > CONFIG_PREEMPT=y > CONFIG_DEBUG_OBJECTS_RCU_HEAD=y > CONFIG_DEBUG_PREEMPT=y > CONFIG_PROVE_RCU=y > CONFIG_SPARSE_RCU_POINTER=y > >(This doesn't yet use CONFIG_RCU_TRACE. Remember that one of the Fedora defaults seems to be CONFIG_PREEMPT_VOLUNTARY=y.) > >1) Using that config I seem to be unable to trigger this Oops. Does that make sense? > >--- Additional comment from vgoyal@redhat.com on 2011-05-24 09:14:43 EDT --- > >(In reply to comment #118) >> (In reply to comment #114) >> > So if you take fedora final release file and just enable CONFIG_PROVE_RCU=y, >> > then you can reproduce the issue? Or more config options are involved too. >> >> 0) The Fedora final release I ran was kernel-2.6.39-0.fc16.i686.rpm. The first >> version of that release that I managed to built with a config resembling >> release candidates (but as a vanilla kernel) was >> kernel-vanilla-2.6.39-0.local2.fc16.i686. >> >> 1) This horror shows the differences in their respective configs: >> >> diff -U 0 <(rpm2cpio Download/kernel-2.6.39-0.fc16.i686.rpm | cpio --quiet -i >> --to-stdout ./boot/config-2.6.39-0.fc16.i686) >> /boot/config-2.6.39-0.local2.fc16.i686 >> >> 2) That generates this output (of which "+CONFIG_PROVE_RCU=y" seems to be the >> most important change, but most of it is entirely new for me anyway): >> > >Ok, lots of debug options seem to have been turned off. May be we can take fedora final release config and turn on CONFIG_PROVE_RCU=y and see if you can reproduce the issue or not. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-24 18:12:39 EDT --- > >(In reply to comment #120) >> Ok, lots of debug options seem to have been turned off. May be we can take >> fedora final release config and turn on CONFIG_PROVE_RCU=y and see if you can >> reproduce the issue or not. > >0) If I take a Fedora finale release (kernel-2.6.39-0.fc16.i686.rpm) and use CONFIG_PROVE_RCU=y, I end up with these differences (of which a few are due to the fact I rpmbuilt a vanilla kernel package): > >--- /dev/fd/63 2011-05-25 00:05:59.211662288 +0200 >+++ /boot/config-2.6.39-0.local4.fc16.i686 2011-05-24 22:13:54.000000000 +0200 >@@ -3,2 +3,2 @@ >-# Linux/i386 2.6.39-0.fc16.i686 Kernel Configuration >-# Thu May 19 20:07:58 2011 >+# Linux/i386 2.6.39-0.local4.fc16.i686 Kernel Configuration >+# Tue May 24 20:15:33 2011 >@@ -258 +258 @@ >-CONFIG_INLINE_SPIN_UNLOCK=y >+# CONFIG_INLINE_SPIN_UNLOCK is not set >@@ -260 +260 @@ >-CONFIG_INLINE_SPIN_UNLOCK_IRQ=y >+# CONFIG_INLINE_SPIN_UNLOCK_IRQ is not set >@@ -267 +267 @@ >-CONFIG_INLINE_READ_UNLOCK=y >+# CONFIG_INLINE_READ_UNLOCK is not set >@@ -269 +269 @@ >-CONFIG_INLINE_READ_UNLOCK_IRQ=y >+# CONFIG_INLINE_READ_UNLOCK_IRQ is not set >@@ -276 +276 @@ >-CONFIG_INLINE_WRITE_UNLOCK=y >+# CONFIG_INLINE_WRITE_UNLOCK is not set >@@ -278 +278 @@ >-CONFIG_INLINE_WRITE_UNLOCK_IRQ=y >+# CONFIG_INLINE_WRITE_UNLOCK_IRQ is not set >@@ -280 +280 @@ >-CONFIG_MUTEX_SPIN_ON_OWNER=y >+# CONFIG_MUTEX_SPIN_ON_OWNER is not set >@@ -415 +415 @@ >-CONFIG_SPLIT_PTLOCK_CPUS=4 >+CONFIG_SPLIT_PTLOCK_CPUS=999999 >@@ -596 +595,0 @@ >-CONFIG_PCI_MSI_DEFAULT_ON=y >@@ -2662 +2660,0 @@ >-CONFIG_INPUT_APPLEIR=m >@@ -2695 +2692,0 @@ >-CONFIG_CRASH=m >@@ -4817 +4813,0 @@ >-CONFIG_TMPFS_XATTR=y >@@ -5026,4 +5022,6 @@ >-# CONFIG_DEBUG_SPINLOCK is not set >-# CONFIG_DEBUG_MUTEXES is not set >-# CONFIG_DEBUG_LOCK_ALLOC is not set >-# CONFIG_PROVE_LOCKING is not set >+CONFIG_DEBUG_SPINLOCK=y >+CONFIG_DEBUG_MUTEXES=y >+CONFIG_DEBUG_LOCK_ALLOC=y >+CONFIG_PROVE_LOCKING=y >+CONFIG_PROVE_RCU=y >+# CONFIG_PROVE_RCU_REPEATEDLY is not set >@@ -5030,0 +5029 @@ >+CONFIG_LOCKDEP=y >@@ -5031,0 +5031,2 @@ >+# CONFIG_DEBUG_LOCKDEP is not set >+CONFIG_TRACE_IRQFLAGS=y >@@ -5378 +5378,0 @@ >-CONFIG_PCI_DEFAULT_USE_CRS=y > >1) Using that Fedora final release with CONFIG_PROVE_RCU=y (and its dependencies) this issue does not seem to show up (ie, I'm unable to trigger this BUG). > >--- Additional comment from vgoyal@redhat.com on 2011-05-25 16:13:04 EDT --- > >Ok, so enabling CONFIG_PROVE_RCU and associated dependencies did not reproduce the isue. I guess time to try paul's debug patch for detecting blocking in rcu critical section along with rcu traces. > >If time permits, it might be a good idea to try to play with rest of the config options and try to narrow down enabling which config option really causes this behavior. > >--- Additional comment from paulmck@linux.vnet.ibm.com on 2011-05-25 16:32:34 EDT --- > >My patch requires CONFIG_PROVE_RCU... So maybe best to follow up on Jens's suggestion. Alternatively, capture the CPU and context-switch number at the beginning of the RCU read-side critical section, then verify that there is no change in either quantity at the end of the RCU read-side critical section. > >--- Additional comment from vgoyal@redhat.com on 2011-05-25 17:00:26 EDT --- > >(In reply to comment #123) >> My patch requires CONFIG_PROVE_RCU... So maybe best to follow up on Jens's >> suggestion. > >PaulB did try Jens patch of disabling cooperating queue logic. But that did not help. > >> Alternatively, capture the CPU and context-switch number at the >> beginning of the RCU read-side critical section, then verify that there is no >> change in either quantity at the end of the RCU read-side critical section. > >He has got a single cpu machine. So cpu number is not going to change. > >context-switch number is interesting data. Where it is stored. I might just write a quick patch to print that info ask PaulB to try that. > >--- Additional comment from pebolle@tiscali.nl on 2011-05-26 05:17:09 EDT --- > >(In reply to comment #123) >> My patch requires CONFIG_PROVE_RCU... > >This doesn't ring a bell. What patch are you referring to here? > >--- Additional comment from vgoyal@redhat.com on 2011-05-26 09:32:08 EDT --- > >(In reply to comment #125) >> (In reply to comment #123) >> > My patch requires CONFIG_PROVE_RCU... >> >> This doesn't ring a bell. What patch are you referring to here? > >I guess following. > >https://lkml.org/lkml/2011/5/24/362 > >--- Additional comment from paulmck@linux.vnet.ibm.com on 2011-05-26 09:58:44 EDT --- > >Further testing got me this patch replacing https://lkml.org/lkml/2011/5/24/362: > >rcu: Restore checks for blocking in RCU read-side critical sections > >Long ago, using TREE_RCU with PREEMPT would result in "scheduling >while atomic" diagnostics if you blocked in an RCU read-side critical >section. However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats >this diagnostic. This commit therefore adds a replacement diagnostic >based on PROVE_RCU. > >Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being >used for things that have nothing to do with rcu_dereference(), rename >lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third >argument that is a string indicating what is suspicious. This third >argument is passed in from a new third argument to rcu_lockdep_assert(). >Update all calls to rcu_lockdep_assert() to add an informative third >argument. > >Finally, add a pair of rcu_lockdep_assert() calls from within >rcu_note_context_switch(), one complaining if a context switch occurs >in an RCU-bh read-side critical section and another complaining if a >context switch occurs in an RCU-sched read-side critical section. >These are present only if the PROVE_RCU kernel parameter is enabled. > >Again, you must enable PROVE_RCU to see these new diagnostics. But you >are enabling PROVE_RCU to check out new RCU uses in any case, aren't you? > >Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > >diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h >index 4aef1dd..bfa66e7 100644 >--- a/include/linux/lockdep.h >+++ b/include/linux/lockdep.h >@@ -545,7 +545,7 @@ do { \ > #endif > > #ifdef CONFIG_PROVE_RCU >-extern void lockdep_rcu_dereference(const char *file, const int line); >+void lockdep_rcu_suspicious(const char *file, const int line, const char *s); > #endif > > #endif /* __LINUX_LOCKDEP_H */ >diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h >index 58b13f1..fb2933d 100644 >--- a/include/linux/rcupdate.h >+++ b/include/linux/rcupdate.h >@@ -297,19 +297,20 @@ extern int rcu_my_thread_group_empty(void); > /** > * rcu_lockdep_assert - emit lockdep splat if specified condition not met > * @c: condition to check >+ * @s: informative message > */ >-#define rcu_lockdep_assert(c) \ >+#define rcu_lockdep_assert(c, s) \ > do { \ > static bool __warned; \ > if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \ > __warned = true; \ >- lockdep_rcu_dereference(__FILE__, __LINE__); \ >+ lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > } \ > } while (0) > > #else /* #ifdef CONFIG_PROVE_RCU */ > >-#define rcu_lockdep_assert(c) do { } while (0) >+#define rcu_lockdep_assert(c, s) do { } while (0) > > #endif /* #else #ifdef CONFIG_PROVE_RCU */ > >@@ -338,14 +339,16 @@ extern int rcu_my_thread_group_empty(void); > #define __rcu_dereference_check(p, c, space) \ > ({ \ > typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \ >- rcu_lockdep_assert(c); \ >+ rcu_lockdep_assert(c, "suspicious rcu_dereference_check()" \ >+ " usage"); \ > rcu_dereference_sparse(p, space); \ > smp_read_barrier_depends(); \ > ((typeof(*p) __force __kernel *)(_________p1)); \ > }) > #define __rcu_dereference_protected(p, c, space) \ > ({ \ >- rcu_lockdep_assert(c); \ >+ rcu_lockdep_assert(c, "suspicious rcu_dereference_protected()" \ >+ " usage"); \ > rcu_dereference_sparse(p, space); \ > ((typeof(*p) __force __kernel *)(p)); \ > }) >@@ -359,7 +362,9 @@ extern int rcu_my_thread_group_empty(void); > #define __rcu_dereference_index_check(p, c) \ > ({ \ > typeof(p) _________p1 = ACCESS_ONCE(p); \ >- rcu_lockdep_assert(c); \ >+ rcu_lockdep_assert(c, \ >+ "suspicious rcu_dereference_index_check()" \ >+ " usage"); \ > smp_read_barrier_depends(); \ > (_________p1); \ > }) >diff --git a/kernel/lockdep.c b/kernel/lockdep.c >index 53a6895..9789028 100644 >--- a/kernel/lockdep.c >+++ b/kernel/lockdep.c >@@ -3798,7 +3798,7 @@ void lockdep_sys_exit(void) > } > } > >-void lockdep_rcu_dereference(const char *file, const int line) >+void lockdep_rcu_suspicious(const char *file, const int line, const char *s) > { > struct task_struct *curr = current; > >@@ -3807,9 +3807,10 @@ void lockdep_rcu_dereference(const char *file, const int line) > return; > #endif /* #ifdef CONFIG_PROVE_RCU_REPEATEDLY */ > /* Note: the following can be executed concurrently, so be careful. */ >- printk("\n===================================================\n"); >- printk( "[ INFO: suspicious rcu_dereference_check() usage. ]\n"); >- printk( "---------------------------------------------------\n"); >+ printk("\n===============================\n"); >+ printk( "[ INFO: suspicious RCU usage. ]\n"); >+ printk( "-------------------------------\n"); >+ printk( "%s\n", s); > printk("%s:%d invoked rcu_dereference_check() without protection!\n", > file, line); > printk("\nother info that might help us debug this:\n\n"); >@@ -3818,4 +3819,4 @@ void lockdep_rcu_dereference(const char *file, const int line) > printk("\nstack backtrace:\n"); > dump_stack(); > } >-EXPORT_SYMBOL_GPL(lockdep_rcu_dereference); >+EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious); >diff --git a/kernel/pid.c b/kernel/pid.c >index 57a8346..a7577b3 100644 >--- a/kernel/pid.c >+++ b/kernel/pid.c >@@ -419,7 +419,9 @@ EXPORT_SYMBOL(pid_task); > */ > struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns) > { >- rcu_lockdep_assert(rcu_read_lock_held()); >+ rcu_lockdep_assert(rcu_read_lock_held(), >+ "find_task_by_pid_ns() needs rcu_read_lock()" >+ " protection"); > return pid_task(find_pid_ns(nr, ns), PIDTYPE_PID); > } > >diff --git a/kernel/rcutree.c b/kernel/rcutree.c >index 88547c8..e043b4d 100644 >--- a/kernel/rcutree.c >+++ b/kernel/rcutree.c >@@ -153,6 +153,12 @@ void rcu_bh_qs(int cpu) > */ > void rcu_note_context_switch(int cpu) > { >+ rcu_lockdep_assert(!rcu_read_lock_bh_held(), >+ "Illegal context switch in RCU-bh" >+ " read-side critical section"); >+ rcu_lockdep_assert(!rcu_read_lock_sched_held(), >+ "Illegal context switch in RCU-sched" >+ " read-side critical section"); > rcu_sched_qs(cpu); > rcu_preempt_note_context_switch(cpu); > } > >--- Additional comment from pebolle@tiscali.nl on 2011-05-27 03:31:01 EDT --- > >(In reply to comment #127) >> Further testing got me this patch [...]: >> >> rcu: Restore checks for blocking in RCU read-side critical sections >> >> [...] > >0) I've applied that patch to (mostly vanilla) v2.6.39. Specifically, I applied commit c15d76f26712bd5228aa0c6af7a7e7c492a812c9, from one of the branches of (one of?) your git repositor(y|ies) on git.kernel.org. Is that the correct commit? > >1) None of the messages from that patch show up in my console or in my logs, even though I did manage (on the first try) to hit this Oops. Should I perhaps turn a knob here or there to get things to work? Or are we apparently looking in the wrong direction? > >--- Additional comment from cebbert@redhat.com on 2011-06-24 14:17:41 EDT --- > >Upstream commit: > >http://git.kernel.org/linus/ab4bd22d3cce6977dc039664cc2d052e3147d662
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 716520
: 915342