From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020623 Debian/1.0.0-0.woody.1 Description of problem: My system boots and is running from a r/o dasd, after having booted from initrd. I make a CMS formatted dasd visible to the kernel with echo "add device range=2000" > /proc/dasd/devices I use a userland tool, cmsfscp, to read a file from the DASD (similar to mtools for reading DOS filesystems under linux). I detach the dasd with echo "set device range=2000 off" > /proc/dasd/devices and the kernel crashes: 06:10:44 CPU: 0 Not tainted 06:10:44 Process bootfs.sh (pid: 84, task: 07c3a000, ksp: 07c3bf10) 06:10:44 Krnl PSW : 07080000 8002c748 06:10:44 __run_task_queue �kernel� 0x90 (2.4.21-4.EL) 06:10:44 Krnl GPRS: 00000000 00000000 00000000 00000000 06:10:44 00b8c15c 07c3ba00 00000000 00000000 06:10:44 00001000 00020dfc 07c3a000 07c3ba00 06:10:44 00080000 8002c6c0 07c3ba10 07c3b998 06:10:44 Krnl ACRS: 4001f860 00000000 00000000 00000000 06:10:44 00000000 00000000 00000000 00000000 06:10:44 00000000 00000000 00000000 00000000 06:10:44 00000000 00000000 00000000 00000000 06:10:44 Krnl Code: d2 03 10 08 d0 04 a7 84 00 03 0d e3 19 cb a7 74 ff f0 58 40 06:10:44 �<00067258>� block_sync_page �kernel� 0x48 (0x7c3ba48) 06:10:44 �<00043a02>� ___wait_on_page �kernel� 0xe6 (0x7c3baa8) 06:10:44 �<00044cde>� do_generic_file_read �kernel� 0x4c2 (0x7c3bb20) 06:10:44 �<0004557e>� generic_file_new_read �kernel� 0x92 (0x7c3bba8) 06:10:44 �<000456e0>� generic_file_read �kernel� 0x20 (0x7c3bc28) 06:10:44 �<0006cc80>� kernel_read �kernel� 0x74 (0x7c3bc88) 06:10:44 �<0006d128>� prepare_binprm �kernel� 0xfc (0x7c3bcf0) 06:10:44 �<0006d746>� do_execve �kernel� 0xd6 (0x7c3bd50) 06:10:44 �<00017888>� sys_execve �kernel� 0x74 (0x7c3bee8) 06:10:44 �<00014f92>� sys_execve_glue �kernel� 0xc (0x7c3bf48) The problem is as follows.. the dasd driver has called blk_cleanup_queue() while the request_queue struct is still queued, via its plug_tq member, to the disk task queue, tq_disk. Some time later run_task_queue() tries to follow the tq_disk linked list and gets a null pointer as blk_cleanup_queue() zeroed out the request_queue struct. I don't see this problem with a standard kernel.org 2.4.21 kernel. I applied the following patch to drivers/s390/block/dasd.c, which reports that the device is queued on tq_disk, calls run_task_queue(), and the kernel no longer crashes. --- drivers/s390/block/dasd.c.ori 2003-11-10 11:55:18.000000000 +0000 +++ drivers/s390/block/dasd.c 2003-11-10 11:55:51.000000000 +0000 @@ -4292,6 +4292,12 @@ max_sectors[major][minor + i] = 0; } if (device->request_queue) { + if (device->request_queue->plug_tq.sync) { + printk("dasd_disable_blkdev(): Device %d:%d on tq_disk (entry %p), running queue\n", major, minor, &device->request_queue->plug_tq); + run_task_queue(&tq_disk); + if (device->request_queue->plug_tq.sync) + printk("dasd.c: Ugh, still on tq_disk. Bye!!\n"); + } blk_cleanup_queue (device->request_queue); kfree(device->request_queue); device->request_queue = NULL; The following paragraph is just my theory as to what might be the cause: I see RedHat have changes at the end of ll_rw_blk.c:__make_request(), which set q->plugged=0 and effectively duplicate what would normally happen when tq_disk is processed. __make_request() has already called q->plug_device_fn(), so the request_queue is queued on tq_disk. If your changes mean we can now get out of __make_request() with no work left queued for my dasd device, then there is nothing to stop me detaching it before anyone calls run_task_queue(&tq_disk), resulting in the above oops. Version-Release number of selected component (if applicable): kernel-2.4.21-4.EL How reproducible: Always Steps to Reproduce: 1.see description section. 2. 3. Additional info:
this code is rather broken. run_task_queue() is NO guarantee all IO is finished etc etc. Sounds like set device ... off isn't supportable.
Well, in principle set...off should be no harder than scsi remove-single-device, should it? run_task_queue() may well not be the right way to handle this, but the dasd driver has gone through the motions of canceling outstanding requests first. I wasn't proposing my patch as a proper fix, just as evidence of what was causing the crash. Maybe the dasd driver didn't cancel outstanding requests properly.
So, where from are thouse outstanding requests coming? I suspect it might be one of those cases when it's better not to do something that hurts.
OK, this has been open and not touched forever. Closing. If this needs to be fixed still, please reopen with additional information.