Description of problem: In vfree(), we have the following: if (in_interrupt()) { printk("vfree(): sleeping in interrupt!! \n"); #ifdef __i386__ show_stack(NULL); #endif } I assume the intention here is to catch the faulty kernel module(s) and fix the issue. So we got one here: Aug 24 15:40:42 elonxdl585 kernel: vfree(): sleeping in interrupt!! Aug 24 15:40:42 elonxdl585 kernel: ca347e80 f8bc7d0c 00000000 ffffffff f8bcef9a f8bd7000 00004000 00000000 Aug 24 15:40:42 elonxdl585 kernel: f8bc7d0c 00000010 f8bc7a65 00000000 00000000 001c0e7d f8bc7da0 f8bd4080 Aug 24 15:40:42 elonxdl585 kernel: c037e100 f8bc7dd2 00000000 ffffffef 00000000 c0134626 00000000 00000001 Aug 24 15:40:42 elonxdl585 kernel: Call Trace: [<f8bc7d0c>] _lock_fdc [floppy] 0xac (0xca347e84) Aug 24 15:40:42 elonxdl585 kernel: [<f8bcef9a>] floppy_release_irq_and_dma [floppy] 0x21a (0xca347e90) Aug 24 15:40:42 elonxdl585 kernel: [<f8bc7d0c>] _lock_fdc [floppy] 0xac (0xca347ea0) Aug 24 15:40:42 elonxdl585 kernel: [<f8bc7a65>] set_dor [floppy] 0x165 (0xca347ea8) Aug 24 15:40:42 elonxdl585 kernel: [<f8bc7da0>] motor_off_callback [floppy] 0x0 (0xca347eb8) Aug 24 15:40:42 elonxdl585 kernel: [<f8bd4080>] motor_off_timer [floppy] 0x0 (0xca347ebc) Aug 24 15:40:42 elonxdl585 kernel: [<f8bc7dd2>] motor_off_callback [floppy] 0x32 (0xca347ec4) Aug 24 15:40:42 elonxdl585 kernel: [<c0134626>] __run_timers [kernel] 0xb6 (0xca347ed4) Aug 24 15:40:42 elonxdl585 kernel: [<c0134342>] timer_bh [kernel] 0x62 (0xca347f00) Aug 24 15:40:42 elonxdl585 kernel: [<c012efb5>] bh_action [kernel] 0x55 (0xca347f14) Aug 24 15:40:42 elonxdl585 kernel: [<c012ee57>] tasklet_hi_action [kernel] 0x67 (0xca347f1c) Aug 24 15:40:42 elonxdl585 kernel: [<c012ebe5>] do_softirq [kernel] 0x105 (0xca347f30) Aug 24 15:40:42 elonxdl585 kernel: [<c010db48>] do_IRQ [kernel] 0x148 (0xca347f50) Aug 24 15:40:42 elonxdl585 kernel: [<c010da00>] do_IRQ [kernel] 0x0 (0xca347f74) Aug 24 15:40:42 elonxdl585 kernel: [<c0109100>] default_idle [kernel] 0x0 (0xca347f7c) Aug 24 15:40:42 elonxdl585 kernel: [<c0109100>] default_idle [kernel] 0x0 (0xca347f90) Aug 24 15:40:42 elonxdl585 kernel: [<c0109129>] default_idle [kernel] 0x29 (0xca347fa4) Aug 24 15:40:42 elonxdl585 kernel: [<c01091c2>] cpu_idle [kernel] 0x42 (0xca347fb0) Aug 24 15:40:42 elonxdl585 kernel: [<c01287a3>] printk [kernel] 0x143 (0xca347fcc) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Since this happens in the timer bottom half code, messing around with irq, waitq etc seems to be messy. Why don't we just change the vmalloc() inside the fd_dma_mem_alloc() into a kmalloc() ? The dma buffer size is either 32K or 64K. The current code path is fd_dma_mem_alloc() --> fd_routine[vdma_mem_alloc] --> vmalloc().
oops, I spoke too soon. After digging into the code more, it is messier than I thought. It seems that the default mem allocation method is indeed the kmalloc(), it only switches into vmalloc() when not enough memory available (can_use_virtual_dma is 2): static int can_use_virtual_dma=2; /* ======= * can use virtual DMA: * 0 = use of virtual DMA disallowed by config * 1 = use of virtual DMA prescribed by config * 2 = no virtual DMA preference configured. By default try hard DMA, * but fall back on virtual DMA when not enough memory available */ static int use_virtual_dma; /* ======= * use virtual DMA "drivers/block/floppy.c" 4522L, 116862C Two options come to mind: 1. Fix the callback function. If it finds itself in interrupt context while trying to floppy_release_irq_and_dma(), delay and schedule the vfree() until it can be safely done (say on next floppy_open() ?). 2. Dis-allow vmalloc - this virtually dis-allows floppy device to get opened when lowmem is tight.
Found a global variable called floppy_track_buffer - seems to be a natural fit for the option #1 in above comment.
Any updates on this bug? I just saw the same problem: Sep 7 10:01:57 cust kernel: vfree(): sleeping in interrupt!! Sep 7 10:01:57 cust kernel: cd503e80 f905cd0c 00000000 ffffffff f9063f9a f9049000 00004000 00000000 Sep 7 10:01:58 cust kernel: f905cd0c 00000010 f905ca65 00000000 00000000 001c1bc5 f905cda0 f9069060 Sep 7 10:01:58 cust kernel: c0391900 f905cdd2 00000000 ffffffef 00000000 c0135396 00000000 0000003f Sep 7 10:01:58 cust kernel: Call Trace: [<f905cd0c>] _lock_fdc [floppy] 0xac (0xcd503e84) Sep 7 10:01:58 cust kernel: [<f9063f9a>] floppy_release_irq_and_dma [floppy] 0x21a (0xcd503e90) Sep 7 10:01:58 cust kernel: [<f905cd0c>] _lock_fdc [floppy] 0xac (0xcd503ea0) Sep 7 10:01:58 cust kernel: [<f905ca65>] set_dor [floppy] 0x165 (0xcd503ea8) Sep 7 10:01:58 cust kernel: [<f905cda0>] motor_off_callback [floppy] 0x0 (0xcd503eb8) Sep 7 10:01:58 cust kernel: [<f9069060>] motor_off_timer [floppy] 0x0 (0xcd503ebc) Sep 7 10:01:58 cust kernel: [<f905cdd2>] motor_off_callback [floppy] 0x32 (0xcd503ec4) Sep 7 10:01:58 cust kernel: [<c0135396>] __run_timers [kernel] 0xb6 (0xcd503ed4) Sep 7 10:01:58 cust kernel: [<c01350b2>] timer_bh [kernel] 0x62 (0xcd503f00) Sep 7 10:01:58 cust kernel: [<c012fd25>] bh_action [kernel] 0x55 (0xcd503f14) Sep 7 10:01:58 cust kernel: [<c012fbc7>] tasklet_hi_action [kernel] 0x67 (0xcd503f1c) Sep 7 10:01:58 cust kernel: [<c012f955>] do_softirq [kernel] 0x105 (0xcd503f30) Sep 7 10:01:58 cust kernel: [<c010df38>] do_IRQ [kernel] 0x148 (0xcd503f50) Sep 7 10:01:58 cust kernel: [<c010ddf0>] do_IRQ [kernel] 0x0 (0xcd503f74) Sep 7 10:01:58 cust kernel: [<c0109100>] default_idle [kernel] 0x0 (0xcd503f7c) Sep 7 10:01:58 cust kernel: [<c0109100>] default_idle [kernel] 0x0 (0xcd503f90) Sep 7 10:01:58 cust kernel: [<c0109129>] default_idle [kernel] 0x29 (0xcd503fa4) Sep 7 10:01:58 cust kernel: [<c01091c2>] cpu_idle [kernel] 0x42 (0xcd503fb0) Sep 7 10:01:58 cust kernel: [<c01294f3>] printk [kernel] 0x153 (0xcd503fcc) Is there a fix or workaround for this problem?
There was not enough time to resolve technical issues with the patch, and RHEL3 is now closed.