Description of problem: Kernel 2.6.23.1-10 randomly panics, but within 24 hours after boot. It comes without warning, but does leave messages in the log file. A cold boot is required to restart. Version-Release number of selected component (if applicable): 2.6.23.1-10 How reproducible: usually, but randomly Steps to Reproduce: 1. After installation of the kernel, it will randomly crash and hang with a panic. Actual results: It crashes! Expected results: It's not supposed to crash. Additional info: Here's what pops up in the /var/log/messages file: --- Oct 21 06:27:01 weather kernel: BUG: unable to handle kernel paging request at virtual address fffd3f08 Oct 21 06:27:01 weather kernel: printing eip: Oct 21 06:27:01 weather kernel: c04622db Oct 21 06:27:01 weather kernel: *pde = 00004067 Oct 21 06:27:01 weather kernel: *pte = 00000000 Oct 21 06:27:01 weather kernel: Oops: 0000 [#1] Oct 21 06:27:01 weather kernel: SMP Oct 21 06:27:01 weather kernel: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_multipath video output sbs battery ac ipv6 kvm_intel kvm snd_hda_intel snd_emu10$ Oct 21 06:27:01 weather kernel: CPU: 2 Oct 21 06:27:01 weather kernel: EIP: 0060:[<c04622db>] Not tainted VLI Oct 21 06:27:01 weather kernel: EFLAGS: 00210282 (2.6.23.1-10.fc7 #1) Oct 21 06:27:01 weather kernel: EIP is at sync_page+0x27/0x41 Oct 21 06:27:01 weather kernel: eax: 8001006d ebx: cc6f8dfc ecx: c2be0fa0 edx: fffd3ed0 Oct 21 06:27:01 weather kernel: esi: cc6f8dfc edi: c300d78c ebp: c04622b4 esp: cc6f8de0 Oct 21 06:27:01 weather kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Oct 21 06:27:01 weather kernel: Process eg.k (pid: 12370, ti=cc6f8000 task=eef51230 task.ti=cc6f8000) Oct 21 06:27:01 weather kernel: Stack: c061b742 cc6f8dfc c2be0fa0 cc6f8e18 00000014 c04622a6 00000002 c2be0fa0 Oct 21 06:27:01 weather kernel: 00000000 00000001 eef51230 c043d3e6 c300d790 c300d790 f7fd3ee0 f7fd3ed0 Oct 21 06:27:01 weather kernel: c0462372 00000000 f7fd3e28 f7fd3e28 f1686540 c04640f5 c0466247 00000044 Oct 21 06:27:01 weather kernel: Call Trace: Oct 21 06:27:01 weather kernel: [<c061b742>] __wait_on_bit_lock+0x2a/0x52 Oct 21 06:27:01 weather kernel: [<c04622a6>] __lock_page+0x58/0x5e Oct 21 06:27:01 weather kernel: [<c043d3e6>] wake_bit_function+0x0/0x3c Oct 21 06:27:01 weather kernel: [<c0462372>] find_lock_page+0x5a/0x90 Oct 21 06:27:01 weather kernel: [<c04640f5>] filemap_fault+0x9b/0x383 Oct 21 06:27:01 weather kernel: [<c0466247>] __alloc_pages+0x64/0x2a2 Oct 21 06:27:01 weather kernel: [<c046c2a4>] __do_fault+0x59/0x394 Oct 21 06:27:01 weather kernel: [<c046e926>] handle_mm_fault+0x3a0/0x78b Oct 21 06:27:01 weather kernel: [<c0470f84>] vma_merge+0x18a/0x19a Oct 21 06:27:01 weather kernel: [<c0471791>] mmap_region+0x31c/0x3d8 Oct 21 06:27:01 weather kernel: [<c061dda4>] do_page_fault+0x26a/0x5ef Oct 21 06:27:01 weather kernel: [<c0458f7e>] audit_syscall_exit+0x2aa/0x2c6 Oct 21 06:27:01 weather kernel: [<c04f51d8>] copy_from_user+0x32/0x5e Oct 21 06:27:01 weather kernel: [<c061db3a>] do_page_fault+0x0/0x5ef Oct 21 06:27:01 weather kernel: [<c061c822>] error_code+0x72/0x78 Oct 21 06:27:01 weather kernel: [<c0610000>] attach_one_algo+0x46/0x64 Oct 21 06:27:01 weather kernel: ======================= Oct 21 06:27:01 weather kernel: Code: 00 31 c0 c3 89 c1 0f ae f0 89 f6 8b 50 10 8b 00 66 85 c0 79 07 ba 40 fd 6f c0 eb 0f 8b 01 84 c0 78 1b f6 c2 01 75 16 85 d2 74 12 <8b> 42 38$ Oct 21 06:27:01 weather kernel: EIP: [<c04622db>] sync_page+0x27/0x41 SS:ESP 0068:cc6f8de0 Oct 21 06:27:01 weather kernel: BUG: unable to handle kernel paging request at virtual address 37343731 Oct 21 06:27:01 weather kernel: printing eip: Oct 21 06:27:01 weather kernel: 37343731 Oct 21 06:27:01 weather kernel: *pde = 00000000 Oct 21 06:27:01 weather kernel: Oops: 0000 [#2] Oct 21 06:27:01 weather kernel: SMP Oct 21 06:27:01 weather kernel: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_multipath video output sbs battery ac ipv6 kvm_intel kvm snd_hda_intel snd_emu10$ Oct 21 06:27:01 weather kernel: CPU: 3 Oct 21 06:27:01 weather kernel: EIP: 0060:[<37343731>] Tainted: G D VLI Oct 21 06:27:01 weather kernel: EFLAGS: 00210002 (2.6.23.1-10.fc7 #1) Oct 21 06:27:01 weather kernel: EIP is at 0x37343731 Oct 21 06:27:01 weather kernel: eax: cc6f8e04 ebx: cc6f8e04 ecx: 00000000 edx: 00000003 Oct 21 06:27:01 weather kernel: esi: 57465220 edi: 00000001 ebp: d81e1e38 esp: d81e1e18 Oct 21 06:27:01 weather kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Oct 21 06:27:01 weather kernel: Process cat (pid: 12402, ti=d81e1000 task=e9fa7840 task.ti=d81e1000) Oct 21 06:27:01 weather kernel: Stack: c04244c6 d81e1e68 00000003 c300d78c 50454b5f c300d78c d81e1e68 00000001 Oct 21 06:27:01 weather kernel: d81e1e5c c04264a0 00000000 d81e1e68 00000003 00200282 c300d78c 00000000 Oct 21 06:27:01 weather kernel: e6378494 f6b801c0 c043d396 d81e1e68 c2be0fa0 00000000 c2be0fa0 c046c5b0 Oct 21 06:27:01 weather kernel: Call Trace: Oct 21 06:27:01 weather kernel: [<c04244c6>] __wake_up_common+0x32/0x55 Oct 21 06:27:01 weather kernel: [<c04264a0>] __wake_up+0x32/0x43 Oct 21 06:27:01 weather kernel: [<c043d396>] __wake_up_bit+0x2e/0x33 Oct 21 06:27:01 weather kernel: [<c046c5b0>] __do_fault+0x365/0x394 Oct 21 06:27:01 weather kernel: [<c046e926>] handle_mm_fault+0x3a0/0x78b Oct 21 06:27:01 weather kernel: [<c0470f84>] vma_merge+0x18a/0x19a Oct 21 06:27:01 weather kernel: [<c0471791>] mmap_region+0x31c/0x3d8 Oct 21 06:27:01 weather kernel: [<c061dda4>] do_page_fault+0x26a/0x5ef Oct 21 06:27:01 weather kernel: [<c0458f7e>] audit_syscall_exit+0x2aa/0x2c6 Oct 21 06:27:01 weather kernel: [<c04f51d8>] copy_from_user+0x32/0x5e Oct 21 06:27:01 weather kernel: [<c061db3a>] do_page_fault+0x0/0x5ef Oct 21 06:27:01 weather kernel: [<c061c822>] error_code+0x72/0x78 Oct 21 06:27:01 weather kernel: [<c0610000>] attach_one_algo+0x46/0x64 Oct 21 06:27:01 weather kernel: ======================= Oct 21 06:27:01 weather kernel: Code: Bad EIP value. Oct 21 06:27:01 weather kernel: EIP: [<37343731>] 0x37343731 SS:ESP 0068:d81e1e18 No error messages precede this. I am running Fedora 7. D'oh! 4 GB of RAM, Seagate 750 GB hard drive, Q6700 processor with latest BIOS from ASUS. Uname -a: Linux weather.admin.niu.edu 2.6.23.1-10.fc7 #1 SMP Thu Oct 18 13:37:14 EDT 2007 i686 i686 i386 GNU/Linux
c10622b4 <sync_page>: c10622b4: 89 c1 mov %eax,%ecx c10622b6: f0 83 04 24 00 lock addl $0x0,(%esp) c10622bb: 8b 50 10 mov 0x10(%eax),%edx c10622be: 8b 00 mov (%eax),%eax c10622c0: 66 85 c0 test %ax,%ax c10622c3: 79 07 jns c10622cc <sync_page+0x18> c10622c5: ba 40 fd 2f c1 mov $0xc12ffd40,%edx c10622c6: R_386_32 swapper_space c10622ca: eb 0f jmp c10622db <sync_page+0x27> c10622cc: 8b 01 mov (%ecx),%eax c10622ce: 84 c0 test %al,%al c10622d0: 78 1b js c10622ed <sync_page+0x39> c10622d2: f6 c2 01 test $0x1,%dl c10622d5: 75 16 jne c10622ed <sync_page+0x39> c10622d7: 85 d2 test %edx,%edx c10622d9: 74 12 je c10622ed <sync_page+0x39> c10622db: 8b 42 38 mov 0x38(%edx),%eax c10622de: 85 c0 test %eax,%eax c10622e0: 74 0b je c10622ed <sync_page+0x39> c10622e2: 8b 50 08 mov 0x8(%eax),%edx c10622e5: 85 d2 test %edx,%edx c10622e7: 74 04 je c10622ed <sync_page+0x39> c10622e9: 89 c8 mov %ecx,%eax c10622eb: ff d2 call *%edx c10622ed: e8 0e 93 1b 00 call c121b600 <io_schedule> c10622ee: R_386_PC32 io_schedule c10622f2: 31 c0 xor %eax,%eax c10622f4: c3 ret (To actually find the right address, you need to subtract 0x400000 from and add 0x1000000 to the reported one.)
static int sync_page(void *word) { struct address_space *mapping; struct page *page; page = container_of((unsigned long *)word, struct page, flags); smp_mb(); mapping = page_mapping(page); if (mapping && mapping->a_ops && mapping->a_ops->sync_page) mapping->a_ops->sync_page(page); io_schedule(); return 0; } The mapping for page <c2be0fa0> is gone. Is there anything unusual about what the running program is doing? What kind of filesystem is its data on?
Not that I am aware of. It's a program called "McIDAS", which I use to make weather maps. It kicks off a script every 60 seconds, so there are lots of xinetd messages about it in the log file. Here's my /var/log/messages file so far this week. I attempted to take out the McIDAS messages, but I don't have xemacs to access right now, and pico wasn't cutting it. http://weather.niu.edu/crap
This continues with the 2.6.23.1-21 kernel as well. This did *not* happen under 2.6.23.1-8. What changed between -8 and -10?
(In reply to comment #4) > This continues with the 2.6.23.1-21 kernel as well. This did *not* happen > under 2.6.23.1-8. What changed between -8 and -10? Very little, and nothing that should cause this. Is the program memory-mapping files on local ext3 filesystems?
Yes. -rw-rw-r-- 1 (username) users 203886592 2007-11-04 11:50 ldm.pq -rw-rw-r-- 1 (username) users 30834688 2007-11-04 11:51 pqsurf.pq It uses the LDM weather data manager; if you do a search on Google with LDM Memory Mapping it gives you an idea of how it works. basically, it shoves an hour's woth of weather data into those two files, with "pqsurf.pq" just current hourly airport readings, with the ldm.pq containing everything else from satellite and radar binary data, to weather balloon data, etc.
*** This bug has been marked as a duplicate of 367141 ***