Bug 344821 - Kernel 2.6.23.1-10 panics randomly after install
Kernel 2.6.23.1-10 panics randomly after install
Status: CLOSED DUPLICATE of bug 367141
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
7
i686 Linux
low Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-21 22:09 EDT by Gilbert Sebenste
Modified: 2008-01-15 14:56 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-15 14:56:01 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gilbert Sebenste 2007-10-21 22:09:52 EDT
Description of problem: Kernel 2.6.23.1-10 randomly panics, but
within 24 hours after boot. It comes without warning, but does leave
messages in the log file. A cold boot is required to restart.


Version-Release number of selected component (if applicable): 2.6.23.1-10


How reproducible: usually, but randomly


Steps to Reproduce:
1. After installation of the kernel, it will randomly crash and hang
with a panic.

  
Actual results: It crashes!


Expected results: It's not supposed to crash.


Additional info: Here's what pops up in the /var/log/messages file:

---
Oct 21 06:27:01 weather kernel: BUG: unable to handle kernel paging
request at virtual address fffd3f08
Oct 21 06:27:01 weather kernel:  printing eip:
Oct 21 06:27:01 weather kernel: c04622db
Oct 21 06:27:01 weather kernel: *pde = 00004067
Oct 21 06:27:01 weather kernel: *pte = 00000000
Oct 21 06:27:01 weather kernel: Oops: 0000 [#1]
Oct 21 06:27:01 weather kernel: SMP
Oct 21 06:27:01 weather kernel: Modules linked in: autofs4 hidp rfcomm l2cap
bluetooth sunrpc dm_multipath video output sbs battery ac ipv6 kvm_intel kvm
snd_hda_intel snd_emu10$
Oct 21 06:27:01 weather kernel: CPU:    2
Oct 21 06:27:01 weather kernel: EIP:    0060:[<c04622db>]    Not tainted VLI
Oct 21 06:27:01 weather kernel: EFLAGS: 00210282   (2.6.23.1-10.fc7 #1)
Oct 21 06:27:01 weather kernel: EIP is at sync_page+0x27/0x41
Oct 21 06:27:01 weather kernel: eax: 8001006d   ebx: cc6f8dfc   ecx:
c2be0fa0   edx: fffd3ed0
Oct 21 06:27:01 weather kernel: esi: cc6f8dfc   edi: c300d78c   ebp:
c04622b4   esp: cc6f8de0
Oct 21 06:27:01 weather kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033 ss:
0068
Oct 21 06:27:01 weather kernel: Process eg.k (pid: 12370, ti=cc6f8000
task=eef51230 task.ti=cc6f8000)
Oct 21 06:27:01 weather kernel: Stack: c061b742 cc6f8dfc c2be0fa0 cc6f8e18
00000014 c04622a6 00000002 c2be0fa0
Oct 21 06:27:01 weather kernel:        00000000 00000001 eef51230 c043d3e6
c300d790 c300d790 f7fd3ee0 f7fd3ed0
Oct 21 06:27:01 weather kernel:        c0462372 00000000 f7fd3e28 f7fd3e28
f1686540 c04640f5 c0466247 00000044
Oct 21 06:27:01 weather kernel: Call Trace:
Oct 21 06:27:01 weather kernel:  [<c061b742>] __wait_on_bit_lock+0x2a/0x52
Oct 21 06:27:01 weather kernel:  [<c04622a6>] __lock_page+0x58/0x5e
Oct 21 06:27:01 weather kernel:  [<c043d3e6>] wake_bit_function+0x0/0x3c
Oct 21 06:27:01 weather kernel:  [<c0462372>] find_lock_page+0x5a/0x90
Oct 21 06:27:01 weather kernel:  [<c04640f5>] filemap_fault+0x9b/0x383
Oct 21 06:27:01 weather kernel:  [<c0466247>] __alloc_pages+0x64/0x2a2
Oct 21 06:27:01 weather kernel:  [<c046c2a4>] __do_fault+0x59/0x394
Oct 21 06:27:01 weather kernel:  [<c046e926>] handle_mm_fault+0x3a0/0x78b
Oct 21 06:27:01 weather kernel:  [<c0470f84>] vma_merge+0x18a/0x19a
Oct 21 06:27:01 weather kernel:  [<c0471791>] mmap_region+0x31c/0x3d8
Oct 21 06:27:01 weather kernel:  [<c061dda4>] do_page_fault+0x26a/0x5ef
Oct 21 06:27:01 weather kernel:  [<c0458f7e>] audit_syscall_exit+0x2aa/0x2c6
Oct 21 06:27:01 weather kernel:  [<c04f51d8>] copy_from_user+0x32/0x5e
Oct 21 06:27:01 weather kernel:  [<c061db3a>] do_page_fault+0x0/0x5ef
Oct 21 06:27:01 weather kernel:  [<c061c822>] error_code+0x72/0x78
Oct 21 06:27:01 weather kernel:  [<c0610000>] attach_one_algo+0x46/0x64
Oct 21 06:27:01 weather kernel:  =======================
Oct 21 06:27:01 weather kernel: Code: 00 31 c0 c3 89 c1 0f ae f0 89 f6 8b 50
10 8b 00 66 85 c0 79 07 ba 40 fd 6f c0 eb 0f 8b 01 84 c0 78 1b f6 c2 01 75
16 85 d2 74 12 <8b> 42 38$
Oct 21 06:27:01 weather kernel: EIP: [<c04622db>] sync_page+0x27/0x41 SS:ESP
0068:cc6f8de0
Oct 21 06:27:01 weather kernel: BUG: unable to handle kernel paging request
at virtual address 37343731
Oct 21 06:27:01 weather kernel:  printing eip:
Oct 21 06:27:01 weather kernel: 37343731
Oct 21 06:27:01 weather kernel: *pde = 00000000
Oct 21 06:27:01 weather kernel: Oops: 0000 [#2]
Oct 21 06:27:01 weather kernel: SMP
Oct 21 06:27:01 weather kernel: Modules linked in: autofs4 hidp rfcomm l2cap
bluetooth sunrpc dm_multipath video output sbs battery ac ipv6 kvm_intel kvm
snd_hda_intel snd_emu10$
Oct 21 06:27:01 weather kernel: CPU:    3
Oct 21 06:27:01 weather kernel: EIP:    0060:[<37343731>]    Tainted: G D
VLI
Oct 21 06:27:01 weather kernel: EFLAGS: 00210002   (2.6.23.1-10.fc7 #1)
Oct 21 06:27:01 weather kernel: EIP is at 0x37343731
Oct 21 06:27:01 weather kernel: eax: cc6f8e04   ebx: cc6f8e04   ecx:
00000000   edx: 00000003
Oct 21 06:27:01 weather kernel: esi: 57465220   edi: 00000001   ebp:
d81e1e38   esp: d81e1e18
Oct 21 06:27:01 weather kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0033 ss:
0068
Oct 21 06:27:01 weather kernel: Process cat (pid: 12402, ti=d81e1000
task=e9fa7840 task.ti=d81e1000)
Oct 21 06:27:01 weather kernel: Stack: c04244c6 d81e1e68 00000003 c300d78c
50454b5f c300d78c d81e1e68 00000001
Oct 21 06:27:01 weather kernel:        d81e1e5c c04264a0 00000000 d81e1e68
00000003 00200282 c300d78c 00000000
Oct 21 06:27:01 weather kernel:        e6378494 f6b801c0 c043d396 d81e1e68
c2be0fa0 00000000 c2be0fa0 c046c5b0
Oct 21 06:27:01 weather kernel: Call Trace:
Oct 21 06:27:01 weather kernel:  [<c04244c6>] __wake_up_common+0x32/0x55
Oct 21 06:27:01 weather kernel:  [<c04264a0>] __wake_up+0x32/0x43
Oct 21 06:27:01 weather kernel:  [<c043d396>] __wake_up_bit+0x2e/0x33
Oct 21 06:27:01 weather kernel:  [<c046c5b0>] __do_fault+0x365/0x394
Oct 21 06:27:01 weather kernel:  [<c046e926>] handle_mm_fault+0x3a0/0x78b
Oct 21 06:27:01 weather kernel:  [<c0470f84>] vma_merge+0x18a/0x19a
Oct 21 06:27:01 weather kernel:  [<c0471791>] mmap_region+0x31c/0x3d8
Oct 21 06:27:01 weather kernel:  [<c061dda4>] do_page_fault+0x26a/0x5ef
Oct 21 06:27:01 weather kernel:  [<c0458f7e>] audit_syscall_exit+0x2aa/0x2c6
Oct 21 06:27:01 weather kernel:  [<c04f51d8>] copy_from_user+0x32/0x5e
Oct 21 06:27:01 weather kernel:  [<c061db3a>] do_page_fault+0x0/0x5ef
Oct 21 06:27:01 weather kernel:  [<c061c822>] error_code+0x72/0x78
Oct 21 06:27:01 weather kernel:  [<c0610000>] attach_one_algo+0x46/0x64
Oct 21 06:27:01 weather kernel:  =======================
Oct 21 06:27:01 weather kernel: Code:  Bad EIP value.
Oct 21 06:27:01 weather kernel: EIP: [<37343731>] 0x37343731 SS:ESP
0068:d81e1e18

No error messages precede this.

I am running Fedora 7. D'oh! 4 GB of RAM, Seagate 750 GB hard drive,
Q6700 processor with latest BIOS from ASUS.

Uname -a:

Linux weather.admin.niu.edu 2.6.23.1-10.fc7 #1 SMP Thu Oct 18 13:37:14 EDT 2007
i686 i686 i386 GNU/Linux
Comment 1 Chuck Ebbert 2007-10-23 14:05:49 EDT
c10622b4 <sync_page>:
c10622b4:       89 c1                   mov    %eax,%ecx
c10622b6:       f0 83 04 24 00          lock addl $0x0,(%esp)
c10622bb:       8b 50 10                mov    0x10(%eax),%edx
c10622be:       8b 00                   mov    (%eax),%eax
c10622c0:       66 85 c0                test   %ax,%ax
c10622c3:       79 07                   jns    c10622cc <sync_page+0x18>
c10622c5:       ba 40 fd 2f c1          mov    $0xc12ffd40,%edx
                        c10622c6: R_386_32      swapper_space
c10622ca:       eb 0f                   jmp    c10622db <sync_page+0x27>
c10622cc:       8b 01                   mov    (%ecx),%eax
c10622ce:       84 c0                   test   %al,%al
c10622d0:       78 1b                   js     c10622ed <sync_page+0x39>
c10622d2:       f6 c2 01                test   $0x1,%dl
c10622d5:       75 16                   jne    c10622ed <sync_page+0x39>
c10622d7:       85 d2                   test   %edx,%edx
c10622d9:       74 12                   je     c10622ed <sync_page+0x39>
c10622db:       8b 42 38                mov    0x38(%edx),%eax
c10622de:       85 c0                   test   %eax,%eax
c10622e0:       74 0b                   je     c10622ed <sync_page+0x39>
c10622e2:       8b 50 08                mov    0x8(%eax),%edx
c10622e5:       85 d2                   test   %edx,%edx
c10622e7:       74 04                   je     c10622ed <sync_page+0x39>
c10622e9:       89 c8                   mov    %ecx,%eax
c10622eb:       ff d2                   call   *%edx
c10622ed:       e8 0e 93 1b 00          call   c121b600 <io_schedule>
                        c10622ee: R_386_PC32    io_schedule
c10622f2:       31 c0                   xor    %eax,%eax
c10622f4:       c3                      ret

(To actually find the right address, you need to subtract 0x400000 from and add
0x1000000 to the reported one.)
Comment 2 Chuck Ebbert 2007-10-23 14:28:55 EDT
static int sync_page(void *word)
{
        struct address_space *mapping;
        struct page *page;

        page = container_of((unsigned long *)word, struct page, flags);
        smp_mb();
        mapping = page_mapping(page);
        if (mapping && mapping->a_ops && mapping->a_ops->sync_page)
                mapping->a_ops->sync_page(page);
        io_schedule();
        return 0;
}

The mapping for page <c2be0fa0> is gone.

Is there anything unusual about what the running program is doing? What kind of
filesystem is its data on?
Comment 3 Gilbert Sebenste 2007-10-23 14:42:20 EDT
Not that I am aware of. It's a program called "McIDAS", which I use to make 
weather maps. It kicks off a script every 60 seconds, so there are lots of 
xinetd messages about it in the log file. Here's my /var/log/messages
file so far this week. I attempted to take out the McIDAS messages, but I don't 
have xemacs to access right now, and pico wasn't cutting it.

http://weather.niu.edu/crap
Comment 4 Gilbert Sebenste 2007-11-05 10:48:43 EST
This continues with the 2.6.23.1-21 kernel as well. This did *not* happen
under 2.6.23.1-8. What changed between -8 and -10?
Comment 5 Chuck Ebbert 2007-11-05 15:00:09 EST
(In reply to comment #4)
> This continues with the 2.6.23.1-21 kernel as well. This did *not* happen
> under 2.6.23.1-8. What changed between -8 and -10?

Very little, and nothing that should cause this.

Is the program memory-mapping files on local ext3 filesystems?
Comment 6 Gilbert Sebenste 2007-11-05 15:22:41 EST
Yes.

-rw-rw-r-- 1 (username) users 203886592 2007-11-04 11:50 ldm.pq
-rw-rw-r-- 1 (username) users  30834688 2007-11-04 11:51 pqsurf.pq

It uses the LDM weather data manager; if you do a search on Google with LDM 
Memory Mapping it gives you an idea of how it works.

basically, it shoves an hour's woth of weather data into those two files, 
with "pqsurf.pq" just current hourly airport readings, with the ldm.pq 
containing everything else from satellite and radar binary data, to weather 
balloon data, etc.
Comment 7 Gilbert Sebenste 2008-01-15 14:56:01 EST

*** This bug has been marked as a duplicate of 367141 ***

Note You need to log in before you can comment on or make changes to this bug.