Bug 147656
Summary: | Known race in __wait_on_buffer() | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 2.1 | Reporter: | Joel Becker <joel.becker> |
Component: | kernel | Assignee: | Larry Woodman <lwoodman> |
Status: | CLOSED DEFERRED | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 2.1 | CC: | galdu, peterm, riel, tao, van.okamura, villapla |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-09-13 20:36:19 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 143573 | ||
Attachments: |
Description
Joel Becker
2005-02-10 03:08:28 UTC
Created attachment 110910 [details]
Patch for the wait_on_buffer race.
Does this have a corresponding Issue Tracker ID number? Issue Tracker 65672 created. I built an AS2.1 kernel that includes the missed wakeup patch. Please try it out and see if if fixes this problem: Its located here: http://people.redhat.com/~lwoodman/.for_oracle/ Thanks, Larry Larry, when you say "missed wakeup patch," do you mean the patch I posted in this bug, or do you mean something else? I'm currently querying the customer to see if the environment is still available. Yes Joel. Larry
I also move the src.rpm to location:
>>>http://people.redhat.com/~lwoodman/.for_oracle/
The patch is number 99999 in the kernel spec file.
Larry
The customer ran the kernel, and found it to be unstable: At about 21:00PM, all oracle process crashed. There were a lot of UNDERRUN message in /var/log/messages. Hopefully I can get the full /var/log/messages to attach here. The customer has gone back to my kernel (e.57 + this patch), which has been working for them. Created attachment 113442 [details]
messages file from the customer's first node. Contains the boots using the Oracle.62 kernel
This is the messages file from the customer's first node, tsintao. The log of
boots with the Oracle.62 kernel are here. The qla driver is reporting UNDERRUN
conditions. The customer says that Oracle crashed quickly with this kernel
after the UNDERRUN messages started appearing.
Created attachment 113443 [details]
/var/log/messages from the customer's other node, running the Oracle.62 kernel
This is the other node in the customer's Oracle RAC. Note the qla driver
messages appear here as well.
Joel, the official AS2.1-U7 kernel is located in:
>>>http://people.redhat.com/~lwoodman/AS2.1/
It does not contain the wait_on_buffer wait patch but I need to know if this
kernel fixes the stability issues you experienced in the previous kernel that
you tried.
Larry
I have similar problems on a DL580 G2 with oracle RAC and a VA7410. We are running kernel 2.4.9-e.49enterprise. This is the output of ps and sysrq tasks for one of the processes: ps 000 D oracle 22168 1 0 76 0 - 351004 wait_o Jan18 ? 00:04:57 oracledss1 (LOCAL=NO) sysrq tasks oracle D 00000000 2400 22168 1 21284 22162 (NOTLB) Call Trace: [<c012101b>] do_softirq [kernel] 0x7b (0xcdca1d48) [<c0148376>] __wait_on_buffer [kernel] 0x76 (0xcdca1d68) [<c0149429>] fsync_inode_buffers [kernel] 0x129 (0xcdca1dac) [<f8837e8f>] journal_get_write_access_Rsmp_a7d05437 [jbd] 0x3f (0xcdca1dcc) [<f88383f1>] journal_dirty_metadata_Rsmp_08cf9292 [jbd] 0x61 (0xcdca1dd8) [<f884bb07>] ext3_do_update_inode [ext3] 0x287 (0xcdca1df0) [<f884bb4a>] ext3_do_update_inode [ext3] 0x2ca (0xcdca1df4) [<f8837e8f>] journal_get_write_access_Rsmp_a7d05437 [jbd] 0x3f (0xcdca1dfc) [<f884beb5>] ext3_mark_iloc_dirty [ext3] 0x25 (0xcdca1e24) [<f884bec6>] ext3_mark_iloc_dirty [ext3] 0x36 (0xcdca1e2c) [<f883dcc7>] __jbd_kmalloc [jbd] 0x27 (0xcdca1e34) [<f8838931>] journal_stop_Rsmp_5858b5e4 [jbd] 0x1a1 (0xcdca1e5c) [<f884c069>] ext3_dirty_inode [ext3] 0xa9 (0xcdca1e7c) [<c012d367>] vmtruncate [kernel] 0x1f7 (0xcdca1e90) [<c015c7aa>] __mark_inode_dirty [kernel] 0x2a (0xcdca1ea0) [<c015e85d>] inode_setattr [kernel] 0xcd (0xcdca1eb4) [<f884bddf>] ext3_setattr [ext3] 0x22f (0xcdca1ed0) [<c013a087>] deactivate_page [kernel] 0x17 (0xcdca1ef0) [<c015e961>] notify_change [kernel] 0x81 (0xcdca1f04) [<c015eaba>] notify_change [kernel] 0x1da (0xcdca1f08) [<c014582d>] do_truncate [kernel] 0x6d (0xcdca1f34) [<c015b3bc>] dput [kernel] 0x1c (0xcdca1f40) [<f8847a14>] ext3_release_file [ext3] 0x14 (0xcdca1f58) [<c01480e8>] __fput [kernel] 0x68 (0xcdca1f64) [<f8847b0e>] ext3_sync_file [ext3] 0x4e (0xcdca1f88) [<c014892d>] sys_fsync [kernel] 0x5d (0xcdca1f98) [<c01073e3>] system_call [kernel] 0x33 (0xcdca1fc0) Is there any kernel update solving the problem? We can update to last kernel 2.4.9-e.65 if you need more tests. Thanks. Created attachment 124045 [details]
hang processes and sysrq output
This is the information about oracle hang processes after doing a rman restore.
This situation occurs every time we do a restore and the database is up in the
same node.
|