Bug 199973 - assert in mm/rmap.c:85 anon_vma_prepare()
Summary: assert in mm/rmap.c:85 anon_vma_prepare()
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Robert Peterson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-07-24 18:04 UTC by Scott Cannata
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-09-19 19:12:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
messages file at time of assert (3.94 MB, text/plain)
2006-07-26 21:29 UTC, Scott Cannata
no flags Details

Description Scott Cannata 2006-07-24 18:04:12 UTC
Description of problem:

On systems w/ gfs mounted FS, cman, and fencing, we had a node fence. 
Immeditately before the fence, in /var/log/messages on the node, we see the 
following assert message -

sleeping function called from invalid context at mm/rmap.c:85
                                                     in_atomic():0[expected: 
0], irqs_disabled():1
                                                      <ffffffff8012f95c>
{__might_sleep+173}
                                                      <ffffffff80169cbd>
{anon_vma_prepare+37}



Version-Release number of selected component (if applicable):

2.6.9-34

How reproducible:

Seen one time so far.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Kiersten (Kerri) Anderson 2006-07-25 12:50:21 UTC
Can we get the logs from each of the nodes in the cluster?  Since the node was
fenced, I assume that there is no other information available for that machine,
like process listings, memory usage, etc.

Comment 2 Scott Cannata 2006-07-26 21:29:18 UTC
Created attachment 133108 [details]
messages file at time of assert

messages.1 file shows console messages at time of assert. Getting other
messages files as well.

Comment 3 Scott Cannata 2006-07-26 21:32:18 UTC
On looking at these logs, I've seen this rmap assert occur on some nodes with 
different thread stacks.  Can you give me any insight into this from your 
perspective ?  Thanks.

Comment 4 Scott Cannata 2006-07-26 21:54:05 UTC
See bug #172944 and /var/log/messages files (gz/tar) that I appended for each 
node in cluster. 

The node2 fenced node1. Right before the fence event, we had this rmap assert 
on node1. Date was Jul 13,19:35.

Note: I just now noticed that we see this rmap.c assert in other time/logs also
on this cluster with different thread stacks leading to assert. These have our 
crosswalk module in the stack, so I am investigating those here. Thanks.

Comment 5 Scott Cannata 2006-07-27 05:14:06 UTC
I just noticed that the rmap assert that occurred did not have the complete 
stack in the messages file because the fence occurred immediately after. I 
only saw the two frames when I filed the bug.

Also, this assert is a debug mesage only and does not panic the system.

We have seen this message before in logs and the node is not typically fenced 
so this assert message needs to be debugged but perhaps has nothing to do with 
the cause of the node fencing.

Complete stack comes from an system call, ioctl, into our crosswalk kernel 
module. I will look at it also. Any insights you rhat folks have are welcome.
Thanks.

Full stack appended:

Jul  3 05:30:20 igrid03 kernel: in_atomic():0[expected: 0], irqs_disabled():1
Jul  3 05:30:20 igrid03 kernel:
Jul  3 05:30:20 igrid03 kernel: Call Trace:<ffffffff8012f95c
                                           {__might_sleep+173}
                                           <ffffffff80169cbd
                                           {anon_vma_prepare+37}
                                           <ffffffff80164498> 
                                           {do_wp_page+321} 
                                           <ffffffff80165483> 
                                           {handle_mm_fault+1107}
                                           <ffffffff801de551>
                                           {__up_read+16} 
                                           <ffffffff80120dbe>
                                           {do_page_fault+518}
                                           <ffffffff80130ab4>
                                           {default_wake_function+0}
                                           <ffffffff8015974f>
                                           {__pagevec_free+39}
                                           <ffffffff802e88ae>
                                           {fn_hash_lookup+224} 
                                           <ffffffff8010fc35>
                                           {error_exit+0}
                                           <ffffffff801e0062 
                                           {copy_user_generic_c+8}
                                           <ffffffffa01c32f3>
                                           {:cwalk_igrid:cwalk_ioctl+182}
                                           <ffffffff80185ed5>{sys_ioctl+853}
                                           <ffffffff8010f19a>{system_call+126}




Comment 6 Robert Peterson 2006-08-04 19:22:06 UTC
I didn't see any information in the logs to indicate there was a
problem with Cluster Suite.  The messages in comment #5 may indicate
that irqs were disabled when the sleep happened, which might be why
the heartbeat messages were not passed from the fenced node to the other
nodes, thus causing it to be fenced.  That's just a theory.
Obviously, one of the other nodes did not see the heartbeat messages,
which could also be caused by hardware problems.  I don't think we can
determine the cause with the information provided.

Perhaps you can recreate the problem with fencing disabled (i.e.
temporarily use manual fencing) and once the system stops responding,
use the sysrq key or echo "t" > /proc/sysrq-trigger from the fenced 
node and add that here as an attachment.


Comment 7 Robert Peterson 2006-09-19 19:12:12 UTC
I have found no information that leads me to believe this problem
is related to cluster suite or GFS.  If we get more information,
feel free to reopen the bug.



Note You need to log in before you can comment on or make changes to this bug.