Bug 137132 - stuck gfs_inoded
stuck gfs_inoded
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
3
All Linux
medium Severity medium
: ---
: ---
Assigned To: Kiersten (Kerri) Anderson
GFS Bugs
:
Depends On:
Blocks: 137219
  Show dependency treegraph
 
Reported: 2004-10-25 18:37 EDT by Erling Nygaard
Modified: 2010-01-11 22:00 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-08 11:36:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Erling Nygaard 2004-10-25 18:37:26 EDT
Description of problem:

We had an incident today in which one of our
GFS cluster lock servers became unresponsive
from a GFS file serving perspective.  the
most obvious observation we saw was a process
'gfs_inoded' rise to the top of a 'top' and
remain pegged on one CPU continually (output
below).  we moved all active clients off of
this node (our cluster serves NFS), yet this
proc remained pegged.  we finally had to force
fence the node to clean the condition up.

i'm afraid i don't have much more to offer as
far as further troubleshooting, as we needed
to restore this node to normal operation ASAP.  
                      
top:
14:03:16 up 22 days,  9:43, 3 users, load average: 18.49, 18 30, 14.79
Tasks: 156 total,   4 running, 152 sleeping,   0 stopped,   0 zombie
Cpu(s):   1.4% user,  87.1% system,   0.0% nice,  11.5% idle
Mem:   4139248k total,  2356516k used,  1782732k free, 1720k buffers
Swap:  2097136k total, 0k used,  2097136k free,  1257572k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
3634 root      19   0     0    0    0 R 99.9  0.0  2680:22 gfs_inoded
3954 root      18   0     0    0    0 D 34.0  0.0 991:55.94 nfsd
3956 root      18   0     0    0    0 D 29.7  0.0 993:01.92 nfsd
3952 root      16   0     0    0    0 D 27.1  0.0 993:25.76 nfsd
3950 root      16   0     0    0    0 D 23.1  0.0 993:18.39 nfsd
3953 root      16   0     0    0    0 D 23.1  0.0 989:50.97 nfsd
3955 root      15   0     0    0    0 D 21.8  0.0 991:20.70 nfsd
3949 root      15   0     0    0    0 D 21.1  0.0 994:29.69 nfsd
3951 root      19   0     0    0    0 D 20.5  0.0 991:07.39 nfsd
2542 root      19   0  221m 221m  640 S 14.9  5.5 6072:17 lock_gulmd
3608 root      19   0     0    0    0 D  9.6  0.0 30:22.92 gfs_glockd
13991 root     18   0   928  928  668 R  8.3  0.0   0:00.31 top



Version-Release number of selected component (if applicable):
Kernel: SuSE 2.4.21-138-smp
GFS: GFS-smp-5.2.1-19.2.0

How reproducible:

Steps to Reproduce:
1. start GFS
2. ?
3. stuck gfs_inoded
  
Actual results:
stuck gfs_inoded

Expected results:
not stuck gfs_inoded

Additional info:
This has happened several times on customer site.
Letting the node sit 12 hours did not make the
gfs_inoded return to "normal" usage of CPU-power

4GB of memory on nodes

processor : 0
vendor_id : GenuineIntel
pu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 3.06GHz
stepping : 5
cpu MHz : 3055.007
cache size : 512 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2                                                      
                                                                   
wp : yes                                                             
                                                            
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
                                                                     
                                                    bogomips : 6094.84
                                                                     
                                                                     
                                                                     
                                                                     
                                                                     
                
processor : 0
                                                                     
                                                    vendor_id :
GenuineIntel
                                                                     
                                                    cpu family : 15
                                                                     
                                                    model : 2
                                                                     
                                                    model name :
Intel(R) Xeon(TM) CPU 2.80GHz
                                                                     
                                                    stepping : 9
                                                                     
                                                    cpu MHz : 2791.080
                                                                     
                                                    cache size : 512 KB
                                                                     
                                                    physical id : 0
                                                                     
                                                    siblings : 2
                                                                     
                                                    fdiv_bug : no
hlt_bug : no                                                         
                                                                
f00f_bug : no                                                        
                                                                 
coma_bug : no                                                        
                                                                 
fpu : yes                                                            
                                                             
fpu_exception : yes                                                  
                                                                       
cpuid level : 2                                                      
                                                                   
wp : yes                                                             
                                                            
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
                                                                     
                                                    bogomips : 5570.56
Comment 1 Corey Marthaler 2004-10-25 18:52:45 EDT
The step 2 not list above was light I/O running to 3 to 5 
filesystems on each of GFS[2,3,4]. 
 
The I/O was genesis, accordion, and iogen/doio. 
 
./genesis -S 12345 -n 75 -L flock -w /veridata1/sistinatest/ 
 
./accordion -L flock -e 1024 -w /veridata1sistinatest/ -t -m 10000 
-S 54321 accd1 accd2 accd3 accd4 accd5 accd6 accd7 accd8 accd9 
accd10 
 
./iogen -f buffered -m sequential -s read,write,readv,writev -t 1b 
-T 40000b 40000b:/veridata4/rwrevbuflarge1 | ./doio -n 10 -avk & 
Comment 2 Erling Nygaard 2004-10-25 19:16:35 EDT
No, this is a different problem, not the system we are running tests 
on. So this load-listing do not apply to this problem.
Comment 3 Kiersten (Kerri) Anderson 2005-07-18 16:20:41 EDT
I don't think we have seen this since this time.  Any updates from anyone?  I
would be inclined to close it as unreproduceable.
Comment 4 Kiersten (Kerri) Anderson 2005-10-11 17:49:10 EDT
I haven't seen this one reported since this time, and there is no reference to
the customer, so not sure how we can proceed?  Any ideas?
Comment 5 Kiersten (Kerri) Anderson 2006-05-08 11:36:38 EDT
Closing this one since we have no way to determine what was happening and no
further information.

Note You need to log in before you can comment on or make changes to this bug.