Bug 73802 - After an random interval sar reports 4.2 Billion Inode handlers allocated
Summary: After an random interval sar reports 4.2 Billion Inode handlers allocated
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: sysstat
Version: 7.3
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Charlie Bennett
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-09-11 03:43 UTC by Need Real Name
Modified: 2007-04-18 16:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-01-13 15:24:48 UTC
Embargoed:


Attachments (Terms of Use)

Description Need Real Name 2002-09-11 03:43:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)

Description of problem:
While investigating intermittent crashes on our Redhat 7.3 system we noticed 
that sar was reporting strange numbers for inode-sz (billions). Also dentused 
ocassionally went very large (4.2 billion).

Once the problem occurs the system continues to report very large numbers of 
inode-handlers in use. The only way to remove the problem is a reboot (note the 
problem has appeared on one 7.2 system).

Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
1.wait for a day or so
2.do a sar -v
3.
	

Actual Results:  12:16:00         1481      1358      1.30 4294906841         
0      0.00         0      0.00         1      0.10
12:21:00         2404      1375      1.31 4294907022         0      
0.00         0      0.00         3      0.29
12:26:00         2422      1302      1.24 4294907045         0      
0.00         0      0.00         2      0.20
12:31:00         2509      1375      1.31 4294907127         0      
0.00         0      0.00         2      0.20
12:36:00         2510      1375      1.31 4294907127         0      
0.00         0      0.00         2      0.20
12:41:00         2510      1373      1.31 4294907127         0      
0.00         0      0.00         2      0.20
12:46:00         2512      1390      1.33 4294907123         0      
0.00         0

Comment 1 Stephen Tweedie 2002-09-11 11:46:16 UTC
When this occurs, what does

  cat /proc/slabinfo

and

  cat /proc/sys/fs/inode-nr

show?

Comment 2 Stephen Tweedie 2002-09-11 11:47:10 UTC
And is this an SMP machine?  What drivers are in use, and what filesystems?

Comment 3 Need Real Name 2002-09-12 07:46:35 UTC
slabinfo - version: 1.1 (SMP)
kmem_cache            64     64    244    4    4    1 :  252  126
ip_fib_hash           14    113     32    1    1    1 :  252  126
journal_head         198    702     48    6    9    1 :  252  126
revoke_table           4    253     12    1    1    1 :  252  126
revoke_record        226    226     32    2    2    1 :  252  126
clip_arp_cache         0      0    128    0    0    1 :  252  126
ip_mrt_cache           0      0     96    0    0    1 :  252  126
tcp_tw_bucket         90     90    128    3    3    1 :  252  126
tcp_bind_bucket      330    452     32    4    4    1 :  252  126
tcp_open_request      80     80     96    2    2    1 :  252  126
inet_peer_cache      236    236     64    4    4    1 :  252  126
ip_dst_cache        1314   1440    192   72   72    1 :  252  126
arp_cache            180    180    128    6    6    1 :  252  126
blkdev_requests     7168   7200     96  180  180    1 :  252  126
dnotify_cache          0      0     20    0    0    1 :  252  126
file_lock_cache      168    168     92    4    4    1 :  252  126
fasync_cache           0      0     16    0    0    1 :  252  126
uid_cache             13    226     32    2    2    1 :  252  126
skbuff_head_cache    666    792    160   33   33    1 :  252  126
sock                 270    270   1280   90   90    1 :   60   30
sigqueue             261    261    132    9    9    1 :  252  126
kiobuf                 0      0     64    0    0    1 :  252  126
cdev_cache           177    177     64    3    3    1 :  252  126
bdev_cache             6     59     64    1    1    1 :  252  126
mnt_cache             14     59     64    1    1    1 :  252  126
inode_cache         3240   3240    480  405  405    1 :  124   62
dentry_cache        3459   3960    128  132  132    1 :  252  126
dquot                  0      0    128    0    0    1 :  252  126
filp                4220   4230    128  141  141    1 :  252  126
names_cache           14     14   4096   14   14    1 :   60   30
buffer_head       219980 229440     96 5736 5736    1 :  252  126
mm_struct            365    504    160   21   21    1 :  252  126
vm_area_struct      5348   9080     96  225  227    1 :  252  126
fs_cache             366    590     64   10   10    1 :  252  126
files_cache          176    297    416   33   33    1 :  124   62
signal_act           171    171   1312   57   57    1 :   60   30
size-131072(DMA)       0      0 131072    0    0   32 :    0    0
size-131072            0      0 131072    0    0   32 :    0    0
size-65536(DMA)        0      0  65536    0    0   16 :    0    0
size-65536             0      0  65536    0    0   16 :    0    0
size-32768(DMA)        0      0  32768    0    0    8 :    0    0
size-32768             3      3  32768    3    3    8 :    0    0
size-16384(DMA)        0      0  16384    0    0    4 :    0    0
size-16384             3      5  16384    3    5    4 :    0    0
size-8192(DMA)       0      0   8192    0    0    2 :    0    0
size-8192              1      5   8192    1    5    2 :    0    0
size-4096(DMA)         0      0   4096    0    0    1 :   60   30
size-4096            133    163   4096  133  163    1 :   60   30
size-2048(DMA)         0      0   2048    0    0    1 :   60   30
size-2048            216    246   2048  108  123    1 :   60   30
size-1024(DMA)         0      0   1024    0    0    1 :  124   62
size-1024            248    248   1024   62   62    1 :  124   62
size-512(DMA)          0      0    512    0    0    1 :  124   62
size-512             264    264    512   33   33    1 :  124   62
size-256(DMA)          0      0    256    0    0    1 :  252  126
size-256             495    495    256   33   33    1 :  252  126
size-128(DMA)          0      0    128    0    0    1 :  252  126
size-128             810    810    128   27   27    1 :  252  126
size-64(DMA)           0      0     64    0    0    1 :  252  126
size-64              528    826     64   14   14    1 :  252  126
size-32(DMA)           0      0     32    0    0    1 :  252  126
size-32              680   1808     32   16   16    1 :  252  126


cat /proc/sys/fs/inode-nr
3206 66203


Yes all our machines are SMP (DELL 2400, 2500 2600) with various RAID 
controllers. 


/dev/sda1                 /                       ext3    defaults        1 1
/dev/sdb1           /amanda                 ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
/dev/sdc1              /opt                    ext3    defaults        1 2
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
/dev/sda3              /tmp                    ext3    defaults        1 2
/dev/sda2               swap                    swap    defaults        0 0
/dev/cdrom              /mnt/cdrom              iso9660 noauto,owner,kudzu,ro 0 
0
/dev/fd0                /mnt/floppy             auto    noauto,owner,kudzu 0 0



Module                  Size  Used by    Not tainted
autofs                 12612   0 (autoclean) (unused)
eepro100               21040   2
st                     29940   0
ext3                   70944   4
jbd                    53792   4 [ext3]
aic7xxx               128672   0
aacraid                28212   5
sd_mod                 12992  10
scsi_mod              113284   4 [st aic7xxx aacraid sd_mod]




Comment 4 Stephen Tweedie 2002-09-12 12:14:17 UTC
Are you sure that the above /proc displays show what happens after this problem
manifests?  Because it looks just fine, and if sar is showing bogus results in
this case, it looks more like a sar problem, not a kernel one.

Comment 5 Need Real Name 2002-09-12 23:21:10 UTC
Yes the SAR results were from a system with billions of inode handlers in use 
(or so it seems). I've just confirmed this. The version of SAR on the machine 
is the latest from the sar site (although the versions from Redhat have the 
same symptom)



Comment 6 Stephen Tweedie 2002-09-13 16:36:22 UTC
I was asking about the /proc display, not the sar output --- I need to know
whether the /proc output you included was from a period where sar was showing
the wrong stats or not.  This is needed to deduce whether it's the kernel's
accounting in /proc, or just sar's parsing of that accounting, which is at fault.

Comment 7 Need Real Name 2002-11-03 13:12:32 UTC
05:08:46 AM dentunusd   file-sz  %file-sz  inode-sz  super-sz %super-sz 
dquot-sz %dquot-sz  rtsig-sz %rtsig-sz
05:08:47 AM      9712       240      2.93 4294944476         0      0.00       
 0      0.00         0      0.00
05:08:48 AM      9712       240      2.93 4294944476         0      0.00       
 0      0.00         0      0.00

[root@root]# cat /proc/sys/fs/inode-state 
12425   35246   0       0       0       0       0
[root@root]# cat /proc/sys/fs/inode-nr    
12470   35246
[root@root]#

Comment 8 Stephen Tweedie 2002-11-14 11:51:18 UTC
The /proc values there are just fine.

However, the sysstat data collection code is not:

   /* Open /proc/sys/fs/inode-state file */
   if ((ktfp = fopen(FINODE_STATE, "r")) == NULL)
      file_stats.inode_used = 0;
   else {
      fscanf(ktfp, "%u %u %*d %*d %*d %*d %*d\n",
	     &(file_stats.inode_used), &parm);
      fclose(ktfp);
      file_stats.inode_used -= parm;
   }

Whoops, it's not passing enough arguments to fscanf, so the fscanf is going to
be writing to random memory locations depending on what's on the stack.

Assigning to the sysstat component, as that is clearly broken.  It remains to be
seen whether it will actually fix this particular problem, of course.

Comment 9 Mike A. Harris 2002-11-22 05:03:46 UTC
sct:  He states that the problem can only be fixed by a reboot.  Looks
like we've got a kernel bug and a sysstat bug no?


Comment 10 Stephen Tweedie 2002-11-29 17:11:02 UTC
There's no evidence so far of a kernel bug.  The /proc data dumped so far looks
fine, and the sysstat proc parsing is *clearly* totally broken, so we definitely
need to fix the fscanf code before we can consider any possible remaining kernel
fault.

Comment 11 Stephen Tweedie 2003-01-11 10:35:30 UTC
Hmm, my mistake --- the scanf is using "%*" to discard the unneeded results.

Comment 12 Need Real Name 2003-03-13 06:03:17 UTC
See bug 78749.


I see next dependent:
In 04:02 begin work cron.daily. In this part exist slocate.
If i use only slocate in any time, situation repeat.
My things: inode is 'unsigned int'.
kernel use relative 'minus' and go to negative side.
Logs:
03:50:00           56      1526      2,91 4294960156
04:00:00           53      1521      2,90 4294960149
04:10:00        76332      1515      2,89     72485
04:20:00        55176      1524      2,91     48818
04:30:00        38970      1507      2,87     31907
04:40:00        28199      1528      2,91     21951
04:50:00        19912      1533      2,92     14304
05:00:00        13942      1554      2,96      9131
05:10:00         9957      1525      2,91      4944
05:20:00         7130      1515      2,89      1922
05:21:00         7132      1519      2,90      1922
05:22:00         6019      1519      2,90       787
05:23:00         6020      1519      2,90       788
05:24:00         6019      1508      2,88       795
05:25:00         6022      1521      2,90       791
05:26:00         6023      1521      2,90       791
05:27:00         5144      1514      2,89 4294967088
05:28:00         5148      1517      2,89 4294967088
05:29:00         5149      1518      2,90 4294967088
05:30:00         5150      1518      2,90 4294967089
05:40:00         3931      1526      2,91 4294965485

Comment 13 Nils Philippsen 2004-01-13 15:24:48 UTC
Closing (no activity, 7.3 EOL). If the problem still shows (on current
versions of RHEL or Fedora Core), please reopen.


Note You need to log in before you can comment on or make changes to this bug.