Bug 73802
Summary: | After an random interval sar reports 4.2 Billion Inode handlers allocated | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Need Real Name <richard.eames> |
Component: | sysstat | Assignee: | Charlie Bennett <ccb> |
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | alz27, binand, nphilipp, rjwalsh, sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-01-13 15:24:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Need Real Name
2002-09-11 03:43:32 UTC
When this occurs, what does cat /proc/slabinfo and cat /proc/sys/fs/inode-nr show? And is this an SMP machine? What drivers are in use, and what filesystems? slabinfo - version: 1.1 (SMP) kmem_cache 64 64 244 4 4 1 : 252 126 ip_fib_hash 14 113 32 1 1 1 : 252 126 journal_head 198 702 48 6 9 1 : 252 126 revoke_table 4 253 12 1 1 1 : 252 126 revoke_record 226 226 32 2 2 1 : 252 126 clip_arp_cache 0 0 128 0 0 1 : 252 126 ip_mrt_cache 0 0 96 0 0 1 : 252 126 tcp_tw_bucket 90 90 128 3 3 1 : 252 126 tcp_bind_bucket 330 452 32 4 4 1 : 252 126 tcp_open_request 80 80 96 2 2 1 : 252 126 inet_peer_cache 236 236 64 4 4 1 : 252 126 ip_dst_cache 1314 1440 192 72 72 1 : 252 126 arp_cache 180 180 128 6 6 1 : 252 126 blkdev_requests 7168 7200 96 180 180 1 : 252 126 dnotify_cache 0 0 20 0 0 1 : 252 126 file_lock_cache 168 168 92 4 4 1 : 252 126 fasync_cache 0 0 16 0 0 1 : 252 126 uid_cache 13 226 32 2 2 1 : 252 126 skbuff_head_cache 666 792 160 33 33 1 : 252 126 sock 270 270 1280 90 90 1 : 60 30 sigqueue 261 261 132 9 9 1 : 252 126 kiobuf 0 0 64 0 0 1 : 252 126 cdev_cache 177 177 64 3 3 1 : 252 126 bdev_cache 6 59 64 1 1 1 : 252 126 mnt_cache 14 59 64 1 1 1 : 252 126 inode_cache 3240 3240 480 405 405 1 : 124 62 dentry_cache 3459 3960 128 132 132 1 : 252 126 dquot 0 0 128 0 0 1 : 252 126 filp 4220 4230 128 141 141 1 : 252 126 names_cache 14 14 4096 14 14 1 : 60 30 buffer_head 219980 229440 96 5736 5736 1 : 252 126 mm_struct 365 504 160 21 21 1 : 252 126 vm_area_struct 5348 9080 96 225 227 1 : 252 126 fs_cache 366 590 64 10 10 1 : 252 126 files_cache 176 297 416 33 33 1 : 124 62 signal_act 171 171 1312 57 57 1 : 60 30 size-131072(DMA) 0 0 131072 0 0 32 : 0 0 size-131072 0 0 131072 0 0 32 : 0 0 size-65536(DMA) 0 0 65536 0 0 16 : 0 0 size-65536 0 0 65536 0 0 16 : 0 0 size-32768(DMA) 0 0 32768 0 0 8 : 0 0 size-32768 3 3 32768 3 3 8 : 0 0 size-16384(DMA) 0 0 16384 0 0 4 : 0 0 size-16384 3 5 16384 3 5 4 : 0 0 size-8192(DMA) 0 0 8192 0 0 2 : 0 0 size-8192 1 5 8192 1 5 2 : 0 0 size-4096(DMA) 0 0 4096 0 0 1 : 60 30 size-4096 133 163 4096 133 163 1 : 60 30 size-2048(DMA) 0 0 2048 0 0 1 : 60 30 size-2048 216 246 2048 108 123 1 : 60 30 size-1024(DMA) 0 0 1024 0 0 1 : 124 62 size-1024 248 248 1024 62 62 1 : 124 62 size-512(DMA) 0 0 512 0 0 1 : 124 62 size-512 264 264 512 33 33 1 : 124 62 size-256(DMA) 0 0 256 0 0 1 : 252 126 size-256 495 495 256 33 33 1 : 252 126 size-128(DMA) 0 0 128 0 0 1 : 252 126 size-128 810 810 128 27 27 1 : 252 126 size-64(DMA) 0 0 64 0 0 1 : 252 126 size-64 528 826 64 14 14 1 : 252 126 size-32(DMA) 0 0 32 0 0 1 : 252 126 size-32 680 1808 32 16 16 1 : 252 126 cat /proc/sys/fs/inode-nr 3206 66203 Yes all our machines are SMP (DELL 2400, 2500 2600) with various RAID controllers. /dev/sda1 / ext3 defaults 1 1 /dev/sdb1 /amanda ext3 defaults 1 2 none /dev/pts devpts gid=5,mode=620 0 0 /dev/sdc1 /opt ext3 defaults 1 2 none /proc proc defaults 0 0 none /dev/shm tmpfs defaults 0 0 /dev/sda3 /tmp ext3 defaults 1 2 /dev/sda2 swap swap defaults 0 0 /dev/cdrom /mnt/cdrom iso9660 noauto,owner,kudzu,ro 0 0 /dev/fd0 /mnt/floppy auto noauto,owner,kudzu 0 0 Module Size Used by Not tainted autofs 12612 0 (autoclean) (unused) eepro100 21040 2 st 29940 0 ext3 70944 4 jbd 53792 4 [ext3] aic7xxx 128672 0 aacraid 28212 5 sd_mod 12992 10 scsi_mod 113284 4 [st aic7xxx aacraid sd_mod] Are you sure that the above /proc displays show what happens after this problem manifests? Because it looks just fine, and if sar is showing bogus results in this case, it looks more like a sar problem, not a kernel one. Yes the SAR results were from a system with billions of inode handlers in use (or so it seems). I've just confirmed this. The version of SAR on the machine is the latest from the sar site (although the versions from Redhat have the same symptom) I was asking about the /proc display, not the sar output --- I need to know whether the /proc output you included was from a period where sar was showing the wrong stats or not. This is needed to deduce whether it's the kernel's accounting in /proc, or just sar's parsing of that accounting, which is at fault. 05:08:46 AM dentunusd file-sz %file-sz inode-sz super-sz %super-sz dquot-sz %dquot-sz rtsig-sz %rtsig-sz 05:08:47 AM 9712 240 2.93 4294944476 0 0.00 0 0.00 0 0.00 05:08:48 AM 9712 240 2.93 4294944476 0 0.00 0 0.00 0 0.00 [root@root]# cat /proc/sys/fs/inode-state 12425 35246 0 0 0 0 0 [root@root]# cat /proc/sys/fs/inode-nr 12470 35246 [root@root]# The /proc values there are just fine. However, the sysstat data collection code is not: /* Open /proc/sys/fs/inode-state file */ if ((ktfp = fopen(FINODE_STATE, "r")) == NULL) file_stats.inode_used = 0; else { fscanf(ktfp, "%u %u %*d %*d %*d %*d %*d\n", &(file_stats.inode_used), &parm); fclose(ktfp); file_stats.inode_used -= parm; } Whoops, it's not passing enough arguments to fscanf, so the fscanf is going to be writing to random memory locations depending on what's on the stack. Assigning to the sysstat component, as that is clearly broken. It remains to be seen whether it will actually fix this particular problem, of course. sct: He states that the problem can only be fixed by a reboot. Looks like we've got a kernel bug and a sysstat bug no? There's no evidence so far of a kernel bug. The /proc data dumped so far looks fine, and the sysstat proc parsing is *clearly* totally broken, so we definitely need to fix the fscanf code before we can consider any possible remaining kernel fault. Hmm, my mistake --- the scanf is using "%*" to discard the unneeded results. See bug 78749. I see next dependent: In 04:02 begin work cron.daily. In this part exist slocate. If i use only slocate in any time, situation repeat. My things: inode is 'unsigned int'. kernel use relative 'minus' and go to negative side. Logs: 03:50:00 56 1526 2,91 4294960156 04:00:00 53 1521 2,90 4294960149 04:10:00 76332 1515 2,89 72485 04:20:00 55176 1524 2,91 48818 04:30:00 38970 1507 2,87 31907 04:40:00 28199 1528 2,91 21951 04:50:00 19912 1533 2,92 14304 05:00:00 13942 1554 2,96 9131 05:10:00 9957 1525 2,91 4944 05:20:00 7130 1515 2,89 1922 05:21:00 7132 1519 2,90 1922 05:22:00 6019 1519 2,90 787 05:23:00 6020 1519 2,90 788 05:24:00 6019 1508 2,88 795 05:25:00 6022 1521 2,90 791 05:26:00 6023 1521 2,90 791 05:27:00 5144 1514 2,89 4294967088 05:28:00 5148 1517 2,89 4294967088 05:29:00 5149 1518 2,90 4294967088 05:30:00 5150 1518 2,90 4294967089 05:40:00 3931 1526 2,91 4294965485 Closing (no activity, 7.3 EOL). If the problem still shows (on current versions of RHEL or Fedora Core), please reopen. |