Description of problem: Customer is seeing SEGV in var_hrswrun() in host/hr_swrun.c to to failure to check fgets return code. Version-Release number of selected component (if applicable): net-snmp-5.3.2.2-5.el5 How reproducible: Not easy, it require the tested process to terminate between snmpd opening and reading the status file.
------------------------------------------------------------------------------- Core was generated by `/usr/sbin/snmpd -LS 5d -Lf /dev/null -p /var/run/snmpd.pid -a'. Program terminated with signal 11, Segmentation fault. [New process 7428] [New process 7430] #0 var_hrswrun (vp=0x7fff2954a070, name=<value optimized out>, length=<value optimized out>, exact=<value optimized out>, var_len=<value optimized out>, write_method=<value optimized out>) at host/hr_swrun.c:1152 1152 while (*cp != ' ') ------------------------------------------------------------------------------- Here is a part of var_hrswrun() to review. ------------------------------------------------------------------------------- 461 u_char * 462 var_hrswrun(struct variable * vp, 463 oid * name, 464 size_t * length, 465 int exact, size_t * var_len, WriteMethod ** write_method) 466 { <snip> 1143 #elif defined(linux) 1144 sprintf(string, "/proc/%d/stat", pid); 1145 if ((fp = fopen(string, "r")) == NULL) { 1146 long_return = 0; 1147 return (u_char *) & long_return; 1148 } 1149 fgets(buf, sizeof(buf), fp); 1150 cp = buf; 1151 for (i = 0; i < 23; ++i) { /* skip 23 fields */ 1152 while (*cp != ' ') 1153 ++cp; 1154 ++cp; 1155 } 1156 long_return = atoi(cp) * (getpagesize() / 1024); /* rss */ 1157 fclose(fp); -------------------------------------------------------------------------------
Although it is possible to fix just this single error it is but one example of a systemic failure to perform return code checks for a number of other reads of /proc/<pid> files. I am currently working through the set attempting a composite patch.
Created attachment 483247 [details] patch for reading pid-based /proc files
Upstream for a single instance fix of the proc race condition is at http://sourceforge.net/tracker/index.php?func=detail&aid=1774612&group_id=12694&atid=312694 The patch I've just attached here fixes all the /proc/<pid>/? read blocks.
I've fixed it upstream in net-snmp-5.4 and 5.5 branches, SVN rev. 20115.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The snmpd daemon did not properly check for errors when reading from /proc filesystem. This could result into daemon crash when gathering information for hrSWRunTable for a process which exits. The updated snmpd daemon properly checks for such error cases and does not crash when populating hrSWRunTable.
*** Bug 705967 has been marked as a duplicate of this bug. ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1076.html