From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050909 Fedora/1.0.6-1.2.fc4 Firefox/1.0.6 Description of problem: when using define USE_PROC_CMDLINE code net-snmp opens /proc to read entries then reads each processes /proc/PID/cmdline files. eventually it will encounter a process that stops between opendir /proc and reading /proc/PID/cmdline attaching strace to snmpd shows ... getdents64(13, /* 1 entries */, 1024) = 32 open("/proc/16336/cmdline", O_RDONLY) = 14 read(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 511) = 40 close(14) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- I'll attach a patch that does not have this problem Version-Release number of selected component (if applicable): net-snmp-5.0.9-2.30E.15 How reproducible: Always Steps to Reproduce: 1. build net-snmp with USE_PROC_CMDLINE 2. have some proc conf elements 3. restart snmpd service Actual Results: eventual snmpd SIGSEGV Expected Results: monitor processes Additional info:
Created attachment 119124 [details] proposed alternative newproc patch
Created attachment 119125 [details] patch that enables use of USE_PROC-CMDLINE code requires spec file to be attached next
Created attachment 119126 [details] spec file using the 2 previous patches
So what you are saying is, that the new-proc patch doesn't work for you and you're fixing the old method to monitor the processes? Which process do you want to monitor? I've configured net-snmp to monitor httpd process and it works fine. How "eventual" is the SIGSEV you're seeing?
as an example, for oracle, I need to check that there is only one process ora_pmon_SID the newproc uses /proc/PID/status Name: which gives oracle /proc/PID/cmdline gives me the ora_pmon_SID I need for other processes cmdline often has the full path which is handy so I prefer (need) the USE_PROC_CMDLINE version, which has this SIGSEV problem to try and get it to occur quicker ## max 1 ssh listener proc /usr/sbin/sshd 1 1 ## max 15 ssh connections proc sshd 15 0 should happen within say 10minutes of there being no one sshing into the server running the agent happens everytime
ok, I let it run for some time # time watch -d 'snmpwalk -On -v2c -c public localhost .1.3.6.1.4.1.2021.2.1' real 20m5.184s user 0m12.197s sys 0m3.797s but still I don't get any segfault. But what is strange that even with your config adds I get line .1.3.6.1.4.1.2021.2.1.101.1 = STRING: Too few /usr/sbin/sshd running (# = 0) which is not exactly correct cos I see one process running (I'm sshd on that RHEL3 machine) Is it another bug and do I just have some missconfiguration?
I only get to actually see the segfault by attaching strace, otherwise it just stops working As I suspect the segfault only occurs when another process stops between the /proc open and reading from /proc/PID/cmdline, it might depend on what else is running. having something that fires of processes and stops them again might help trigger it. I'll try and think of something suitable other differences in our setups- I hit it with snmptable hostname prTable rather than a walk (easier to read) but I can't see that that would effect it also I use a remote host and ssh into to do the service snmpd restart (and attache the strace) then log out (which should change the sshd count to 0)
This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you.