Bug 170875 - perl script using net-snmp hangs on newer kernels
Summary: perl script using net-snmp hangs on newer kernels
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: net-snmp
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Radek Vokál
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-10-14 20:48 UTC by Marc Wiartrowski
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version: net-snmp-5.0.9-2.30E.18
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-11-08 15:09:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
text tcpdump of last few snmp gets (41.28 KB, text/plain)
2005-10-14 20:51 UTC, Marc Wiartrowski
no flags Details
binary tcpdump of that last few snmp gets (2.22 KB, application/octet-stream)
2005-10-14 20:53 UTC, Marc Wiartrowski
no flags Details
Last debug lines from perl script (62.60 KB, text/plain)
2005-10-14 21:01 UTC, Marc Wiartrowski
no flags Details
snmp perl script (4.23 KB, application/octet-stream)
2005-10-19 13:33 UTC, Marc Wiartrowski
no flags Details

Description Marc Wiartrowski 2005-10-14 20:48:46 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

Description of problem:
We have been running a piece of perl code that loops through thousands of 
IP address and does a few snmp get queries on each.  It works fine with the
Update 2 '2.4.21-15.EL' and a local compiled version of net-snmp 5.2.1

Upon upgrading to Update 5 with kernel 2.4.21-32.0.1.EL, the script will
run fine for many IPs, but then it appears that upon recieving a responce
with an incorrect UDP checksum it just hangs the perl script.

In testing, if we use the Update 5 kernel with the redhat rpm of net-snmp-*-5.0.9-2.30E.19 the script eventually still hangs.  BUT if
we go back to the Update 2 kernel and keep the rpm snmp it works
just fine everytime.  So it appears to be something between the 2 kernels.

Its not the same IP address either.  

tcpdump and net-snmp debug attachments to follow.

Snipet of the perl code:

while (my ($mac, $data) =  each %{$device}) {
   print "Polling: $mac ($data->{ip})\n";
   $SNMP::debugging = 3;
   my $sess2 = new SNMP::Session(DestHost => $data->{ip}, Community => 'public', UseSprintValue => 1, UseLongNames => 1, UseNumeric => 1, Timeout => 100000, Retries => 2);
   print "Made Connection\n";
   my $vars = new SNMP::VarList(['.1.3.6.1.4.1.1782.2.3.4.3', 0], ['.1.3.6.1.4.1.1782.2.3.5.1', 0], ['.1.3.6.1.4.1.1782.2.3.3.8', 0], ['.1.3.6.1.4.1.1782.2.3.3.4', 0]);
   print "Set OIDs\n";
   my @val = $sess2->get($vars);
   print "Did get\n";
   if ($sess2->{ErrorStr}) {
      print ". . . SNMP Error: $sess2->{ErrorStr}\n";
      next;
   } else {
      # Do stuff with returned results
   }
}



Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.EL net-snmp-5.0.9-2.30E.19

How reproducible:
Always

Steps to Reproduce:
1. Run perl with net-snmp script on Update 5 kernel with any snmp version


Additional info:

Comment 1 Marc Wiartrowski 2005-10-14 20:51:25 UTC
Created attachment 119999 [details]
text tcpdump of last few snmp gets

Comment 2 Marc Wiartrowski 2005-10-14 20:53:06 UTC
Created attachment 120001 [details]
binary tcpdump of that last few snmp gets

Comment 3 Marc Wiartrowski 2005-10-14 21:01:12 UTC
Created attachment 120003 [details]
Last debug lines from perl script

Comment 4 Suzanne Hillman 2005-10-17 14:59:54 UTC
Could you check if this still happens on the Update 6 kernel, please?

Comment 5 Marc Wiartrowski 2005-10-17 17:36:04 UTC
Just tried kernel 2.4.21-37.EL and it still locked up.

Comment 6 Radek Vokál 2005-10-18 11:40:07 UTC
May I have whole perl script so I can rerun the test. I didn't reproduce the
hang on -32 kernel, but I think it appears after more than several runs. 

Comment 7 Marc Wiartrowski 2005-10-19 13:33:00 UTC
Created attachment 120165 [details]
snmp perl script

ok.  This is a trimmed down version of the script, but I just ran it
on 2.4.21-37.EL and it locked up.  It basically cycles through ~250k
ips doing the snmp query.  

I believe it broke somewhere between the Update 2 and Update 4 kernels.
We can try to narrow it down if you like.

Comment 8 Radek Vokál 2005-10-21 12:47:41 UTC
Should I pass some arguments to your script to make it run? Or how do you test
it? (I thought I can load in in snmpd.conf with `perl do` but this doesn't work)

Comment 9 Marc Wiartrowski 2005-10-21 13:20:21 UTC
Its not a script that runs through snmpd, its a command line script actually
run through cron once a day that does snmp gets. 

The script takes 2 parameters, --comm for the community string for snmp and
--cust which tells it the customer name.   The customer name is used to tell
the script which table in a database to connect to in which to get the 250k+
IP address it then snmp quries.

I am not sure how you would run the script without modifing it as it needs
to connect to a database to get the IP addresses.  And even if I could send 
you the database, you wouldn't be able to query the IP addresses, as they 
are on a private network.

The script takes many hours to run through the 250,000+ IP addresses when 
it works on the Update 2 kernel.  With a newer kernel it will run for an
hour or so and make it through several 1000 IP addresses before it hangs.

If there is something I can help with or run, please let me know.
 

Comment 10 Radek Vokál 2006-11-08 13:58:59 UTC
Sorry I got back to this issue now. Is this still a problem? I've tried your
script but probably I need to probe more machines.

Comment 11 Marc Wiartrowski 2006-11-08 14:22:04 UTC
In the time frame of Update 5 we went back to the Update 2 kernel.
Currently our problem appears to have been fixed in Updates 6 and 7 as we
are running them just fine.  (We have not went to Update 8 anywhere
yet.)

Comment 12 Radek Vokál 2006-11-08 15:09:52 UTC
Thanks for testing. 


Note You need to log in before you can comment on or make changes to this bug.