Bug 860308 - condor SEGFAULT after upgrade while using custom hostname
Summary: condor SEGFAULT after upgrade while using custom hostname
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor
Version: 2.2
Hardware: All
OS: Linux
low
low
Target Milestone: 2.3
: ---
Assignee: Timothy St. Clair
QA Contact: Martin Bukatovic
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-25 14:20 UTC by Martin Bukatovic
Modified: 2013-03-06 18:46 UTC (History)
5 users (show)

Fixed In Version: condor-7.8.5-0.1
Doc Type: Bug Fix
Doc Text:
Cause: Statically configuring /etc/hosts which may have ipv6 entries. Consequence: Schedd will crash trying to forward resolve entries based on the CVE fix. Fix: Properly handle ipv6 addresses in /etc/hosts and static configurations Result: Condor starts normally.
Clone Of:
Environment:
Last Closed: 2013-03-06 18:46:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Condor 3224 0 None None None 2012-09-25 19:43:36 UTC
Red Hat Bugzilla 853945 0 urgent CLOSED RHHAv2 collector daemon SEGFAULT 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2013:0564 0 normal SHIPPED_LIVE Low: Red Hat Enterprise MRG Grid 2.3 security update 2013-03-06 23:37:09 UTC

Internal Links: 853945

Description Martin Bukatovic 2012-09-25 14:20:04 UTC
Description of problem:

Condor was upgraded from condor-7.6.5-0.14.el6.i686 to condor-7.6.5-0.22.el6.i686.
After condor restart, both condor_master and condor_collector daemons crashed.

Affected machine has custom hostname (rhel-6-i386.virtualdomain) with           
proper entry in /etc/hosts:  

~~~                                                                             
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4  
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6  
                                                                                
# local pool                                                                    
192.168.122.7   rhel-5-i386.virtualdomain   rhel-5-i386                         
192.168.122.198 rhel-5-x86_64.virtualdomain rhel-5-x86_64                       
192.168.122.231 rhel-6-i386.virtualdomain   rhel-6-i386                         
192.168.122.111 rhel-6-x86_64.virtualdomain rhel-5-x86_64                       
192.168.122.187 rhel-5-x86_64_duo.virtualdomain rhel-5-x86_64_duo               
~~~                                                                             
                                                                                
Default personal condor configuration was used.                                 
                                                                                
The master logs shows (using 'ALL_DEBUG=D_FULLDEBUG'):

~~~                                                                             
09/25/12 13:29:02 IPVERIFY: checking rhel-6-i386.virtualdomain against e77aa8c0 
09/25/12 13:29:02 IPVERIFY: matched e77aa8c0 to e77aa8c0                        
09/25/12 13:29:02 IPVERIFY: ip found is 1                                       
09/25/12 13:29:02 IPVERIFY: checking rhel-6-i386 against e77aa8c0               
09/25/12 13:29:02 IPVERIFY: comparing 100007f to e77aa8c0                       
09/25/12 13:29:02 IPVERIFY: ip found is 0                                       
09/25/12 13:29:02 WARNING: forward resolution of localhost4 doesn't match e77aa8c0!
Stack dump for process 5237 at timestamp 1348572542 (24 frames)                 
condor_master(dprintf_dump_stack+0x44)[0x810cfb4]                               
condor_master[0x8144a87]                                                        
[0x743400]                                                                      
/lib/libc.so.6(_IO_vfprintf+0x38fe)[0xe7835e]                                   
/lib/libc.so.6(__vsnprintf_chk+0xd4)[0xf2e104]                                  
condor_master(vprintf_length+0x38)[0x81295f8]                                   
condor_master(vsprintf_realloc+0x4b)[0x812964b]                                 
condor_master[0x810da2d]                                                        
condor_master(_condor_dprintf_va+0x318)[0x810ef18]                              
condor_master(dprintf+0x20)[0x81380d0]                                          
condor_master(_Z18verify_name_has_ipPc7in_addr+0x2c)[0x80cc03c]                 
condor_master(_ZN8IpVerify6VerifyE12DCpermissionPK11sockaddr_inPKcP8MyStringS7_+0x6bd)[0x80ce69d]
condor_master(_ZN6SecMan6VerifyE12DCpermissionPK11sockaddr_inPKcP8MyStringS7_+0x3d)[0x80e174d]
condor_master(_ZN10DaemonCore6VerifyEPKc12DCpermissionPK11sockaddr_inS1_+0x71)[0x80a3871]
condor_master(_ZN10DaemonCore9HandleReqEP6StreamS1_+0xcb7)[0x80b1c87]           
condor_master(_ZN10DaemonCore22HandleReqSocketHandlerEP6Stream+0x5f)[0x80b47bf] 
condor_master(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x5af)[0x80b4f6f]
condor_master(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x2d)[0x80b503d]
condor_master(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x57)[0x81429f7]       
condor_master(_ZN10DaemonCore17CallSocketHandlerERib+0x107)[0x80aae17]          
condor_master(_ZN10DaemonCore6DriverEv+0x1f6d)[0x80af99d]                       
condor_master(main+0x1432)[0x809e4b2]                                           
/lib/libc.so.6(__libc_start_main+0xe6)[0xe4ace6]                                
condor_master[0x8092761]                                                        
~~~ 

and condor_collector log shows the same problem:                                
                                                                                
~~~                                                                             
09/25/12 13:29:02 IPVERIFY: checking rhel-6-i386.virtualdomain against e77aa8c0 
09/25/12 13:29:02 IPVERIFY: matched e77aa8c0 to e77aa8c0                        
09/25/12 13:29:02 IPVERIFY: ip found is 1                                       
09/25/12 13:29:02 IPVERIFY: checking rhel-6-i386 against e77aa8c0               
09/25/12 13:29:02 IPVERIFY: comparing 100007f to e77aa8c0                       
09/25/12 13:29:02 IPVERIFY: ip found is 0                                       
09/25/12 13:29:02 WARNING: forward resolution of localhost4 doesn't match e77aa8c0!
Stack dump for process 5239 at timestamp 1348572542 (24 frames)                 
condor_collector(dprintf_dump_stack+0x44)[0x8125834]                            
condor_collector[0x8164a07]                                                     
[0xb25400]                                                                      
/lib/libc.so.6(_IO_vfprintf+0x38fe)[0x58a35e]                                   
/lib/libc.so.6(__vsnprintf_chk+0xd4)[0x640104]                                  
condor_collector(vprintf_length+0x38)[0x8141b78]                                
condor_collector(vsprintf_realloc+0x4b)[0x8141bcb]                              
condor_collector[0x81262ad]                                                     
condor_collector(_condor_dprintf_va+0x318)[0x8127798]                           
condor_collector(dprintf+0x20)[0x8156520]                                       
condor_collector(_Z18verify_name_has_ipPc7in_addr+0x2c)[0x80de9cc]              
condor_collector(_ZN8IpVerify6VerifyE12DCpermissionPK11sockaddr_inPKcP8MyStringS7_+0x6bd)[0x80e102d]
condor_collector(_ZN6SecMan6VerifyE12DCpermissionPK11sockaddr_inPKcP8MyStringS7_+0x3d)[0x80f3ccd]
condor_collector(_ZN10DaemonCore6VerifyEPKc12DCpermissionPK11sockaddr_inS1_+0x71)[0x80b7891]
condor_collector(_ZN10DaemonCore9HandleReqEP6StreamS1_+0xcb7)[0x80c5ca7]        
condor_collector(_ZN10DaemonCore22HandleReqSocketHandlerEP6Stream+0x5f)[0x80c87df]
condor_collector(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x5af)[0x80c8f8f]
condor_collector(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x2d)[0x80c905d]
condor_collector(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x57)[0x8162977]    
condor_collector(_ZN10DaemonCore17CallSocketHandlerERib+0x107)[0x80bee37]       
condor_collector(_ZN10DaemonCore6DriverEv+0x1f6d)[0x80c39bd]                    
condor_collector(main+0x1432)[0x80b24d2]                                        
/lib/libc.so.6(__libc_start_main+0xe6)[0x55cce6]                                
condor_collector[0x809c7c1]                                                     
~~~                                                                             
                                                                                
Version-Release number of selected component (if applicable):

[root@rhel-6-i386 condor]# rpm -qa | grep condor
condor-classads-7.6.5-0.22.el6.i686
python-condorutils-1.5-4.el6.noarch
condor-7.6.5-0.22.el6.i686

How reproducible:

I fail to reproduce the problem on new clean installed machine.

Steps to Reproduce:

Don't know.
  
Actual results:

Some condor daemons crashes.

Expected results:

Condor should not crash.

Additional info:

On one hand I don't know how to reproduce the issue on new clean installed
machine, on the other hand I was able to reproduce it on my other old virtual
machines: rhel-6-x86_64 one as well as on rhel 5 nodes (both i386 and x86_64)
when using /etc/hosts file from rhel 6 (all these machines were installed 2
months ago and I use them for testing purposes). 

Quick fix:                                                                      
                                                                                
When you replace following lines from /etc/hosts (default on rhel 6):           
                                                                                
~~~                                                                             
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4  
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6  
~~~                                                                             
                                                                                
with these:                                                                     
                                                                                
~~~                                                                             
127.0.0.1   localhost                                                           
::1         localhost                                                           
~~~                                                                             
                                                                                
the problem doesn't occur.

Related problems:

https://bugzilla.redhat.com/show_bug.cgi?id=853945
https://lists.cs.wisc.edu/archive/condor-users/2012-August/msg00086.shtml

Comment 1 Martin Bukatovic 2012-09-25 14:34:29 UTC
Interesting note:

When I don't use ALL_DEBUG=D_FULLDEBUG option, I will get slightly
different stack trace in the condor master log:

~~~
09/25/12 16:30:55 WARNING: forward resolution of localhost4 doesn't match e77aa8c0!
Stack dump for process 6657 at timestamp 1348583455 (19 frames)
condor_master(dprintf_dump_stack+0x44)[0x810cfb4]
condor_master[0x8144a87]
[0x2c8400]
/lib/libc.so.6(__nss_hostname_digits_dots+0x39)[0x549239]
/lib/libc.so.6(gethostbyname+0x9a)[0x54e3ba]
condor_master(_Z18verify_name_has_ipPc7in_addr+0x34)[0x80cc044]
condor_master(_ZN8IpVerify6VerifyE12DCpermissionPK11sockaddr_inPKcP8MyStringS7_+0x6bd)[0x80ce69d]
condor_master(_ZN6SecMan6VerifyE12DCpermissionPK11sockaddr_inPKcP8MyStringS7_+0x3d)[0x80e174d]
condor_master(_ZN10DaemonCore6VerifyEPKc12DCpermissionPK11sockaddr_inS1_+0x71)[0x80a3871]
condor_master(_ZN10DaemonCore9HandleReqEP6StreamS1_+0xcb7)[0x80b1c87]
condor_master(_ZN10DaemonCore22HandleReqSocketHandlerEP6Stream+0x5f)[0x80b47bf]
condor_master(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x5af)[0x80b4f6f]
condor_master(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x2d)[0x80b503d]
condor_master(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x57)[0x81429f7]
condor_master(_ZN10DaemonCore17CallSocketHandlerERib+0x107)[0x80aae17]
condor_master(_ZN10DaemonCore6DriverEv+0x1f6d)[0x80af99d]
condor_master(main+0x1432)[0x809e4b2]
/lib/libc.so.6(__libc_start_main+0xe6)[0x467ce6]
condor_master[0x8092761]
~~~

Note that this one is the same as in 2 other bug reports I linked to above.

Comment 2 Martin Bukatovic 2012-09-25 15:05:04 UTC
When run without ALL_DEBUG=D_FULLDEBUG, it seems that condor crashes because of
calling gethostbyname with wrong string - as can be seen in the following
excerpt from ltrace log:

~~~
6699 16:48:01.259591   write(10, "09/25/12 16:48:01 WARNING: forward resolution of localhost4 doesn't match e77aa8c0!\n
", 84) = 84
6699 16:48:01.259728   fflush(0x9e49510)                                 = 0
6699 16:48:01.259827   fclose(0x9e49510)                                 = 0
6699 16:48:01.259953   umask(022)                                        = 022
6699 16:48:01.260061   sigprocmask(2, 0xbfb6d1dc, NULL)                  = 0
6699 16:48:01.260215   gethostbyname("\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\
377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377
\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\37
7\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\3
77\377\377\377\377\377\377\377\377\377\377"... <unfinished ...>
6699 16:48:01.260337 --- SIGSEGV (Segmentation fault) ---
~~~

Log was generated using: ltrace -tt -n 2 -f -s 120 -o condor_ltrace condor_master

Comment 6 Martin Bukatovic 2012-09-25 18:53:10 UTC
I was able to reproduce the problem on fresh virtual machine using the following
steps:

Steps to Reproduce:                                                             
                                                                                
1) install fresh rhel 6.3                                                       
2) change hostname to 'rhel-6-x86_64.virtualdomain'                             
   edit /etc/sysconfig/network                                                  
3) add following lines into /etc/hosts:                                         

   # local pool                                                                 
   192.168.122.7   rhel-5-i386.virtualdomain   rhel-5-i386                      
   192.168.122.198 rhel-5-x86_64.virtualdomain rhel-5-x86_64                    
   192.168.122.231 rhel-6-i386.virtualdomain   rhel-6-i386                      
   192.168.122.169 rhel-6-x86_64.virtualdomain rhel-6-x86_64                    

   where 192.168.122.169 is global ipv4 address of the machine                  
4) reboot machine (for hostname to be updated)                                  
5) install from mrg 2.1 install these packages:                                 
   condor-7.6.5-0.14.el6.x86_64.rpm                                             
   condor-classads-7.6.5-0.14.el6.x86_64.rpm                                    
   condor-debuginfo-7.6.5-0.14.el6.x86_64.rpm                                   
6) start condor if it's not already running, run condor_status to see that it's working, stop condor
7) upgrade to mrg 2.2 (just do yum upgrade)                                     
8) start condor, try condor_status, see logs

Comment 17 errata-xmlrpc 2013-03-06 18:46:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0564.html


Note You need to log in before you can comment on or make changes to this bug.