Bug 476810

Summary: Long real server names cause segfault in lvsd
Product: Red Hat Enterprise Linux 5 Reporter: Tim Steneker <tsteneker>
Component: piranhaAssignee: Marek Grac <mgrac>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: low    
Version: 5.2CC: cluster-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-10 16:31:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tim Steneker 2008-12-17 09:50:43 UTC
Description of problem:
When using long (+/- 28 or more characters) real server names in piranha configuration, and then starting pulse, lvsd will crash with a segfault:

kernel: lvsd[2140]: segfault at ffffffffffffffd0 rip 000000314ec785a0 rsp 00007fff63d99558 error 4

Version-Release number of selected component (if applicable):

Program Version:        lvs 1.38
Built:                  17/Dec/2008
A component of:         piranha-0.8.4-7

output of uname -a:"

Linux lb01.domainname.local 2.6.18-92.1.18.el5.centos.plus #1 SMP Wed Nov 26 07:28:20 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

Use a long real server name in lvs.cf, e.g. "www.www0001.domainname.local" and then start pulse using "/etc/init.d/pulse start" in an x86_64 environment.

Steps to Reproduce:
1. Create a basic piranha configuration with a real server which has a long name like "www.www0001.domainname.local".
2. Start Pulse using "/etc/init.d/pulse start"
3. Watch the messages log in "/var/log/messages"

Make sure lvsd is being run in daemon mode, as this triggers the syslog function (where the problem probably lies), instead of printing log messages to the display (which works correctly).
  
Actual results:

Dec 17 10:21:22 lb01 pulse[2137]: STARTING PULSE AS MASTER
Dec 17 10:21:40 lb01 pulse[2137]: partner dead: activating lvs
Dec 17 10:21:40 lb01 lvs[2140]: starting virtual service www.domainname.net: 80
Dec 17 10:21:40 lb01 avahi-daemon[3832]: Registering new address record for 10.36.125.202 on eth0.
Dec 17 10:21:40 lb01 avahi-daemon[3832]: Withdrawing address record for 10.36.125.202 on eth0.
Dec 17 10:21:40 lb01 kernel: lvsd[2140]: segfault at ffffffffffffffd0 rip 000000314ec785a0 rsp 00007fff63d99558 error 4
Dec 17 10:21:40 lb01 nanny[2154]: starting LVS client monitor for 10.36.125.202:80
Dec 17 10:21:45 lb01 pulse[2142]: gratuitous lvs arps finished
Dec 17 10:22:08 lb01 pulse[2137]: Terminating due to signal 15

Expected results:

Dec 17 10:24:36 lb01 pulse[2812]: STARTING PULSE AS MASTER
Dec 17 10:24:54 lb01 pulse[2812]: partner dead: activating lvs
Dec 17 10:24:54 lb01 lvs[2826]: starting virtual service www.domainname.net active: 80
Dec 17 10:24:54 lb01 avahi-daemon[3832]: Registering new address record for 10.0.8.1 on eth1.
Dec 17 10:24:54 lb01 avahi-daemon[3832]: Withdrawing address record for 10.0.8.1 on eth1.
Dec 17 10:24:54 lb01 lvs[2826]: create_monitor for www.domainname.net/www.www0001.domainname.local running as pid 2838
Dec 17 10:24:54 lb01 nanny[2835]: starting LVS client monitor for 10.36.125.202:80
Dec 17 10:24:59 lb01 pulse[2828]: gratuitous lvs arps finished

Additional info:

The problems seems to be in "piranha-0.8.4/util.c", specifically in the "doSyslog" function. As soon as the log messages are larger than 80 characters a reallocation of memory is being done in this function and that somehow causes a segfault. A quick fix allocating more bytes initially helped to solve the problem for us, but a more structural solution would be of course the reallocation to succeed properly.

Comment 1 Marek Grac 2009-01-07 12:33:32 UTC
Unable to reproduce, can you send me your lvs configuration file?

Comment 2 Tim Steneker 2009-01-12 08:22:55 UTC
That's a little hard since the servers we have tested this on are already in production. The most important thing is that the real server name is long, e.g. "www.www0001.domainname.local".

Have you tested this on a 64 bit platform? We know the problem does not occur  on 32 bit platforms (in our test setup), so if you have tested this on a 32 bit platform, that is expected.

Please let me know, otherwise we will need to make a new test setup for this.

Comment 3 Marek Grac 2009-01-20 17:58:41 UTC
Test setup will be very welcomed. I tried it on my 64bit machines with 5.3.

Comment 4 Marek Grac 2009-02-10 16:31:01 UTC
It is very likely that this bug is a duplicate of #446802 (segfault if syslog message is longer than 80 characters) which was resolved in 5.3. IMHO that is a reason why I was not able to reproduce it as it was fixed already. Closing as duplicate, if you will have same problems with 5.3 please open new bug.

*** This bug has been marked as a duplicate of bug 446802 ***