From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; fi; rv:1.8) Gecko/20051111 Firefox/1.5 Description of problem: Initially pulse starts normally. However, when it activates the lvs and creates a monitor for the virtual service both nanny and lvsd produce a segfault. They then continue to load and eventually print a message that gratuitous lvs arps has been finished. kernel: nanny[10515]: segfault at 0000000000000000 rip 0000003188e6fd20 rsp 0000007fbffef0c8 error 4 kernel: lvsd[10510]: segfault at 0000000000000480 rip 000000000040314f rsp 0000007fbffff970 error 4 These two segfaults result in the LVS routing table to be only half-done. The virtual service is there but no real servers have been specified. They can be entered manually using ipvsadm and then everything works (until pulse is restarted and it resets the table). This happens straight out of the box on two identical Dell PowerEdge 1425 servers (one is primary and the other is backup) essentially rendering the whole service useless. If you disable all real servers, for instance using Piranha, no segfaults are produced and the services start up nicely. Once you enable at least one real server the problem reoccurs. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Add a real server using, for instance, Piranha 2. /sbin/service pulse start Additional info: CURRENT LVS ROUTING TABLE IP Virtual Server version 1.2.0 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.58.12.246:80 wlc CURRENT LVS PROCESSES root 10507 0.0 0.0 5176 548 ? Ss 12:01 0:00 pulse root 10510 0.0 0.0 0 0 ? Zs 12:01 0:00 [lvsd] <defunct> pulse[10507]: STARTING PULSE AS MASTER pulse: pulse startup succeeded pulse[10507]: partner dead: activating lvs lvs[10510]: starting virtual service rek active: 80 nanny[10515]: starting LVS client monitor for 10.58.12.246:80 kernel: nanny[10515]: segfault at 0000000000000000 rip 0000003188e6fd20 rsp 0000007fbffef0c8 error 4 lvs[10510]: create_monitor for rek/rek2 running as pid 10515 kernel: lvsd[10510]: segfault at 0000000000000480 rip 000000000040314f rsp 0000007fbffff970 error 4 lada pulse[10512]: gratuitous lvs arps finished The machine has 4 network interfaces bonded into one external and one internal bonding interface. I have tried running pulse without any bonded interfaces but this did nothing to solve the problem.
All available updates have been downloaded and installed from Redhat Network.
Created attachment 123037 [details] lvs.cf LVS.CF is identical on both servers. Segfault persists even if the backup service is disabled in this file.
Created attachment 123038 [details] Output of lsmod
Created attachment 123039 [details] Output of lspci-vv
Created attachment 123040 [details] Syslog
Could you start pulse with 'pulse -nv' instead of 'service pulse start' and capture output until lvsd dies?
Created attachment 123103 [details] Output of pulse -nv The segfaults are visible in /var/log/messages.
There is a bug in nanny that triggers a series of unfortunate events: 1. nanny segfaults if regular expression matching is enabled, and there is no âexpect stringâ to match against 2. lvsd segfaults with nanny, leaving other nannies alive 3. pulse doesn't monitor lvsd, so is unaware of problem 4. system is half configured, without failing over
Quick fix: 1. Disable regular expression matching in âvirtual servers->monitoring scriptsâ if âexpectâ is blank. 2. Since âedit monitoring scriptsâ page removes escape characters on âacceptâ (a bug), no newline characters can be specified. So, replace ârnrnâ with â\r\n\r\nâ in /etc/sysconfig/ha/lvs.cf. or Recreate a virtual server, without making changes to âedit monitoring scriptsâ page. Defaults work just fine with http.
Problem solved! HTTP is just for testing purposes so the defaults wouldn't have been applicable for the custom applications we are using in production. Thank you for the quick fix! /Kim
Fixed in 0.8.3 Fixed nanny and lvsd segfaults To test: 1. enable regex matching and leave expect string empty; nanny shouldn't segfault 2. start lvsd and kill a nanny; lvsd should gracefully exit terminating other nannies
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0538.html