Bug 16316

Summary: Probelms with multiple services with lvs.cf
Product: [Retired] Red Hat High Availability Server Reporter: Need Real Name <ciofina>
Component: piranhaAssignee: Phil Copeland <copeland>
Status: CLOSED ERRATA QA Contact: Phil Copeland <copeland>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.0CC: ciofina, keith.moore
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-08-18 03:35:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The lvs executables you requested:)
none
my production lvs.cf none

Description Red Hat Bugzilla 2000-08-16 07:49:20 UTC
I use RH6.2 with piranha 0.4.16-3 and kernel 2.2.16-4.
All the actions with HA goes well if I use only one virtual service (www
on the lvs.cf). If I try to put also another service on the same IP address
lvs starts only the first one. 
To avoid the problem at the moment I put only one service inside the lvs.cf
file and activate the other by hand (with ipvsadm) when pulse has finished
to start nanny and the other things. 
As you can see in the lvs.cf file I have also a backup server, but in the
case the primary server goes down the backup server will start only the
first service.

Can anyone help me?

Thanks in advance:)

My lvs.cf file is like this:
primary = 1.2.3.4
service = lvs
rsh_command = rsh
backup_active = 1
backup = 1.2.3.5
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = direct
virtual www {
     active = 1
     address = 1.2.3.2 eth0:1
     port = 80
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     load_monitor = rup
     scheduler = wlc
     protocol = tcp
     timeout = 6
     reentry = 15
     server a.b.it {
         address = 1.2.3.11
         active = 1
         weight = 1
     }
     server b.b.it {
         address = 1.2.3.12
         active = 1
         weight = 1
     }
     server c.b.it {
         address = 1.2.3.13
         active = 1
         weight = 1
     }
}
virtual wwws {
     active = 1
     address = 1.2.3.2 eth0:1
     port = 443
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     load_monitor = rup
     scheduler = wlc
     protocol = tcp
     timeout = 6
     reentry = 15
     server a.b.it {
         address = 1.2.3.11
         active = 1
         weight = 1
     }
     server b.b.it {
         address = 1.2.3.12
         active = 1
         weight = 1
     }
     server c.b.it {
         address = 1.2.3.13
         active = 1
         weight = 1
     }
}

Comment 1 Red Hat Bugzilla 2000-08-16 18:19:45 UTC
It is actually a requirement for piranha LVS that each virtual service have a
unique virtual address and virtual device designation. This should be documented
(let me know if it's not). Part of the reason for this is that services are
monitored, removed, and added individually.



Comment 2 Red Hat Bugzilla 2000-08-16 18:21:42 UTC
Additionl note: Virtual IP addresses are not the same as the real IP address of
the computer -- they are independent. Multiple IP services CAN run on the same
computer, but each must have a unique virtual IP address. It was not clear from
your posting whether this was understood.

Comment 3 Red Hat Bugzilla 2000-08-16 23:06:50 UTC
Problem reopened -- it was closed accidentally


Comment 4 Red Hat Bugzilla 2000-08-16 23:09:33 UTC
Problem being investigated in another entry.


*** This bug has been marked as a duplicate of 16399 ***

Comment 5 Red Hat Bugzilla 2000-08-16 23:21:05 UTC
OK, so it's not a duplicate.


Comment 6 Red Hat Bugzilla 2000-08-16 23:23:21 UTC
As you can see below, there are no problems with having multiple services with
the same IP but different ports (This has to be allowed, since it's very common
to have both port 80 and 443 on the same site).  I have been using this type of
configuration since RedHat 6.0.

The only difference in the code I am running and the latest code is a bugfix for
lvs.c regarding persistence has been applied:

IP Virtual Server version 0.9.14 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port          Forward Weight ActiveConn InActConn
TCP  198.245.191.240:https wlc persistent 20
  -> renpwwwbkprd5.renp-dmz.com:https Masq    1      0          0
TCP  198.245.191.240:www wlc persistent 20
  -> renpwwwbkprd5.renp-dmz.com:www Masq    1      0          0                 

-- Keith Moore

Comment 7 Red Hat Bugzilla 2000-08-16 23:34:32 UTC
Are you using Apache on the real servers?  Some servers do not respond to
standard GET requests on port 443 (IIS for one).  If this is the case, the SSL
server will come never be brought up, since it's not getting a proper response. 
Try blanking out the SEND/EXPECT.  Also, make sure you are using the latest
software, 0.4.16-3, prior versions had problems with 443.  

What is the output of ipvsadm -ln This will show if the Virtual server is being
enabled, but none of the real servers.
 
-- Keith Moore

Comment 8 Red Hat Bugzilla 2000-08-16 23:50:25 UTC
In addition to the comment made by Keith Moore, also install this patch he
created and see if it has any impact on your problem (you never mention if you
are using persistence or not):

--- lvs.c.old	Wed Aug 16 19:05:14 2000
+++ lvs.c	Wed Aug 16 19:08:48 2000
@@ -465,6 +465,7 @@
                         int * numClientsPtr) {
     int i;
     char * argv[40];
+    char wrkBuf[10];
     char ** arg = argv;
     char virtAddress[50];
     int oldNumClients;
@@ -521,7 +522,8 @@
 
     if (vserver->persistent > 0 ) {       
         *arg++ = (char *) "-p";
-        (void) sprintf(*arg++, "%d", vserver->persistent);
+        (void) sprintf(wrkBuf, "%d", vserver->persistent);
+	*arg++ = wrkBuf;
 
         if (vserver->pmask.s_addr) {
             pmask = inet_ntoa(vserver->pmask);


Comment 9 Red Hat Bugzilla 2000-08-17 00:10:36 UTC
I cut and pasted your config into my test machine, it works fine, however I had
to remove the send/expect for port 443, since I'm using NT/IIS which doesn't
respond to GET requests on port 443 (This is probably the case with Netscape
also).

With the send/expect it would not bring up the https real-servers.

-- Keith Moore

Comment 10 Red Hat Bugzilla 2000-08-17 08:03:26 UTC
1. ipvsadm -l doesn't return the 443 service;
2. I tried also without persistence, but the problem persist
3. I'm using apache as www and stronghold (apache based) for https: I tried also
without the persistence and without the send/expect string (although the secure
server doesn't expect some password) and the problem persist.
4. I didn't try to apply the patch for persistence, but I believe it doesn't
affect my case;
5. I didn't try to change the IP address's (I will try this today). But, in any
case, I tried with other services (like ftp) with a different IP and all goes
well.

Follow you will find a piece of my log file when pulse start the service on the
primary server. As you can see lvs try to start the service strong on port 443,
but after this nothing else about the second service (the ip addresses are
real!).

Thanks again:)

Aug 17 07:37:58 haa pulse[2227]: STARTING PULSE AS MASTER
Aug 17 07:37:58 haa pulse: pulse startup succeeded
Aug 17 07:38:16 haa pulse[2227]: partner dead: activating lvs
Aug 17 07:38:16 haa lvs: running command  "/usr/sbin/ipvsadm" "-C"
Aug 17 07:38:16 haa lvs[2233]: starting virtual service www active: 80
Aug 17 07:38:16 haa lvs[2233]: running command  "/usr/sbin/ipvsadm" "-A" "-t"
"193.70.29.2:80" "-s" "wlc"
Aug 17 07:38:16 haa lvs[2233]: running command  "/usr/sbin/nanny" "-c" "-h"
"193.70.29.18" "-p" "80" "-s" "GET / HTTP/1.0\r\n\r\n" "-x" "HTTP" "-a" "15"
"-I" "/usr/sbin/ipvsadm" "-t" "6" "-w" "1" "-V" "193.70.29.2" "-M" "g" "-U"
"rup"
Aug 17 07:38:16 haa lvs[2233]: create_monitor for www/hidx.teta.it running as
pid 2240
Aug 17 07:38:16 haa lvs[2233]: running command  "/usr/sbin/nanny" "-c" "-h"
"193.70.29.20" "-p" "80" "-s" "GET / HTTP/1.0\r\n\r\n" "-x" "HTTP" "-a" "15"
"-I" "/usr/sbin/ipvsadm" "-t" "6" "-w" "1" "-V" "193.70.29.2" "-M" "g" "-U"
"rup"
Aug 17 07:38:16 haa pulse[2242]: running command  "/sbin/ifconfig" "eth0:1"
"193.70.29.2" "up"
Aug 17 07:38:16 haa pulse[2239]: running command  "/usr/sbin/send_arp" "-i" "eth
0" "193.70.29.2" "0804000000EB" "193.70.29.63" "ffffffffffff"
Aug 17 07:38:16 haa lvs[2233]: create_monitor for www/hidb.teta.it running as pi
d 2241
Aug 17 07:38:16 haa lvs[2233]: running command  "/usr/sbin/nanny" "-c" "-h"
"193.70.29.19" "-p" "80" "-s" "GET / HTTP/1.0\r\n\r\n" "-x" "HTTP" "-a" "15"
"-I" "/usr/sbin/ipvsadm" "-t" "6" "-w" "1" "-V" "193.70.29.2" "-M" "g" "-U"
"rup"
Aug 17 07:38:16 haa lvs[2233]: create_monitor for www/hida.teta.it running as
pid 2244
Aug 17 07:38:16 haa lvs[2233]: starting virtual service strong active: 443
Aug 17 07:38:16 haa nanny[2240]: starting LVS client monitor for 193.70.29.2:80
Aug 17 07:38:16 haa nanny[2240]: making 193.70.29.18:80 available
Aug 17 07:38:16 haa nanny[2240]: running command  "/usr/sbin/ipvsadm" "-a" "-t"
"193.70.29.2:80" "-r" "193.70.29.18" "-g" "-w" "1"
Aug 17 07:38:16 haa nanny[2241]: starting LVS client monitor for 193.70.29.2:80
Aug 17 07:38:16 haa nanny[2241]: making 193.70.29.20:80 available
Aug 17 07:38:16 haa nanny[2241]: running command  "/usr/sbin/ipvsadm" "-a" "-t"
"193.70.29.2:80" "-r" "193.70.29.20" "-g" "-w" "1"
Aug 17 07:38:16 haa nanny[2244]: starting LVS client monitor for 193.70.29.2:80
Aug 17 07:38:16 haa nanny[2244]: making 193.70.29.19:80 available
Aug 17 07:38:16 haa nanny[2244]: running command  "/usr/sbin/ipvsadm" "-a" "-t"
"193.70.29.2:80" "-r" "193.70.29.19" "-g" "-w" "1"
Aug 17 07:38:16 haa nanny[2241]: running command  "rup" "193.70.29.20"
Aug 17 07:38:16 haa nanny[2240]: running command  "rup" "193.70.29.18"
Aug 17 07:38:16 haa nanny[2244]: running command  "rup" "193.70.29.19"
Aug 17 07:38:21 haa pulse[2235]: gratuitous lvs arps finished

Comment 11 Red Hat Bugzilla 2000-08-17 11:20:43 UTC
This is the same log I got when I was having the persistence problem.  Is lvs
Defunct after startup?

-- Keith Moore

Comment 12 Red Hat Bugzilla 2000-08-17 11:28:11 UTC
If lvs is defunct, there will be a core file dumped by lvs.  Please attach the
core file to this bug report (Make sure you mark it as binary).

In case you don't know the easiest way to find the proper core file:

find / -name core -exec file {} \;

Look for the one created by lvs.  You may need to kill the nanny's to allow lvs
to finish dumping it's core.

I've tried several things to duplicate your problem, without success, so I need
to rely on your system to get the information.

-- Keith Moore

Comment 13 Red Hat Bugzilla 2000-08-17 14:32:47 UTC
Yes! lvs (with the lvs.cf contains two services) is defunct (In fact I must kill
the nanny's only when I start pulse with the lvs.cf mentioned). I cannot find
any core file in the system also after killing the nanny's processes. 

Do you know why lvs doesn't produce the core file?
Do you know a way to verify this?
Is it possible I have a broken lvs executables (probably this is a stupid
question)? To prevent Murphy:) here you find the sum result of lvs: 15938 31
Let me know how produce the core files and thank you very much for your
cooperation.

Comment 14 Red Hat Bugzilla 2000-08-17 14:58:03 UTC
Ok, not getting a core file is moderatly annoying, I've never had that problem. 
If lvs is going defunct there is a bug.  (The patch above fixes one of them). 
It's probably not a corrupt binary, that would act differently.

Try running the following (As root):

/usr/sbin/lvs --nodeamon --nofork -c /etc/lvs.cf

The normal log will come to the screen, and it should dump a core file in your
current directory if it dies.  Once again, you may have to kill lvs and the
nanny processes.  The defunct is because the parent died, but there are still
active children, so the kernel keeps the PID existing, but... defunct.

If I can get a core, I can probably find the problem in about 2 minutes.

-- Keith Moore


Comment 15 Red Hat Bugzilla 2000-08-17 15:29:47 UTC
Probably the problem is elsewhere:(:(:(
Now (if I start from the console lvs like you described to me) all goes well
and also the https service start!!!

On the other hand if I start the services by pulse (in this case the IP address
goes up) I give the same bad results. ?????

Comment 16 Red Hat Bugzilla 2000-08-17 15:41:21 UTC
Ok, I know what kind of bug that sounds like, I'm looking through that section
of the code now.

-- Keith

Comment 17 Red Hat Bugzilla 2000-08-17 15:57:24 UTC
Please attach your lvs binary.  (Insure to mark it as binary).

-- Keith Moore


Comment 18 Red Hat Bugzilla 2000-08-17 16:10:41 UTC
Created attachment 2597 [details]
The lvs executables you requested:)

Comment 19 Red Hat Bugzilla 2000-08-17 16:23:43 UTC
Ah, I just realized that the config you posted and your real one aren't the same
(IPs changed to protect the innocent, I'm sure)  With the posted one, and your
executable, everything still works fine for me.  Please attach (not
cut-and-paste) your config.  Piranha is a bit touchy about the config, and any
minor difference could totally break my test.

This is a configuration related issue, probably having to do with using a
specific set of options. I'm not sure how much testing goes into the direct mode
of LVS. 

-- Keith Moore

Comment 20 Red Hat Bugzilla 2000-08-17 16:47:37 UTC
Created attachment 2598 [details]
my production lvs.cf

Comment 21 Red Hat Bugzilla 2000-08-17 16:57:21 UTC
That did it.   It IS the same bug, in your production file you have persistent =
900 for the https service, and this triggers the persistent bug.  I was able to
duplicate it immediately.

Apply the above patch and it solves the problem.  If you don't have the setup to
apply and rebuild I'm sure we can get you an updated RPM.

-- Keith Moore

Comment 22 Red Hat Bugzilla 2000-08-17 17:02:01 UTC
I beleive I tried also to avoid the persistence istruction in some of the many
tries I have done. In any case if you can send me an updated RPM I will try.
Thank you very much:):):):)

Comment 23 Red Hat Bugzilla 2000-08-17 18:54:23 UTC
Can you tell us if it is now solved? Can we close this bugzilla entry?



Comment 24 Red Hat Bugzilla 2000-08-18 00:43:28 UTC
Last known status:

Problem was reproduced by Keith Moore, and corrected with the persistence patch.
Keith sent ciofina an updated RPM. Also; Red Hat posted a new source RPM
containing the latest patches.

We are waiting to hear if this problem has been solved and can be closed



Comment 25 Red Hat Bugzilla 2000-08-18 03:34:59 UTC
The problem was solved. Thank you very much:):):)
Sorry for the delay but when you send the new bin here was night:) (I am in Europe).