Bug 15909 - lvs component fails to create ipvsadm entries
Summary: lvs component fails to create ipvsadm entries
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat High Availability Server
Classification: Retired
Component: ipvsadm
Version: 1.0
Hardware: i386
OS: Linux
high
high
Target Milestone: ---
Assignee: Phil Copeland
QA Contact: Phil Copeland
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2000-08-10 08:21 UTC by David D.W. Downey
Modified: 2005-10-31 22:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2000-08-14 18:22:58 UTC
Embargoed:


Attachments (Terms of Use)

Description Red Hat Bugzilla 2000-08-10 08:21:17 UTC
Case:        I define a virtual server www.xyz.com with IP of xxx.xxx.xxx (FQIP) with a real server entry of 192.168.1.11. When i start the lvs 
services lvs is supposed to create an ipvsadmin rule and enable that rule. 

Problem: Rule does not get created and thus activated.

Current solution:      Manually create the ipvsadm rule and enable by  hand.

Problem: ivsadm rule will not be managed by lvs

Comment 1 Red Hat Bugzilla 2000-08-10 18:52:50 UTC
As far as we can tell, this should work fine. Piranha certainly will create and
modify the ipvsadm rules as needed, so we need to know more in order to
investigate this. Certainly, as you point out, that if lvs doesn't create the
rule it will have problems maintainig it. Could you update us on the following:

1. What is the ipchains and ipvadm rules you are trying (or expecting to result)
   to use? Sample commands, ipvsadm list output, etc.

2. Can you include a copy of your lvs.cf config file.

3. A simply diagram of your network setup, with indications of the ip and
   virtual ip addresses and nic interfaces?


Thanks



Comment 2 Red Hat Bugzilla 2000-08-13 07:09:31 UTC
I shouldn't have to create any ipvsadm rules. Defining the real nodes involved 
should create the rules. (piranha should have the function to look and see what 
the name of the virtual server is like www.qixo.com and then look at the real 
nodes that make up that virtual server and create the rules and enable them.)

like 216.200.192.106 is virtual server www.qixo.com
this is made upof real nodes 192.168.1.11 and 192.168.1.10

ipchains -A forward -s 192.168.1.0/24 -d 0.0.0.0/0 -j MASQ should already have 
been applied since the admin should already know that he neds MASQ enabled for 
his servers.

piranha should create these rules and put them in place..
ipvsadm -A -t 216.200.192.106:80 -s rr
ipvsadm -a -t 216.200.192.106:80 -r 192.168.1.11 -m
ipvsadm -a -t 216.200.192.106:80 -r 192.168.1.10 -m

some part of the lvs clustering software should then be monitoring via somethng 
along the lines of the following to add remove servers as they go up and down..


grep lvs.cf for the real node IPs, pass that information to a script that tests 
for known response from whatever servies are defined as being handled by the 
real nodes (like www for instance). If no response removes the non responding 
server's ipvsadm rule, if there IS a response runs ipvsadm -L and greps for the 
name or IP of the real node. if there it doesn't re-add it, it just tests the 
next one. if it;s NOT there it adds the rule.
All of this gets checked every 10 to 15 seconds. this needs to be started/
stopped from the script that starts/stops the lvs daemon. and is continuously 
monitoring.

At this juncture the above actions are NOT done.
 here is the copy of my lvs.cf file as it stands now.



CURRENT LVS.CF FILE

#
# Set up timeout values for the LVS
# =================================
ipchains -M -S 7200 10 160
#
# Start setting up routing for LVS/HA
# ===================================
ipvsadm -A -t 216.200.192.111:80 -s rr
# RE-ENABLE .12 WHEN DEVEL IS DONE!
[root@vs-00 /root]# less /etc/lvs.cf
primary = 216.200.192.100
service = lvs
rsh_command = rsh
backup_active = 1
backup = 216.200.192.101
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 12
network = nat
nat_router = 192.168.1.22 eth1:0
virtual 216.200.192.106.qixo.com {
     active = 1
     address = 216.200.192.106 eth0:1
     port = 80
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "QIXO"
     load_monitor = ruptime
     scheduler = rr
     protocol = tcp
     timeout = 6
     reentry = 15
     server ws-01 {
         address = 192.168.1.11
         active = 1
         weight = 1
     }
     server ws-02 {
			address = 192.168.1.10
			active = 1
			weight = 1
	}
}



SYSTEMS LAYOUT

             VIRT_IP
                |
       ====================
       |                  |
       0                  0
      LVS Node1         LVS Node2
       |                  |
       ====================
        |                |
      Real Node1        Real Node2





Comment 3 Red Hat Bugzilla 2000-08-14 15:05:56 UTC
> I shouldn't have to create any ipvsadm rules. Defining the real nodes involved 
> should create the rules. 

I thought I said this.

This is why I asked for more information; to determine what's wrong in your
situation.


> some part of the lvs clustering software should then be monitoring via
> somethng along the lines of the following to add remove servers as
> they go up and down..

This is what the product does.


> grep lvs.cf for the real node IPs, pass that information to a script
> that tests for known response from whatever servies are defined as being
> handled by the real nodes (like www for instance). If no response removes
> the non responding server's ipvsadm rule, if there IS a response runs
> ipvsadm -L and greps for the name or IP of the real node. if there it
> doesn't re-add it, it just tests the next one. if it;s NOT there it adds
> the rule.

Again, this is what the product does.


> At this juncture the above actions are NOT done.

OK, this is why we need to look at your situation a bit.


>  here is the copy of my lvs.cf file as it stands now.


GREAT. We'll look at it.  This diagram helps a little too. Could you supply one
more piece of information? Your diagram does not indicate the all the
non-virtual IP addresses being used, nor their interfaces. In order to recreate
your problem in our lab, we could use tht information. If it helps, there are
simple block diagrams in the HA Server installation Guide that you could clone.
lvs.cf does not show all the ip addresses involved in a setup (for example; your
ipvs rules reference an ip address not shown). Thanks.




Comment 4 Red Hat Bugzilla 2000-08-14 18:22:41 UTC
Keith the setup is very simple. It is exactly like what is in your manual.

Front end nodes have FQIPs on eth0 with eth1 being the 192.168.1.x IPs. Piranha configures and enables eth0:0 as the floating FQIP that the world sees 
as the cluster IP. Piranha configures and enables eth1:0 as the NAT device as shown in the lvs.cf file.

Now, the situation has changed in regards to this product. There is no way you can say the product does what it's supposed to.  It DOES create the 
needed eth aliases and DOES enable them and DOES maintain them. We CAN send information back and forth from the front end FQIP to the server farm 
in the back and we DO get responses BUT this is ONLY after we HAND create the ipvsadm rules to do this and put them in place. WE should not be doing 
this, PIRANHA is supposed to be doing this. This is most DEFINITELY broken in the software! I've called and spoken with Q about this issue, we went 
thru on the telephone and configured this according to the manual. (He had one that he walked through with me (manual that is) to the point that we 
were reading off page numbers to make sure we were in the same section of the manual!)

Simply put folks, this software is broken. Piranha does NOT create, maintain, or modify the needed ipvsadm rules that will make this product work as 
advertised. At this current point it does NOT work as advertised!   To make matters worse when we called in to the Durham office to get this definite 
bug fixed, we were told that we would have to pay for a development contract to get this to work. 

ERR?  Why should WE have to pay for an additional contract to fix a problem with code in the original product. A bug that should have NOT been in the 
code and should have been working in the original product. That makes entirely NO sense! Why should WE be charged for fixing a bug that is core to 
the product working correctly and as advertised?? You mean we are going to be charged to fix something wrong with your product? When I asked 
what the basis was for the charge, I was told that it was because the support contract that comes with the product is for installation configuration and 
administration only and NOT for fixing something at the code level. Keith, this is a problem at the code level that should NOT have been there in the 
FIRST place! And this definitely affects the configuration and adminsitration points of the contract since neither  Red Hat nor I can rectify the problem if 
the code is broken! The code is most DEFINTIELY broken.


Next, when the conversation moved to refund territory, we were told by Chris that management was going to keep 500 of our money for technical 
support already rendered! Why? The technical support was for nothing more than reporting to you via telephone that there was a possible bug in the 
software and to have it verifiied that there WAS in fact a bug!  ***This IS a bug!*** There is no way we will allow a $500 charge!

I have left numerous mesages with various folks involved with this like Nathan Thomas, Q, yourself, Kim Lynch, and others. This is rapidly starting to 
feel like WE are being  made to pay for  the RIGHT to have bugs fixed that should have been working in the first place since the whole LVS structure 
hinges on this codse working correctly. Right now there is no controlling entity in any way shape or form that handles nodes coming in or out of the 
server pool. All additions, first time entries, and removals of dead machines are having to be handled by a human. Right now the only thing the product 
DOES do correctly is to rotate the FQIP  for the virtual server between the front end nodes.  

Needless to say, neither my CTO, CEO, nor I are happy in th

Comment 5 Red Hat Bugzilla 2000-08-14 21:22:32 UTC
The problem you are reporting is unique to your situation. This is not a known
bug with the product. In fact, it is a fundemental part of the product to
perform ipvsadm calls. Bugs are always possible -- this could be a unique ipvs
situation, but it needs to be investigated  and that requires time and
cooperation. 

After several commnuications, involving both support and myself, it has become
apparent that there are more issues being brought into this situation than just
a problem report, and that bugzilla is not the best forum to resolve them.
Certainly I am not on a position to respond to refund disatisfaction.

This problem has been moved to Red Hat support.


Comment 6 Red Hat Bugzilla 2000-08-14 21:24:55 UTC
Additional information: Using the posted lvs.cf file, the problem was not
reproducible in the lab and the system responded correctly.


Comment 7 Red Hat Bugzilla 2000-08-15 04:37:27 UTC
Keith, I do not see how we can be the only ones out here with this problem. 

marking the problem resolved does NOT make the problem go away though it does 

make it appear that a single customer is having a problem with this and 

therefore not a bug and therefore a face saving solution.



This problem is NOT resolved whether it is marked as such or not. We ARE working 

with Red hat to resolve this issue once key members return from the LWE. (we can 

discuss this at LinuxWorld if you will be attending.)



Also, I was not stating that you had anything to do with refund stuff. I brought 

that out into the open due to the lack of response we recieved from Red hat on 

these issues. Since playing phone tag was getting no where, a public 

announcement in bugzilla regarding the problem was necessitated. HOWEVER< the 

problem has been resolved to both party's satisfaction at this point even if the 

underlying issue is not resolved as of yet.



I do however believe that one will be forth coming, though we do take exception 

to the early closing of this bug before we, as a team, have had a chance to work 

on it. If the lvs.cf file given to you worked in your labs then it should have 

worked equally fine in our systems. I do however pose the possibility that 

mayhaps there is something in the hardware of a Dell 2450 server that may or may 

not cause this issue since that has not been addressed, nor the question even 

posed as to what hardware we were using. No generic troublshooting questions 

were asked in fact other than those that you posed to me in this forum. 



At this juncture I will leave off further comments regarding this issue until 

such time as we can work on this after the LWE. A working relationship has been 

established from which to solve this puzzle due to an earlier discussion.



I will not argue the closing of this issue other than to publicly state that the 

original problem has not thus far been solved, but steps have been taken by BOTH 

sides to ensure this becomes the case.



Comment 8 Red Hat Bugzilla 2000-08-15 17:41:22 UTC
Again, this entry is closed because there are several, non-technical support
issues involved with this customer. These will not be elaborated on here. Since
official phone support is involved and bugzilla is a casual support vehicle
(there is no obligation by Red Hat to respond to postings here), there will not
be further activity logged on this bugzilla entry. This is also not a proper
forum for debate.






Note You need to log in before you can comment on or make changes to this bug.