Case: I define a virtual server www.xyz.com with IP of xxx.xxx.xxx (FQIP) with a real server entry of 192.168.1.11. When i start the lvs services lvs is supposed to create an ipvsadmin rule and enable that rule. Problem: Rule does not get created and thus activated. Current solution: Manually create the ipvsadm rule and enable by hand. Problem: ivsadm rule will not be managed by lvs
As far as we can tell, this should work fine. Piranha certainly will create and modify the ipvsadm rules as needed, so we need to know more in order to investigate this. Certainly, as you point out, that if lvs doesn't create the rule it will have problems maintainig it. Could you update us on the following: 1. What is the ipchains and ipvadm rules you are trying (or expecting to result) to use? Sample commands, ipvsadm list output, etc. 2. Can you include a copy of your lvs.cf config file. 3. A simply diagram of your network setup, with indications of the ip and virtual ip addresses and nic interfaces? Thanks
I shouldn't have to create any ipvsadm rules. Defining the real nodes involved should create the rules. (piranha should have the function to look and see what the name of the virtual server is like www.qixo.com and then look at the real nodes that make up that virtual server and create the rules and enable them.) like 216.200.192.106 is virtual server www.qixo.com this is made upof real nodes 192.168.1.11 and 192.168.1.10 ipchains -A forward -s 192.168.1.0/24 -d 0.0.0.0/0 -j MASQ should already have been applied since the admin should already know that he neds MASQ enabled for his servers. piranha should create these rules and put them in place.. ipvsadm -A -t 216.200.192.106:80 -s rr ipvsadm -a -t 216.200.192.106:80 -r 192.168.1.11 -m ipvsadm -a -t 216.200.192.106:80 -r 192.168.1.10 -m some part of the lvs clustering software should then be monitoring via somethng along the lines of the following to add remove servers as they go up and down.. grep lvs.cf for the real node IPs, pass that information to a script that tests for known response from whatever servies are defined as being handled by the real nodes (like www for instance). If no response removes the non responding server's ipvsadm rule, if there IS a response runs ipvsadm -L and greps for the name or IP of the real node. if there it doesn't re-add it, it just tests the next one. if it;s NOT there it adds the rule. All of this gets checked every 10 to 15 seconds. this needs to be started/ stopped from the script that starts/stops the lvs daemon. and is continuously monitoring. At this juncture the above actions are NOT done. here is the copy of my lvs.cf file as it stands now. CURRENT LVS.CF FILE # # Set up timeout values for the LVS # ================================= ipchains -M -S 7200 10 160 # # Start setting up routing for LVS/HA # =================================== ipvsadm -A -t 216.200.192.111:80 -s rr # RE-ENABLE .12 WHEN DEVEL IS DONE! [root@vs-00 /root]# less /etc/lvs.cf primary = 216.200.192.100 service = lvs rsh_command = rsh backup_active = 1 backup = 216.200.192.101 heartbeat = 1 heartbeat_port = 539 keepalive = 6 deadtime = 12 network = nat nat_router = 192.168.1.22 eth1:0 virtual 216.200.192.106.qixo.com { active = 1 address = 216.200.192.106 eth0:1 port = 80 send = "GET / HTTP/1.0\r\n\r\n" expect = "QIXO" load_monitor = ruptime scheduler = rr protocol = tcp timeout = 6 reentry = 15 server ws-01 { address = 192.168.1.11 active = 1 weight = 1 } server ws-02 { address = 192.168.1.10 active = 1 weight = 1 } } SYSTEMS LAYOUT VIRT_IP | ==================== | | 0 0 LVS Node1 LVS Node2 | | ==================== | | Real Node1 Real Node2
> I shouldn't have to create any ipvsadm rules. Defining the real nodes involved > should create the rules. I thought I said this. This is why I asked for more information; to determine what's wrong in your situation. > some part of the lvs clustering software should then be monitoring via > somethng along the lines of the following to add remove servers as > they go up and down.. This is what the product does. > grep lvs.cf for the real node IPs, pass that information to a script > that tests for known response from whatever servies are defined as being > handled by the real nodes (like www for instance). If no response removes > the non responding server's ipvsadm rule, if there IS a response runs > ipvsadm -L and greps for the name or IP of the real node. if there it > doesn't re-add it, it just tests the next one. if it;s NOT there it adds > the rule. Again, this is what the product does. > At this juncture the above actions are NOT done. OK, this is why we need to look at your situation a bit. > here is the copy of my lvs.cf file as it stands now. GREAT. We'll look at it. This diagram helps a little too. Could you supply one more piece of information? Your diagram does not indicate the all the non-virtual IP addresses being used, nor their interfaces. In order to recreate your problem in our lab, we could use tht information. If it helps, there are simple block diagrams in the HA Server installation Guide that you could clone. lvs.cf does not show all the ip addresses involved in a setup (for example; your ipvs rules reference an ip address not shown). Thanks.
Keith the setup is very simple. It is exactly like what is in your manual. Front end nodes have FQIPs on eth0 with eth1 being the 192.168.1.x IPs. Piranha configures and enables eth0:0 as the floating FQIP that the world sees as the cluster IP. Piranha configures and enables eth1:0 as the NAT device as shown in the lvs.cf file. Now, the situation has changed in regards to this product. There is no way you can say the product does what it's supposed to. It DOES create the needed eth aliases and DOES enable them and DOES maintain them. We CAN send information back and forth from the front end FQIP to the server farm in the back and we DO get responses BUT this is ONLY after we HAND create the ipvsadm rules to do this and put them in place. WE should not be doing this, PIRANHA is supposed to be doing this. This is most DEFINITELY broken in the software! I've called and spoken with Q about this issue, we went thru on the telephone and configured this according to the manual. (He had one that he walked through with me (manual that is) to the point that we were reading off page numbers to make sure we were in the same section of the manual!) Simply put folks, this software is broken. Piranha does NOT create, maintain, or modify the needed ipvsadm rules that will make this product work as advertised. At this current point it does NOT work as advertised! To make matters worse when we called in to the Durham office to get this definite bug fixed, we were told that we would have to pay for a development contract to get this to work. ERR? Why should WE have to pay for an additional contract to fix a problem with code in the original product. A bug that should have NOT been in the code and should have been working in the original product. That makes entirely NO sense! Why should WE be charged for fixing a bug that is core to the product working correctly and as advertised?? You mean we are going to be charged to fix something wrong with your product? When I asked what the basis was for the charge, I was told that it was because the support contract that comes with the product is for installation configuration and administration only and NOT for fixing something at the code level. Keith, this is a problem at the code level that should NOT have been there in the FIRST place! And this definitely affects the configuration and adminsitration points of the contract since neither Red Hat nor I can rectify the problem if the code is broken! The code is most DEFINTIELY broken. Next, when the conversation moved to refund territory, we were told by Chris that management was going to keep 500 of our money for technical support already rendered! Why? The technical support was for nothing more than reporting to you via telephone that there was a possible bug in the software and to have it verifiied that there WAS in fact a bug! ***This IS a bug!*** There is no way we will allow a $500 charge! I have left numerous mesages with various folks involved with this like Nathan Thomas, Q, yourself, Kim Lynch, and others. This is rapidly starting to feel like WE are being made to pay for the RIGHT to have bugs fixed that should have been working in the first place since the whole LVS structure hinges on this codse working correctly. Right now there is no controlling entity in any way shape or form that handles nodes coming in or out of the server pool. All additions, first time entries, and removals of dead machines are having to be handled by a human. Right now the only thing the product DOES do correctly is to rotate the FQIP for the virtual server between the front end nodes. Needless to say, neither my CTO, CEO, nor I are happy in th
The problem you are reporting is unique to your situation. This is not a known bug with the product. In fact, it is a fundemental part of the product to perform ipvsadm calls. Bugs are always possible -- this could be a unique ipvs situation, but it needs to be investigated and that requires time and cooperation. After several commnuications, involving both support and myself, it has become apparent that there are more issues being brought into this situation than just a problem report, and that bugzilla is not the best forum to resolve them. Certainly I am not on a position to respond to refund disatisfaction. This problem has been moved to Red Hat support.
Additional information: Using the posted lvs.cf file, the problem was not reproducible in the lab and the system responded correctly.
Keith, I do not see how we can be the only ones out here with this problem. marking the problem resolved does NOT make the problem go away though it does make it appear that a single customer is having a problem with this and therefore not a bug and therefore a face saving solution. This problem is NOT resolved whether it is marked as such or not. We ARE working with Red hat to resolve this issue once key members return from the LWE. (we can discuss this at LinuxWorld if you will be attending.) Also, I was not stating that you had anything to do with refund stuff. I brought that out into the open due to the lack of response we recieved from Red hat on these issues. Since playing phone tag was getting no where, a public announcement in bugzilla regarding the problem was necessitated. HOWEVER< the problem has been resolved to both party's satisfaction at this point even if the underlying issue is not resolved as of yet. I do however believe that one will be forth coming, though we do take exception to the early closing of this bug before we, as a team, have had a chance to work on it. If the lvs.cf file given to you worked in your labs then it should have worked equally fine in our systems. I do however pose the possibility that mayhaps there is something in the hardware of a Dell 2450 server that may or may not cause this issue since that has not been addressed, nor the question even posed as to what hardware we were using. No generic troublshooting questions were asked in fact other than those that you posed to me in this forum. At this juncture I will leave off further comments regarding this issue until such time as we can work on this after the LWE. A working relationship has been established from which to solve this puzzle due to an earlier discussion. I will not argue the closing of this issue other than to publicly state that the original problem has not thus far been solved, but steps have been taken by BOTH sides to ensure this becomes the case.
Again, this entry is closed because there are several, non-technical support issues involved with this customer. These will not be elaborated on here. Since official phone support is involved and bugzilla is a casual support vehicle (there is no obligation by Red Hat to respond to postings here), there will not be further activity logged on this bugzilla entry. This is also not a proper forum for debate.