Hide Forgot
Description of problem: As per the doc bug on load balancer, "That mirror list round robins the order of CDS instances returned", this is the reason I see this issue only for alternate yum tries after yum clean all. yum is having issues fetching from the other url when the primary url is active. I think this will cause a lot of load on a single CDS node, risking the load balancing. [root@dhcp201-140 ~]# yum clean all Loaded plugins: rhui-lb Cleaning up Everything [root@dhcp201-140 ~]# yum install zsh Loaded plugins: rhui-lb https://dhcp201-131.englab.pnq.redhat.com/pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/repodata/repomd.xml: [Errno 14] problem making ssl connection Trying other mirror. rhui-rhel-server-6-releases | 3.7 kB 00:00 rhui-rhel-server-6-releases/primary_db | 2.9 MB 00:00 Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package zsh.x86_64 0:4.3.10-4.1.el6 set to be updated --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================================================================================= Package Arch Version Repository Size ================================================================================================================================================================================= Installing: zsh x86_64 4.3.10-4.1.el6 rhui-rhel-server-6-releases 2.1 M Transaction Summary ================================================================================================================================================================================= Install 1 Package(s) Upgrade 0 Package(s) Total download size: 2.1 M Installed size: 4.8 M Is this ok [y/N]: n Exiting on user Command Complete! [root@dhcp201-140 ~]# yum clean all Loaded plugins: rhui-lb Cleaning up Everything [root@dhcp201-140 ~]# yum install zsh Loaded plugins: rhui-lb rhui-rhel-server-6-releases | 3.7 kB 00:00 rhui-rhel-server-6-releases/primary_db | 2.9 MB 00:00 Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package zsh.x86_64 0:4.3.10-4.1.el6 set to be updated --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================================================================================= Package Arch Version Repository Size ================================================================================================================================================================================= Installing: zsh x86_64 4.3.10-4.1.el6 rhui-rhel-server-6-releases 2.1 M Transaction Summary ================================================================================================================================================================================= Install 1 Package(s) Upgrade 0 Package(s) Total download size: 2.1 M Installed size: 4.8 M Is this ok [y/N]: n Exiting on user Command Complete! Version-Release number of selected component (if applicable): [root@dhcp201-137 noarch]# rpm -qav | grep -ie pulp -ie rh-rhui-tools pulp-common-0.0.204-1.el6.noarch pulp-client-0.0.204-1.el6.noarch rh-rhui-tools-2.0.37-1.el6.noarch pulp-0.0.204-1.el6.noarch How reproducible: Steps to Reproduce: 1.yum clean all 2. yum install zsh 3. yum clean all 4. yum install zsh Actual results: Throws error saying 'problem making ssl connection' when using the other url from the mirrorlist. Expected results: Shouldn't be facing ssl connection issue on consecutive attempt of installation. Additional info: 1) It works perfectly for the rhel5 clients, verified also from the /var/log/httpd/ssl_request.log file on both the machines. 2) Also on the primary load balancer node if httpd service is shutdown, the other node then serves requests.
There's a lot missing from this bug report. Again, instead of just dumping a bunch of console output, please describe your environment. There is really not much for me to go on in this. Have you tried to make a yum connection directly to both of the CDS instances to see if they are both working? How about wget? For that matter, what is it that you're trying to test? A normal working environment or a situation where one of the CDS instances has failed? I also have questions on the report contents itself: "As per the doc bug on load balancer, "That mirror list round robins the order of CDS instances returned", this is the reason I see this issue only for alternate yum tries after yum clean all." I don't understand. Are you saying one CDS is down? "yum is having issues fetching from the other url when the primary url is active." What do you mean by "when the primary url is active"? What does active mean? Is the server for the primary URL up or down? "I think this will cause a lot of load on a single CDS node, risking the load balancing."" So are you saying the load balancing isn't evenly distributing the load? "2) Also on the primary load balancer node if httpd service is shutdown, the other node then serves requests." This makes it sound like you were trying to test a downed CDS, in which case of course there will be a lot of load on a single CDS. But again, I don't have a clue what it is you're trying to do so I'm left guessing.
My mistake, I think didn't use the right terminology. CDS1 = primary load balancer = primary url = here dhcp201-136 CDS2 = other url = here dhcp201-131 active mean = apache/httpd is running I am not trying downed cds, but just to rule out some issue with CDS2/dhcp201-131 node , I shutdown the apache service on CDS1/dhcp201-136 and now able to fetch stuff from CDS2/dhcp201-131 without any "problem making ssl connection" issue. This behaviour is weired. I have two CDS nodes. >>>>> Have you tried to make a yum connection directly to both of the CDS instances to see if they are both working? How about wget? Yes, yum connection directly to both of the CDS instances works. >>>>> For that matter, what is it that you're trying to test? A normal working environment or a situation where one of the CDS instances has failed? A normal working environment, both CDS nodes running httpd service. The issue is that the requests are being sent only to a single CDS node in the case of rhel6 and fedora clients, this works perfectly fine in the case of rhel5. I believe that "mirror list round robins the order of CDS instances returned", this happens I guess in the case of rhel6 and fedora clients , but for some reason when the CDS node is not primary load balancer , it throws an error saying "Problem making ssl connection".(NOTE: While a direct yum connection request to this node is perfect) Trying to say that whenever the other url(non primary load balancer url ) is being used or lets say here being returned, its having "problem making ssl connection" ( Both the CDS nodes are running the apache services) ########################################################################## Below, here I think the negative testing approach may explain things more clearly, which is being a bit tough because of load balancing and round robin stuff. negative testing => Below ouput is when, trying to download without entitlement certs. =============================================================================== [root@dhcp201-140 ~]# yum install zsh Loaded plugins: rhui-lb https://dhcp201-131.englab.pnq.redhat.com/pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/repodata/repomd.xml: [Errno 14] problem making ssl connection Trying other mirror. https://dhcp201-136.englab.pnq.redhat.com/pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/repodata/repomd.xml: [Errno 14] HTTP Error 401 : https://dhcp201-136.englab.pnq.redhat.com/pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/repodata/repomd.xml Trying other mirror. Error: Cannot retrieve repository metadata (repomd.xml) for repository: rhui-rhel-server-6-releases. Please verify its path and try again =============================================================================== FRom the above output we can see that the error here it is again , a) problem making ssl connection , for https://dhcp201-131.englab.pnq.redhat.com which is the CDS2 node or other url I am referring. b) The error should have been the same in both the cases(for both the url) as Authorization is Required. ############################################################################### >>>>>> So are you saying the load balancing isn't evenly distributing the load? yes, in the case of only rhel 6 and fedora and not rhel5.
Ok, I think I get what you're saying: - You have two CDS instances running (we'll call one primary and one secondary). - Yum is configured to use the load balancer on the primary. - Every other request results in a "problem making ssl connection" error. This lines up to only requests that get balanced to the secondary CDS instance. - This issue is not seen on RHEL 5, only RHEL 6 clients. RHEL 5 clients connecting to the primary CDS as the load balancer can retrieve content from both CDS instances without this error. - You can use yum on the RHEL 6 client to connect directly to the secondary CDS correctly. Keep in mind that when testing this, the ID of the repository can _not_ start with "rhui-", otherwise the RHUI load balancing plugin will kick in and affect the destination URL. The safest bet to elliminate all of this would be to use curl directly against both CDS instances to retrieve repomd.xml. - If you shut down the primary and the RHEL 6 client fails over to using the secondary as the load balancer, then subsequent yum requests to the secondary CDS actually succeed. Is all of that correct? What version of Fedora did you also test this on? If it turns out to be an issue in the yum plugin, I'd guess it's a version thing (since Fedora and RHEL 6 yum versions are probably closer than they are to RHEL 5).
I have to change the bugs name. Fedora is F14 >>>>>>If you shut down the primary and the RHEL 6 client fails over to using the secondary as the load balancer, then subsequent yum requests to the secondary CDS actually succeed. 1) FAILOVER is working fine, but the issue is with LOAD BALANCING feature. 2) Always, all the requests go to the CDS1 node in the case of RHEL6 and F14 clients, when both the CDS apache servers are running. 3) I see no log updates in ssl_request.log for CDS2 node. 4) As, this works perfectly for RHEL5, may be the issue is with yum plugin, not sure though. 5) Also while connecting directly to the secondary CDS it works, I try this way by disabling the plugin in rhui-lb.conf file, and using the below url. baseurl=https://dhcp201-131.englab.pnq.redhat.com/pulp/repos/content/dist/rhel/rhui/server-6/releases/$releasever/$basearch/os
So it's not that the load balancer isn't balancing the load across CDS 1 and CDS 2, it's that when the client attempts to go to CDS 2 for content it's getting this error? The LB on CDS 1 is successfully rotating the returned mirror list, but in cases where the client is getting CDS 2 as the first in the list it fails with the problem making ssl connection error? So it's not really a load balancing bug, since the load is still being balanced. It's that the calls to CDS 2, when accessed through the load balancing functionality, fail. But only in that case; directly to CDS 2 works correctly?
Removing needinfo; Kedar confirmed the last comment in chat.
Also observed with pulp 211 and rhui 2.0.40
May be related to bug , https://bugzilla.redhat.com/show_bug.cgi?id=716951 Also, If nss and curl rpms are upgraded to => nss 3.12.9-9 => curl-7.19.7-26 then the actual load-balancing functionality is working fine. (the comment 5 issue not observed after upgrading to above nss and curl rpms)
I was able to reproduce this using the upgraded versions of NSS and curl. I think this has something to do with NSS locally caching the certificates. I have two CDS instances set up, "cds-1" and "cds-2". I can connect to cds-1 fine. When I try to use curl to go to cds-2, I get this: [root@ip-10-195-79-175 yum.repos.d]# curl --cacert /etc/pki/entitlement/ca.crt --cert /etc/pki/entitlement/product/content.crt --key /etc/pki/entitlement/key.pem -v https://cds-2/pulp/repos//content/dist/rhel/rhui/server/5Server/x86_64/rhui/1.2/os/repodata/repomd.xml * About to connect() to cds-2 port 443 (#0) * Trying 10.83.106.3... connected * Connected to cds-2 (10.83.106.3) port 443 (#0) * Initializing NSS with certpath: /etc/pki/nssdb * CAfile: /etc/pki/entitlement/ca.crt CApath: none * NSS error -8054 * Closing connection #0 * SSL connect error curl: (35) SSL connect error The NSS error -8054 has to do with certificate revocation lists. The cert isn't revoked per se, but I think NSS is getting confused. My initial thought was that it was in the server-side SSL certs, but in this case both have unique serial numbers. I think this is a client-side thing.
I lied, it's not about CRLs. The description of -8054 is: "You are attempting to import a cert with the same issuer/serial as an existing cert, but that is not the same cert." That's probably what's happening; I'm trying to use the same client-side cert as was used previously and it's barfing on the import.
Kedar - Can you check the serial numbers of all of your CDS instances' SSL certificates and your client-side entitlement certificate? You can find it in the `openssl x509 -text -in <cert` view. I suspect the issue may be that there's a duplicate serial number in there somewhere and NSS is viewing that as a security breach (rightfully so).
commit 1248a82a1235415934e2f6b3bc52930603fb834e Author: Jay Dobies <jason.dobies> Date: Tue Jul 26 09:16:42 2011 -0400 720665 - Default the serial number to 100. This should alleviate any potential overlap in serial numbers if the same CA was used to sign the CDS SSL certificates. rhui-2.0/tools/src/rhui/tools/launcher.py
[root@ip-10-91-17-24 build]# openssl x509 -in cds1i-950da0f4.crt -noout -serial serial=08 [root@ip-10-91-17-24 build]# openssl x509 -in cds2i-970da0f6.crt -noout -serial serial=08 [root@ip-10-91-17-24 build]# openssl x509 -in rhuai-910da0f0.crt -noout -serial serial=08 And also a single CA was being used for all SSL certs. We will try to use different serial numbers now and proceed with the testing.
After using different serial numbers for CDS nodes, the load balancing stuff is working fine for rhel6 nodes [root@ip-10-172-38-137 ~]# rpm -qav | grep -ie "^curl" -ie "nss-3" nss-3.12.7-2.el6.x86_64 curl-7.19.7-16.el6.x86_64 Need to still verify on fedora clients.
LOAD BALANCING stuff is working fine. CDS2 ssl_request_log 10.88.30.207 - /CN=Red Hat Update Infrastructure [28/Jul/2011:14:12:36 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/repodata/repomd.xml HTTP/1.1" 200 3828 10.88.30.207 - /CN=Red Hat Update Infrastructure [28/Jul/2011:14:12:36 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/repodata/7e94eeabef95726e82318ff8b73520540d2747feb3c0425130fcafdc560340c1-primary.sqlite.bz2 HTTP/1.1" 200 3002152 10.88.30.207 - /CN=Red Hat Update Infrastructure [28/Jul/2011:14:12:43 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/Packages/zsh-4.3.10-4.1.el6.x86_64.rpm HTTP/1.1" 200 2240608 CDS1 ssl_request log primary load balancer 10.88.30.207 - - [28/Jul/2011:14:12:35 -0400] "GET /pulp/mirror/?members HTTP/1.1" 200 58 10.88.30.207 - - [28/Jul/2011:14:12:35 -0400] "GET /pulp/mirror/content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os HTTP/1.1" 200 216 10.88.30.207 - - [28/Jul/2011:14:13:21 -0400] "GET /pulp/mirror/?members HTTP/1.1" 200 58 10.88.30.207 - - [28/Jul/2011:14:13:21 -0400] "GET /pulp/mirror/content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os HTTP/1.1" 200 216 10.88.30.207 - /CN=Red Hat Update Infrastructure [28/Jul/2011:14:13:21 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/repodata/repomd.xml HTTP/1.1" 200 3828 10.88.30.207 - /CN=Red Hat Update Infrastructure [28/Jul/2011:14:13:21 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/repodata/7e94eeabef95726e82318ff8b73520540d2747feb3c0425130fcafdc560340c1-primary.sqlite.bz2 HTTP/1.1" 200 3002152 10.88.30.207 - /CN=Red Hat Update Infrastructure [28/Jul/2011:14:13:27 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/Packages/zsh-4.3.10-4.1.el6.x86_64.rpm HTTP/1.1" 200 2240608 fed14 cds1 ssl_request log fed14/os/repodata/repomd.xml HTTP/1.1" 200 2693 10.192.198.80 - "" [29/Jul/2011:03:33:32 -0400] "GET /pulp/repos//baseos/x86_64/fed14/os/repodata/primary.sqlite.bz2 HTTP/1.1" 200 8497 10.192.198.80 - "" [29/Jul/2011:03:33:34 -0400] "GET /pulp/repos//baseos/x86_64/fed14/os/zsh-4.3.10-5.fc14.x86_64.rpm HTTP/1.1" 200 2240844 10.86.222.208 - - [29/Jul/2011:03:58:33 -0400] "GET /pulp/mirror/?members HTTP/1.1" 200 58 10.86.222.208 - "" [29/Jul/2011:03:58:34 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server/5Server/x86_64/os/repodata/repomd.xml HTTP/1.1" 401 498 10.86.222.208 - - [29/Jul/2011:04:58:33 -0400] "GET /pulp/mirror/?members HTTP/1.1" 200 58 10.86.222.208 - "" [29/Jul/2011:04:58:34 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server/5Server/x86_64/os/repodata/repomd.xml HTTP/1.1" 401 498 cds2 ssl_request log 10.192.198.80 - "" [29/Jul/2011:03:33:04 -0400] "GET /pulp/repos//baseos/x86_64/fed14/os/repodata/repomd.xml HTTP/1.1" 200 2693 10.192.198.80 - "" [29/Jul/2011:03:33:05 -0400] "GET /pulp/repos//baseos/x86_64/fed14/os/repodata/primary.sqlite.bz2 HTTP/1.1" 200 8497 10.192.198.80 - "" [29/Jul/2011:03:33:16 -0400] "GET /pulp/repos//baseos/x86_64/fed14/os/zsh-4.3.10-5.fc14.x86_64.rpm HTTP/1.1" 200 2240844 10.86.222.208 - "" [29/Jul/2011:03:58:34 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server/5Server/x86_64/os/repodata/repomd.xml HTTP/1.1" 401 497 10.86.222.208 - "" [29/Jul/2011:04:58:34 -0400] "GET /pulp/repos//content/dist/rhel/rhui/server/5Server/x86_64/os/repodata/repomd.xml HTTP/1.1" 401 497
moving to release pending
closing out, product released