Bug 1158023
| Summary: | DNS Failure of first DNS server causes engine lockup: java.io.EOFException: SSL peer shut down incorrectly | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Daniel Helgenberger <daniel.helgenberger> | ||||
| Component: | ovirt-engine-core | Assignee: | Eli Mesika <emesika> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Pavel Stehlik <pstehlik> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.5 | CC: | alonbl, bugs, ecohen, gklein, iheim, lsurette, oourfali, rbalakri, s.kieske, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.6.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | infra | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-03-10 09:30:25 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
this is known issue with java resolve for example[1] to overcome this a custom resolver should be implemented and a custom host verifier. this is not engine bug but java. [1] http://www.rexconsulting.net/tip-java-does-not-honor-dns-ttl-recommendation-in-enterprise-environment.html Thanks (In reply to Alon Bar-Lev from comment #1) > this is known issue with java resolve for example[1] > Thanks Alon for pointing out Java and this Doc (I was expecting such a thing I have to admit). As your source states this is rather a 'feature' and not a bug in Java. In turn, this should affect Jboss as a whole (enterprise environment). I opened an RFE for this custom resolver [1]. I would think this matter is rather severe - I wonder nobody else has reported something like this yet? [1] BZ1158487 (In reply to Daniel Helgenberger from comment #2) > Thanks (In reply to Alon Bar-Lev from comment #1) > > this is known issue with java resolve for example[1] > > > Thanks Alon for pointing out Java and this Doc (I was expecting such a thing > I have to admit). > > As your source states this is rather a 'feature' and not a bug in Java. In > turn, this should affect Jboss as a whole (enterprise environment). > I opened an RFE for this custom resolver [1]. I would think this matter is > rather severe - I wonder nobody else has reported something like this yet? > > [1] BZ1158487 I just worked very hard on different[1] component to enable dynamic resolution... java has very poor support within it native base classes, not sure why, as it is recent technology. [1] http://gerrit.ovirt.org/gitweb?p=ovirt-engine-extension-aaa-ldap.git;a=blob;f=README.unboundid-ldapsdk;hb=HEAD (In reply to Alon Bar-Lev from comment #3) > java has very poor support within it native base classes, not > sure why, as it is recent technology. I think we need to ask Oracle or alternatively move JBoss to DjangoBoss or RailsBoss ;) Back to the subject, reading your gerrit issue: > The UnboundID LDAP SDK for Java provides the RoundRobinDNSServerSet to > provide some remedy, it does that by duplicating functionality of the basic > ServerSets, it includes support for fail over, random, round robin modes. If I am not mistaken this functionality should do for this case? Particularity failover mimics the OS behavior. (In reply to Daniel Helgenberger from comment #4) > > The UnboundID LDAP SDK for Java provides the RoundRobinDNSServerSet to > > provide some remedy, it does that by duplicating functionality of the basic > > ServerSets, it includes support for fail over, random, round robin modes. > > If I am not mistaken this functionality should do for this case? > Particularity failover mimics the OS behavior. if dynamic dns processing is required, application should use jndi dns provider instead of allowing java to do this automatically, this is possible only in some cases, as java has classes, especially jndi ldap that cannot be used with custom resolver implementation. (In reply to Alon Bar-Lev from comment #5) > if dynamic dns processing is required, application should use jndi dns > provider instead of allowing java to do this automatically, this is possible > only in some cases, as java has classes, especially jndi ldap that cannot be > used with custom resolver implementation. Sorry, I cannot quite follow here. If I get it right then the goal would be to have one DNS resolver/proxy function to cover all use cases (ldap srv records; host names...); witch is a very sensible approach. I am no developer and and can only provide ideas to the extent of my knowledge. That said I would therefore solve the problem my calling the nslookup or dig binaries from Java; witch should not be very expensive but is quite dirty to say the least and may create some other issues (like recreating your 'own' DNS cache). Dynamic processing is only required to the extend used by the Engine. I gave this never much thought really. I prefer a clean setup where hosts are resolvable in DNS so I put some name servers in the engine's resolve.conf. I guess when I registered the hosts the Engine did a lookup; witch was fine and now looks up hosts by hostname on startup? A real dynamic setup, as I understand it, involves short DNS TTLs (Google, Facebook, EC2 style load balancing). Of course, this is not applicable for oVirt; and most likely never will be. Hosts (and DNS records for that matter) in production will have (semi)static IPs most of the time. For example, I just put my hosts in /etc/hosts to work around this issue. Maybe adding host/ip pairs to some /var/lib - file could fix the the immediate issue? Whatever you come up with will surely be much better and I am happy to test this if needed. *** Bug 1158487 has been marked as a duplicate of this bug. *** Per the comments above, this issue is a java issue and not an engine one, and we don't plan to workaround it in the engine code. Closing as WONTFIX. |
Created attachment 951353 [details] Sample engine.log: SSL peer shut down incorrectly Description of problem: If Engine fails to look up hosts from the first DNS server the connection to the hosts is not resolved any more. The engine is effectively locked up endlessly trying to fence and to connect to hosts. Logs filling with 'java.io.EOFException: SSL peer shut down incorrectly' Version-Release number of selected component (if applicable): 3.5 Engine Host: CentOS 6.5 How reproducible: Always Steps to Reproduce: On the engine host 1. Make sure DNS is working and hosts are looked up correctly from the engine 2. Shutdown Engine 3. Optional: Clear DNS cache 4. Edit /etc/resolv.conf and add an invalid NS as fist entry. (Simulate failed DNS) 5. Test nslookup; should come up with ';; connection timed out; trying next origin' for the first NS 6. Start engine 7. Watch engine.log Actual results: Engine is locked up. Hosts are tried to be fenced. WebUI inaccessible. Expected results: Second / third DNS entries in ressolv.conf are used Additional info: I ran into this issue while my main internal DNS server underwent maintenance and the engine host was restarted because failure of the storage appliance at the same time (unrelated). Note, all the time a secondary NS entry was present and working perfectly.