Bug 879904
Summary: | engine: engine fails to connect to DB and cannot be started with NPE | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||||||||||
Component: | ovirt-engine | Assignee: | Juan Hernández <juan.hernandez> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Pavel Stehlik <pstehlik> | ||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||
Priority: | low | ||||||||||||||
Version: | 3.1.0 | CC: | acathrow, bazulay, bdagan, bhubbard, byount, cpelland, fharshav, iheim, jkt, jlivings, jraju, juan.hernandez, lpeer, lyarwood, ngupta, Rhev-m-bugs, yeylon, yzaslavs | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | 3.3.0 | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | infra | ||||||||||||||
Fixed In Version: | is1 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: |
The engine performed reverse DNS lookups of IP addresses when it failed to connect to the database. This includes looking up its own name and delaying the rest of the operations, including the deployment of the connection pool, which raises a null pointer exception in the administration and user portals. This update modifies the engine so that it locates the data source in a loop and retries until it is successful .
|
Story Points: | --- | ||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2014-01-21 17:31:34 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
was this a local postgres or a remote postgres? (In reply to comment #1) > was this a local postgres or a remote postgres? local Barak had the same issue on his setup but we are not sure why its happening. its not a dns issue. Kublin thinks its a security issue this time from error we found in jboss log: 2012-11-26 14:31:59,130 ERROR [org.jboss.as.controller.management-operation] (ServerService Thread Pool -- 16) JBAS014613: Operation ("enable") failed - address: ([ ("subsystem" => "datasources"), ("data-source" => "ENGINEDataSource") ]) - failure description: "JBAS010436: Failed to create DataSource instance for [{ \"operation\" => \"enable\", \"address\" => [ (\"subsystem\" => \"datasources\"), (\"data-source\" => \"ENGINEDataSource\") ], \"persistent\" => true }] reason: IJ010073: Invalid <security> configuration" 2012-11-26 14:31:59,400 INFO [org.jboss.as.connector.subsystems.datasources] (ServerService Thread Pool -- 16) JBAS010404: Deploying non-JDBC-compliant driver class org.postgresql.Driver (version 8.4) 2012-11-26 14:31:59,465 INFO [org.jboss.as.naming] (MSC service thread 1-4) JBAS011802: Starting Naming Service I am attaching all the logs from Barak's setup. Created attachment 651942 [details]
logs
Juan - do you have any idea? We are suspecting eap 6.0.1 The problem described in comment #3 we already had, see bug 868659, and it has already been solved. The initial problem is different, and I don't think it is related to 6.0.1. My guess is that the fact that the first DNS server is not responding means that all DNS operations are delayed. By default the resolver waits 5 seconds for a response and retry two times, so it is 10 seconds before it tries with the second nameserver. This is time enough to delay the deployment of the database connection pool so that the application starts before it is deployed, thus the NPE. However DNS should not be used to resolve the address of the database, as it is localhost for local database. I would check the "files" is the first option in /etc/nsswitch.conf for the hosts database and that there is a "localhost" entry in the /etc/hosts file. If this is the problem we could maybe use 127.0.0.1 in the database connection pool configuration, to avoid DNS operations in any case. The problem of Barak is solved in the way as described at bug https://bugzilla.redhat.com/show_bug.cgi?id=868659. Dafna, please use workaround described by Juan Dafna, did you check if replacing "localhost" with "127.0.0.1" in the "/etc/sysconfig/ovirt-engine file resolved the issue? It is in the ENGINE_DB_URL variable, its value will probably be this: ENGINE_DB_URL=jdbc:postgresql://localhost:5432/engine I am suggesting to change it to this: ENGINE_DB_URL=jdbc:postgresql://127.0.0.1:5432/engine If that solves the issue (with the DNS configuration still broken) then the problem is that we start the engine before the data source is deployed, and there are two things we can do to improve it: 1. Resolve the database host name to an IP address during setup and store the IP address instead of the host name. This is perfectly fine for local installations, but is not wise for remote databases as the system will start to fail if the database IP address changes. 2. Modify the code so that we wait till the database connection pool is deployed before starting to use it. This is much more reliable in my opinion. I tested with 127.0.0.1 and we are still getting NPE when engine starts (attaching logs again): ENGINE_DB_URL=jdbc:postgresql://127.0.0.1:5432/engine 2012-12-16 17:54:41,472 ERROR [org.ovirt.engine.core.dal.dbbroker.DbFacadeLocator] (MSC service thread 1-3) Unable to locate DbFacade instance: java.lang.RuntimeException: Datasource is not defined at org.ovirt.engine.core.dal.dbbroker.DbFacadeLocator.<clinit>(DbFacadeLocator.java:34) [engine-dal.jar:] at org.ovirt.engine.core.dal.dbbroker.DbFacade.getInstance(DbFacade.java:209) [engine-dal.jar:] at org.ovirt.engine.core.bll.Backend.checkDBConnectivity(Backend.java:124) [engine-bll.jar:] at org.ovirt.engine.core.bll.Backend.create(Backend.java:118) [engine-bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_09-icedtea] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_09-icedtea] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_09-icedtea] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_09-icedtea] at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptorFactory$ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptorFactory.java:130) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.ManagedReferenceFieldInjectionInterceptorFactory$ManagedReferenceFieldInjectionInterceptor.processInvocation(ManagedReferenceFieldInjectionInterceptorFactory.java:112) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.ManagedReferenceInterceptorFactory$ManagedReferenceInterceptor.processInvocation(ManagedReferenceInterceptorFactory.java:95) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.ManagedReferenceInterceptorFactory$ManagedReferenceInterceptor.processInvocation(ManagedReferenceInterceptorFactory.java:95) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInOurTx(CMTTxInterceptor.java:227) [jboss-as-ejb3.jar:7.1.2.Final-redhat-1] at org.jboss.as.ejb3.tx.CMTTxInterceptor.requiresNew(CMTTxInterceptor.java:332) [jboss-as-ejb3.jar:7.1.2.Final-redhat-1] at org.jboss.as.ejb3.tx.SingletonLifecycleCMTTxInterceptor.processInvocation(SingletonLifecycleCMTTxInterceptor.java:57) [jboss-as-ejb3.jar:7.1.2.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.1.2.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.BasicComponent.constructComponentInstance(BasicComponent.java:161) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.as.ee.component.BasicComponent.createInstance(BasicComponent.java:85) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.as.ejb3.component.singleton.SingletonComponent.getComponentInstance(SingletonComponent.java:119) [jboss-as-ejb3.jar:7.1.2.Final-redhat-1] at org.jboss.as.ejb3.component.singleton.SingletonComponent.start(SingletonComponent.java:134) [jboss-as-ejb3.jar:7.1.2.Final-redhat-1] at org.jboss.as.ee.component.ComponentStartService.start(ComponentStartService.java:44) [jboss-as-ee.jar:7.1.2.Final-redhat-1] at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1811) at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1746) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_09-icedtea] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_09-icedtea] at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_09-icedtea] Created attachment 664346 [details]
new logs
Thanks for the information Dafna. Then only reason for this that I can imagine is that the database is doing a reverse DNS lookup of of the IP address of the client. Can you check that Dafna? Maybe take a dump of the traffic to the DNS servers while you start the engine? Or try manually the following command: psql --host 127.0.0.1 --user engine --password engine Does this command take longer when the DNS configuration is wrong? Anyhow, I think this will need a modification in the code to make the engine wait till the datasource is deployed before trying to use it. Created attachment 664982 [details]
tcpdump
I did not see any effect when running psql --host 127.0.0.1 --user engine --password engine
I ran tcpdump on port 53, qa2 file is when we give a dead dns and qa1 is when we have a functional dns.
Checking the provided dumps I see that there are reverse lookups of the IP address of the machine that are failing to contact the first server, thus delaying things. I checked in the machine what process makes those lookups, and it is the ovirt-engine service, not the database. So this means that the application server is looking up its own host name and probably delaying the rest of the operations, including the deployment of the connection pool. I prepared the following patch to modify the engine so that it will locate the data source in a loop, retrying till it is successful: http://gerrit.ovirt.org/10189 With this patch applied I verified that the behaviour when the DNS is slow is different. It prints this new warning and then retries the operation: 2012-12-18 12:32:13,851 WARN [org.ovirt.engine.core.dal.dbbroker.DbFacadeLocator] (MSC service thread 1-3) The datasource can't be located. This probably means that DNS is not working correctly or is slow, please check it. Will try again in a few seconds. After a few seconds, when the data source is deployed, the application continues as usual. Created attachment 666974 [details]
engine log file from rhs-c issue
I'm seeing the same dbfacade error on my red hat storage console instance (ovirt based). Has been working for months until I applied maintenance tonight ... that'll teach me ;o) I have restored the change (I abandoned it some time ago). moved the data source lookup loop out of the static initializer and added a time limit. The upstream change has been merged, so it will be in 3.3 after the rebase. *** Bug 972678 has been marked as a duplicate of this bug. *** *** Bug 975445 has been marked as a duplicate of this bug. *** This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance. The first part of the doc text is not accurate. Initially we thought that this was caused by a reverse DNS lookup, but we don't have proof of that. What actually happens is that the relative deployment order of the application and the data source aren't guaranteed by the application server, so in some situations the data source may be deployed after the application is deployed and executed. Thus the application has to wait till the data source is deployed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0038.html |
Created attachment 651461 [details] logs Description of problem: I added a domain to my setup and changed resolved.com so that the first server in the list is a server that's currently down. engine failed to start with NPE which is also shown in the webadmin and user portal when we try to log in. Version-Release number of selected component (if applicable): si24.4 How reproducible: 100% Steps to Reproduce: 1. add a domain to your setup using rhevm-manage-domains 2. change resolve.conf so that the first server on the list is a dns server which is down 3. restart ovirt engine Actual results: engine fails to start with NPE which is also shown in webadmin and user portal Expected results: we need to jump to the the next server on the list Additional info: logs 2012-11-25 11:56:56,368 ERROR [org.ovirt.engine.core.utils.ejb.EJBUtilsStrategy] (ServerService Thread Pool -- 37) Error looking up resource DATA_SOURCE 2012-11-25 11:56:56,376 ERROR [org.ovirt.engine.core.dal.dbbroker.DbFacadeLocator] (ServerService Thread Pool -- 37) Unable to locate DbFacade instance: java.lang.RuntimeException: Datasource is not defined at org.ovirt.engine.core.dal.dbbroker.DbFacadeLocator.<clinit>(DbFacadeLocator.java:34) [engine-dal.jar:] at org.ovirt.engine.core.dal.dbbroker.DbFacade.getInstance(DbFacade.java:209) [engine-dal.jar:] at org.ovirt.engine.core.bll.Backend.checkDBConnectivity(Backend.java:124) [engine-bll.jar:] at org.ovirt.engine.core.bll.Backend.create(Backend.java:118) [engine-bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_09-icedtea] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_09-icedtea] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_09-icedtea] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_09-icedtea] at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptorFactory$ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptorFactory.java:130) [jboss-as-ee.jar:7.1.3.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ManagedReferenceFieldInjectionInterceptorFactory$ManagedReferenceFieldInjectionInterceptor.processInvocation(ManagedReferenceFieldInjectionInterceptorFactory.java:112) [jboss-as-ee.jar:7.1.3.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ManagedReferenceInterceptorFactory$ManagedReferenceInterceptor.processInvocation(ManagedReferenceInterceptorFactory.java:95) [jboss-as-ee.jar:7.1.3.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ManagedReferenceInterceptorFactory$ManagedReferenceInterceptor.processInvocation(ManagedReferenceInterceptorFactory.java:95) [jboss-as-ee.jar:7.1.3.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.1.3.Final-redhat-3] :