Hide Forgot
Description of problem: In my particular environment, I have two active directory domain controllers providing LDAP and DNS. The engine is aware of both DNS servers. One DC is located within OVIRT the other is outside on bare-metal. If either DC is powered off the Web UI becomes unresponsive. Local login stalls for about 5 minutes. It appears as if login completes, but all containers are empty. Once the DC is restored all functions return. Version-Release number of selected component (if applicable): 3.3.1 How reproducible: Every time. Steps to Reproduce: 1. Power off Domain Controller 2. Attempt to log in with local admin account on Web UI Actual results: Access to virtual machines. Expected results: Sluggish UI with no guests listed in containers. Additional info: I'm willing to provide any logs that are needed, I just don't know where to start.
Can you provide the relevant engine log
This has happened several times. Would you like a historical or just the most recent around the time it has occurred?
The most recent would be sufficient
Created attachment 851683 [details] Most recent crash. Took 12 hours to get back up.
Comment on attachment 851683 [details] Most recent crash. Took 12 hours to get back up. This is the requested engine.log
Also of note, these crashes occurred during host migrations. (Two occurred during this 12 hour period.) This caused several guests to be listed as "UNKNOWN". I had to edit the entries in postgres to get them back online.
I was able to reproduce it on my environment. The issue seems to be that engine can no longer find the host on which the database is installed. The cause of all exceptions seems to be: java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/ENGINEDataSource I was able to fix it by specifying the actual ip address of the database host instead of FQDN in /etc/ovirt-engine/engine.conf.d/10-setup-database.conf ENGINE_DB_HOST="<ip of db host>" ENGINE_DB_PORT="5432" ....
domain is such a loaded term, changing bug summary
I also looked at the logs, and you also have connections errors to the hosts, once again - due to the DNS issues. Any chance we can also see the resolv.conf file?
if both DNS servers are in the /etc/resolve.conf and both can resolve the db it should be O.K. than (or the workaround suggested in comment #7. Ryan can you please verify that ?
The system in question is a production cluster. I'll schedule an outage to test it as soon as possible. I planned on adding a bare-metal ADS as a back-up, so I'll add the test in line with that.
Reduced urgency as this looks like a system/environmental issue. Ryan - when do you think you'll get back with the answers ?