Description of problem: oo-diagnostics hangs with '--abortok' option if the DNS service is down Version-Release number of selected component (if applicable): rubygem-openshift-origin-common-1.17.2.5-1.el6op.noarch How reproducible: always Steps to Reproduce: 1.stop the DNS service #/etc/init.d/named stop 2.run oo-diagnostics #oo-diagnostics --abortok -v Actual results: oo-diagnostics hangs when checking node profiles via MCollective Output: NFO: loading list of installed packages INFO: OpenShift broker installed. INFO: OpenShift node installed. INFO: Loading the broker rails environment. INFO: running: prereq_dns_server_available INFO: checking that the first server in /etc/resolv.conf responds FAIL: prereq_dns_server_available 192.168.59.193 doesn't appear to respond to DNS requests. This command: host -W 1 example.com. 192.168.59.193 should have connected to your primary nameserver. Instead, it returned: ;; connection timed out; trying next origin ;; connection timed out; no servers could be reached Please check the following to resolve this issue: * Does /etc/resolv.conf have your correct nameserver? * Is your nameserver running? * Is the firewall on your nameserver open (udp:53)? * Can you connect to your nameserver? Many OpenShift functions fail without working DNS resolution. INFO: running: prereq_domain_resolves INFO: checking that we can resolve our application domain FAIL: prereq_domain_resolves Application domain does not appear to resolve under current nameserver configuration. This command: host -W 5 -t NS 'ose20-xiama.com.cn' should have returned the nameserver(s) for ose20-xiama.com.cn. Instead, it returned: Host ose20-xiama.com.cn not found: 3(NXDOMAIN) Please check the following to resolve this issue: * Is CLOUD_DOMAIN=ose20-xiama.com.cn in broker.conf correct? * Does /etc/resolv.conf have the right nameserver(s)? * Is your OpenShift domain nameserver running? * Is the firewall on your nameserver open (udp:53)? * Does your nameserver respond to queries via dig/host? Many OpenShift functions may fail without application DNS. INFO: running: test_enterprise_rpms INFO: Checking that all OpenShift RPMs are actually from OpenShift Enterprise INFO: running: test_selinux_policy_rpm INFO: rpm selinux-policy installed with at least version 3.7.19-195.el6_4.4 INFO: running: test_selinux_enabled INFO: running: test_broker_cache_permissions INFO: broker application cache permissions appear fine INFO: running: test_node_profiles_districts_from_broker INFO: checking node profiles via MCollective Expected results: It should continue tests. Additional info:
If the host DNS server is down (requiring --abortok), you should expect problems. However, I recall that there are some other scenarios (this is just one) where mcollective can't reach activemq and simply hangs, and I agree that oo-diagnostics should not do that. Thanks for making the bug so we can remember to look into this and ensure there is always some kind of timeout enforced. I can't remember exactly when I've seen this before, but here are some scenarios to test: 1. ActiveMQ host doesn't resolve (like this scenario) 2. Port 61613 is not open on ActiveMQ host 3. ActiveMQ is not started And for that matter, we may encounter the same kinds of problems when mongodb is unreachable.
Actually... I think mcollective does time out normally, and it may be mongodb that's hanging (looking up districts). Check both.
check it on puddle [2.0.4/2014-02-06.1] Only when activemq servive is broken, e.g: dns is down, activemq is down, port 61613 is down, oo-diagnostics will hang scenario one: Stop the activemq or stop the port 61613, oo-diagnostics will hang #/etc/init.d/activemq stop #iptables -A INPUT -p tcp --dport 61613 -j DROP #iptables -A OUTPUT -p tcp --dport 61613 -j DROP 2.run oo-diagnostics #oo-diagnostics --abortok -v Output: INFO: loading list of installed packages INFO: OpenShift broker installed. INFO: OpenShift node installed. INFO: Loading the broker rails environment. INFO: running: prereq_dns_server_available INFO: checking that the first server in /etc/resolv.conf responds INFO: running: prereq_domain_resolves INFO: checking that we can resolve our application domain INFO: running: test_enterprise_rpms INFO: Checking that all OpenShift RPMs are actually from OpenShift Enterprise INFO: running: test_selinux_policy_rpm INFO: rpm selinux-policy installed with at least version 3.7.19-195.el6_4.4 INFO: running: test_selinux_enabled INFO: running: test_broker_cache_permissions INFO: broker application cache permissions appear fine INFO: running: test_node_profiles_districts_from_broker INFO: checking node profiles via MCollective scenario two: Stop the mongodb.oo-diagnostics run normally. #/etc/init.d/mongod stop 2.run oo-diagnostics #oo-diagnostics --abortok -v Output: INFO: loading list of installed packages INFO: OpenShift broker installed. INFO: OpenShift node installed. INFO: Loading the broker rails environment. MOPED: Retrying connection to primary for replica set <Moped::Cluster nodes=[<Moped::Node resolved_address="127.0.0.1:27017">]> MOPED: Retrying connection to primary for replica set <Moped::Cluster nodes <--snip--> FAIL: rescue in load_broker_rails_env Broker application failed to load. This is often a gem dependency problem. Updating rubygem RPMs and restarting openshift-broker to regenerate the broker Gemfile.lock may fix the problem. The actual error encountered was: #<Moped::Errors::ConnectionFailure: Could not connect to a primary node for replica set <Moped::Cluster nodes=[<Moped::Node resolved_address="127.0.0.1:27017">]>> *** THIS PROBLEM NEEDS TO BE RESOLVED FOR THE BROKER TO WORK. DISABLING BROKER TESTS. *** INFO: running: prereq_dns_server_available INFO: checking that the first server in /etc/resolv.conf responds INFO: running: prereq_domain_resolves INFO: checking that we can resolve our application domain <--snip--> scenario 3: Stop the mcollective service .oo-diagnostics can run normally. #/etc/init.d/mongod stop 2.run oo-diagnostics #oo-diagnostics --abortok -v Output: [root@broker ~]# oo-diagnostics -o -v INFO: loading list of installed packages INFO: OpenShift broker installed. INFO: OpenShift node installed. INFO: Loading the broker rails environment. INFO: running: prereq_dns_server_available INFO: checking that the first server in /etc/resolv.conf responds INFO: running: prereq_domain_resolves INFO: checking that we can resolve our application domain INFO: running: test_enterprise_rpms INFO: Checking that all OpenShift RPMs are actually from OpenShift Enterprise INFO: running: test_selinux_policy_rpm INFO: rpm selinux-policy installed with at least version 3.7.19-195.el6_4.4 INFO: running: test_selinux_enabled INFO: running: test_broker_cache_permissions INFO: broker application cache permissions appear fine INFO: running: test_node_profiles_districts_from_broker INFO: checking node profiles via MCollective No request sent, we did not discover any nodes.FAIL: test_node_profiles_districts_from_broker No node hosts found. Please install some, or ensure the existing ones respond to 'oo-mco ping'. OpenShift cannot host gears without at least one node host responding. INFO: skipping test_node_profiles_districts_from_broker INFO: running: test_broker_accept_scripts INFO: running oo-accept-broker INFO: oo-accept-broker ran without error: <--snip--> INFO: running: test_broker_passenger_ps INFO: checking the broker application process tree INFO: running: test_for_nonrpm_rubygems INFO: checking for presence of gem-installed rubygems INFO: looking in /opt/rh/ruby193/root/usr/local/share/gems/specifications/*.gemspec /opt/rh/ruby193/root/usr/share/gems/specifications/*.gemspec INFO: running: test_for_multiple_gem_versions INFO: checking for presence of gem-installed rubygems INFO: running: test_node_httpd_error_log INFO: running: test_node_containerization_plugin INFO: running: test_node_mco_log INFO: running: test_pam_openshift INFO: running: test_services_enabled INFO: checking that required services are running now FAIL: test_services_enabled The following service(s) are not currently started: ruby193-mcollective These services are required for OpenShift functionality. INFO: checking that required services are enabled at boot INFO: running: test_missing_iptables_config <--snip--> test_auth_conf_files INFO: running: test_broker_certificate WARN: test_broker_certificate Using a self-signed certificate for the broker INFO: running: test_abrt_addon_python INFO: running: test_node_frontend_clash INFO: running: test_yum_configuration 4 WARNINGS 5 ERRORS
Consolidating all "hangs on activemq down" bugs. *** This bug has been marked as a duplicate of bug 1122872 ***