Bug 1821487
Summary: | [RFE] Increase the time for dig check while computing score calculations. | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Siddhant Rao <sirao> |
Component: | ovirt-hosted-engine-ha | Assignee: | Asaf Rachmani <arachman> |
Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 4.3.8 | CC: | arachman, lsurette, mavital, michal.skrivanek, mtessun, rdlugyhe, sbonazzo |
Target Milestone: | ovirt-4.4.1 | Keywords: | FutureFeature, Triaged |
Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ovirt-hosted-engine-ha-2.4.3 | Doc Type: | Enhancement |
Doc Text: |
Previously, network tests timed out after 2 seconds. The current release increases the timeout period from 2 seconds to 5 seconds. This reduces unnecessary timeouts when the network tests require more than 2 seconds to pass.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-08-04 13:27:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Siddhant Rao
2020-04-06 23:00:59 UTC
AFAICT timeout parameter under network_test should do the job. It's a shared config for tcp, dns, and ping tests, but that should be all right... let's change the default to 5 in any case. shouldn't hurt and it's going to be more resilient in general Please provide reproduction steps. (In reply to Nikolai Sednev from comment #4) > Please provide reproduction steps. Steps to Reproduce: 1. Deploy hosted-engine 2. Add another host that can run the HE vm 3. Shutdown the network of the host 4. Check /var/log/ovirt-hosted-engine-ha/broker.log on the host (In reply to Michal Skrivanek from comment #3) > let's change the default to 5 in any case. shouldn't hurt and it's going to > be more resilient in general It depends. Slow DNS might also affect other areas (or are we relying on IP-addresses only?). So increasing the timeout to 5 by default seems fine, but in case we rely for DNS (esp. for monitoring actions and such), we must ensure that the DNS is fast - needing 2+ seconds to answer a query is simply a very slow DNS. A typical answer is within miliseconds: ; <<>> DiG 9.11.4-P2-RedHat-9.11.4-17.P2.el8_0.1 <<>> +tries=1 +time=1 some.server.net ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62989 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;some.server.net. IN A ;; ANSWER SECTION: some.server.net. 3400 IN CNAME some-other.server.net. some-other.server.net. 3400 IN A <ipv4> ;; Query time: 24 msec ;; SERVER: <ip>#53(<ip>) ;; WHEN: Thu May 28 14:38:04 CEST 2020 ;; MSG SIZE rcvd: 82 So while I am fine with increasing the timeout (I would rather prefer that to be configurable - so we could even add a warning to that specific line that DNS must not be "too slow". In case we do not use DNS at all in RHV for recurring tasks (which I doubt, as I expect that a DNS entry in RHV is updated in case DNS changes). As such I see these kind of extremely slow DNS queries as an issue for the overall stability - as it might effect several monitoring timeouts in a negative way. (In reply to Martin Tessun from comment #6) > So while I am fine with increasing the timeout (I would rather prefer that > to be configurable - so we could even add a warning to that specific line > that DNS must not be "too slow". This can be tracked in a separate BZ > In case we do not use DNS at all in RHV for recurring tasks (which I doubt, > as I expect that a DNS entry in RHV is updated in case DNS changes). As such > I see these kind of extremely slow DNS queries as an issue for the overall > stability - as it might effect several monitoring timeouts in a negative way. I'm not sure that the dns used by the hosted engine test is also the dns used to resolve local address. connectivity check here was meant to check global connectivity usually pointing to the gateway. Works for me on these components: Software Version:4.4.1.2-0.10.el8ev rhvm-appliance-4.4-20200604.0.el8ev.x86_64 ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch ovirt-hosted-engine-ha-2.4.3-1.el8ev.noarch Linux 4.18.0-193.9.1.el8_2.x86_64 #1 SMP Sun Jun 14 15:03:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.2 (Ootpa) less /usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors/network.py self._tests = { 'ping': self._ping, 'dns': self._dns, 'tcp': self._tcp, 'none': self._none, } self._addr = options.get('addr') self._timeout = str(options.get('timeout', 5)) self._total = options.get('count', 5) self._delay = options.get('delay', 0.5) self._network_test = options.get('network_test', 'ping') if not self._network_test: self._network_test = 'ping' if self._network_test not in self._tests: raise Exception( "{t}: invalid network test".format( t=self._network_test ) ) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246 |