Bug 1299232
| Summary: | Hosts are stuck in 'installing' | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Nelly Credi <ncredi> | ||||||
| Component: | Host-Deploy | Assignee: | Moti Asayag <masayag> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Meni Yakove <myakove> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 3.6.3 | CC: | alex.boyd, bugs, dgilbert, didi, khakimi, masayag, mlipchuk, ncredi, ngoldin, oourfali, pkliczew, pmatyas, sasundar, sbonazzo, ylavi | ||||||
| Target Milestone: | ovirt-3.6.3 | Keywords: | AutomationBlocker, Regression | ||||||
| Target Release: | 3.6.3 | Flags: | didi:
needinfo-
rule-engine: ovirt-3.6.z+ rule-engine: blocker+ ylavi: planning_ack+ masayag: devel_ack+ rule-engine: testing_ack+ |
||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-02-18 11:14:15 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Nelly Credi
2016-01-17 16:04:08 UTC
correction: in both cases it required host reinstall after engine restart Created attachment 1115753 [details]
engine logs
adding two more details:
the following exception appeared in host deploy logs:
2016-01-17 15:20:30 DEBUG otopi.plugins.otopi.packagers.dnfpackager dnfpackager._boot:178 Cannot initialize minidnf
Traceback (most recent call last):
File "/tmp/ovirt-qS7Cl5Alvp/otopi-plugins/otopi/packagers/dnfpackager.py", line 165, in _boot
constants.PackEnv.DNF_DISABLED_PLUGINS
File "/tmp/ovirt-qS7Cl5Alvp/otopi-plugins/otopi/packagers/dnfpackager.py", line 75, in _getMiniDNF
from otopi import minidnf
File "/tmp/ovirt-qS7Cl5Alvp/pythonlib/otopi/minidnf.py", line 31, in <module>
import dnf
ImportError: No module named dnf
as far as we can tell, first time this happened on GE was 12/01/2016:
https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/GE-builder/945/consoleFull
engine is on el 6.7, logs are attached above.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. Didi, can you take a look at this? dnf plugin is not supposed to be enabled on el6. (In reply to Sandro Bonazzola from comment #5) > Didi, can you take a look at this? dnf plugin is not supposed to be enabled > on el6. maybe it's just a "try to import and use if there" but better to check Please provide vdsm log. (In reply to Sandro Bonazzola from comment #6) > (In reply to Sandro Bonazzola from comment #5) > > Didi, can you take a look at this? dnf plugin is not supposed to be enabled > > on el6. > > maybe it's just a "try to import and use if there" but better to check It is, and can be ignored. If you only have dnf and not yum, you'll see the same error about yum. If both are missing, you'll later see that the otopi packager will fail. See e.g. bug 1297835. (In reply to Nadav Goldin from comment #3) > adding two more details: > > the following exception appeared in host deploy logs: > 2016-01-17 15:20:30 DEBUG otopi.plugins.otopi.packagers.dnfpackager > dnfpackager._boot:178 Cannot initialize minidnf > Traceback (most recent call last): > File "/tmp/ovirt-qS7Cl5Alvp/otopi-plugins/otopi/packagers/dnfpackager.py", > line 165, in _boot > constants.PackEnv.DNF_DISABLED_PLUGINS > File "/tmp/ovirt-qS7Cl5Alvp/otopi-plugins/otopi/packagers/dnfpackager.py", > line 75, in _getMiniDNF > from otopi import minidnf > File "/tmp/ovirt-qS7Cl5Alvp/pythonlib/otopi/minidnf.py", line 31, in > <module> > import dnf > ImportError: No module named dnf This isn't a cause for any failure. If 'dnf' is available on the server, it will be used instead of 'yum' for managing packages, else 'yum' will be used. > > as far as we can tell, first time this happened on GE was 12/01/2016: > https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/GE-builder/945/ > consoleFull > > > engine is on el 6.7, logs are attached above. The logs contains stackoverflow exception: Caused by: java.lang.StackOverflowError at java.lang.Throwable.toString(Throwable.java:480) [rt.jar:1.7.0_91] at java.lang.String.valueOf(String.java:2849) [rt.jar:1.7.0_91] at java.lang.StringBuilder.append(StringBuilder.java:128) [rt.jar:1.7.0_91] at org.jboss.logmanager.formatters.Formatters$14.renderCause(Formatters.java:823) [jboss-logmanager.jar:1.5.4.Final-redhat-1] at org.jboss.logmanager.formatters.Formatters$14.renderCause(Formatters.java:841) [jboss-logmanager.jar:1.5.4.Final-redhat-1] at org.jboss.logmanager.formatters.Formatters$14.renderCause(Formatters.java:841) [jboss-logmanager.jar:1.5.4.Final-redhat-1] The root cause for the stackoverflow which result in engine stuck is not clear. Can we get the vdsm.log of the installed hosts for further debug ? I also had a host stuck in installing; the error I noticed in the install log was: 'libvirt: Network Filter Driver error : Network filter not found: no nwfilter with matching name 'vdsm-no-mac-spoofing'' on a failed adding a host/installing a host to my rhev-m but the bigger problem was that it stuck in installing until I rebooted the rhev-m box. (In reply to Dr. David Alan Gilbert from comment #10) > I also had a host stuck in installing; the error I noticed in the install > log was: > > 'libvirt: Network Filter Driver error : Network filter not found: no > nwfilter with matching name 'vdsm-no-mac-spoofing'' on a failed adding a > host/installing a host to my rhev-m > > but the bigger problem was that it stuck in installing until I rebooted the > rhev-m box. Could you attach the engine.log and server.log from the rhevm server ? Created attachment 1116065 [details]
vdsm log
The exact reproduce for this bug is to halt vdsm after the host-deploy has ended. It will produce an exception on the engine side which isn't handled and causes the host installation action to fail with the described result. *** Bug 1299961 has been marked as a duplicate of this bug. *** It didn't make it on time for 3.6.2, so pushing to 3.6.3. *** Bug 1301377 has been marked as a duplicate of this bug. *** Moti - when will this be on MODIFIED? (In reply to Oved Ourfali from comment #17) > Moti - when will this be on MODIFIED? It was merged to 3.6 today - so it can be moved to MODIFIED. I'm seeing very similar issues with a fresh install on RHEL 7 (hosted engine on RHEL7 too). New hosts get stuck at installing and restart of hosted engine fixes. Also unable to add second Host to hosted engine as the installer gets stuck, although hosted-engine status shows host in maintenance?
--== Host 2 status ==--
Status up-to-date : True
Hostname : omxakt01.oam.eeint.co.uk
Host ID : 2
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 0
stopped : False
Local maintenance : True
crc32 : d476ad4a
Host timestamp : 40127
we see the same issue with ovirt 4.0 on rhel 7.2: the first host stuck at installing state. but the other 2 hosts installed as expected and reach the up state. after restart of ovirt-engine the host become non-operational and we could remove it. NOTES: - The first host which stuck at installing state also lost its IP a day after. - After network restart we got back the host ip. - The hosts and the engine are rhel 7.2. |