Bug 1412906

Summary: ovirt-engine can't install legacy RHV-H in 3.6 Compatibility Mode
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-host-deployAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Pavol Brilla <pbrilla>
Severity: low Docs Contact:
Priority: high    
Version: 4.0.6CC: bazulay, dougsland, gklein, gveitmic, lsurette, mgoldboi, obockows, pbrilla, rbalakri, Rhev-m-bugs, srevivo, trichard, ykaul, ylavi
Target Milestone: ovirt-4.1.1Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, Red Hat Virtualization Manager couldn't install legacy RHEV-H when in 3.6 Compatibility Mode. This is now fixed.
Story Points: ---
Clone Of:
: 1420865 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1420865, 1425756    
Attachments:
Description Flags
engine install log
none
host deploy logs for rhev-h none

Description Germano Veit Michel 2017-01-13 05:42:45 UTC
Description of problem:

Installation of legacy RHV-H in a DC with 3.6 Compatibility Mode fails.

2017-01-13 15:24:43,310 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (org.ovirt.thread.pool-6-thread-23) [51a24a3b] SSH error running command root.24.212:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True': Command returned failure code 2 during SSH session 'root.24.212'


2017-01-13 15:24:43,891 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [51a24a3b] Correlation ID: 51a24a3b, Call Stack: null, Custom Event ID: -1, Message: Failed to install Host Host-2. Internal error: [Errno 30] Read-only file system: 'os.devnull'.

Version-Release number of selected component (if applicable):
ovirt-engine-4.0.6.3-0.1.el7ev.noarch
rhev-hypervisor7-7.2-20160920.1

How reproducible:
100%

Steps to Reproduce:
1. Try to add new Host to 3.6 Level DC.

Actual results:
Fails

Expected results:
Succeeds

Comment 3 Sandro Bonazzola 2017-01-17 09:25:08 UTC
Can you please provide host-deploy logs or maybe a full sos report?
Just to understand where this is failing.

Comment 4 Germano Veit Michel 2017-01-18 05:11:25 UTC
(In reply to Sandro Bonazzola from comment #3)
> Can you please provide host-deploy logs or maybe a full sos report?
> Just to understand where this is failing.

Isn't the host just failing to run those commands in comment #0?

I will reproduce it again and attach the logs shortly.

Comment 5 Germano Veit Michel 2017-01-18 06:47:04 UTC
Created attachment 1242036 [details]
engine install log

There are not really many logs as the failure if very early.

It's failing to run this in the host, so the deploy does not even start to generate logs.

umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True

with:

Read-only file system: 'os.devnull'.

https://docs.python.org/3/library/os.html#os.devnull

So is it complaining that /dev/null is read-only? Doesn't make much sense to me.

Comment 6 Sandro Bonazzola 2017-02-02 10:41:34 UTC
Simone, can you please check in this is related to bug #1414265

Comment 7 Simone Tiraboschi 2017-02-02 16:24:45 UTC
I tried adding an host deployed with rhev-hypervisor7-7.2-20160920.1 on an engine 4.1 on a cluster with cl=3.6 and it failed due to:

2017-02-02 16:07:22 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-71Zg8vz4ZC/pythonlib/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/tmp/ovirt-71Zg8vz4ZC/otopi-plugins/otopi/core/transaction.py", line 93, in _main_end
    self._mainTransaction.commit()
  File "/tmp/ovirt-71Zg8vz4ZC/pythonlib/otopi/transaction.py", line 148, in commit
    element.commit()
  File "/tmp/ovirt-71Zg8vz4ZC/otopi-plugins/ovirt-host-deploy/kernel/kernel.py", line 80, in commit
    self._parent._build_grubby_command(),
  File "/tmp/ovirt-71Zg8vz4ZC/pythonlib/otopi/plugin.py", line 931, in execute
    command=args[0],
RuntimeError: Command '/sbin/grubby' failed to execute
2017-02-02 16:07:22 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Transaction commit': Command '/sbin/grubby' failed to execute

Comment 8 Simone Tiraboschi 2017-02-02 16:48:08 UTC
Created attachment 1247156 [details]
host deploy logs for rhev-h

Comment 9 Simone Tiraboschi 2017-02-02 16:54:44 UTC
On my RHEV-H host it fails with:
[root@c72he20170202h4 admin]# /sbin/grubby --update-kernel DEFAULT
error opening /boot/grub/grub.cfg for read: No such file or directory

Comment 10 Simone Tiraboschi 2017-02-02 17:45:21 UTC
Once we fix that, the engine is able to add the host but if fails configuring the management bridge.

The issue is:
2017-02-02 18:43:30,966+01 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler3) [33c9782b] HostName = c72he20170202h4.localdomain
2017-02-02 18:43:30,966+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler3) [33c9782b] Failed in 'GetCapabilitiesVDS' method, for vds: 'c72he20170202h4.localdomain'; host: 'c72he20170202h4.localdomain': Required SwitchType is not reported.
2017-02-02 18:43:30,966+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler3) [33c9782b] Command 'GetCapabilitiesVDSCommand(HostName = c72he20170202h4.localdomain, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='b5ce63a8-9386-4a2f-8cef-53f41f1bcaa9', vds='Host[c72he20170202h4.localdomain,b5ce63a8-9386-4a2f-8cef-53f41f1bcaa9]'})' execution failed: Required SwitchType is not reported.
2017-02-02 18:43:30,966+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler3) [33c9782b] Failure to refresh host 'c72he20170202h4.localdomain' runtime info: Required SwitchType is not reported.
2017-02-02 18:43:30,971+01 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler3) [33c9782b] Failed to refresh VDS , vds = 'c72he20170202h4.localdomain' : 'b5ce63a8-9386-4a2f-8cef-53f41f1bcaa9', error = 'Required SwitchType is not reported.', continuing.
2017-02-02 18:43:30,971+01 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler3) [33c9782b] Exception: java.lang.IllegalStateException: Required SwitchType is not reported.
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.getSwitchType(VdsBrokerObjectsBuilder.java:2143) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.addHostNetworksAndUpdateInterfaces(VdsBrokerObjectsBuilder.java:1744) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.updateNetworkData(VdsBrokerObjectsBuilder.java:1697) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.updateVDSDynamicData(VdsBrokerObjectsBuilder.java:853) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:17) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:111) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.DefaultVdsCommandExecutor.execute(DefaultVdsCommandExecutor.java:14) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:407) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:674) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.beforeFirstRefreshTreatment(HostMonitoring.java:628) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:129) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:274) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source) [:1.8.0_121]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_121]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]

Comment 11 Sandro Bonazzola 2017-02-03 06:34:39 UTC
Moving to ovirt-host-deploy being the issue caused by a call to grubby.
Dropping dependency on bug #1414265 being not related with the issue.

Comment 12 Simone Tiraboschi 2017-02-03 10:22:44 UTC
rhev-hypervisor7-7.3-20170118.0.iso coms with vdsm 4.17.37-1.el7ev and, with patch https://gerrit.ovirt.org/#/c/71608/ , the host goes up.

Comment 14 Sandro Bonazzola 2017-02-14 13:00:23 UTC
*** Bug 1384060 has been marked as a duplicate of this bug. ***

Comment 16 Pavol Brilla 2017-04-11 10:20:54 UTC
In case of 2 clusters: 
Default = 4.1 compatibility
old = 3.6 compatibility 

Adding RHEVH as clasic host - change of cluster to "old" is no problem, host is UP

Registration of node through TUI of RHEVH, 
Node is registered on engine.
Approve: Not possible to change DC to "old". 

Error:
FQND_of_host:
Cannot edit Host. Host parameters cannot be modified while Host is operational.
Please switch Host to Maintenance mode first.



Workaround:
Approve -> host will FAIL to be added to 4.1 cluster -> Maintanance -> Edit host ( put correct 3.6 cluster "old" ) -> Activate -> Host is UP

Comment 17 Yaniv Lavi 2017-04-12 11:18:08 UTC
(In reply to Pavol Brilla from comment #16)
> In case of 2 clusters: 
> Default = 4.1 compatibility
> old = 3.6 compatibility 
>
> 
> Registration of node through TUI of RHEVH, 
> Node is registered on engine.
> Approve: Not possible to change DC to "old". 
> 
> Error:
> FQND_of_host:
> Cannot edit Host. Host parameters cannot be modified while Host is
> operational.
> Please switch Host to Maintenance mode first.
> 
> 
> 
> Workaround:
> Approve -> host will FAIL to be added to 4.1 cluster -> Maintanance -> Edit
> host ( put correct 3.6 cluster "old" ) -> Activate -> Host is UP

Please open a new bug for this flow. Moving back to ON_QA to verify the flow described in this bug.