Created attachment 771172 [details] Example engine.log Description of problem: RHEL host installation appears to end in failure before the host reboots successfully due to SSHDialog throwing an IOException when it sees anything on stderr. In my case tar throws warning messages about timestamps into stderr causing the apparent failure : engine.log 2013-07-09 10:41:08,734 ERROR [org.ovirt.engine.core.utils.ssh.SSHDialog] (pool-3-thread-34) SSH stderr during command root.fab.redhat.com:'umask 0077; MYTMP="$(mktemp -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; rm -fr "${MYTMP}" && mkdir "${MYTMP}" && tar -C "${MYTMP}" -x && "${MYTMP}"/setup DIALOG/dialect=str:machine DIALOG/customization=bool:True': stderr: tar: ./pythonlib/otopi/base.pyo: time stamp 2013-05-05 06:48:37 is 3082194.915133752 s in the future tar: ./pythonlib/otopi/common.py: time stamp 2013-05-05 06:48:34 is 3082191.914750444 s in the future tar: ./pythonlib/otopi/transaction.py: time stamp 2013-05-05 06:48:34 is 3082191.914506707 s in the future tar: ./pythonlib/otopi/services.pyc: time stamp 2013-05-05 06:48:37 is 3082194.914384842 s in the future tar: ./pythonlib/otopi/context.pyo: time stamp 2013-05-05 06:48:37 is 3082194.914283362 s in the future tar: ./pythonlib/otopi/command.pyo: time stamp 2013-05-05 06:48:37 is 3082194.914190198 s in the future tar: ./pythonlib/otopi/services.pyo: time stamp 2013-05-05 06:48:37 is 3082194.914108643 s in the future tar: ./pythonlib/otopi/base.py: time stamp 2013-05-05 06:48:34 is 3082191.914025291 s in the future tar: ./pythonlib/otopi/dialog.pyo: time stamp 2013-05-05 06:48:37 is 3082194.913922812 s in the future tar: ./pythonlib/otopi/__main__.py: time stamp 2013-05-05 06:48:34 is 3082191.9138297 s in the future 2013-07-09 10:41:08,734 ERROR [org.ovirt.engine.core.utils.ssh.SSHDialog] (pool-3-thread-34) SSH error running command root.fab.redhat.com:'umask 0077; MYTMP="$(mktemp -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; rm -fr "${MYTMP}" && mkdir "${MYTMP}" && tar -C "${MYTMP}" -x && "${MYTMP}"/setup DIALOG/dialect=str:machine DIALOG/customization=bool:True': java.io.IOException: Error messages during execution at org.ovirt.engine.core.utils.ssh.SSHDialog.executeCommand(SSHDialog.java:315) [engine-utils.jar:] at org.ovirt.engine.core.bll.VdsDeploy.execute(VdsDeploy.java:916) [engine-bll.jar:] at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:169) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1042) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1127) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1513) [engine-bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:166) [engine-utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:108) [engine-utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1145) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:315) [engine-bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:361) [engine-bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:566) [engine-bll.jar:] ovirt-engine/backend/manager/modules/utils/src/main/java/org/ovirt/engine/core/utils/ssh/SSHDialog.java 307 try { 308 _client.executeCommand( 309 command, 310 new SequenceInputStream(Collections.enumeration(stdinList)), 311 poutStdout, 312 stderr 313 ); 314 if (stderr.size() > 0) { 315 throw new IOException("Error messages during execution"); 316 } 317 } 318 finally { 319 if (stderr.size() > 0) { 320 log.error( 321 String.format( 322 "SSH stderr during command %1$s:'%2$s': stderr: %3$s", 323 _client.getDisplayHost(), 324 command, 325 new String(stderr.toByteArray(), "UTF-8") 326 ) 327 ); 328 } 329 } Version-Release number of selected component (if applicable): rhevm-backend-3.2.0-11.30.el6ev.noarch ovirt-host-deploy-1.0.0-2.el6ev.noarch How reproducible: Always. Steps to Reproduce: - Ensure stderr is used by one of more commands called by SSH. Actual results: IOException thrown when stderr is seen. Expected results: IOException is not thrown and stderr is logged in engine.log Additional info: http://en.wikipedia.org/wiki/Standard_streams#Standard_error_.28stderr.29 "Standard error is another output stream typically used by programs to output error messages or diagnostics."
It is by design. stderr should be empty in any of the commands we use via the dialog. You can argue that we need to ignore tar errors, but I am unsure this is right.
Found that gnu tar can be instructed to suppress these timestamp warnings. There can be easy workaround in your environment by inserting the following into vdc_options at BootstrapCommand. Value --- umask 0077; MYTMP="$(mktemp -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; rm -fr "${MYTMP}" && mkdir "${MYTMP}" && tar --warning=no-timestamp -C "${MYTMP}" -x && "${MYTMP}"/setup DIALOG/dialect=str:machine DIALOG/customization=bool:True --- But please remember to remove this when upgrading product. Or just for now have clocks omre or less synced. Thanks!
(In reply to Alon Bar-Lev from comment #1) > It is by design. > > stderr should be empty in any of the commands we use via the dialog. That might be fine for now but if the dialog is used elsewhere throwing an exception based on stderr output like this might be an issue. Why can't we check the return code? SSH should pass this back from the host to the client after all. > You can argue that we need to ignore tar errors, but I am unsure this is > right. Well these are warnings but I take your point. We should still log these somewhere. (In reply to Alon Bar-Lev from comment #2) > Found that gnu tar can be instructed to suppress these timestamp warnings. > > There can be easy workaround in your environment by inserting the following > into vdc_options at BootstrapCommand. > > Value > --- > umask 0077; MYTMP="$(mktemp -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX > \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; rm > -fr "${MYTMP}" && mkdir "${MYTMP}" && tar --warning=no-timestamp -C > "${MYTMP}" -x && "${MYTMP}"/setup DIALOG/dialect=str:machine > DIALOG/customization=bool:True > --- > > But please remember to remove this when upgrading product. > > Or just for now have clocks omre or less synced. > > Thanks! Yup I'll document this workaround for now. Lee
(In reply to Lee Yarwood from comment #3) > (In reply to Alon Bar-Lev from comment #1) > > It is by design. > > > > stderr should be empty in any of the commands we use via the dialog. > > That might be fine for now but if the dialog is used elsewhere throwing an > exception based on stderr output like this might be an issue. > > Why can't we check the return code? SSH should pass this back from the host > to the client after all. Because the complex command sequence may result in errors that we do not expect and not get these via the return code. So I prefer to be safe and make sure stderr does not tell us that there was unexpected error.