+++ This bug was initially created as a clone of Bug #1119225 +++ Description of problem: "Execution of setup failed" message is odd if only nfs start fails. In fact setup is done, only NFS got some issue. I think it should state this more clearly: - setup done - but there were some issues! I have no idea what happened, it was clear install... Anyway, the BZ is about message of failure from engine-setup. ~~~ [ INFO ] Restarting nfs services [ ERROR ] Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20140714115528-ytp8fx.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20140714115734-setup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Execution of setup failed -- 2014-07-14 11:57:34 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:866 execute-output: ('/sbin/initctl', 'status', 'nfs') stderr: initctl: Unknown job: nfs 2014-07-14 11:57:34 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:785 execute: ('/sbin/service', 'nfs', 'start'), executable='None', cwd='None', env=None 2014-07-14 11:57:34 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:803 execute-result: ('/sbin/service', 'nfs', 'start'), rc=1 2014-07-14 11:57:34 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:861 execute-output: ('/sbin/service', 'nfs', 'start') stdout: Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Starting NFS daemon: [FAILED] 2014-07-14 11:57:34 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:866 execute-output: ('/sbin/service', 'nfs', 'start') stderr: 2014-07-14 11:57:34 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/system/nfs.py", line 276, in _closeup state=state, File "/usr/share/otopi/plugins/otopi/services/rhel.py", line 188, in state 'start' if state else 'stop' File "/usr/share/otopi/plugins/otopi/services/rhel.py", line 96, in _executeServiceCommand raiseOnError=raiseOnError File "/usr/lib/python2.6/site-packages/otopi/plugin.py", line 871, in execute command=args[0], RuntimeError: Command '/sbin/service' failed to execute 2014-07-14 11:57:34 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute -- [root@ovirt ~]# service nfs status rpc.svcgssd is stopped rpc.mountd (pid 32015) is running... nfsd dead but subsys locked rpc.rquotad (pid 32011) is running... -- [root@ovirt ~]# service nfs restart Shutting down NFS daemon: [FAILED] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Starting NFS daemon: [ OK ] Starting RPC idmapd: [ OK ] -- Jul 14 11:57:32 localhost kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Jul 14 11:57:34 localhost kernel: RPC: Registered named UNIX socket transport module. Jul 14 11:57:34 localhost kernel: RPC: Registered udp transport module. Jul 14 11:57:34 localhost kernel: RPC: Registered tcp transport module. Jul 14 11:57:34 localhost kernel: RPC: Registered tcp NFSv4.1 backchannel transport module. Jul 14 11:57:34 localhost kernel: Installing knfsd (copyright (C) 1996 okir.de). Jul 14 11:57:34 localhost rpc.mountd[32015]: Version 1.2.3 starting Jul 14 11:57:34 localhost kernel: lockd_up: makesock failed, error=-98 Jul 14 11:57:34 localhost kernel: nfsd: last server has exited, flushing export cache Jul 14 11:57:34 localhost rpc.nfsd[32019]: error starting threads: errno 98 (Address already in use) Jul 14 12:01:49 localhost ntpd[5672]: 0.0.0.0 0613 03 spike_detect +44.616753 s Jul 14 12:04:54 localhost rpc.mountd[32015]: Caught signal 15, un-registering and exiting. Jul 14 12:04:55 localhost rpc.mountd[32425]: Version 1.2.3 starting Jul 14 12:04:55 localhost kernel: NFSD: Using /var/lib/nfs/v4recovery ~~~ Version-Release number of selected component (if applicable): ovirt-engine-setup-3.5.0-0.0.master.20140629172257.git0b16ed7.el6.noarch How reproducible: ??? Steps to Reproduce: 1. no idea, just happened 2. 3. Actual results: nfs issue caused appearance of a message which could make an admin think the whole setup failed Expected results: setup is done but something got wrong Additional info:
This is an intermittent issue we hit when deploying RHEV. We do not recover from this failure, therefore if this is seen the RHCI deployment will be in a failed state. The progress bars will not reflect the failure, they will be 'stuck' indefinitely. In Step 5B "Installation Progress": The RHEV progress bar will remain stuck at 95.5% The backend task will be looping over checking the RHEV data center for it to come up.
If this error is seen a user may recover from the problem by: 1) Determining the IP address of the rhev engine 2) SSH as root to the rhev engine, using the password entered in the deployment 3) Run "engine-setup" manually 4) Wait for the next puppet run to be invoked on rhev engine which will complete the configuration of the data center 5) Deployment will pick back up and continue executing
Assigning to Julie for review. Julie - looks like what we need for this is a short troubleshooting section that outlines the above issue and how to resolve it using the procedure in comment #2.
Verified.
Documentation now available at: https://access.redhat.com/documentation/en/quickstart-cloud-installer/version-1.0/quickstart-cloud-installer-guide/