Bug 1057119
| Summary: | [vdsm] cannot add host into setup because vdsmd does not run correctly | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Jiri Belka <jbelka> | ||||||||||
| Component: | vdsm | Assignee: | Dan Kenigsberg <danken> | ||||||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Pavel Stehlik <pstehlik> | ||||||||||
| Severity: | urgent | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | unspecified | CC: | abaron, acathrow, bazulay, gklein, iheim, jbelka, jonas, mgoldboi, sbonazzo, yeylon | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | 3.4.0 | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | infra | ||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2014-01-27 09:19:49 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
# cat /proc/cmdline root=live:LABEL=Root ro rootfstype=auto rootflags=ro RD_NO_LVM crashkernel=128M elevator=deadline max_loop=256 rd_NO_LUKS rd_NO_MD rd_NO_DM edd=off enforcing=0 This seems like a dup of bug 1055153 : Vdsm fails to start since it cannot write to its log which is owned by root. -rw-r--r--. 1 root root 0 2014-01-23 13:23 vdsm.log Is it an ovirt-hosted-engine deployment, too? I too seem to have bumped into this bug. Fixing the ownership on the log file in my case does however not seem to help, the log remains empty even with correct owner set. drwxr-xr-x. 3 vdsm kvm 4096 Jan 23 21:46 . drwxr-xr-x. 13 root root 4096 Jan 23 22:51 .. drwxr-xr-x. 2 vdsm kvm 4096 Jan 23 21:30 backup -rw-r--r--. 1 root root 0 Jan 23 21:46 metadata.log -rw-r--r--. 1 root root 15183 Jan 23 23:02 supervdsm.log -rw-r--r--. 1 vdsm kvm 352 Jan 23 21:46 upgrade.log -rw-r--r--. 1 vdsm kvm 0 Jan 23 22:21 vdsm.log FWIW my machine also have multiple nics, 4 to be precise. If by 'ovirt-hosted-engine' means used the ovirt iso the answer is yes. I've even tried the smoking fresh ISO released tonight http://lists.ovirt.org/pipermail/node-devel/2014-January/000589.html Created attachment 854635 [details]
message log when shit hits the fan
(In reply to Jonas Israelsson from comment #3) > > FWIW my machine also have multiple nics, 4 to be precise. Danken, this too has multiple nics. It can't be a coincidence. Something with having multiple nics _and_ wrong vdsm packaging not owning vdsm.log and not creating it with right ownership and mode bits cause vdsmd not to start correctly. Jonas has 644 on vdsm.log. I assume it should be 664 for kvm group to be able to log there. Ownership and file permissions make no difference (vdsm:kvm - tried even 777 for example). No, this is normal ovirt-node (hypervisor). # cat /etc/ovirt-node-iso-release oVirt Node Hypervisor release 3.0.3 (0.999.201401221941draft.el6) I also played around with permission (but never mentioned), as Jiri said 777 makes no difference. The 644 on my vdsm.log is set by me, I just took the same as the other log-files hence I'm not 100% sure if that actually is the default permission. Setting target release to current version for consideration and review. please do not push non-RFE bugs to an undefined target release to make sure bugs are reviewed for relevancy, fix, closure, etc. Jonas, would you share your supervdsm.log and upgrade.log? Could you do su - vdsm -s /bin/bash /usr/share/vdsm/vdsm so we can see why vdsm dies so quickly? [root@ft vdsm]# su - vdsm -s /bin/bash /usr/share/vdsm/vdsm /usr/share/vdsm/vdsm: line 11: import: command not found /usr/share/vdsm/vdsm: line 12: import: command not found /usr/share/vdsm/vdsm: line 13: import: command not found /usr/share/vdsm/vdsm: line 14: import: command not found /usr/share/vdsm/vdsm: line 15: import: command not found /usr/share/vdsm/vdsm: line 16: import: command not found /usr/share/vdsm/vdsm: line 17: import: command not found /usr/share/vdsm/vdsm: line 18: import: command not found /usr/share/vdsm/vdsm: line 19: import: command not found /usr/share/vdsm/vdsm: line 20: import: command not found /usr/share/vdsm/vdsm: line 21: import: command not found /usr/share/vdsm/vdsm: line 22: import: command not found /usr/share/vdsm/vdsm: line 23: from: command not found /usr/share/vdsm/vdsm: line 25: from: command not found /usr/share/vdsm/vdsm: line 26: from: command not found /usr/share/vdsm/vdsm: line 27: from: command not found /usr/share/vdsm/vdsm: line 28: from: command not found /usr/share/vdsm/vdsm: line 30: from: command not found /usr/share/vdsm/vdsm: line 31: from: command not found /usr/share/vdsm/vdsm: line 33: import: command not found /usr/share/vdsm/vdsm: line 34: import: command not found /usr/share/vdsm/vdsm: line 41: syntax error near unexpected token `loggerConfFile' /usr/share/vdsm/vdsm: line 41: `loggerConfFile = constants.P_VDSM_CONF + 'logger.conf'' Created attachment 855965 [details]
Supervdsm-log as requested
Created attachment 855966 [details]
upgrade.log
If I make vdsm "su-able" and run /usr/share/vdsm/vdsm I get..
root@ft admin]# su - vdsm
-bash-4.1$ /usr/share/vdsm/vdsm
Traceback (most recent call last):
File "/usr/share/vdsm/vdsm", line 208, in <module>
run(**argDict)
File "/usr/share/vdsm/vdsm", line 82, in run
lconfig.fileConfig(loggerConfFile)
File "/usr/lib64/python2.6/logging/config.py", line 84, in fileConfig
File "/usr/lib64/python2.6/logging/config.py", line 162, in _install_handlers
File "/usr/lib64/python2.6/logging/handlers.py", line 377, in __init__
File "/usr/lib64/python2.6/logging/__init__.py", line 827, in __init__
File "/usr/lib64/python2.6/logging/__init__.py", line 846, in _open
IOError: [Errno 13] Permission denied: '/var/log/vdsm/metadata.log'
-bash-4.1$ ll /var/log/vdsm/
total 24
drwxr-xr-x. 2 vdsm kvm 4096 2014-01-23 21:30 backup
-rw-r--r--. 1 root root 0 2014-01-23 21:46 metadata.log
-rw-r--r--. 1 root root 15183 2014-01-23 23:02 supervdsm.log
-rw-r--r--. 1 vdsm kvm 352 2014-01-23 21:46 upgrade.log
-rw-r--r--. 1 vdsm kvm 0 2014-01-23 22:21 vdsm.log
So I changed to
-bash-4.1$ ll /var/log/vdsm
total 64
drwxr-xr-x. 2 vdsm kvm 4096 2014-01-23 21:30 backup
-rw-r--r--. 1 vdsm kvm 0 2014-01-23 21:46 metadata.log
-rw-r--r--. 1 vdsm kvm 1161 2014-01-27 08:59 mom.log
-rw-r--r--. 1 root root 16885 2014-01-27 08:59 supervdsm.log
-rw-r--r--. 1 vdsm kvm 352 2014-01-23 21:46 upgrade.log
-rw-r--r--. 1 vdsm kvm 24701 2014-01-27 08:59 vdsm.log
And now it seem vdsm is able to start. I'll try to re-register the node now..
That seem indeed to make to node able to complete its installation and is now registered to the engine. Thanks for your report, Jonas. It seems bug 1055153 all over again. *** This bug has been marked as a duplicate of bug 1055153 *** |
Created attachment 854381 [details] /var/log files Description of problem: 2014-01-23 14:23:23,046 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException 2014-01-23 14:23:23,047 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] Command PollVDSCommand(HostName = dell-r210ii-13.rhev.lab.eng.brq.redhat.com, HostId = 7e2ac33c-cb22-4cb0-8fab-764a73899381) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException 2014-01-23 14:23:23,549 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException 2014-01-23 14:23:23,552 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] Command PollVDSCommand(HostName = dell-r210ii-13.rhev.lab.eng.brq.redhat.com, HostId = 7e2ac33c-cb22-4cb0-8fab-764a73899381) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException 2014-01-23 14:23:24,054 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] Host installation failed for host 7e2ac33c-cb22-4cb0-8fab-764a73899381, dell-r210ii-13.rhev.lab.eng.brq.redhat.com.: org.ovirt.engine.core.bll.InstallVdsCommand$VdsInstallException: Network error during communication with the host at org.ovirt.engine.core.bll.InstallVdsCommand.configureManagementNetwork(InstallVdsCommand.java:278) [bll.jar:] at org.ovirt.engine.core.bll.InstallVdsCommand.installHost(InstallVdsCommand.java:203) [bll.jar:] at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:105) [bll.jar:] at org.ovirt.engine.core.bll.ApproveVdsCommand.executeCommand(ApproveVdsCommand.java:45) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1114) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1199) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1875) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1219) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:351) [bll.jar:] at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:179) [bll.jar:] at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:151) [bll.jar:] at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:160) [bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:97) [utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51] # /etc/init.d/vdsmd status VDS daemon is not running, and its watchdog is running # ls -ltr /var/log/vdsm total 16 drwxr-xr-x. 2 vdsm kvm 4096 2014-01-23 13:05 backup -rw-r--r--. 1 root root 7774 2014-01-23 13:23 supervdsm.log -rw-r--r--. 1 root root 0 2014-01-23 13:23 vdsm.log -rw-r--r--. 1 root root 0 2014-01-23 13:23 metadata.log -rw-r--r--. 1 vdsm kvm 352 2014-01-23 13:23 upgrade.log # tail /var/log/messages Jan 23 13:23:32 dell-r210ii-13 python: error in accessing vdsm log file Jan 23 13:23:32 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave Jan 23 13:23:32 dell-r210ii-13 python: error in accessing vdsm log file Jan 23 13:23:32 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave Jan 23 13:23:32 dell-r210ii-13 python: error in accessing vdsm log file Jan 23 13:23:32 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave Jan 23 13:23:32 dell-r210ii-13 python: error in accessing vdsm log file Jan 23 13:23:32 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave Jan 23 13:23:33 dell-r210ii-13 python: error in accessing vdsm log file Jan 23 13:23:33 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly for more than 30 seconds, master sleeping for 900 seconds # getenforce Permissive # iptables -nL Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:54321 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:16514 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 multiport dports 5634:6166 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 multiport dports 49152:49216 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:161 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all -- 0.0.0.0/0 0.0.0.0/0 PHYSDEV match ! --physdev-is-bridged reject-with icmp-host-prohibited Chain OUTPUT (policy ACCEPT) target prot opt source destination # netstat -putna Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 4125/rpcbind tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 12584/sshd tcp 0 0 0.0.0.0:662 0.0.0.0:* LISTEN 4157/rpc.statd tcp 0 0 0.0.0.0:16514 0.0.0.0:* LISTEN 14447/libvirtd tcp 0 0 10.34.62.205:22 10.34.131.48:53085 ESTABLISHED 16280/sshd tcp 0 0 :::111 :::* LISTEN 4125/rpcbind tcp 0 0 :::22 :::* LISTEN 12584/sshd tcp 0 0 :::662 :::* LISTEN 4157/rpc.statd udp 0 0 0.0.0.0:111 0.0.0.0:* 4125/rpcbind udp 0 0 10.34.62.205:123 0.0.0.0:* 12606/ntpd udp 0 0 127.0.0.1:123 0.0.0.0:* 12606/ntpd udp 0 0 0.0.0.0:123 0.0.0.0:* 12606/ntpd udp 0 0 0.0.0.0:908 0.0.0.0:* 4125/rpcbind udp 0 0 0.0.0.0:662 0.0.0.0:* 4157/rpc.statd udp 0 0 0.0.0.0:941 0.0.0.0:* 4157/rpc.statd udp 0 0 0.0.0.0:68 0.0.0.0:* 13376/dhclient udp 0 0 :::111 :::* 4125/rpcbind udp 0 0 2620:52:0:223c:d6ae:52ff:123 :::* 12606/ntpd udp 0 0 fe80::d6ae:52ff:fec7:1a3:123 :::* 12606/ntpd udp 0 0 fe80::d6ae:52ff:fec7:1a3:123 :::* 12606/ntpd udp 0 0 ::1:123 :::* 12606/ntpd udp 0 0 :::123 :::* 12606/ntpd udp 0 0 :::908 :::* 4125/rpcbind udp 0 0 :::662 :::* 4157/rpc.statd Version-Release number of selected component (if applicable): # cat /etc/ovirt-node-iso-release oVirt Node Hypervisor release 3.0.3 (0.999.201401221941draft.el6) How reproducible: 100% Steps to Reproduce: 1. add into setup 2. 3. Actual results: vdsmd does not run correctly Expected results: working :) Additional info: