Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1057119

Summary: [vdsm] cannot add host into setup because vdsmd does not run correctly
Product: [Retired] oVirt Reporter: Jiri Belka <jbelka>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED DUPLICATE QA Contact: Pavel Stehlik <pstehlik>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, bazulay, gklein, iheim, jbelka, jonas, mgoldboi, sbonazzo, yeylon
Target Milestone: ---   
Target Release: 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-27 09:19:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log files
none
message log when shit hits the fan
none
Supervdsm-log as requested
none
upgrade.log none

Description Jiri Belka 2014-01-23 13:36:00 UTC
Created attachment 854381 [details]
/var/log files

Description of problem:

2014-01-23 14:23:23,046 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
2014-01-23 14:23:23,047 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] Command PollVDSCommand(HostName = dell-r210ii-13.rhev.lab.eng.brq.redhat.com, HostId = 7e2ac33c-cb22-4cb0-8fab-764a73899381) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
2014-01-23 14:23:23,549 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
2014-01-23 14:23:23,552 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] Command PollVDSCommand(HostName = dell-r210ii-13.rhev.lab.eng.brq.redhat.com, HostId = 7e2ac33c-cb22-4cb0-8fab-764a73899381) execution failed. Exception: RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
2014-01-23 14:23:24,054 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (org.ovirt.thread.pool-6-thread-40) [7f11097c] Host installation failed for host 7e2ac33c-cb22-4cb0-8fab-764a73899381, dell-r210ii-13.rhev.lab.eng.brq.redhat.com.: org.ovirt.engine.core.bll.InstallVdsCommand$VdsInstallException: Network error during communication with the host
        at org.ovirt.engine.core.bll.InstallVdsCommand.configureManagementNetwork(InstallVdsCommand.java:278) [bll.jar:]
        at org.ovirt.engine.core.bll.InstallVdsCommand.installHost(InstallVdsCommand.java:203) [bll.jar:]
        at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:105) [bll.jar:]
        at org.ovirt.engine.core.bll.ApproveVdsCommand.executeCommand(ApproveVdsCommand.java:45) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1114) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1199) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1875) [bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1219) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:351) [bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.executeValidatedCommand(MultipleActionsRunner.java:179) [bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner.runCommands(MultipleActionsRunner.java:151) [bll.jar:]
        at org.ovirt.engine.core.bll.MultipleActionsRunner$2.run(MultipleActionsRunner.java:160) [bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:97) [utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_51]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [rt.jar:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_51]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_51]
        at java.lang.Thread.run(Thread.java:744) [rt.jar:1.7.0_51]


# /etc/init.d/vdsmd status
VDS daemon is not running, and its watchdog is running

# ls -ltr /var/log/vdsm
total 16
drwxr-xr-x. 2 vdsm kvm  4096 2014-01-23 13:05 backup
-rw-r--r--. 1 root root 7774 2014-01-23 13:23 supervdsm.log
-rw-r--r--. 1 root root    0 2014-01-23 13:23 vdsm.log
-rw-r--r--. 1 root root    0 2014-01-23 13:23 metadata.log
-rw-r--r--. 1 vdsm kvm   352 2014-01-23 13:23 upgrade.log

# tail /var/log/messages 
Jan 23 13:23:32 dell-r210ii-13 python: error in accessing vdsm log file
Jan 23 13:23:32 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave
Jan 23 13:23:32 dell-r210ii-13 python: error in accessing vdsm log file
Jan 23 13:23:32 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave
Jan 23 13:23:32 dell-r210ii-13 python: error in accessing vdsm log file
Jan 23 13:23:32 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave
Jan 23 13:23:32 dell-r210ii-13 python: error in accessing vdsm log file
Jan 23 13:23:32 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly, respawning slave
Jan 23 13:23:33 dell-r210ii-13 python: error in accessing vdsm log file
Jan 23 13:23:33 dell-r210ii-13 respawn: slave '/usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid' died too quickly for more than 30 seconds, master sleeping for 900 seconds

# getenforce 
Permissive

# iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:54321 
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:16514 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           tcp dpt:22 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           multiport dports 5634:6166 
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           multiport dports 49152:49216 
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0           udp dpt:161 
REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
REJECT     all  --  0.0.0.0/0            0.0.0.0/0           PHYSDEV match ! --physdev-is-bridged reject-with icmp-host-prohibited 

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

# netstat -putna
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN      4125/rpcbind        
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      12584/sshd          
tcp        0      0 0.0.0.0:662                 0.0.0.0:*                   LISTEN      4157/rpc.statd      
tcp        0      0 0.0.0.0:16514               0.0.0.0:*                   LISTEN      14447/libvirtd      
tcp        0      0 10.34.62.205:22             10.34.131.48:53085          ESTABLISHED 16280/sshd          
tcp        0      0 :::111                      :::*                        LISTEN      4125/rpcbind        
tcp        0      0 :::22                       :::*                        LISTEN      12584/sshd          
tcp        0      0 :::662                      :::*                        LISTEN      4157/rpc.statd      
udp        0      0 0.0.0.0:111                 0.0.0.0:*                               4125/rpcbind        
udp        0      0 10.34.62.205:123            0.0.0.0:*                               12606/ntpd          
udp        0      0 127.0.0.1:123               0.0.0.0:*                               12606/ntpd          
udp        0      0 0.0.0.0:123                 0.0.0.0:*                               12606/ntpd          
udp        0      0 0.0.0.0:908                 0.0.0.0:*                               4125/rpcbind        
udp        0      0 0.0.0.0:662                 0.0.0.0:*                               4157/rpc.statd      
udp        0      0 0.0.0.0:941                 0.0.0.0:*                               4157/rpc.statd      
udp        0      0 0.0.0.0:68                  0.0.0.0:*                               13376/dhclient      
udp        0      0 :::111                      :::*                                    4125/rpcbind        
udp        0      0 2620:52:0:223c:d6ae:52ff:123 :::*                                    12606/ntpd          
udp        0      0 fe80::d6ae:52ff:fec7:1a3:123 :::*                                    12606/ntpd          
udp        0      0 fe80::d6ae:52ff:fec7:1a3:123 :::*                                    12606/ntpd          
udp        0      0 ::1:123                     :::*                                    12606/ntpd          
udp        0      0 :::123                      :::*                                    12606/ntpd          
udp        0      0 :::908                      :::*                                    4125/rpcbind        
udp        0      0 :::662                      :::*                                    4157/rpc.statd

Version-Release number of selected component (if applicable):
# cat /etc/ovirt-node-iso-release 
oVirt Node Hypervisor release 3.0.3 (0.999.201401221941draft.el6)

How reproducible:
100%

Steps to Reproduce:
1. add into setup
2.
3.

Actual results:
vdsmd does not run correctly

Expected results:
working :)

Additional info:

Comment 1 Jiri Belka 2014-01-23 13:36:47 UTC
# cat /proc/cmdline 
root=live:LABEL=Root ro rootfstype=auto rootflags=ro RD_NO_LVM  crashkernel=128M elevator=deadline max_loop=256 rd_NO_LUKS rd_NO_MD rd_NO_DM edd=off enforcing=0

Comment 2 Dan Kenigsberg 2014-01-23 19:15:32 UTC
This seems like a dup of bug 1055153 : Vdsm fails to start since it cannot write to its log which is owned by root.

-rw-r--r--. 1 root root    0 2014-01-23 13:23 vdsm.log

Is it an ovirt-hosted-engine deployment, too?

Comment 3 Jonas Israelsson 2014-01-23 23:22:14 UTC
I too seem to have bumped into this bug. Fixing the ownership on the log file in my case does however not seem to help, the log remains empty even with correct owner set.

drwxr-xr-x.  3 vdsm kvm   4096 Jan 23 21:46 .
drwxr-xr-x. 13 root root  4096 Jan 23 22:51 ..
drwxr-xr-x.  2 vdsm kvm   4096 Jan 23 21:30 backup
-rw-r--r--.  1 root root     0 Jan 23 21:46 metadata.log
-rw-r--r--.  1 root root 15183 Jan 23 23:02 supervdsm.log
-rw-r--r--.  1 vdsm kvm    352 Jan 23 21:46 upgrade.log
-rw-r--r--.  1 vdsm kvm      0 Jan 23 22:21 vdsm.log


FWIW my machine also have multiple nics, 4 to be precise.

If by 'ovirt-hosted-engine' means used the ovirt iso the answer is yes. I've even tried the smoking fresh ISO released tonight

http://lists.ovirt.org/pipermail/node-devel/2014-January/000589.html

Comment 4 Jonas Israelsson 2014-01-23 23:23:30 UTC
Created attachment 854635 [details]
message log when shit hits the fan

Comment 5 Sandro Bonazzola 2014-01-24 07:05:21 UTC
(In reply to Jonas Israelsson from comment #3)
 
> 
> FWIW my machine also have multiple nics, 4 to be precise.

Danken, this too has multiple nics.
It can't be a coincidence. Something with having multiple nics _and_ wrong vdsm packaging not owning vdsm.log and not creating it with right ownership and mode bits cause vdsmd not to start correctly.

Jonas has 644 on vdsm.log. I assume it should be 664 for kvm group to be able to log there.

Comment 6 Jiri Belka 2014-01-24 10:04:47 UTC
Ownership and file permissions make no difference (vdsm:kvm - tried even 777 for example). No, this is normal ovirt-node (hypervisor).

# cat /etc/ovirt-node-iso-release 
oVirt Node Hypervisor release 3.0.3 (0.999.201401221941draft.el6)

Comment 7 Jonas Israelsson 2014-01-25 08:16:25 UTC
I also played around with permission (but never mentioned), as Jiri said 777 makes no difference.
The 644 on my vdsm.log is set by me, I just took the same as the other log-files hence I'm not 100% sure if that actually is the default permission.

Comment 8 Itamar Heim 2014-01-26 08:10:21 UTC
Setting target release to current version for consideration and review. please
do not push non-RFE bugs to an undefined target release to make sure bugs are
reviewed for relevancy, fix, closure, etc.

Comment 9 Dan Kenigsberg 2014-01-27 01:34:11 UTC
Jonas, would you share your supervdsm.log and upgrade.log?

Could you do

  su - vdsm -s /bin/bash
  /usr/share/vdsm/vdsm

so we can see why vdsm dies so quickly?

Comment 10 Jonas Israelsson 2014-01-27 08:50:07 UTC
[root@ft vdsm]# su - vdsm -s /bin/bash /usr/share/vdsm/vdsm
/usr/share/vdsm/vdsm: line 11: import: command not found
/usr/share/vdsm/vdsm: line 12: import: command not found
/usr/share/vdsm/vdsm: line 13: import: command not found
/usr/share/vdsm/vdsm: line 14: import: command not found
/usr/share/vdsm/vdsm: line 15: import: command not found
/usr/share/vdsm/vdsm: line 16: import: command not found
/usr/share/vdsm/vdsm: line 17: import: command not found
/usr/share/vdsm/vdsm: line 18: import: command not found
/usr/share/vdsm/vdsm: line 19: import: command not found
/usr/share/vdsm/vdsm: line 20: import: command not found
/usr/share/vdsm/vdsm: line 21: import: command not found
/usr/share/vdsm/vdsm: line 22: import: command not found
/usr/share/vdsm/vdsm: line 23: from: command not found
/usr/share/vdsm/vdsm: line 25: from: command not found
/usr/share/vdsm/vdsm: line 26: from: command not found
/usr/share/vdsm/vdsm: line 27: from: command not found
/usr/share/vdsm/vdsm: line 28: from: command not found
/usr/share/vdsm/vdsm: line 30: from: command not found
/usr/share/vdsm/vdsm: line 31: from: command not found
/usr/share/vdsm/vdsm: line 33: import: command not found
/usr/share/vdsm/vdsm: line 34: import: command not found
/usr/share/vdsm/vdsm: line 41: syntax error near unexpected token `loggerConfFile'
/usr/share/vdsm/vdsm: line 41: `loggerConfFile = constants.P_VDSM_CONF + 'logger.conf''

Comment 11 Jonas Israelsson 2014-01-27 08:51:00 UTC
Created attachment 855965 [details]
Supervdsm-log as requested

Comment 12 Jonas Israelsson 2014-01-27 08:51:35 UTC
Created attachment 855966 [details]
upgrade.log

Comment 13 Jonas Israelsson 2014-01-27 09:01:15 UTC
If I make vdsm "su-able" and run /usr/share/vdsm/vdsm I get..

root@ft admin]# su - vdsm
-bash-4.1$ /usr/share/vdsm/vdsm
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 208, in <module>
    run(**argDict)
  File "/usr/share/vdsm/vdsm", line 82, in run
    lconfig.fileConfig(loggerConfFile)
  File "/usr/lib64/python2.6/logging/config.py", line 84, in fileConfig
  File "/usr/lib64/python2.6/logging/config.py", line 162, in _install_handlers
  File "/usr/lib64/python2.6/logging/handlers.py", line 377, in __init__
  File "/usr/lib64/python2.6/logging/__init__.py", line 827, in __init__
  File "/usr/lib64/python2.6/logging/__init__.py", line 846, in _open
IOError: [Errno 13] Permission denied: '/var/log/vdsm/metadata.log'

-bash-4.1$ ll /var/log/vdsm/
total 24
drwxr-xr-x. 2 vdsm kvm   4096 2014-01-23 21:30 backup
-rw-r--r--. 1 root root     0 2014-01-23 21:46 metadata.log
-rw-r--r--. 1 root root 15183 2014-01-23 23:02 supervdsm.log
-rw-r--r--. 1 vdsm kvm    352 2014-01-23 21:46 upgrade.log
-rw-r--r--. 1 vdsm kvm      0 2014-01-23 22:21 vdsm.log

So I changed to

-bash-4.1$ ll /var/log/vdsm
total 64
drwxr-xr-x. 2 vdsm kvm   4096 2014-01-23 21:30 backup
-rw-r--r--. 1 vdsm kvm      0 2014-01-23 21:46 metadata.log
-rw-r--r--. 1 vdsm kvm   1161 2014-01-27 08:59 mom.log
-rw-r--r--. 1 root root 16885 2014-01-27 08:59 supervdsm.log
-rw-r--r--. 1 vdsm kvm    352 2014-01-23 21:46 upgrade.log
-rw-r--r--. 1 vdsm kvm  24701 2014-01-27 08:59 vdsm.log

And now it seem vdsm is able to start. I'll try to re-register the node now..

Comment 14 Jonas Israelsson 2014-01-27 09:07:17 UTC
That seem indeed to make to node able to complete its installation and is now registered to the engine.

Comment 15 Dan Kenigsberg 2014-01-27 09:19:49 UTC
Thanks for your report, Jonas. It seems bug 1055153 all over again.

*** This bug has been marked as a duplicate of bug 1055153 ***