Bug 1254185
Summary: | Failed to deploy HE over RHEVH6.7 | [ ERROR ] Failed to execute stage 'Misc configuration': Failed to persist network configuration. Error code: "16" message: "Unexpected exception" | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Nikolai Sednev <nsednev> | ||||||||||||||
Component: | vdsm | Assignee: | Yeela Kaplan <ykaplan> | ||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> | ||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||
Priority: | urgent | ||||||||||||||||
Version: | 3.5.4 | CC: | bazulay, bkorren, bmcclain, danken, dkenigsb, dougsland, ecohen, fdeutsch, gklein, ibarkan, istein, lpeer, lsurette, nsednev, oourfali, pstehlik, sbonazzo, ycui, yeylon, ylavi | ||||||||||||||
Target Milestone: | --- | Keywords: | Triaged, Unconfirmed, ZStream | ||||||||||||||
Target Release: | 3.5.4 | ||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | infra | ||||||||||||||||
Fixed In Version: | vdsm-4.16.26 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2015-09-06 17:09:35 UTC | Type: | Bug | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 1250199 | ||||||||||||||||
Attachments: |
|
Description
Nikolai Sednev
2015-08-17 11:46:09 UTC
Created attachment 1063780 [details]
alma04.log
Created attachment 1063782 [details]
ovirt-hosted-engine-setup-20150817110321-g67var.log
We have supervdsm running twice, and writing twice into the same log file MainThread::DEBUG::2015-08-17 08:08:51,518::supervdsmServer::422::SuperVdsm.Server::(main) Making sure I'm root - SuperVdsm MainThread::DEBUG::2015-08-17 08:08:51,518::supervdsmServer::422::SuperVdsm.Server::(main) Making sure I'm root - SuperVdsm MainThread::ERROR::2015-08-17 08:08:51,519::supervdsmServer::468::SuperVdsm.Server::(main) Could not start Super Vdsm Traceback (most recent call last): File "/usr/share/vdsm/supervdsmServer", line 448, in main server = manager.get_server() File "/usr/lib64/python2.6/multiprocessing/managers.py", line 467, in get_server File "/usr/lib64/python2.6/multiprocessing/managers.py", line 136, in __init__ File "/usr/lib64/python2.6/multiprocessing/connection.py", line 106, in __init__ File "/usr/lib64/python2.6/multiprocessing/connection.py", line 227, in __init__ File "<string>", line 1, in bind error: [Errno 98] Address already in use Hi Dan, (In reply to Dan Kenigsberg from comment #3) > We have supervdsm running twice, and writing twice into the same log file > > MainThread::DEBUG::2015-08-17 > 08:08:51,518::supervdsmServer::422::SuperVdsm.Server::(main) Making sure I'm > root - SuperVdsm > MainThread::DEBUG::2015-08-17 > 08:08:51,518::supervdsmServer::422::SuperVdsm.Server::(main) Making sure I'm > root - SuperVdsm > > MainThread::ERROR::2015-08-17 > 08:08:51,519::supervdsmServer::468::SuperVdsm.Server::(main) Could not start > Super Vdsm > Traceback (most recent call last): > File "/usr/share/vdsm/supervdsmServer", line 448, in main > server = manager.get_server() > File "/usr/lib64/python2.6/multiprocessing/managers.py", line 467, in > get_server > File "/usr/lib64/python2.6/multiprocessing/managers.py", line 136, in > __init__ > File "/usr/lib64/python2.6/multiprocessing/connection.py", line 106, in > __init__ > File "/usr/lib64/python2.6/multiprocessing/connection.py", line 227, in > __init__ > File "<string>", line 1, in bind > error: [Errno 98] Address already in use I was able to reproduce this report just once. I have noticed that during the install of RHEL 6.7 (HE vm) for oVirt Engine setup that vnc application unexpectedly closed and vdsm service was down. I have tried to reproduce this report again to provide full logs (as I could reproduce the report in the first attempt) and I still not able to reproduce after several times. What I have seen was two supervdsm process, one as defunct and the 'Address already in use' in the vdsm.log. Hi Nikolai, Are you able to easily reproduce your report? Thanks! Hi Dan, As I said previously, I am still not able to reproduce again the report but here some data that might help you. After rhev-h installation and networking settings via TUI ================================================================= [root@node ~]# cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 7.1 (20150813.0.el7ev) # rpm -qa | grep -i vdsm vdsm-xmlrpc-4.16.24-2.el7ev.noarch vdsm-python-4.16.24-2.el7ev.noarch vdsm-cli-4.16.24-2.el7ev.noarch vdsm-jsonrpc-4.16.24-2.el7ev.noarch vdsm-hook-vhostmd-4.16.24-2.el7ev.noarch vdsm-4.16.24-2.el7ev.x86_64 vdsm-reg-4.16.24-2.el7ev.noarch vdsm-python-zombiereaper-4.16.24-2.el7ev.noarch vdsm-yajsonrpc-4.16.24-2.el7ev.noarch vdsm-hook-ethtool-options-4.16.24-2.el7ev.noarch ovirt-node-plugin-vdsm-0.2.0-26.el7ev.noarch # ps aux | grep -i vdsm root 12245 0.1 0.5 621828 59220 ? S<sl 22:35 0:00 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock root 16969 0.0 0.0 10632 932 pts/0 S+ 22:47 0:00 grep --color=auto -i vdsm # service vdsmd status Redirecting to /bin/systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: inactive (dead) since Mon 2015-08-17 22:47:31 UTC; 31s ago Process: 16936 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 16121 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=exited, status=0/SUCCESS) Process: 15033 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 16121 (code=exited, status=0/SUCCESS) CGroup: /system.slice/vdsmd.service Aug 17 22:36:08 localhost python[16121]: DIGEST-MD5 parse_server_challenge() Aug 17 22:36:08 localhost python[16121]: DIGEST-MD5 ask_user_info() Aug 17 22:36:08 localhost python[16121]: DIGEST-MD5 client step 2 Aug 17 22:36:08 localhost python[16121]: DIGEST-MD5 ask_user_info() Aug 17 22:36:08 localhost python[16121]: DIGEST-MD5 make_client_response() Aug 17 22:36:08 localhost python[16121]: DIGEST-MD5 client step 3 Aug 17 22:47:23 node.localdomain systemd[1]: Stopping Virtual Desktop Server Manager... Aug 17 22:47:30 node.localdomain python[16121]: DIGEST-MD5 client mech dispose Aug 17 22:47:30 node.localdomain python[16121]: DIGEST-MD5 common mech dispose Aug 17 22:47:30 node.localdomain vdsmd_init_common.sh[16936]: vdsm: Running run_final_hooks Aug 17 22:47:31 node.localdomain systemd[1]: Stopped Virtual Desktop Server Manager. Setting Hosted Engine via TUI ==================================== - Provided RHEL-6.7 iso for HE installer In the middle of initial questions (figure 1) for setting HE, I see vdsm has started and a defunct process of supervdsm has been created. # /bin/systemctl status -l vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Mon 2015-08-17 22:57:37 UTC; 3min 3s ago # ps aux | grep -i vdsm root 17331 1.0 0.5 626520 60564 ? S<sl 22:57 0:00 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock vdsm 17514 1.8 0.3 1515680 45864 ? S<sl 22:57 0:01 /usr/bin/python /usr/share/vdsm/vdsm vdsm 17616 0.0 0.0 382596 1092 ? S<l 22:57 0:00 /usr/libexec/ioprocess --read-pipe-fd 39 --write-pipe-fd 38 --max-threads 10 --max-queued-requests 10 qemu 17624 0.0 0.0 0 0 ? Z< 22:57 0:00 [supervdsmServer] <defunct> vdsm 17641 0.0 0.0 382596 3140 ? S<l 22:57 0:00 /usr/libexec/ioprocess --read-pipe-fd 47 --write-pipe-fd 46 --max-threads 10 --max-queued-requests 10 root 17714 0.0 0.0 10632 932 pts/0 R+ 22:58 0:00 grep --color=auto -i vdsm Later (figure 2), after setting all Hosted-Engine, I see a new "Start vdsmd", when the setup of hosted-engine is complete and ask to login into the system via vnc, this is the status of vdsm and ps: # ps aux | grep -i vdsm root 17977 0.5 0.5 710656 61748 ? S<sl 23:01 0:00 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock vdsm 18171 3.0 0.4 1550224 57120 ? S<sl 23:02 0:04 /usr/bin/python /usr/share/vdsm/vdsm qemu 18687 0.0 0.0 0 0 ? Z< 23:02 0:00 [supervdsmServer] <defunct> vdsm 18688 0.0 0.0 382596 3200 ? S<l 23:02 0:00 /usr/libexec/ioprocess --read-pipe-fd 56 --write-pipe-fd 55 --max-threads 10 --max-queued-requests 10 qemu 19136 7.2 0.2 4724024 34852 ? Sl 23:03 0:02 /usr/libexec/qemu-kvm -name HostedEngine -S -machine rhel6.5.0,accel=kvm,usb=off -cpu SandyBridge -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 9612e98d-6125-4f89-ac78-01661a6cd4ec -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.1-20150813.0.el7ev,serial=8103812E-150B-4725-B1E2-BA9C9B191C81_52:54:00:c6:56:b6,uuid=9612e98d-6125-4f89-ac78-01661a6cd4ec -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/HostedEngine.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2015-08-17T23:03:52,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/data/ovirt-hosted-engine-setup/rhel67.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=/var/run/vdsm/storage/353cf67b-514b-416a-acf6-cb75b6a3c739/cbc09241-a8db-4eda-b164-798a2906029e/6e625010-ee57-46df-a9f6-dc491c4d5d08,if=none,id=drive-virtio-disk0,format=raw,serial=cbc09241-a8db-4eda-b164-798a2906029e,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:3a:be:7d,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/9612e98d-6125-4f89-ac78-01661a6cd4ec.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/9612e98d-6125-4f89-ac78-01661a6cd4ec.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -vnc 0:0,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -msg timestamp=on root 19227 0.0 0.0 10632 932 pts/0 S+ 23:04 0:00 grep --color=auto -i vdsm # service vdsmd status -l Redirecting to /bin/systemctl status -l vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Mon 2015-08-17 23:02:08 UTC; 2min 9s ago Main PID: 18171 (vdsm) CGroup: /system.slice/vdsmd.service ├─18171 /usr/bin/python /usr/share/vdsm/vdsm └─18688 /usr/libexec/ioprocess --read-pipe-fd 56 --write-pipe-fd 55 --max-threads 10 --max-queued-requests 10 Aug 17 23:03:52 node.localdomain python[18171]: DIGEST-MD5 ask_user_info() Aug 17 23:03:52 node.localdomain python[18171]: DIGEST-MD5 make_client_response() Aug 17 23:03:52 node.localdomain python[18171]: DIGEST-MD5 client step 2 Aug 17 23:03:52 node.localdomain python[18171]: DIGEST-MD5 parse_server_challenge() Aug 17 23:03:52 node.localdomain python[18171]: DIGEST-MD5 ask_user_info() Aug 17 23:03:52 node.localdomain python[18171]: DIGEST-MD5 make_client_response() Aug 17 23:03:52 node.localdomain python[18171]: DIGEST-MD5 client step 3 Aug 17 23:03:52 node.localdomain vdsm[18171]: vdsm vds ERROR unexpected error Traceback (most recent call last): File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 1136, in wrapper File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 383, in vmSetTicket File "/usr/share/vdsm/API.py", line 642, in setTicket File "/usr/share/vdsm/virt/vm.py", line 4847, in setTicket AttributeError: 'NoneType' object has no attribute 'XMLDesc' Aug 17 23:03:53 node.localdomain vdsm[18171]: vdsm vds ERROR unexpected error Traceback (most recent call last): File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 1136, in wrapper File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 383, in vmSetTicket File "/usr/share/vdsm/API.py", line 642, in setTicket File "/usr/share/vdsm/virt/vm.py", line 4847, in setTicket AttributeError: 'NoneType' object has no attribute 'XMLDesc' Aug 17 23:03:54 node.localdomain vdsm[18171]: vdsm vm.Vm WARNING vmId=`9612e98d-6125-4f89-ac78-01661a6cd4ec`::_readPauseCode unsupported by libvirt vm I am attaching the logs but keep in mind/ignore that after adding the HE to oVirt Engine 3.6 it failed because of missing package ovirt-vmconsole. I will do new tests in RHEV-M 3.5 too. Hi Nikolai, Just for double check, could you please also provide the RHEV-M version you have used during your report? Thanks! Created attachment 1064103 [details]
logs Aug 17
Created attachment 1064115 [details]
figure1
Created attachment 1064116 [details]
figure2
(In reply to Douglas Schilling Landgraf from comment #5) > Hi Nikolai, > > Are you able to easily reproduce your report? > > Thanks! I wasn't trying to reproduce the issue, it happened during clean deployment and I filed a bug with all the logs. Flow was as follows: [root@alma04 ~]# hosted-engine --deploy [ INFO ] Stage: Initializing [ INFO ] Generating a temporary VNC password. [ INFO ] Stage: Environment setup Continuing will configure this host for serving as hypervisor and create a VM where you have to install oVirt Engine afterwards. Are you sure you want to continue? (Yes, No)[Yes]: Configuration files: [] Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20150817110321-g67var.log Version: otopi-1.3.2 (otopi-1.3.2-1.el6ev) It has been detected that this program is executed through an SSH connection without using screen. Continuing with the installation may lead to broken installation if the network connection fails. It is highly recommended to abort the installation and run it inside a screen session using command "screen". Do you want to continue anyway? (Yes, No)[No]: yes [ INFO ] Hardware supports virtualization [ INFO ] Stage: Environment packages setup [ INFO ] Stage: Programs detection [ INFO ] Stage: Environment setup [ INFO ] Generating libvirt-spice certificates [ INFO ] Stage: Environment customization --== STORAGE CONFIGURATION ==-- During customization use CTRL-D to abort. Please specify the storage you would like to use (iscsi, nfs3, nfs4)[nfs3]: Please specify the full shared storage connection path to use (example: host:/path): 10.35.160.108:/Virt/nsednev_HE_3_5_z_1 [ INFO ] Installing on first host Please provide storage domain name. [hosted_storage]: Local storage datacenter name is an internal name and currently will not be shown in engine's admin UI. Please enter local datacenter name [hosted_datacenter]: --== SYSTEM CONFIGURATION ==-- --== NETWORK CONFIGURATION ==-- Please indicate a nic to set rhevm bridge on: (em1, em2, p1p1, p1p2) [em1]: p1p1 iptables was detected on your computer, do you wish setup to configure it? (Yes, No)[Yes]: Please indicate a pingable gateway IP address [10.35.117.254]: --== VM CONFIGURATION ==-- Please specify the device to boot the VM from (cdrom, disk, pxe) [cdrom]: pxe Please specify an alias for the Hosted Engine image [hosted_engine]: The following CPU types are supported by this host: - model_SandyBridge: Intel SandyBridge Family - model_Westmere: Intel Westmere Family - model_Nehalem: Intel Nehalem Family - model_Penryn: Intel Penryn Family - model_Conroe: Intel Conroe Family Please specify the CPU type to be used by the VM [model_SandyBridge]: Please specify the number of virtual CPUs for the VM [Defaults to minimum requirement: 2]: Please specify the disk size of the VM in GB [Defaults to minimum requirement: 25]: You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:39:e4:af]: 00:16:3E:7C:CC:CC Please specify the memory size of the VM in MB [Defaults to minimum requirement: 4096]: Please specify the console type you would like to use to connect to the VM (vnc, spice) [vnc]: --== HOSTED ENGINE CONFIGURATION ==-- Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_1]: Enter 'admin@internal' user password that will be used for accessing the Administrator Portal: Confirm 'admin@internal' user password: Please provide the FQDN for the engine you would like to use. This needs to match the FQDN that you will use for the engine installation within the VM. Note: This will be the FQDN of the VM you are now going to create, it should not point to the base host or to any other existing machine. Engine FQDN: nsednev-he-3.qa.lab.tlv.redhat.com Please provide the name of the SMTP server through which we will send notifications [localhost]: Please provide the TCP port number of the SMTP server [25]: Please provide the email address from which notifications will be sent [root@localhost]: Please provide a comma-separated list of email addresses which will get notifications [root@localhost]: [ INFO ] Stage: Setup validation --== CONFIGURATION PREVIEW ==-- Bridge interface : p1p1 Engine FQDN : nsednev-he-3.qa.lab.tlv.redhat.com Bridge name : rhevm SSH daemon port : 22 Firewall manager : iptables Gateway address : 10.35.117.254 Host name for web application : hosted_engine_1 Host ID : 1 Image alias : hosted_engine Image size GB : 25 Storage connection : 10.35.160.108:/Virt/nsednev_HE_3_5_z_1 Console type : vnc Memory size MB : 4096 MAC address : 00:16:3E:7C:CC:CC Boot type : pxe Number of CPUs : 2 CPU Type : model_SandyBridge Please confirm installation settings (Yes, No)[Yes]: [ INFO ] Stage: Transaction setup [ INFO ] Stage: Misc configuration [ INFO ] Stage: Package installation [ INFO ] Stage: Misc configuration [ INFO ] Configuring libvirt [ INFO ] Configuring VDSM [ INFO ] Starting vdsmd [ INFO ] Waiting for VDSM hardware info [ INFO ] Waiting for VDSM hardware info [ INFO ] Configuring the management bridge [ ERROR ] Failed to execute stage 'Misc configuration': Failed to persist network configuration. Error code: "16" message: "Unexpected exception" [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150817110729.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [root@alma04 ~]# cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150813.0.el6ev) (In reply to Douglas Schilling Landgraf from comment #7) > Hi Nikolai, > > Just for double check, could you please also provide the RHEV-M version you > have used during your report? > > Thanks! vt16.6. Please provide the vdsm and he-setup logs for comment 11. (In reply to Fabian Deutsch from comment #13) > Please provide the vdsm and he-setup logs for comment 11. All logs were already provided to the bug, within the tar file alma04.log you will find the vdsm.log and others, while he-setup log provided separately ovirt-hosted-engine-setup-20150817110321-g67var.log Please see the "Attachments" section of this bug report. Thread-16::DEBUG::2015-08-17 11:07:23,878::BindingXMLRPC::1133::vds::(wrapper) client [127.0.0.1]::call setupNetworks with ({'rhevm': {'nic': 'p1p1', 'bootproto': 'dhcp', 'blockingdhcp': True}}, {}, {'connectivityCheck': False}) {} Thread-16::DEBUG::2015-08-17 11:07:29,705::BindingXMLRPC::1140::vds::(wrapper) return setupNetworks with {'status': {'message': 'Done', 'code': 0}} Detector thread::DEBUG::2015-08-17 11:07:29,722::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 127.0.0.1:35476 Detector thread::DEBUG::2015-08-17 11:07:29,736::protocoldetector::201::vds.MultiProtocolAcceptor::(_remove_connection) Connection removed from 127.0.0.1:35476 Detector thread::DEBUG::2015-08-17 11:07:29,736::protocoldetector::247::vds.MultiProtocolAcceptor::(_handle_connection_read) Detected protocol xml from 127.0.0.1:35476 Detector thread::DEBUG::2015-08-17 11:07:29,736::BindingXMLRPC::1173::XmlDetector::(handleSocket) xml over http detected from ('127.0.0.1', 35476) Thread-17::DEBUG::2015-08-17 11:07:29,776::BindingXMLRPC::1133::vds::(wrapper) client [127.0.0.1]::call setSafeNetworkConfig with () {} Thread-17::ERROR::2015-08-17 11:07:29,776::BindingXMLRPC::1152::vds::(wrapper) unexpected error Traceback (most recent call last): File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 1136, in wrapper File "/usr/share/vdsm/rpc/BindingXMLRPC.py", line 562, in setSafeNetworkConfig File "/usr/share/vdsm/API.py", line 1541, in setSafeNetworkConfig File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> File "<string>", line 2, in setSafeNetworkConfig File "/usr/lib64/python2.6/multiprocessing/managers.py", line 722, in _callmethod File "/usr/lib64/python2.6/multiprocessing/managers.py", line 709, in _connect File "/usr/lib64/python2.6/multiprocessing/connection.py", line 143, in Client File "/usr/lib64/python2.6/multiprocessing/connection.py", line 270, in SocketClient File "<string>", line 1, in connect error: [Errno 2] No such file or directory supervdsm shows: MainProcess|Thread-16::INFO::2015-08-17 11:07:29,479::netconfpersistence::76::root::(setNetwork) Adding network rhevm({'nic': 'p1p1', 'bootproto': 'dhcp', 'blockingdhcp': True}) MainProcess|Thread-16::DEBUG::2015-08-17 11:07:29,479::utils::739::root::(execCmd) /sbin/ip route show to 0.0.0.0/0 table all (cwd None) MainProcess|Thread-16::DEBUG::2015-08-17 11:07:29,484::utils::759::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|Thread-16::INFO::2015-08-17 11:07:29,492::netconfpersistence::166::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainProcess|Thread-16::DEBUG::2015-08-17 11:07:29,492::netconfpersistence::174::root::(_clearDisk) No existent config to clear. MainProcess|Thread-16::INFO::2015-08-17 11:07:29,493::netconfpersistence::190::root::(save) Saved new config RunningConfig({'rhevm': {'nic': 'p1p1', 'bootproto': 'dhcp', 'blockingdhcp': True}}, {}) to /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ MainProcess|Thread-16::DEBUG::2015-08-17 11:07:29,493::utils::739::root::(execCmd) /usr/libexec/vdsm/hooks/after_network_setup/30_ethtool_options (cwd None) MainProcess|Thread-16::DEBUG::2015-08-17 11:07:29,704::utils::759::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|Thread-16::INFO::2015-08-17 11:07:29,704::hooks::100::root::(_runHooksDir) MainProcess|Thread-16::DEBUG::2015-08-17 11:07:29,705::supervdsmServer::109::SuperVdsm.ServerCallback::(wrapper) return setupNetworks with None Looks like a protocol error between supervdsmd and vdsmd or some sort of race condition. Moving to vdsm / network vertical Ido, could this be related to your latest network changes? Hardly. The fact that there are multiple instances of supervdsm needs to be resolved first. should this issue be resolved by the following commit? https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=c6f3e6eba71c1166281f4ef7e9be1ad7000f4e77 Please properly assign and ack the bug. Please disregard comment #22, not related to this bug. (In reply to Barak Korren from comment #21) > should this issue be resolved by the following commit? > https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit; > h=c6f3e6eba71c1166281f4ef7e9be1ad7000f4e77 > Please properly assign and ack the bug. Not at all - the patch is unrelated to this bug, and this bug is still under deliberation. When setSafeNetworkconfig is sent from vdsm to supervdsm, nothing shows in the logs - for more than an hour: MainProcess|Thread-16::DEBUG::2015-08-19 08:48:51,050::supervdsmServer::109::SuperVdsm.ServerCallback::(wrapper) return setupNetworks with None MainThread::DEBUG::2015-08-19 10:04:15,180::supervdsmServer::462::SuperVdsm.Server::(main) Terminated normally I don't know what caused supervdsm to die, and what kept it silent for so long. (In reply to Nikolai Sednev from comment #23) > Please disregard comment #22, not related to this bug. So why did you move this bug to hosted-engine? (In reply to Dan Kenigsberg from comment #25) > I don't know what caused supervdsm to die, and what kept it silent for so > long. Nikolai, is this bug reproducible? Might be happened automatically, I didn't tried to change something. Regarding reproduction I tried only once and got this behavior. (In reply to Sandro Bonazzola from comment #26) > (In reply to Nikolai Sednev from comment #23) > > Please disregard comment #22, not related to this bug. > > So why did you move this bug to hosted-engine? > > (In reply to Dan Kenigsberg from comment #25) > > > I don't know what caused supervdsm to die, and what kept it silent for so > > long. > > Nikolai, is this bug reproducible? Sandro, it's reproducible without any problem: [ ERROR ] Failed to execute stage 'Misc configuration': Failed to persist network configuration. Error code: "16" message: "Unexpected exception" Hello, all. I've looked around for the double-supervdsm issue and I have a pretty good idea why it happens. First of all, we should note that in boot.log, vdsm fails to start with the following error: ---------cut here--------- Loading the softdog kernel module: [ OK ] Starting wdmd: [ OK ] Starting ovirt-post: [ OK ] Starting sanlock: [ OK ] Starting virt-who: [ OK ] Starting ovirt-cim: [ OK ] Starting ovirt-firstboot: [ OK ] Starting RHN Managed Xen Domains: [ OK ] supervdsm start [ OK ] cannot start vdsm, operation is locked [FAILED] ---------cut here--------- This log suggest that vdsmd was started before sysvinit got to starting it, and it is true, vdsmd is up when the system starts, and since sysvinit did not start it, someone else did. I've looked around and apparently 'ovirt-post' starts vdsmd in the background, so by the time sysvinit gets to it vdsmd start command fails due to a lock. The first start happens as part of on-boot hook of ovirt-node: ---------cut here--------- [root@localhost libexec]# cat /usr/libexec/ovirt-node/hooks/on-boot/90-start-vdsm #[...] service vdsmd start & exit 0 ---------cut here--------- The second instance of supervdsm is spawned because * when ovirt-node hook starts vdsmd, vdsmd checks if supervdsm is up (its not) and starts it. * then sysvinit gets to starting supervdsm and there is no lock (unlike vdsm service) so with a little luck, service can be started twice if the timing is right. I think we should evaluate if its ovirt-node's responsibility to start vdsmd, and if it is, ovirt-post stage should be sequenced after vdsm/supervdsm services are started. Current start order is: (alphabetical) ---------cut here--------- lrwxrwxrwx. 1 root root 20 2015-08-20 14:10 S98ovirt-post -> ../init.d/ovirt-post lrwxrwxrwx. 1 root root 17 2015-08-20 14:10 S98sanlock -> ../init.d/sanlock lrwxrwxrwx. 1 root root 18 2015-08-20 14:10 S98virt-who -> ../init.d/virt-who lrwxrwxrwx. 1 root root 11 2015-08-14 03:13 S99local -> ../rc.local lrwxrwxrwx. 1 root root 19 2015-08-20 14:10 S99ovirt-cim -> ../init.d/ovirt-cim lrwxrwxrwx. 1 root root 25 2015-08-14 03:15 S99ovirt-firstboot -> ../init.d/ovirt-firstboot lrwxrwxrwx. 1 root root 33 2015-08-14 03:14 S99rhn-virtualization-host -> ../init.d/rhn-virtualization-host lrwxrwxrwx. 1 root root 20 2015-08-14 03:15 S99supervdsmd -> ../init.d/supervdsmd lrwxrwxrwx. 1 root root 15 2015-08-14 03:15 S99vdsmd -> ../init.d/vdsmd ---------cut here--------- Hi Dima, (In reply to Dima Kuznetsov from comment #29) > Hello, all. > > I've looked around for the double-supervdsm issue and I have a pretty good > idea why it happens. > > First of all, we should note that in boot.log, vdsm fails to start with the > following error: > > ---------cut here--------- > Loading the softdog kernel module: [ OK ] > Starting wdmd: [ OK ] > Starting ovirt-post: [ OK ] > Starting sanlock: [ OK ] > Starting virt-who: [ OK ] > Starting ovirt-cim: [ OK ] > Starting ovirt-firstboot: [ OK ] > Starting RHN Managed Xen Domains: [ OK ] > supervdsm start [ OK ] > cannot start vdsm, operation is locked [FAILED] > ---------cut here--------- > > This log suggest that vdsmd was started before sysvinit got to starting it, > and it is true, vdsmd is up when the system starts, and since sysvinit did > not start it, someone else did. > > I've looked around and apparently 'ovirt-post' starts vdsmd in the > background, so by the time sysvinit gets to it vdsmd start command fails due > to a lock. The first start happens as part of on-boot hook of ovirt-node: > ---------cut here--------- > [root@localhost libexec]# cat > /usr/libexec/ovirt-node/hooks/on-boot/90-start-vdsm > #[...] > service vdsmd start & > > exit 0 > ---------cut here--------- > > The second instance of supervdsm is spawned because > * when ovirt-node hook starts vdsmd, vdsmd checks if supervdsm is up (its > not) and starts it. > * then sysvinit gets to starting supervdsm and there is no lock (unlike > vdsm service) so with a little luck, service can be started twice if the > timing is right. > > I think we should evaluate if its ovirt-node's responsibility to start > vdsmd, and if it is, ovirt-post stage should be sequenced after > vdsm/supervdsm services are started. > > Current start order is: (alphabetical) > ---------cut here--------- > lrwxrwxrwx. 1 root root 20 2015-08-20 14:10 S98ovirt-post -> > ../init.d/ovirt-post > lrwxrwxrwx. 1 root root 17 2015-08-20 14:10 S98sanlock -> ../init.d/sanlock > lrwxrwxrwx. 1 root root 18 2015-08-20 14:10 S98virt-who -> ../init.d/virt-who > lrwxrwxrwx. 1 root root 11 2015-08-14 03:13 S99local -> ../rc.local > lrwxrwxrwx. 1 root root 19 2015-08-20 14:10 S99ovirt-cim -> > ../init.d/ovirt-cim > lrwxrwxrwx. 1 root root 25 2015-08-14 03:15 S99ovirt-firstboot -> > ../init.d/ovirt-firstboot > lrwxrwxrwx. 1 root root 33 2015-08-14 03:14 S99rhn-virtualization-host -> > ../init.d/rhn-virtualization-host > lrwxrwxrwx. 1 root root 20 2015-08-14 03:15 S99supervdsmd -> > ../init.d/supervdsmd > lrwxrwxrwx. 1 root root 15 2015-08-14 03:15 S99vdsmd -> ../init.d/vdsmd > ---------cut here--------- Thanks for your input, I could reproduce the problem right now and I don't see the message "cannot start vdsm, operation is locked" in /var/log/boot.log from this fresh install although your logic seems good and this error might be triggering from different scenarios. I will make some tests/debugging now. (In reply to Douglas Schilling Landgraf from comment #30) > > Thanks for your input, I could reproduce the problem right now and I don't > see the message "cannot start vdsm, operation is locked" in > /var/log/boot.log from this fresh install although your logic seems good and > this error might be triggering from different scenarios. I will make some > tests/debugging now. Unfortunately, was not exactly the reproducer, my storage was not able to communicate so I got a different error. @Nikolai, I still cannot see the error you shared bellow. I noticed you triggered the HE manually via console in comment#11 (in a fresh install of rhev-h). Did you see this error triggering the HE via TUI too? """ [ ERROR ] Failed to execute stage 'Misc configuration': Failed to persist network configuration. Error code: "16" message: "Unexpected exception" [ INFO ] Stage: Clean """ Virt QE also helped to try to reproduce the issue through Hosted-Engine TUI in bug description error, so far we haven't reproduced it. Steps: 1. TUI clean installed rhevh 6.7 on local disk machine. 2. dhcp setup network via RHEVH TUI, em1. Note: not bond and vlan network, because here has another bug 100% reproduce: Bug 1254523 3. set root password in RHEVH-> RHEVM TUI. 4. start hosted engine setup on Hosted Engine TUI using rhevm-appliance ova. 5. Use default options for HE setup process. 6. HE can setup successful, not encounter such issue in bug description. Not sure what exact action will trigger this bug, so not sure the exact reproducible here. Thanks. (In reply to Douglas Schilling Landgraf from comment #31) > (In reply to Douglas Schilling Landgraf from comment #30) > > > > Thanks for your input, I could reproduce the problem right now and I don't > > see the message "cannot start vdsm, operation is locked" in > > /var/log/boot.log from this fresh install although your logic seems good and > > this error might be triggering from different scenarios. I will make some > > tests/debugging now. > > Unfortunately, was not exactly the reproducer, my storage was not able to > communicate so I got a different error. > > @Nikolai, I still cannot see the error you shared bellow. I noticed you > triggered the HE manually via console in comment#11 (in a fresh install of > rhev-h). Did you see this error triggering the HE via TUI too? > > """ > [ ERROR ] Failed to execute stage 'Misc configuration': Failed to persist > network configuration. Error code: "16" message: "Unexpected exception" > [ INFO ] Stage: Clean > """ I'm not using the TUI, only CLI based deployment. (In reply to Ying Cui from comment #32) > Virt QE also helped to try to reproduce the issue through Hosted-Engine TUI > in bug description error, so far we haven't reproduced it. > > Steps: > 1. TUI clean installed rhevh 6.7 on local disk machine. > 2. dhcp setup network via RHEVH TUI, em1. > Note: not bond and vlan network, because here has another bug 100% > reproduce: Bug 1254523 > 3. set root password in RHEVH-> RHEVM TUI. > 4. start hosted engine setup on Hosted Engine TUI using rhevm-appliance ova. > 5. Use default options for HE setup process. > 6. HE can setup successful, not encounter such issue in bug description. > > Not sure what exact action will trigger this bug, so not sure the exact > reproducible here. Thanks. Please follow up the original reproduction steps. Deploy using CLI. Deploy using NFS. Deploy using PXE. (In reply to Nikolai Sednev from comment #34) > (In reply to Ying Cui from comment #32) > > Virt QE also helped to try to reproduce the issue through Hosted-Engine TUI > > in bug description error, so far we haven't reproduced it. > > > > Steps: > > 1. TUI clean installed rhevh 6.7 on local disk machine. > > 2. dhcp setup network via RHEVH TUI, em1. > > Note: not bond and vlan network, because here has another bug 100% > > reproduce: Bug 1254523 > > 3. set root password in RHEVH-> RHEVM TUI. > > 4. start hosted engine setup on Hosted Engine TUI using rhevm-appliance ova. > > 5. Use default options for HE setup process. > > 6. HE can setup successful, not encounter such issue in bug description. > > > > Not sure what exact action will trigger this bug, so not sure the exact > > reproducible here. Thanks. > > Please follow up the original reproduction steps. > Deploy using CLI. > Deploy using NFS. > Deploy using PXE. The only supported path is via TUI. Please try to reproduce via TUI asap. The deployment via CLI passed OK at the first RHEVH6.7 with the fix https://gerrit.ovirt.org/#/c/45202/ , thanks to Yeela's efforts. (In reply to Nikolai Sednev from comment #36) > The deployment via CLI passed OK at the first RHEVH6.7 with the fix > https://gerrit.ovirt.org/#/c/45202/ , thanks to Yeela's efforts. Please recreate with TUI. Yaniv, why CLI based HE-deployment not blocked in RHEVHs if the only way of deploying it is via TUI? (In reply to Nikolai Sednev from comment #39) > Yaniv, why CLI based HE-deployment not blocked in RHEVHs if the only way of > deploying it is via TUI? It's not supported users go to shell and type hosted-engine --deploy. We created a flow via TUI with all adjusts to trigger the hosted-engine --deploy. (In reply to Douglas Schilling Landgraf from comment #40) > (In reply to Nikolai Sednev from comment #39) > > Yaniv, why CLI based HE-deployment not blocked in RHEVHs if the only way of > > deploying it is via TUI? > > It's not supported users go to shell and type hosted-engine --deploy. We > created a flow via TUI with all adjusts to trigger the hosted-engine > --deploy. I didn't saw this error happening if deployment triggered by the TUI. (In reply to Nikolai Sednev from comment #41) > (In reply to Douglas Schilling Landgraf from comment #40) > > (In reply to Nikolai Sednev from comment #39) > > > Yaniv, why CLI based HE-deployment not blocked in RHEVHs if the only way of > > > deploying it is via TUI? > > > > It's not supported users go to shell and type hosted-engine --deploy. We > > created a flow via TUI with all adjusts to trigger the hosted-engine > > --deploy. > > I didn't saw this error happening if deployment triggered by the TUI. Then this is not a real bug. (In reply to Yaniv Dary from comment #42) > (In reply to Nikolai Sednev from comment #41) > > (In reply to Douglas Schilling Landgraf from comment #40) > > > (In reply to Nikolai Sednev from comment #39) > > > > Yaniv, why CLI based HE-deployment not blocked in RHEVHs if the only way of > > > > deploying it is via TUI? > > > > > > It's not supported users go to shell and type hosted-engine --deploy. We > > > created a flow via TUI with all adjusts to trigger the hosted-engine > > > --deploy. > > > > I didn't saw this error happening if deployment triggered by the TUI. > > Then this is not a real bug. I've added a second host via TUI, got it added to the HE engine and then saw it came to active. After few seconds host became non-operational and was reported by the engine as: 2015-Aug-25, 09:56 Host hosted_engine_2 moved to Non-Operational state because interfaces which are down are needed by required networks in the current cluster: 'p1p1 (rhevm)'. I've checked that host is UP by logging in to it over ssh and continuously pinging it, it was UP. I've activated the host within the engine via WEBUI, it bacame active for a few seconds and then again was set by the engine in to not-operational with the same error, while host was UP and not even a single ping was lost. After some time I reactivated the host within the engine again and it stayed UP. I've checked that I still have several running vdsmds and supervdsm processes on both hosts, might be that using TUI we're not hitting the race, but it's there and Yeela's patch will resolve multiple vdsm instances issue. I've retried to add the second host after another reprovision and got really weird results: [ INFO ] Still waiting for VDSM host to become operational... The host hosted_engine_2 is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? The host hosted_engine_2 is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? The host hosted_engine_2 is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? I saw some errors within the ovirt-ha-broker: Thread-7::ERROR::2015-08-25 15:47:34,494::notifications::30::ovirt_hosted_engine_ha.broker.notifications.Notifications::(send_email) [Errno 111] Connection refused Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 21, in send_email File "/usr/lib64/python2.6/smtplib.py", line 244, in __init__ File "/usr/lib64/python2.6/smtplib.py", line 300, in connect File "/usr/lib64/python2.6/smtplib.py", line 278, in _get_socket File "/usr/lib64/python2.6/socket.py", line 567, in create_connection error: [Errno 111] Connection refused Thread-67::INFO::2015-08-25 15:47:41,950::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-67::INFO::2015-08-25 15:47:41,955::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-68::INFO::2015-08-25 15:47:41,956::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-68::INFO::2015-08-25 15:47:41,957::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-69::INFO::2015-08-25 15:47:41,958::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-69::INFO::2015-08-25 15:47:41,959::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-10::INFO::2015-08-25 15:47:41,968::mem_free::53::mem_free.MemFree::(action) memFree: 31278 (In reply to Nikolai Sednev from comment #43) > (In reply to Yaniv Dary from comment #42) > > (In reply to Nikolai Sednev from comment #41) > > > (In reply to Douglas Schilling Landgraf from comment #40) > > > > (In reply to Nikolai Sednev from comment #39) > > > > > Yaniv, why CLI based HE-deployment not blocked in RHEVHs if the only way of > > > > > deploying it is via TUI? > > > > > > > > It's not supported users go to shell and type hosted-engine --deploy. We > > > > created a flow via TUI with all adjusts to trigger the hosted-engine > > > > --deploy. > > > > > > I didn't saw this error happening if deployment triggered by the TUI. > > > > Then this is not a real bug. > > I've added a second host via TUI, got it added to the HE engine and then saw > it came to active. > After few seconds host became non-operational and was reported by the engine > as: > > 2015-Aug-25, 09:56 > > Host hosted_engine_2 moved to Non-Operational state because interfaces which > are down are needed by required networks in the current cluster: 'p1p1 > (rhevm)'. > > I've checked that host is UP by logging in to it over ssh and continuously > pinging it, it was UP. I've activated the host within the engine via WEBUI, > it bacame active for a few seconds and then again was set by the engine in > to not-operational with the same error, while host was UP and not even a > single ping was lost. ping against ip or fqdn? in both sides? > After some time I reactivated the host within the engine again and it stayed > UP. > > I've checked that I still have several running vdsmds and supervdsm > processes on both hosts, might be that using TUI we're not hitting the race, > but it's there and Yeela's patch will resolve multiple vdsm instances issue. Please provide the vdsm, ovirt-hosted-engine-ha, ovirt-hosted-engine-setup and engine logs (and even access to machine). If it's a different report we probably need a different bug as it might create a confusion. (In reply to Douglas Schilling Landgraf from comment #45) > (In reply to Nikolai Sednev from comment #43) > > (In reply to Yaniv Dary from comment #42) > > > (In reply to Nikolai Sednev from comment #41) > > > > (In reply to Douglas Schilling Landgraf from comment #40) > > > > > (In reply to Nikolai Sednev from comment #39) > > > > > > Yaniv, why CLI based HE-deployment not blocked in RHEVHs if the only way of > > > > > > deploying it is via TUI? > > > > > > > > > > It's not supported users go to shell and type hosted-engine --deploy. We > > > > > created a flow via TUI with all adjusts to trigger the hosted-engine > > > > > --deploy. > > > > > > > > I didn't saw this error happening if deployment triggered by the TUI. > > > > > > Then this is not a real bug. > > > > I've added a second host via TUI, got it added to the HE engine and then saw > > it came to active. > > After few seconds host became non-operational and was reported by the engine > > as: > > > > 2015-Aug-25, 09:56 > > > > Host hosted_engine_2 moved to Non-Operational state because interfaces which > > are down are needed by required networks in the current cluster: 'p1p1 > > (rhevm)'. > > > > I've checked that host is UP by logging in to it over ssh and continuously > > pinging it, it was UP. I've activated the host within the engine via WEBUI, > > it bacame active for a few seconds and then again was set by the engine in > > to not-operational with the same error, while host was UP and not even a > > single ping was lost. > > ping against ip or fqdn? in both sides? > > > After some time I reactivated the host within the engine again and it stayed > > UP. > > > > I've checked that I still have several running vdsmds and supervdsm > > processes on both hosts, might be that using TUI we're not hitting the race, > > but it's there and Yeela's patch will resolve multiple vdsm instances issue. > > Please provide the vdsm, ovirt-hosted-engine-ha, ovirt-hosted-engine-setup > and engine logs (and even access to machine). If it's a different report we > probably need a different bug as it might create a confusion. Ping against both IP and FQDN and ping is ICMP protocol, if sent and received, than both sides are getting the packet and reply. Regarding access to the machines,plese contact me via IRC for more details. Regarding more logs, I'll supply them shortly. Second clean RHEVH6.7 addition attempt has failed during it's initial deployment, not even got to the adding of the host to the engine. Logs are attached (named "logs") Created attachment 1067163 [details]
logs
Bug is already merged, and it is 3.5 only. Pavel - can you QE-ack? Works for me on these components: vdsm-4.16.26-1.el6ev.x86_64 sanlock-2.8-2.el6_5.x86_64 libvirt-client-0.10.2-54.el6.x86_64 mom-0.4.1-5.el6ev.noarch qemu-kvm-rhev-0.12.1.2-2.479.el6.x86_64 ovirt-hosted-engine-setup-1.2.5.3-1.el6ev.noarch ovirt-hosted-engine-ha-1.2.6-3.el6ev.noarch Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150828.0.el6ev) Engine: rhevm-3.5.4.2-1.3.el6ev.noarch rhevm-setup-3.5.4.2-1.3.el6ev.noarch rhevm-guest-agent-common-1.0.10-2.el6ev.noarch Linux version 2.6.32-573.3.1.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #1 SMP Mon Aug 10 09:44:54 EDT 2015 I've tested the deployment using CLI and TUI, original bug symptoms weren't reproduced. Also there was no race and only one instance of vdsmd and supervdsm were running as desired: 45652 vdsm 0 -20 2587m 54m 10m S 4.6 0.2 0:47.46 vdsm 46262 vdsm 20 0 144m 14m 2580 S 0.7 0.0 0:03.63 ovirt-ha-agent 46319 vdsm 20 0 724m 18m 2824 S 0.3 0.1 0:08.74 ovirt-ha-broker # ps aux | grep vdsm root 45450 0.0 0.0 11304 820 ? S< 06:05 0:00 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/supervdsm_respawn.pid /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid root 45452 0.1 0.1 774232 65200 ? S<l 06:05 0:01 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid vdsm 45649 0.0 0.0 9200 804 ? S< 06:06 0:00 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid /usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid vdsm 45652 4.0 0.1 2649108 55924 ? S<l 06:06 0:35 /usr/bin/python /usr/share/vdsm/vdsm --pidfile /var/run/vdsm/vdsmd.pid vdsm 45721 0.0 0.0 394568 1420 ? S<l 06:06 0:00 /usr/libexec/ioprocess --read-pipe-fd 44 --write-pipe-fd 42 --max-threads 10 --max-queued-requests 10 vdsm 45728 0.0 0.0 470348 3540 ? S<l 06:06 0:00 /usr/libexec/ioprocess --read-pipe-fd 52 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10 vdsm 46262 0.3 0.0 148152 14636 ? S 06:07 0:02 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent vdsm 46306 0.0 0.0 394568 3444 ? S<l 06:07 0:00 /usr/libexec/ioprocess --read-pipe-fd 63 --write-pipe-fd 62 --max-threads 10 --max-queued-requests 10 vdsm 46319 0.7 0.0 742328 18420 ? Sl 06:07 0:06 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker I've tested on NFS3 and with two hosts. RHEV 3.5.4 Released. closing current release. |