Bug 1254888

Summary: live migration Hosted Engine VM failed
Product: Red Hat Enterprise Virtualization Manager Reporter: Ying Cui <ycui>
Component: vdsmAssignee: Francesco Romani <fromani>
Status: CLOSED NOTABUG QA Contact: Ilanit Stein <istein>
Severity: high Docs Contact:
Priority: medium    
Version: 3.5.4CC: alukiano, bazulay, cshao, dougsland, ecohen, fdeutsch, gklein, huiwa, lpeer, lsurette, mgoldboi, nsednev, ofrenkel, rbarry, sbonazzo, ycui, yeylon, ylavi
Target Milestone: ---   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-23 11:17:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1250199    
Attachments:
Description Flags
journalctl
none
rhevh_var_log
none
sosreport from hypervisor
none
engine.log
none
screenshot for migration none

Description Ying Cui 2015-08-19 07:57:42 UTC
Description of problem:
Hosted-Engine VM migration failed between two Hosts.

Version-Release number of selected component (if applicable):
# rpm -qa ovirt-node ovirt-hosted-engine-setup ovirt-hosted-engine-ha vdsm kernel ovirt-node-plugin-hosted-engine
ovirt-hosted-engine-setup-1.2.5.3-1.el7ev.noarch
vdsm-4.16.24-2.el7ev.x86_64
ovirt-hosted-engine-ha-1.2.6-2.el7ev.noarch
ovirt-node-plugin-hosted-engine-0.2.0-18.0.el7ev.noarch
kernel-3.10.0-229.11.1.el7.x86_64
ovirt-node-3.2.3-18.el7.noarch
# cat /etc/rhev-hypervisor-release 
Red Hat Enterprise Virtualization Hypervisor release 7.1 (20150813.0.el7ev)

How reproducible:
Test this scenario 4 times on the same machines repeat step 1 to step 7
three times, HE-VM can be migrated successful.
One time, failed migrated.
around 25%


Steps to Reproduce:
1. Precondition: all rhevh are clean installation, ssh and network already setup yet.
2. Setup HE on the first RHEV-H successful.
    - nfs storage
    - em1 
3. Setup additional HE on second or third RHEV-H successful.
4. All above RHEV-H are UP in Hosted Engine
5. login to engine webadmin portal
6. Navigate to "Virtual Machine"
7. Select hosted-engine VM, then click "Migrate" button, then choose a RHEV-H host to migrate the HE VM.

Actual results:
Migration failed due to Error: Could not connect to peer host

Expected results:
live migration Hosted Engine successful on RHEVH.

Note: not sure this is RHEV-H special issue or RHEL can reproduce it as well.

Comment 1 Ying Cui 2015-08-19 07:59:40 UTC
Additional info:

# from /var/log/message
<snip>
....
Aug 19 04:49:26 rhevhtest-1 journal: metadata not found: Requested metadata element is not present
Aug 19 04:49:26 rhevhtest-1 journal: Forwarding to syslog missed 6 messages.
Aug 19 04:49:27 rhevhtest-1 journal: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Error initiating connection
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 106, in _setupVdsConnection
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1224, in __call__
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1578, in __request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1264, in request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1292, in single_request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1439, in send_content
  File "/usr/lib64/python2.7/httplib.py", line 969, in endheaders
  File "/usr/lib64/python2.7/httplib.py", line 829, in _send_output
  File "/usr/lib64/python2.7/httplib.py", line 791, in send
  File "/usr/share/vdsm/kaxmlrpclib.py", line 151, in connect
  File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 92, in connect
  File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
gaierror: [Errno -2] Name or service not known
Aug 19 04:49:27 rhevhtest-1 journal: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::'progress'
Aug 19 04:49:27 rhevhtest-1 journal: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Failed to destroy remote VM
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 164, in _recover
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1224, in __call__
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1578, in __request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1264, in request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1292, in single_request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1439, in send_content
  File "/usr/lib64/python2.7/httplib.py", line 969, in endheaders
  File "/usr/lib64/python2.7/httplib.py", line 829, in _send_output
  File "/usr/lib64/python2.7/httplib.py", line 791, in send
  File "/usr/share/vdsm/kaxmlrpclib.py", line 151, in connect
  File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 92, in connect
  File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
gaierror: [Errno -2] Name or service not known
Aug 19 04:49:27 rhevhtest-1 journal: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 231, in run
  File "/usr/share/vdsm/virt/migration.py", line 120, in _setupRemoteMachineParams
  File "/usr/share/vdsm/virt/vm.py", line 2837, in getStats
  File "/usr/share/vdsm/virt/vm.py", line 2883, in _getRunningVmStats
KeyError: 'progress'
</snip>

# hosted-engine --vm-status

--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : rhevhtest-1.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 12309
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=12309 (Wed Aug 19 06:06:05 2015)
        host-id=1
        score=2400
        maintenance=False
        state=EngineUp


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : rhevhtest-2.redhat.com
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 7451
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=7451 (Wed Aug 19 06:06:01 2015)
        host-id=2
        score=2400
        maintenance=False
        state=EngineDown


--== Host 3 status ==--

Status up-to-date                  : True
Hostname                           : rhevhtest-3.redhat.com
Host ID                            : 3
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 6407
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=6407 (Wed Aug 19 06:06:09 2015)
        host-id=3
        score=2400
        maintenance=False
        state=EngineDown

# systemctl status ovirt-ha-agent.service ovirt-ha-broker.service vdsmd.service vdsm-network.service supervdsmd.service
ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled)
   Active: active (running) since Wed 2015-08-19 07:34:17 UTC; 10min ago
  Process: 1824 ExecStart=/usr/lib/systemd/systemd-ovirt-ha-agent start (code=exited, status=0/SUCCESS)
 Main PID: 1944 (ovirt-ha-agent)
   CGroup: /system.slice/ovirt-ha-agent.service
           └─1944 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent

Aug 19 07:34:17 localhost systemd-ovirt-ha-agent[1824]: Starting ovirt-ha-agent: [  OK  ]
Aug 19 07:34:17 localhost systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.

ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled)
   Active: active (running) since Wed 2015-08-19 07:34:17 UTC; 10min ago
  Process: 1178 ExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker start (code=exited, status=0/SUCCESS)
 Main PID: 1822 (ovirt-ha-broker)
   CGroup: /system.slice/ovirt-ha-broker.service
           └─1822 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker

Aug 19 07:34:14 localhost systemd[1]: Starting oVirt Hosted Engine High Availability Communications Broker...
Aug 19 07:34:17 localhost systemd-ovirt-ha-broker[1178]: Starting ovirt-ha-broker: [  OK  ]
Aug 19 07:34:17 localhost systemd[1]: Started oVirt Hosted Engine High Availability Communications Broker.
Aug 19 07:35:17 rhevhtest-1.redhat.com ovirt-ha-broker[1822]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request...c350d0c1'
                                                              Traceback (most recent call last):
                                                                File "/usr/lib/python2.7/site-packages/ovirt_host...
Aug 19 07:35:20 rhevhtest-1.redhat.com ovirt-ha-broker[1822]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request...c350d0c1'
                                                              Traceback (most recent call last):
                                                                File "/usr/lib/python2.7/site-packages/ovirt_host...
Aug 19 07:35:20 rhevhtest-1.redhat.com ovirt-ha-broker[1822]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler ERROR Error handling request...c350d0c1'
                                                              Traceback (most recent call last):
                                                                File "/usr/lib/python2.7/site-packages/ovirt_host...
Aug 19 07:38:16 rhevhtest-1.redhat.com ovirt-ha-broker[1822]: ovirt-ha-broker cpu_load_no_engine.EngineHealth ERROR Failed to read vm stats: [Errno 2] No such file...c/0/stat'
Aug 19 07:40:58 rhevhtest-1.redhat.com ovirt-ha-broker[1822]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection refused
                                                              Traceback (most recent call last):
                                                                File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 21, in send_email
Aug 19 07:41:08 rhevhtest-1.redhat.com ovirt-ha-broker[1822]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection refused
                                                              Traceback (most recent call last):
                                                                File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 21, in send_email
Aug 19 07:41:19 rhevhtest-1.redhat.com ovirt-ha-broker[1822]: ovirt-ha-broker ovirt_hosted_engine_ha.broker.notifications.Notifications ERROR [Errno 111] Connection refused
                                                              Traceback (most recent call last):
                                                                File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/notifications.py", line 21, in send_email

vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Wed 2015-08-19 07:35:05 UTC; 10min ago
  Process: 17114 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 17232 (vdsm)
   CGroup: /system.slice/vdsmd.service
           ├─17232 /usr/bin/python /usr/share/vdsm/vdsm
           ├─17454 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 56 --max-threads 10 --max-queued-requests 10
           ├─18272 /usr/libexec/ioprocess --read-pipe-fd 46 --write-pipe-fd 45 --max-threads 10 --max-queued-requests 10
           └─18386 /usr/libexec/ioprocess --read-pipe-fd 59 --write-pipe-fd 57 --max-threads 10 --max-queued-requests 10

Aug 19 07:37:30 rhevhtest-1.redhat.com python[17232]: DIGEST-MD5 ask_user_info()
Aug 19 07:37:30 rhevhtest-1.redhat.com python[17232]: DIGEST-MD5 make_client_response()
Aug 19 07:37:30 rhevhtest-1.redhat.com python[17232]: DIGEST-MD5 client step 3
Aug 19 07:37:30 rhevhtest-1.redhat.com vdsm[17232]: vdsm vm.Vm WARNING vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Unknown type found, device: '{'device': 'unix'...'}}' found
Aug 19 07:37:30 rhevhtest-1.redhat.com vdsm[17232]: vdsm vm.Vm WARNING vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Unknown type found, device: '{'device': 'unix'...'}}' found
Aug 19 07:37:30 rhevhtest-1.redhat.com vdsm[17232]: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Alias not found for device type graphics during ...ation host
Aug 19 07:39:19 rhevhtest-1.redhat.com vdsm[17232]: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Error initiating connection
                                                    Traceback (most recent call last):
                                                      File "/usr/share/vdsm/virt/migration.py", line 106, in _setupVdsConnection...
Aug 19 07:39:19 rhevhtest-1.redhat.com vdsm[17232]: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::'progress'
Aug 19 07:39:19 rhevhtest-1.redhat.com vdsm[17232]: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Failed to destroy remote VM
                                                    Traceback (most recent call last):
                                                      File "/usr/share/vdsm/virt/migration.py", line 164, in _recover...
Aug 19 07:39:19 rhevhtest-1.redhat.com vdsm[17232]: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Failed to migrate
                                                    Traceback (most recent call last):
                                                      File "/usr/share/vdsm/virt/migration.py", line 231, in run...

vdsm-network.service - Virtual Desktop Server Manager network restoration
   Loaded: loaded (/usr/lib/systemd/system/vdsm-network.service; enabled)
   Active: active (exited) since Wed 2015-08-19 07:34:59 UTC; 10min ago
  Process: 16482 ExecStart=/usr/bin/vdsm-tool restore-nets (code=exited, status=0/SUCCESS)
  Process: 16367 ExecStartPre=/usr/bin/vdsm-tool --vvverbose --append --logfile=/var/log/vdsm/upgrade.log upgrade-3.0.0-networks (code=exited, status=0/SUCCESS)
  Process: 14532 ExecStartPre=/usr/bin/vdsm-tool --vvverbose --append --logfile=/var/log/vdsm/upgrade.log upgrade-unified-persistence (code=exited, status=0/SUCCESS)
 Main PID: 16482 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/vdsm-network.service
           └─17070 /sbin/dhclient -H rhevhtest-1 -1 -q -lf /var/lib/dhclient/dhclient--rhevm.lease -pf /var/run/dhclient-rhevm.pid rhevm

Aug 19 07:34:50 rhevhtest-1.redhat.com python[16494]: DIGEST-MD5 client step 2
Aug 19 07:34:50 rhevhtest-1.redhat.com python[16494]: DIGEST-MD5 ask_user_info()
Aug 19 07:34:50 rhevhtest-1.redhat.com python[16494]: DIGEST-MD5 make_client_response()
Aug 19 07:34:50 rhevhtest-1.redhat.com python[16494]: DIGEST-MD5 client step 3
Aug 19 07:34:56 rhevhtest-1.redhat.com dhclient[16993]: DHCPREQUEST on rhevm to 255.255.255.255 port 67 (xid=0x5f4e9b1a)
Aug 19 07:34:56 rhevhtest-1.redhat.com dhclient[16993]: DHCPACK from 10.66.11.253 (xid=0x5f4e9b1a)
Aug 19 07:34:58 rhevhtest-1.redhat.com dhclient[16993]: bound to 10.66.11.167 -- renewal in 36332 seconds.
Aug 19 07:34:59 rhevhtest-1.redhat.com python[16494]: DIGEST-MD5 client mech dispose
Aug 19 07:34:59 rhevhtest-1.redhat.com python[16494]: DIGEST-MD5 common mech dispose
Aug 19 07:34:59 rhevhtest-1.redhat.com systemd[1]: Started Virtual Desktop Server Manager network restoration.

supervdsmd.service - "Auxiliary vdsm service for running helper functions as root"
   Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static)
   Active: active (running) since Wed 2015-08-19 07:34:47 UTC; 10min ago
 Main PID: 14533 (supervdsmServer)
   CGroup: /system.slice/supervdsmd.service
           └─14533 /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock

Aug 19 07:34:47 rhevhtest-1.redhat.com systemd[1]: Started "Auxiliary vdsm service for running helper functions as root".
Aug 19 07:34:47 rhevhtest-1.redhat.com python[14533]: DIGEST-MD5 client step 2
Aug 19 07:34:47 rhevhtest-1.redhat.com python[14533]: DIGEST-MD5 parse_server_challenge()
Aug 19 07:34:47 rhevhtest-1.redhat.com python[14533]: DIGEST-MD5 ask_user_info()
Aug 19 07:34:47 rhevhtest-1.redhat.com python[14533]: DIGEST-MD5 client step 2
Aug 19 07:34:47 rhevhtest-1.redhat.com python[14533]: DIGEST-MD5 ask_user_info()
Aug 19 07:34:47 rhevhtest-1.redhat.com python[14533]: DIGEST-MD5 make_client_response()
Aug 19 07:34:47 rhevhtest-1.redhat.com python[14533]: DIGEST-MD5 client step 3
Hint: Some lines were ellipsized, use -l to show in full.

Comment 2 Ying Cui 2015-08-19 08:00:38 UTC
Created attachment 1064650 [details]
journalctl

Comment 3 Ying Cui 2015-08-19 08:01:21 UTC
Created attachment 1064651 [details]
rhevh_var_log

Comment 4 Ying Cui 2015-08-19 08:03:16 UTC
Created attachment 1064652 [details]
sosreport from hypervisor

Comment 5 Ying Cui 2015-08-19 08:05:19 UTC
Created attachment 1064655 [details]
engine.log

Comment 6 Ying Cui 2015-08-19 08:10:43 UTC
Created attachment 1064669 [details]
screenshot for migration

Comment 7 Sandro Bonazzola 2015-08-19 12:41:07 UTC
Doesn't seems integration related, moving to virt for now.

Comment 8 Omer Frenkel 2015-08-20 06:32:42 UTC
does it fail between specific hosts? or its randomly failing?

Comment 9 Ying Cui 2015-08-20 08:43:14 UTC
(In reply to Omer Frenkel from comment #8)
> does it fail between specific hosts? or its randomly failing?

it fail between any two hosts.

See pic attachment 1064669 [details]
Here are 3 rhevh hosts with HE VM

rhevh-1 with HEVM
tried to migrate HEVM to rhevh-2 failed
tried to migrate HEVM to rhevh-3 failed

Comment 10 Francesco Romani 2015-08-20 13:06:57 UTC
These:

# from /var/log/message
<snip>
....
Aug 19 04:49:26 rhevhtest-1 journal: metadata not found: Requested metadata element is not present
Aug 19 04:49:26 rhevhtest-1 journal: Forwarding to syslog missed 6 messages.
Aug 19 04:49:27 rhevhtest-1 journal: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Error initiating connection
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 106, in _setupVdsConnection
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1224, in __call__
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1578, in __request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1264, in request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1292, in single_request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1439, in send_content
  File "/usr/lib64/python2.7/httplib.py", line 969, in endheaders
  File "/usr/lib64/python2.7/httplib.py", line 829, in _send_output
  File "/usr/lib64/python2.7/httplib.py", line 791, in send
  File "/usr/share/vdsm/kaxmlrpclib.py", line 151, in connect
  File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 92, in connect
  File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
gaierror: [Errno -2] Name or service not known
Aug 19 04:49:27 rhevhtest-1 journal: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::'progress'
Aug 19 04:49:27 rhevhtest-1 journal: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Failed to destroy remote VM
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 164, in _recover
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1224, in __call__
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1578, in __request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1264, in request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1292, in single_request
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1439, in send_content
  File "/usr/lib64/python2.7/httplib.py", line 969, in endheaders
  File "/usr/lib64/python2.7/httplib.py", line 829, in _send_output
  File "/usr/lib64/python2.7/httplib.py", line 791, in send
  File "/usr/share/vdsm/kaxmlrpclib.py", line 151, in connect
  File "/usr/lib/python2.7/site-packages/vdsm/SecureXMLRPCServer.py", line 92, in connect
  File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
gaierror: [Errno -2] Name or service not known
Aug 19 04:49:27 rhevhtest-1 journal: vdsm vm.Vm ERROR vmId=`c5312b39-316c-4906-bc20-df2a472bda1f`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 231, in run
  File "/usr/share/vdsm/virt/migration.py", line 120, in _setupRemoteMachineParams
  File "/usr/share/vdsm/virt/vm.py", line 2837, in getStats
  File "/usr/share/vdsm/virt/vm.py", line 2883, in _getRunningVmStats
KeyError: 'progress'
</snip>

are low level network errors.

Looking in /var/log/messages, it seems there are resolving error from unrelated services. E.g:

Aug 19 06:21:38 rhevhtest-1 python: Error in communication with subscription manager, trying to recover:
(this repeats many times after)

Can the hosts ping each other (e.g. rhevtest-1 to rhevtest-2)?

Any selinux denials in sight? (probably not, but need to ask anyway...)

Comment 11 Ying Cui 2015-08-21 04:38:10 UTC
> Can the hosts ping each other (e.g. rhevtest-1 to rhevtest-2)?

Yes, they can ping each other. You can see hosted-engine --vm-status info as well.

And I have one env. can reproduce this issue. It will be kept until next Monday - Aug 25, so you can ssh to them to check.

rhevh-test1 with HEVM: 10.66.11.167 root password: redhat
rhevh-test2          : 10.66.10.107 root password: redhat
rhevh-test3          : 10.66.65.29  root password: redhat

HEVM : rhevm-appliance-20150727.0-1.x86_64.rhevm.ova
       10.66.10.105  root password: redhat

# login 10.66.11.167

[root@rhevhtest-1 admin]# ping 10.66.10.107
PING 10.66.10.107 (10.66.10.107) 56(84) bytes of data.
64 bytes from 10.66.10.107: icmp_seq=1 ttl=64 time=0.574 ms
64 bytes from 10.66.10.107: icmp_seq=2 ttl=64 time=0.306 ms
64 bytes from 10.66.10.107: icmp_seq=3 ttl=64 time=0.324 ms

--- 10.66.10.107 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.306/0.401/0.574/0.123 ms

[root@rhevhtest-1 admin]# ping 10.66.65.29
PING 10.66.65.29 (10.66.65.29) 56(84) bytes of data.
64 bytes from 10.66.65.29: icmp_seq=1 ttl=61 time=0.491 ms
64 bytes from 10.66.65.29: icmp_seq=2 ttl=61 time=0.392 ms
64 bytes from 10.66.65.29: icmp_seq=3 ttl=61 time=0.491 ms

--- 10.66.65.29 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.392/0.458/0.491/0.046 ms

[root@rhevhtest-1 admin]# ping 10.66.10.105
PING 10.66.10.105 (10.66.10.105) 56(84) bytes of data.
64 bytes from 10.66.10.105: icmp_seq=1 ttl=64 time=0.186 ms
64 bytes from 10.66.10.105: icmp_seq=2 ttl=64 time=0.165 ms
64 bytes from 10.66.10.105: icmp_seq=3 ttl=64 time=0.154 ms

--- 10.66.10.105 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.154/0.168/0.186/0.017 ms

> Any selinux denials in sight? (probably not, but need to ask anyway...)

Set all rhevh setenforce 0, then try migrate, still can not migrate HEVM successful. the same error in log and rhevm portal UI. Thanks.

Comment 17 Omer Frenkel 2015-08-23 10:56:30 UTC
this seems to be a network configuration issue, so closing.
please re-open if needed.

Comment 18 Nikolai Sednev 2015-08-23 11:11:24 UTC
Please try reproducing the issue, following the 4.4 steps taken from https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.3/html/Installation_Guide/Configuring_the_Self-Hosted_Engine.html

You should not use 
127.0.0.1               localhost.localdomain localhost localhost.localdomain 

You should use only environment with reserved FQDNs for your hosts and the engine.
You should reserve MAC address for your engine in DHCP server, so it'll always receive the same IP.
You should update the DNS server records, so your FQDNs will be resolvable.
Once you have properly configured environment, you should proceed with the HE deployment following user manual provided above or one of mine: https://mojo.redhat.com/docs/DOC-1013769

It looks like you have unresolvable host FQDN, hence migration to it fails.
Seems like not a bug to me.

Comment 19 Omer Frenkel 2015-08-23 11:17:46 UTC
Nikolai reopened by mistake

Comment 20 Ying Cui 2015-11-23 12:30:03 UTC
replied comment 18, we already reserved FQDNs issues 3 months ago. just cleanup this needinfo.