Bug 1562787 - Node 0 SHE deployment fails over tagged interface.
Summary: Node 0 SHE deployment fails over tagged interface.
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.2.14
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Ido Rosenzwig
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-02 13:48 UTC by Nikolai Sednev
Modified: 2018-04-03 15:55 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-04-03 15:55:16 UTC
oVirt Team: Network
Embargoed:
ratamir: blocker?
nsednev: planning_ack?
nsednev: devel_ack?
nsednev: testing_ack?


Attachments (Terms of Use)
sosreport from puma18 (9.94 MB, application/x-xz)
2018-04-02 13:48 UTC, Nikolai Sednev
no flags Details
sosreport from the engine (9.34 MB, application/x-xz)
2018-04-02 13:48 UTC, Nikolai Sednev
no flags Details

Description Nikolai Sednev 2018-04-02 13:48:03 UTC
Created attachment 1416268 [details]
sosreport from puma18

Description of problem:
Node 0  SHE deployment fails over tagged interface, using CLI.

[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_vms": [{"affinity_labels": [], "applications": [], "bios": {"boot_menu": {"enabled": false}}, "cdroms": [], "cluster": {"href": "/ovirt-engine/api/clusters/1748a294-3676-11e8-b3bd-00163eeeeee1", "id": "1748a294-3676-11e8-b3bd-00163eeeeee1"}, "cpu": {"architecture": "x86_64", "topology": {"cores": 1, "sockets": 4, "threads": 1}}, "cpu_profile": {"href": "/ovirt-engine/api/cpuprofiles/58ca604e-01a7-003f-01de-000000000250", "id": "58ca604e-01a7-003f-01de-000000000250"}, "cpu_shares": 0, "creation_time": "2018-04-02 16:06:05.791000+03:00", "delete_protected": false, "disk_attachments": [], "display": {"address": "127.0.0.1", "allow_override": false, "copy_paste_enabled": true, "disconnect_action": "LOCK_SCREEN", "file_transfer_enabled": true, "monitors": 1, "port": 5900, "single_qxl_pci": false, "smartcard_enabled": false, "type": "vnc"}, "graphics_consoles": [], "high_availability": {"enabled": false, "priority": 0}, "host": {"href": "/ovirt-engine/api/hosts/043bc7e9-5580-48d8-8953-6d84a33ed596", "id": "043bc7e9-5580-48d8-8953-6d84a33ed596"}, "host_devices": [], "href": "/ovirt-engine/api/vms/b537b175-8545-426d-8571-d6652d56d2ef", "id": "b537b175-8545-426d-8571-d6652d56d2ef", "io": {"threads": 0}, "katello_errata": [], "large_icon": {"href": "/ovirt-engine/api/icons/e57746a0-a95b-019c-4355-27b4eac77170", "id": "e57746a0-a95b-019c-4355-27b4eac77170"}, "memory": 17179869184, "memory_policy": {"guaranteed": 17179869184, "max": 17179869184}, "migration": {"auto_converge": "inherit", "compressed": "inherit"}, "migration_downtime": -1, "name": "external-HostedEngineLocal", "next_run_configuration_exists": false, "nics": [], "numa_nodes": [], "numa_tune_mode": "interleave", "origin": "external", "original_template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "os": {"boot": {"devices": ["hd"]}, "type": "other"}, "permissions": [], "placement_policy": {"affinity": "migratable"}, "quota": {"id": "2996557c-3676-11e8-8971-00163eeeeee1"}, "reported_devices": [], "run_once": false, "sessions": [], "small_icon": {"href": "/ovirt-engine/api/icons/5ba0b8a7-51c6-5ef5-7ed0-495a62737d13", "id": "5ba0b8a7-51c6-5ef5-7ed0-495a62737d13"}, "snapshots": [], "sso": {"methods": [{"id": "guest_agent"}]}, "start_paused": false, "stateless": false, "statistics": [], "status": "unknown", "storage_error_resume_behaviour": "auto_resume", "tags": [], "template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "time_zone": {"name": "Etc/GMT"}, "type": "desktop", "usb": {"enabled": false}, "watchdogs": []}]}, "attempts": 24, "changed": false}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180402162515.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180402155548-4n0to5.log
You have new mail in /var/spool/mail/root


Version-Release number of selected component (if applicable):
rhvm-appliance-4.2-20180401.0.el7.noarch.rpm
ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

How reproducible:
100%

Steps to Reproduce:
1.Deploy SHE over VLAN tagged interface.

Actual results:
Deployment fails.

Expected results:
Deployment should succeed.

Additional info:
Logs from host and engine attached.

Comment 1 Nikolai Sednev 2018-04-02 13:48:41 UTC
Created attachment 1416269 [details]
sosreport from the engine

Comment 2 Dan Kenigsberg 2018-04-02 14:11:51 UTC
Which ovirt-engine was tested here?
Do you have a clue why it failed?
Why did you place it on the Network team?

Comment 3 Nikolai Sednev 2018-04-02 14:27:28 UTC
(In reply to Dan Kenigsberg from comment #2)
> Which ovirt-engine was tested here?
Engine inside the appliance:
ovirt-engine-setup-base-4.2.2.6-0.1.el7.noarch
> Do you have a clue why it failed?
Probably network related issue, as it happens only over tagged interface deployment and not happening over untagged. I've tried the same over NFS using tagged and untagged, the untagged passed just fine.
> Why did you place it on the Network team?
Please see my previous answer.

Comment 4 Dan Kenigsberg 2018-04-02 16:02:48 UTC
Nikolai,

[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook

suggests where things broke. Can you attach ansible logs? Can you find out which anisble task failed?

Comment 5 Nikolai Sednev 2018-04-03 12:13:52 UTC
AFAIK it should be ovirt-hosted-engine-setup-ansible-initial_clean-20180402155647-74b5jx.log, which is inside /var/log/ovirt-hosted-engine-setup directory.

In vdsm log I clearly see that there was a problem establishing SPM and connectivity with the storage:
2018-04-02 16:38:13,834+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH getAllTasksInfo error=Not SPM: () from=::1,47774, task_id=b14decc1-564a-4b4d-b39d-388e9410095b (api:50)
2018-04-02 16:38:13,906+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='b14decc1-564a-4b4d-b39d-388e9410095b') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in getAllTasksInfo
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2218, in getAllTasksInfo
    raise se.SpmStatusError()
SpmStatusError: Not SPM: ()
2018-04-02 16:38:13,906+0300 INFO  (jsonrpc/7) [storage.TaskManager.Task] (Task='b14decc1-564a-4b4d-b39d-388e9410095b') aborting: Task is aborted: 'Not SPM: ()' - code 654 (task:1181)
2018-04-02 16:38:13,907+0300 ERROR (jsonrpc/7) [storage.Dispatcher] FINISH getAllTasksInfo error=Not SPM: () (dispatcher:82)
2018-04-02 16:38:13,907+0300 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Host.getAllTasksInfo failed (error 654) in 0.08 seconds (__init__:573)
2018-04-02 16:38:13,948+0300 INFO  (jsonrpc/0) [vdsm.api] START getAllTasksStatuses(spUUID=None, options=None) from=::1,47774, task_id=44b90186-a560-436d-a709-6b656b651beb (api:46)
2018-04-02 16:38:13,948+0300 INFO  (jsonrpc/0) [vdsm.api] FINISH getAllTasksStatuses error=Not SPM: () from=::1,47774, task_id=44b90186-a560-436d-a709-6b656b651beb (api:50)
2018-04-02 16:38:13,948+0300 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='44b90186-a560-436d-a709-6b656b651beb') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in getAllTasksStatuses
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2178, in getAllTasksStatuses
    raise se.SpmStatusError()
SpmStatusError: Not SPM: ()
2018-04-02 16:38:13,949+0300 INFO  (jsonrpc/0) [storage.TaskManager.Task] (Task='44b90186-a560-436d-a709-6b656b651beb') aborting: Task is aborted: 'Not SPM: ()' - code 654 (task:1181)
2018-04-02 16:38:13,949+0300 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH getAllTasksStatuses error=Not SPM: () (dispatcher:82)
2018-04-02 16:38:13,949+0300 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Host.getAllTasksStatuses failed (error 654) in 0.00 seconds (__init__:573)

Comment 6 Nikolai Sednev 2018-04-03 14:22:57 UTC
I'm still investigating the issue, looks like some network disorder happened during deployment over tagged VLAN. Checking infrastructure...

Comment 7 Nikolai Sednev 2018-04-03 15:01:34 UTC
Reproduced again, no network infrastructure related here with storage, as its FC direct links connectivity to the SAN.

[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Obtain SSO token using username/password credentials]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Check for the local bootstrap VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Make the engine aware that the external VM is stopped]
[ INFO  ] TASK [Wait for the local bootstrap VM to be down at engine eyes]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_vms": [{"affinity_labels": [], "applications": [], "bios": {"boot_menu": {"enabled": false}}, "cdroms": [], "cluster": {"href": "/ovirt-engine/api/clusters/8ad5d0f2-374c-11e8-b5a6-00163eeeeee1", "id": "8ad5d0f2-374c-11e8-b5a6-00163eeeeee1"}, "cpu": {"architecture": "x86_64", "topology": {"cores": 1, "sockets": 4, "threads": 1}}, "cpu_profile": {"href": "/ovirt-engine/api/cpuprofiles/58ca604e-01a7-003f-01de-000000000250", "id": "58ca604e-01a7-003f-01de-000000000250"}, "cpu_shares": 0, "creation_time": "2018-04-03 17:41:22.135000+03:00", "delete_protected": false, "disk_attachments": [], "display": {"address": "127.0.0.1", "allow_override": false, "copy_paste_enabled": true, "disconnect_action": "LOCK_SCREEN", "file_transfer_enabled": true, "monitors": 1, "port": 5900, "single_qxl_pci": false, "smartcard_enabled": false, "type": "vnc"}, "graphics_consoles": [], "high_availability": {"enabled": false, "priority": 0}, "host": {"href": "/ovirt-engine/api/hosts/ac45012e-8613-489a-8e91-be32743125b6", "id": "ac45012e-8613-489a-8e91-be32743125b6"}, "host_devices": [], "href": "/ovirt-engine/api/vms/b9cd99f2-f703-4dd9-94b0-b5aa29eecbc1", "id": "b9cd99f2-f703-4dd9-94b0-b5aa29eecbc1", "io": {"threads": 0}, "katello_errata": [], "large_icon": {"href": "/ovirt-engine/api/icons/21b0241c-e1eb-c9e8-42ae-7e01aca5ea1d", "id": "21b0241c-e1eb-c9e8-42ae-7e01aca5ea1d"}, "memory": 17179869184, "memory_policy": {"guaranteed": 17179869184, "max": 17179869184}, "migration": {"auto_converge": "inherit", "compressed": "inherit"}, "migration_downtime": -1, "name": "external-HostedEngineLocal", "next_run_configuration_exists": false, "nics": [], "numa_nodes": [], "numa_tune_mode": "interleave", "origin": "external", "original_template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "os": {"boot": {"devices": ["hd"]}, "type": "other"}, "permissions": [], "placement_policy": {"affinity": "migratable"}, "quota": {"id": "9cd0b006-374c-11e8-b0f0-00163eeeeee1"}, "reported_devices": [], "run_once": false, "sessions": [], "small_icon": {"href": "/ovirt-engine/api/icons/8f625bad-06e9-b023-40bd-bbef504d58a1", "id": "8f625bad-06e9-b023-40bd-bbef504d58a1"}, "snapshots": [], "sso": {"methods": [{"id": "guest_agent"}]}, "start_paused": false, "stateless": false, "statistics": [], "status": "unknown", "storage_error_resume_behaviour": "auto_resume", "tags": [], "template": {"href": "/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000", "id": "00000000-0000-0000-0000-000000000000"}, "time_zone": {"name": "Etc/GMT"}, "type": "desktop", "usb": {"enabled": false}, "watchdogs": []}]}, "attempts": 24, "changed": false}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180403175704.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180403172943-sse552.log

Comment 8 Nikolai Sednev 2018-04-03 15:55:16 UTC
Environmental issue:
DNS has not been properly updated, hence DNS was pointing the engine to unreachable IP of host's interface within unreachable, untagged VLAN, which was not used during deployment and it was not reachable from tagged VLAN, from which deployment was initiated.

Moving to closed.


Note You need to log in before you can comment on or make changes to this bug.