Bug 1590948 - Failed to execute ovirt-ova-query playbook (exit code 4)
Summary: Failed to execute ovirt-ova-query playbook (exit code 4)
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.2.3.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Ondra Machacek
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-13 17:35 UTC by Kyle Stapp
Modified: 2018-06-29 14:45 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-06-21 13:22:06 UTC
oVirt Team: Infra
Embargoed:


Attachments (Terms of Use)
Log files of the attempts to fix (388.42 KB, application/x-gzip)
2018-06-13 17:35 UTC, Kyle Stapp
no flags Details
Logs (Including ovf files from OVA). (2.65 MB, application/x-tar)
2018-06-14 02:10 UTC, Kyle Stapp
no flags Details
Logs (Including ovf files from OVA)--zipped. (397.70 KB, application/x-gzip)
2018-06-14 02:18 UTC, Kyle Stapp
no flags Details
Logs (Including ovf files from OVA) -- zipped and working (397.70 KB, application/x-gzip)
2018-06-14 02:19 UTC, Kyle Stapp
no flags Details

Description Kyle Stapp 2018-06-13 17:35:57 UTC
Created attachment 1451001 [details]
Log files of the attempts to fix

Description of problem:
Ovirt cannot seem to import it's OWN OVA exported files.

Version-Release number of selected component (if applicable):
4.2.3.5

How reproducible:
Always

Steps to Reproduce:
1.Export OVA using Web GUI
2.transfer to another instance of ovirt
3.Using Web GUI try to import 'OVA', point it to file, and select the VM to load.

Actual results:
Failure to import with message like 'Importing VM Windows7 to Cluster Default'

Expected results:
Successful import

Additional info:
I worked through 3 different iterations of things here.

Specific Root Problem:
The ovf provides a uuid for the <rasd:Parent> and <rasd:template> objects which causes a failure in parsing the values because it expects an integer in virt-v2v's conversion codebase.  I don't know the right answer here or who's wrong...just that virt-v2v doesn't like it.  This is the 1-original log file directory attached in the tarzip.

There are 3 directories of logs (named 1- 2- 3-) included here.  Each have the ovf file that was packed into the ova with the same disk iteratively that caused the error.  I incrementally fixed different parts of the ovf to get farther in the import process, but still ultimately was left with inability to import ovirts OWN OVA file into another ovirt isntance.

In the second try (2-removedparent), I removed the parent and template elements altogether. It fell down because the ovirt exported OVA file contains a syntactical error according to the spec I found: it uses 'disk/<uuid>' instead of '/disk/<uuid>' for the <rasd:HostResource> element.  I manually fixed this then tried again.

In the third attempt (3-fixeddisk) it fell down because of a VMDK image descriptor problem.

[root@server srv]# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

[root@server srv]# rpm -qa | grep vdsm
vdsm-jsonrpc-4.20.27.1-1.el7.centos.noarch
vdsm-hook-vhostmd-4.20.27.1-1.el7.centos.noarch
vdsm-common-4.20.27.1-1.el7.centos.noarch
vdsm-network-4.20.27.1-1.el7.centos.x86_64
vdsm-4.20.27.1-1.el7.centos.x86_64
vdsm-yajsonrpc-4.20.27.1-1.el7.centos.noarch
vdsm-python-4.20.27.1-1.el7.centos.noarch
vdsm-hook-fcoe-4.20.27.1-1.el7.centos.noarch
vdsm-api-4.20.27.1-1.el7.centos.noarch
vdsm-client-4.20.27.1-1.el7.centos.noarch
vdsm-hook-openstacknet-4.20.27.1-1.el7.centos.noarch
vdsm-hook-ethtool-options-4.20.27.1-1.el7.centos.noarch
vdsm-hook-vfio-mdev-4.20.27.1-1.el7.centos.noarch
vdsm-hook-vmfex-dev-4.20.27.1-1.el7.centos.noarch
vdsm-http-4.20.27.1-1.el7.centos.noarch

[root@server srv]# rpm -qa | grep virt
libvirt-python-4.3.0-1.el7.x86_64
libvirt-daemon-driver-storage-logical-4.3.0-1.el7.x86_64
libvirt-daemon-driver-interface-4.3.0-1.el7.x86_64
libvirt-bash-completion-4.3.0-1.el7.x86_64
ovirt-host-dependencies-4.2.2-2.el7.centos.x86_64
cockpit-ovirt-dashboard-0.11.24-1.el7.centos.noarch
libvirt-libs-4.3.0-1.el7.x86_64
virt-manager-common-1.4.3-3.el7.noarch
libvirt-daemon-driver-storage-4.3.0-1.el7.x86_64
ovirt-setup-lib-1.1.4-1.el7.centos.noarch
libvirt-daemon-4.3.0-1.el7.x86_64
ovirt-vmconsole-1.0.5-4.el7.centos.noarch
libvirt-daemon-driver-nwfilter-4.3.0-1.el7.x86_64
libvirt-daemon-driver-storage-rbd-4.3.0-1.el7.x86_64
libvirt-daemon-driver-storage-disk-4.3.0-1.el7.x86_64
libvirt-daemon-driver-secret-4.3.0-1.el7.x86_64
libvirt-lock-sanlock-4.3.0-1.el7.x86_64
ovirt-host-deploy-1.7.3-1.el7.centos.noarch
virt-v2v-1.36.10-6.el7_5.2.x86_64
ovirt-imageio-daemon-1.3.1.2-0.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch
fence-virt-0.3.2-13.el7.x86_64
ovirt-host-4.2.2-2.el7.centos.x86_64
libvirt-daemon-driver-storage-core-4.3.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.5-4.el7.centos.noarch
libvirt-daemon-driver-storage-scsi-4.3.0-1.el7.x86_64
libvirt-daemon-driver-storage-iscsi-4.3.0-1.el7.x86_64
libvirt-daemon-driver-nodedev-4.3.0-1.el7.x86_64
python-ovirt-engine-sdk4-4.2.6-2.el7.centos.x86_64
ovirt-hosted-engine-ha-2.2.11-1.el7.centos.noarch
ovirt-engine-appliance-4.2-20180504.1.el7.centos.noarch
ovirt-release42-4.2.3.1-1.el7.noarch
libvirt-daemon-driver-qemu-4.3.0-1.el7.x86_64
libvirt-daemon-config-nwfilter-4.3.0-1.el7.x86_64
libvirt-daemon-config-network-4.3.0-1.el7.x86_64

Comment 1 Kyle Stapp 2018-06-13 18:00:48 UTC
Please note...I've run the ovf through xmllint --format to get nicer readable xml.

Comment 2 Arik 2018-06-13 21:49:18 UTC
(In reply to Kyle Stapp from comment #1)
> Please note...I've run the ovf through xmllint --format to get nicer
> readable xml.

I can't find the OVF in the attached tar file, did you forget to include it maybe?

That's interesting - we failed to parse the OVF and therefore fell back to the old mechanism of querying external (non-oVirt) OVAs and trying to import it using virt-v2v.

It would be great to have that OVA.
If it's too big or you can't attach it for any other reason, it would also be great to have at least the OVF ('tar xvf /srv/Windows7.ova vm.ovf' should do the trick).

Comment 3 Kyle Stapp 2018-06-13 23:41:47 UTC
Apologies....you are right...I forgot to add them.  I'll add them later tonight.  Around 11 pm EST.

Comment 4 Kyle Stapp 2018-06-14 02:10:26 UTC
Created attachment 1451079 [details]
Logs (Including ovf files from OVA).

Comment 5 Kyle Stapp 2018-06-14 02:13:01 UTC
There...I've included the ovf files this time.  Sorry about the mistake before.  I can't share the OS images themselves as they have company proprietary stuff.  They are in the respective directories with the manual edits I made in 2- and 3- to try and get it to work.

Again...the ovirt_ova_query_ansible logs are empty-ish too...which seems bizarre to me....And leaves me with nothing to read :(.

Comment 6 Kyle Stapp 2018-06-14 02:18:26 UTC
Created attachment 1451080 [details]
Logs (Including ovf files from OVA)--zipped.

Comment 7 Kyle Stapp 2018-06-14 02:19:28 UTC
Created attachment 1451081 [details]
Logs (Including ovf files from OVA) -- zipped and working

Comment 8 Arik 2018-06-14 07:24:50 UTC
The good news is that the OVF configuration is perfectly fine. After removing the newlines from vm.ovf and tar-ing it - the query works properly.

The bad news is that this seems like a general issue in invoking ansible playbooks on that host (de000d1d-ceb6-45a5-bb7e-214ad7516043):
Ansible playbook command has exited with value: 4

Therefore, moving to infra.

Comment 9 Martin Perina 2018-06-14 11:25:35 UTC
(In reply to Arik from comment #8)
> The good news is that the OVF configuration is perfectly fine. After
> removing the newlines from vm.ovf and tar-ing it - the query works properly.
> 
> The bad news is that this seems like a general issue in invoking ansible
> playbooks on that host (de000d1d-ceb6-45a5-bb7e-214ad7516043):
> Ansible playbook command has exited with value: 4
> 
> Therefore, moving to infra.

According to [1] exit code 4 means parser error, so are we sure we don't have a code error bug in the playbook/role? Is it possible to execute flow from command line to see debug output of execution?



[1] https://github.com/ansible/ansible/issues/19720

Comment 10 Kyle Stapp 2018-06-14 11:38:42 UTC
If you tell me the exact command structure to invite it I will do that happily

Comment 11 Kyle Stapp 2018-06-14 11:49:33 UTC
I looked at the other bug.  It might have to do with firewall rules we have in place if it's a host unreachable instigator.  I'll check when I get into work in about 1hr.


Thanks for help.

Comment 12 Kyle Stapp 2018-06-14 11:49:33 UTC
I looked at the other bug.  It might have to do with firewall rules we have in place if it's a host unreachable instigator.  I'll check when I get into work in about 1hr.


Thanks for help.

Comment 13 Kyle Stapp 2018-06-14 13:09:58 UTC
Wow....so the problem was the firewall rules/sshd-port.  We have additonal lockdown rules that we add after ovirt is installed.  We change the port etc for sshd.

I was unaware that ansible was part of the local host ova reading stuff etc.

Could we get better error messages around the ovirt ansible stuff?  Could we get a more explicit error message from ansible itself into one of our logs or in the engine.log?

I didn't realise import needed sshd to be on port 22.

Out of curiousity, does the ansible code path here respect the OVEHOSTED_NETWORK/sshdPort setting?  I did not use it in this case...but I know details can fall through the cracks.

Comment 14 Kyle Stapp 2018-06-14 13:11:19 UTC
Why is the ovirt-query-ova-ansible log file empty instead of full of the ansible error message?

Comment 15 Kyle Stapp 2018-06-14 13:21:17 UTC
We also might want to look at fixing our ova output to be closer to spec.  I don't know proper place to open a new bug etc...so I'll leave that to others:).  At least specifically for the 'disk/' issue -->'/disk/' in the <rasd:HostResource> element.

Comment 16 Arik 2018-06-14 13:34:11 UTC
(In reply to Kyle Stapp from comment #13)
> Wow....so the problem was the firewall rules/sshd-port.  We have additonal
> lockdown rules that we add after ovirt is installed.  We change the port etc
> for sshd.
> 
> I was unaware that ansible was part of the local host ova reading stuff etc.
> 
> Could we get better error messages around the ovirt ansible stuff?  Could we
> get a more explicit error message from ansible itself into one of our logs
> or in the engine.log?

It would be better in 4.2.4 - the 'load' operation will fail when failing to execute the ansible playbook for querying the OVA, so we won't try to import it as VMware's OVA nor miss the error I mentioned in comment 8.

> 
> I didn't realise import needed sshd to be on port 22.

Not only import flow needs it. I wonder how did you install your host(s) - didn't you use host-deploy? or did you set the firewall rules after the hosts were installed? anyway, both host-deploy and checks for updates should now work for you.

Kyle, could you please share this knowledge with the users-list on that thread?

Comment 17 Kyle Stapp 2018-06-14 13:50:30 UTC
I indeed setup firewall rules AFTER host-deploy.  I think I did that after testing with OVEHOSTED_NETWORK/sshdPort set to an exotic port number and some things broke.  But I might have misattributed the breakage to ssh when it was really the cruft I speak to below.

When I was installing/reinstalling/reinstalling/reinstalling trying to get an automated workflow using ansible to configure our host and automatically deploy ovirt along with our own software I/Ovirt uninstall was previously leaving cruft around, that I found causes spurious failures later in the re-installs.  I eventually found https://www.ovirt.org/documentation/how-to/hosted-engine/#recoving-from-failed-install.  And built a fullproof purge that seems to get me into pure state.

I use the following ansible snippet to do a full purge that seems to allow correct installation following a borked one (I pulled it from my code base so some one or two references don't directly apply to ovirt..but most does):

    - name: Clean Old Install
      #This attempts to remove all old cruft from previous install attempts
      #The reason we include the ovirt packages is so that they can be reinstall
      #At potentially newer versions
      block:
        - name: Detect existing cleanup script
          shell: which ovirt-hosted-engine-cleanup | cat
          register: ohes_cleanup
        - name: Debug ohes_cleanup.stdout
          debug:
            var: ohes_cleanup.stdout
        - name: Run Ovirt's Hosted Engine Cleanup Script
          shell: ovirt-hosted-engine-cleanup -q
          when: ohes_cleanup.stdout != ""
        - name: Clean old packages
          package:
            name: "{{item}}"
            state: absent
          with_items:
            - "*vdsm*"
            - "*ovirt*"
            - "*libvirt*"
            - "*cockpit*"
        - name: Remove old configs etc
          shell: "rm -rf /etc/{{item}}"
          args:
            warn: False
          with_items:
            - "/etc/*ovirt*"
            - "/etc/*vdsm*"
            - "/etc/libvirt/qemu/HostedEngine*"
            - "/etc/*libvirt*"
            - "/etc/pki/vdsm"
            - "/etc/pki/keystore"
            - "/etc/ovirt-hosted-engine"
            - "/var/lib/libvirt/"
            - "/var/lib/vdsm/"
            - "/var/lib/ovirt-hosted-engine-*"
            - "/var/log/ovirt-hosted-engine-setup/"
            - "/var/cache/libvirt/"
        - name: Clean old repo files
          shell: "rm -rf /etc/yum.repos.d/{{item}}"
          args:
            warn: False
          with_items:
            - "ovirt*"
            - "virt*"
        - name: Remove old firewalld rules
          file:
            path: "/etc/firewalld/direct.xml"
            state: absent
        - name: Clear firewall rules
          shell: "iptables --flush"
          args:
            warn: False
          ignore_errors: yes
        - name: clean interface configs
          shell: "rm -rf /etc/sysconfig/network-scripts/{{nic_device_name}}* /etc/sysconfig/network-scripts/ifcfg-dummy_0* /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt“
          args:
            warn: False
        - name: clean network stuff
          shell: "{{item}}"
          args:
            warn: False
          with_items:
            - "brctl delbr ovirtmgmt | cat"
            - "ip link del ovirtmgmt | cat"
            - "ip link del dummy0 | cat"
            - "ip link del virbr0 | cat"
            - "ip link del virbr0-nic | cat"
            - "ip link del dummy_0 | cat"
            - 'ip link del \;vdsmdummy\; | cat'

Comment 18 Kyle Stapp 2018-06-14 15:49:38 UTC
This can be closed.  But please open a new one for:

* Better ansible error message that makes it clear it cannot connect to host in some log
* Fixing the 'disk/' -> '/disk/' test in <rasd:HostResource> element


Note You need to log in before you can comment on or make changes to this bug.