Bug 1506280 - engine-setup fails: Failed to execute stage 'Setup validation': 'list' object has no attribute 'splitlines'
Summary: engine-setup fails: Failed to execute stage 'Setup validation': 'list' object...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Setup.Engine
Version: 4.1.5.2
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.2.0
: 4.2.0
Assignee: Yedidyah Bar David
QA Contact: Lucie Leistnerova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-25 14:55 UTC by Mikhail Khoroshev
Modified: 2017-12-20 11:37 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-12-20 11:37:13 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+
lsvaty: testing_ack+


Attachments (Terms of Use)
Ovirt-setup log, ovirt web portal screenshot. (53.94 KB, application/zip)
2017-10-25 14:55 UTC, Mikhail Khoroshev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 83518 0 master MERGED packaging: setup: Fix dbvalidations 2020-09-10 14:08:02 UTC
oVirt gerrit 83544 0 master MERGED packaging: setup: Clean up fkvalidator.sh errors 2020-09-10 14:08:02 UTC

Description Mikhail Khoroshev 2017-10-25 14:55:09 UTC
Created attachment 1343303 [details]
Ovirt-setup log, ovirt web portal screenshot.

Description of problem:
Engine-setup script fails with an error: Failed to execute stage 'Setup validation': 'list' object has no attribute 'splitlines'

Version-Release number of selected component (if applicable):
ovirt-vmconsole-1.0.4-1.el7.centos.noarch
ovirt-setup-lib-1.1.4-1.el7.centos.noarch
ovirt-node-ng-image-update-placeholder-4.1.6-1.el7.centos.noarch
ovirt-release41-4.1.6-1.el7.centos.noarch
ovirt-vmconsole-host-1.0.4-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.centos.noarch
ovirt-host-deploy-1.6.6-1.el7.centos.noarch
ovirt-hosted-engine-ha-2.1.5-1.el7.centos.noarch
ovirt-release-host-node-4.1.6-1.el7.centos.noarch
ovirt-imageio-common-1.0.0-1.el7.noarch
ovirt-node-ng-nodectl-4.1.4-0.20170919.0.el7.noarch
ovirt-imageio-daemon-1.0.0-1.el7.noarch
ovirt-hosted-engine-setup-2.1.3.8-1.el7.centos.noarch
ovirt-node-ng-image-update-4.1.6-1.el7.centos.noarch
vdsm-4.19.31-1.el7.centos.x86_64

How reproducible:
Every time.

Steps to Reproduce:
1. Install OS (4.1.5.2-1-el7.centos) on 3 nodes using ovirt-node-ng-installer iso image.
2. Deploy cluster HostedEngine on 3 nodes based on Gluster storage's via web portal of any above nodes.
3. Create some VM's and start it.
4. Upgrade all nodes to 4.1.6-1 version via ovirt web portal
5. Put cluster into GlobalMaintenance from any above nodes
6. Yum update all packages in VM HostedEngine
7. Run engine-setup on VM HostedEngine and accept all queries 

Actual results:
engine-setup stops with an error: 'Setup validation': 'list' object has no attribute 'splitlines'

Expected results:
engine-setup completed normally and Ovirt Engine Version updated to 4.1.6-1

Additional info:
Ovirt engine-setup log in attachment.

Comment 1 Yedidyah Bar David 2017-10-25 15:13:26 UTC
Seems like a bug in the fix for bug 1261335.

That bug was meant to show nicer error message on some condition.

The bug is in:

    stderrLines = stderr.splitlines()

stderr in this case is already a list, should not be split further.

To see the actual problem causing setup to fail, you can check the setup log. You can (also) see there:

    Constraint violation found in  job_subject_entity (job_id) |1

So you can check the contents of the table 'job_subject_entity' to try and find out the problem, or attach it here. You can see that with:

su - postgres -c "psql engine -c 'select * from job_subject_entity;'"

Comment 2 Mikhail Khoroshev 2017-10-25 15:26:09 UTC
engine=# select * from job_subject_entity;
                job_id                |              entity_id               | entity_type 
--------------------------------------+--------------------------------------+-------------
 5001a696-ef62-48ec-987b-c21ca951b7c7 | 59a9d4f5-06ad-4873-be89-4f9c1119fb52 | VM
 9e6a184f-103d-45cf-9ac1-3b66994abdab | 6e6ee144-5f9e-46aa-a135-56e7eadba743 | VM
(2 rows)

Comment 3 Yedidyah Bar David 2017-10-26 05:35:22 UTC
Can you please check also:

    select * from job;

You should see a matching line per each line in job_subject_entity. That's the only constraint I see in the sources. If all do match, perhaps it was a temporary state and you can try engine-setup gain. If not, it might be interesting/useful to know how this happened, but I guess it should be safe to remove the offending line from job_subject_entity. If you do want to investigate, you can start by searching the IDs in engine logs.

Comment 4 Mikhail Khoroshev 2017-10-26 06:59:00 UTC
The job table is empty.

engine=# select * from job;
 job_id | action_type | description | status | owner_id | visible | start_time | end_time | last_update_time | correlation_id | is_external | is_auto_cleared | engine_session_seq_
id 
--------+-------------+-------------+--------+----------+---------+------------+----------+------------------+----------------+-------------+-----------------+--------------------
---
(0 rows)

The above VM's (ID 59a9d4f5-06ad-4873-be89-4f9c1119fb52, ID 6e6ee144-5f9e-46aa-a135-56e7eadba743) mostly in a down state. The last task/operations on them - removing the additional hdd's via the ovirt web portal. 
Do I need to export them to backup storage and remove from the cluster before job_subject_entity clearing?

Comment 5 Mikhail Khoroshev 2017-10-26 09:14:35 UTC
The offered lines has been removed from the job_subject_entity table and engine-setup was completed successfully. The most probably that it was a VM FS crash caused by hardware failure during the some task operation.
Many thanks for your help!

Comment 6 Yedidyah Bar David 2017-10-26 09:54:08 UTC
Thanks for the report :-)

Keeping the bug open for fixing the error message. Also noticed a related bug while looking at the setup log:

/usr/share/ovirt-engine/setup/dbutils/fkvalidator.sh: line 89: exit: Constraint: numeric argument required

Relevant code is:

        if [ "${exit_code}" = "0" -a -z "${fix_it}" ]; then
                exit_code="$(echo "${res}" | sed -n '2p')"
        fi
        exit ${exit_code}

So exit_code here is probably "Constraint violation found in..." instead of a number.

Comment 7 Yedidyah Bar David 2017-11-02 15:08:16 UTC
Reusing current bug also to fix the other issue.

To reproduce, you have to have at least 2 violations in different tables. Did this:

alter table job_subject_entity DISABLE TRIGGER ALL;
insert into job_subject_entity values ('5001a696-ef62-48ec-987b-c21ca951b7c7', '59a9d4f5-06ad-4873-be89-4f9c1119fb52', 'VM');
insert into job_subject_entity values ('5001a696-ef62-48ec-987b-c21ca951b7c8', '59a9d4f5-06ad-4873-be89-4f9c1119fb53', 'VM');
alter table job_subject_entity ENABLE TRIGGER ALL;

alter table async_tasks_entities DISABLE TRIGGER ALL;
insert into async_tasks_entities values ('5001a696-ef62-48ec-987b-c21ca951b7c9', '59a9d4f5-06ad-4873-be89-4f9c1119fb54', 'VM');
alter table async_tasks_entities ENABLE TRIGGER ALL;

With that, engine-setup fails as in the attached log, see comment 6.

Comment 8 Yedidyah Bar David 2017-11-14 12:33:28 UTC
Note to QE:

Reproduction/Verification flow:

1. Setup an engine

2. Cause at least two different tables to have entries that invalid foreign keys.
No idea how to cause this to happen using "normal" means, and almost certainly there aren't any - and if there are, it's most likely a bug in the engine or in postgresql (or both). I personally did this using comment 7.

3. Update setup packages to the version you want to verify

4. Run engine-setup

With a broken version, it will fail with the message in comment 0, and setup log will also have a line like in comment 6.

With a fixed version, it will fail with a nicer message:

[ERROR] Failed to execute stage 'Setup validation': Failed checking Engine database: an exception occurred while validating the Engine database, please check the logs for getting more info:
Constraint violation found in  async_tasks_entities (async_task_id) |1

And in setup log you should see something like this:

2017-11-09 14:14:59,641+0200 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.upgrade.dbvalidations plugin.execute:926 execute-output: ['/usr/share/ovirt-engine/setup/dbutils/validatedb.
sh', '--user=engine', '--host=localhost', '--port=5432', '--database=engine', '--log=/var/log/ovirt-engine/setup/ovirt-engine-setup-20171109141455-0j0cn5.log'] stderr:
Constraint violation found in  async_tasks_entities (async_task_id) |1
Constraint violation found in  job_subject_entity (job_id) |1

Comment 9 Lucie Leistnerova 2017-12-14 13:56:00 UTC
engine-setup failed with the nicer message and log contains details, according to steps in Comment 8

verified in ovirt-engine-setup-4.2.0.2-0.1.el7.noarch

Comment 10 Sandro Bonazzola 2017-12-20 11:37:13 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.