Bug 1467690 - Endless error message in engine log: Error during log command: org.ovirt.engine.core.bll.storage.disk.SparsifyImageCommand. Exception null
Summary: Endless error message in engine log: Error during log command: org.ovirt.eng...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.1.3.4
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Shmuel Melamud
QA Contact: Vitalii Yerys
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-04 15:30 UTC by Israel Pinto
Modified: 2017-10-02 08:32 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-02 08:32:34 UTC
oVirt Team: Virt
Embargoed:
tjelinek: ovirt-4.2?
mtessun: planning_ack+
ipinto: devel_ack?
mavital: testing_ack+


Attachments (Terms of Use)
engine.log (710.84 KB, application/x-xz)
2017-07-04 15:30 UTC, Israel Pinto
no flags Details

Description Israel Pinto 2017-07-04 15:30:43 UTC
Created attachment 1294291 [details]
engine.log

Description of problem:
I see this error message on engine log after trying to execute sparsify on vm disk. The sparsify action failed.
From engine log:
2017-07-04 17:35:23,068+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler1) [disks_syncAction_b7735a81-3304-47d6] Failed invoking callback end method 'onFailed' for command '7ba228dd-89f9-4b33-a35f-453d1c2e0cbf' with exception 'null', the callback is marked for end method retries
2017-07-04 17:35:23,076+03 ERROR [org.ovirt.engine.core.bll.storage.disk.SparsifyImageCommand] (DefaultQuartzScheduler1) [disks_syncAction_0bbfabfd-d8ca-46d4] Ending command 'org.ovirt.engine.core.bll.storage.disk.SparsifyImageCommand' with failure.
2017-07-04 17:35:23,082+03 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler1) [disks_syncAction_0bbfabfd-d8ca-46d4] transaction rolled back
2017-07-04 17:35:23,089+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (DefaultQuartzScheduler1) [disks_syncAction_0bbfabfd-d8ca-46d4] Error during log command: org.ovirt.engine.core.bll.storage.disk.SparsifyImageCommand. Exception null

The message appear endlessly on the log and don't stop.

Some background: 
The Night before the hosts in the setup were not reachable since they are  
nested hosts and the nested setup were down since storage problem in lab.
I restore the setup and activate the hosts, but i see in the engine log the DB query failed (could cause the problem i see after restoring the setup)
From engine log:
2017-07-04 03:17:03,125+03 ERROR [org.ovirt.engine.core.bll.job.ExecutionHandler] (DefaultQuartzScheduler4) [disks_syncAction_b7735a81-3304-47d6] Exception: org.springframework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(38cbfcfa-fab8-4e52-91b2-af05c3924083) is not present in table "job".
  Where: SQL statement "INSERT INTO step (
        step_id,
        parent_step_id,
        job_id,
        step_type,
        description,
        step_number,
        status,
        progress,
        start_time,
        end_time,
        correlation_id,
        external_id,
        external_system_type,
        is_external
        )
    VALUES (
        v_step_id,
        v_parent_step_id,
        v_job_id,
        v_step_type,
        v_description,
        v_step_number,
        v_status,
        v_progress,
        v_start_time,
        v_end_time,
        v_correlation_id,
        v_external_id,
        v_external_system_type,
        v_is_external
        )"
PL/pgSQL function insertstep(uuid,uuid,uuid,character varying,text,integer,character varying,smallint,timestamp with time zone,timestamp with time zone,character varying,uuid,character varying,boolean) line 3 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"


Version-Release number of selected component (if applicable):
Red Hat Virtualization Manager Version: 4.1.3.4-0.1.el7

Host:
OS Version:RHEL - 7.4 - 11.el7
Kernel Version:3.10.0 - 663.el7.x86_64
KVM Version:2.9.0 - 10.el7
LIBVIRT Version:libvirt-3.2.0-4.el7
VDSM Version:vdsm-4.19.18-1.el7ev
SPICE Version:0.12.8 - 2.el7


Steps to Reproduce:
1. After activate host 
2. Run sparsify on vm disk 


Additional info:
Attaching engine log

Comment 1 Allon Mureinik 2017-07-05 06:51:17 UTC
The error looks similar to the error in bug 1464348. A similar approach may do the trick.

Comment 2 Yaniv Kaul 2017-08-06 07:44:43 UTC
Is this on track to 4.1.5? Otherwise, please defer.

Comment 3 Tomas Jelinek 2017-09-04 13:30:52 UTC
Some conclusions:

- if you cut the line between engine and host during the sparsify, the command fails
- if the failing of the command fails by itself, it will be retried again by the infrastructure
- this will be repeated indefinitely

But, we have not succeeded to simulate this situation - the failing of the command always succeeded (hence not having the infinite errors). It had to be something very specific in the environment or hit due to some other bug.

Considering this and the fact it is basically harmless (you will have many errors in log but nothing actually breaks), retargeting to 4.2.

Comment 4 Tomas Jelinek 2017-09-14 08:59:12 UTC
@Israel: we were not able to simulate this - do you happen to have some reproduction environment when this actually happens?

Comment 5 Israel Pinto 2017-09-18 06:39:46 UTC
The problem were solve with update of the kernel version, 
See https://bugzilla.redhat.com/show_bug.cgi?id=1469134#c9
We can close this BZ, I don't think it will be reproduce, since the problem was with kernal see: https://bugzilla.redhat.com/show_bug.cgi?id=1448312

Comment 6 Tomas Jelinek 2017-10-02 08:32:34 UTC
thank you Israel, closing.


Note You need to log in before you can comment on or make changes to this bug.