Bug 1467690 - Endless error message in engine log: Error during log command: org.ovirt.engine.core.bll.storage.disk.SparsifyImageCommand. Exception null
Endless error message in engine log: Error during log command: org.ovirt.eng...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt (Show other bugs)
4.1.3.4
Unspecified Unspecified
medium Severity high (vote)
: ovirt-4.2.0
: ---
Assigned To: Shmuel Melamud
Vitalii Yerys
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-04 11:30 EDT by Israel Pinto
Modified: 2017-10-02 04:32 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-10-02 04:32:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
tjelinek: ovirt‑4.2?
mtessun: planning_ack+
ipinto: devel_ack?
mavital: testing_ack+


Attachments (Terms of Use)
engine.log (710.84 KB, application/x-xz)
2017-07-04 11:30 EDT, Israel Pinto
no flags Details

  None (edit)
Description Israel Pinto 2017-07-04 11:30:43 EDT
Created attachment 1294291 [details]
engine.log

Description of problem:
I see this error message on engine log after trying to execute sparsify on vm disk. The sparsify action failed.
From engine log:
2017-07-04 17:35:23,068+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler1) [disks_syncAction_b7735a81-3304-47d6] Failed invoking callback end method 'onFailed' for command '7ba228dd-89f9-4b33-a35f-453d1c2e0cbf' with exception 'null', the callback is marked for end method retries
2017-07-04 17:35:23,076+03 ERROR [org.ovirt.engine.core.bll.storage.disk.SparsifyImageCommand] (DefaultQuartzScheduler1) [disks_syncAction_0bbfabfd-d8ca-46d4] Ending command 'org.ovirt.engine.core.bll.storage.disk.SparsifyImageCommand' with failure.
2017-07-04 17:35:23,082+03 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler1) [disks_syncAction_0bbfabfd-d8ca-46d4] transaction rolled back
2017-07-04 17:35:23,089+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogableBase] (DefaultQuartzScheduler1) [disks_syncAction_0bbfabfd-d8ca-46d4] Error during log command: org.ovirt.engine.core.bll.storage.disk.SparsifyImageCommand. Exception null

The message appear endlessly on the log and don't stop.

Some background: 
The Night before the hosts in the setup were not reachable since they are  
nested hosts and the nested setup were down since storage problem in lab.
I restore the setup and activate the hosts, but i see in the engine log the DB query failed (could cause the problem i see after restoring the setup)
From engine log:
2017-07-04 03:17:03,125+03 ERROR [org.ovirt.engine.core.bll.job.ExecutionHandler] (DefaultQuartzScheduler4) [disks_syncAction_b7735a81-3304-47d6] Exception: org.springframework.dao.DataIntegrityViolationException: CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"
  Detail: Key (job_id)=(38cbfcfa-fab8-4e52-91b2-af05c3924083) is not present in table "job".
  Where: SQL statement "INSERT INTO step (
        step_id,
        parent_step_id,
        job_id,
        step_type,
        description,
        step_number,
        status,
        progress,
        start_time,
        end_time,
        correlation_id,
        external_id,
        external_system_type,
        is_external
        )
    VALUES (
        v_step_id,
        v_parent_step_id,
        v_job_id,
        v_step_type,
        v_description,
        v_step_number,
        v_status,
        v_progress,
        v_start_time,
        v_end_time,
        v_correlation_id,
        v_external_id,
        v_external_system_type,
        v_is_external
        )"
PL/pgSQL function insertstep(uuid,uuid,uuid,character varying,text,integer,character varying,smallint,timestamp with time zone,timestamp with time zone,character varying,uuid,character varying,boolean) line 3 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: insert or update on table "step" violates foreign key constraint "fk_step_job"


Version-Release number of selected component (if applicable):
Red Hat Virtualization Manager Version: 4.1.3.4-0.1.el7

Host:
OS Version:RHEL - 7.4 - 11.el7
Kernel Version:3.10.0 - 663.el7.x86_64
KVM Version:2.9.0 - 10.el7
LIBVIRT Version:libvirt-3.2.0-4.el7
VDSM Version:vdsm-4.19.18-1.el7ev
SPICE Version:0.12.8 - 2.el7


Steps to Reproduce:
1. After activate host 
2. Run sparsify on vm disk 


Additional info:
Attaching engine log
Comment 1 Allon Mureinik 2017-07-05 02:51:17 EDT
The error looks similar to the error in bug 1464348. A similar approach may do the trick.
Comment 2 Yaniv Kaul 2017-08-06 03:44:43 EDT
Is this on track to 4.1.5? Otherwise, please defer.
Comment 3 Tomas Jelinek 2017-09-04 09:30:52 EDT
Some conclusions:

- if you cut the line between engine and host during the sparsify, the command fails
- if the failing of the command fails by itself, it will be retried again by the infrastructure
- this will be repeated indefinitely

But, we have not succeeded to simulate this situation - the failing of the command always succeeded (hence not having the infinite errors). It had to be something very specific in the environment or hit due to some other bug.

Considering this and the fact it is basically harmless (you will have many errors in log but nothing actually breaks), retargeting to 4.2.
Comment 4 Tomas Jelinek 2017-09-14 04:59:12 EDT
@Israel: we were not able to simulate this - do you happen to have some reproduction environment when this actually happens?
Comment 5 Israel Pinto 2017-09-18 02:39:46 EDT
The problem were solve with update of the kernel version, 
See https://bugzilla.redhat.com/show_bug.cgi?id=1469134#c9
We can close this BZ, I don't think it will be reproduce, since the problem was with kernal see: https://bugzilla.redhat.com/show_bug.cgi?id=1448312
Comment 6 Tomas Jelinek 2017-10-02 04:32:34 EDT
thank you Israel, closing.

Note You need to log in before you can comment on or make changes to this bug.