Bug 1248453

Summary: [engine-backend] Gluster domain deactivation ends with a failure while the domain is master
Product: [oVirt] vdsm Reporter: Elad <ebenahar>
Component: GeneralAssignee: Ala Hino <ahino>
Status: CLOSED WORKSFORME QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.17.0CC: acanan, amureini, bazulay, bugs, ecohen, gklein, laravot, lsurette, mgoldboi, nsoffer, rbalakri, tnisan, ycui, yeylon, ylavi
Target Milestone: ovirt-3.6.2Flags: amureini: needinfo+
amureini: ovirt-3.6.z?
tnisan: ovirt-4.0.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: 4.17.17   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-13 11:50:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elad 2015-07-30 10:37:01 UTC
Description of problem:
Had 2 storage domain in the DC: 1 Gluster as master and another data domain.
I deactivated the Gluster domain and the operation ended with a failure in vdsm.


Version-Release number of selected component (if applicable):

vdsm-cli-4.17.0-1201.git7ba0684.el7.noarch
vdsm-yajsonrpc-4.17.0-1201.git7ba0684.el7.noarch
vdsm-infra-4.17.0-1201.git7ba0684.el7.noarch
vdsm-4.17.0-1201.git7ba0684.el7.noarch
vdsm-xmlrpc-4.17.0-1201.git7ba0684.el7.noarch
vdsm-jsonrpc-4.17.0-1201.git7ba0684.el7.noarch
vdsm-python-4.17.0-1201.git7ba0684.el7.noarch
glusterfs-libs-3.7.2-3.el7.x86_64
glusterfs-api-3.7.2-3.el7.x86_64
glusterfs-server-3.7.2-3.el7.x86_64
glusterfs-client-xlators-3.7.2-3.el7.x86_64
glusterfs-fuse-3.7.2-3.el7.x86_64
glusterfs-cli-3.7.2-3.el7.x86_64
glusterfs-geo-replication-3.7.2-3.el7.x86_64
glusterfs-3.7.2-3.el7.x86_64
libvirt-daemon-1.2.8-16.el7_1.3.x86_64
qemu-kvm-ev-2.1.2-23.el7_1.3.1.x86_64
selinux-policy-3.13.1-23.el7_1.8.noarch

ovirt-engine-3.6.0-0.0.master.20150627185750.git6f063c1.el6.noarch



How reproducible:
Always

Steps to Reproduce:
1. Have a DC with 2 storage domains: 1 Gluster as master and another data domain
2. Deactivate the Gluster domain

Actual results:

Gluster master domain deactivation failure in vdsm:
 
Thread-1547::ERROR::2015-07-30 08:31:11,883::task::866::Storage.TaskManager.Task::(_setError) Task=`a5505f7f-1eea-49e7-a02b-644d8e754b04`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1237, in deactivateStorageDomain
    pool.deactivateSD(sdUUID, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1091, in deactivateSD
    self.masterMigrate(sdUUID, newMsdUUID, masterVersion)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 810, in masterMigrate
    exclude=('./lost+found',))
  File "/usr/share/vdsm/storage/fileUtils.py", line 68, in tarCopy
    raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err)
TarCopyFailed: (1, 0, '', '/usr/bin/tar: ./vms: time stamp 2015-07-30 14:59:44 is 23312.863365733 s in the future\n/usr/bin/tar: ./tasks/31002a70-459d-418a-9b03-8aabb36b15c5/31002a70-459d-418a-9b03-8aabb36b15c5.ta
sk: time stamp 2015-07-30 11:43:41 is 11549.857334236 s in the future\n/usr/bin/tar: ./tasks/31002a70-459d-418a-9b03-8aabb36b15c5/31002a70-459d-418a-9b03-8aabb36b15c5.job.0: time stamp 2015-07-30 11:43:41 is 11549
.854094166 s in the future\n/usr/bin/tar: ./tasks/31002a70-459d-418a-9b03-8aabb36b15c5/31002a70-459d-418a-9b03-8aabb36b15c5.result: time stamp 2015-07-30 11:43:41 is 11549.850335046 s in the future\n/usr/bin/tar: 
./tasks/31002a70-459d-418a-9b03-8aabb36b15c5: time stamp 2015-07-30 11:43:41 is 11549.849210573 s in the future\n/usr/bin/tar: ./tasks/5a514e3a-2c8b-4aee-98a4-00192f879d21/5a514e3a-2c8b-4aee-98a4-00192f879d21.task
: time stamp 2015-07-30 11:43:42 is 11550.844568364 s in the future\n/usr/bin/tar: ./tasks/5a514e3a-2c8b-4aee-98a4-00192f879d21/5a514e3a-2c8b-4aee-98a4-00192f879d21.result: time stamp 2015-07-30 11:43:42 is 11550.
840811022 s in the future\n/usr/bin/tar: ./tasks/5a514e3a-2c8b-4aee-98a4-00192f879d21/5a514e3a-2c8b-4aee-98a4-00192f879d21.job.0: time stamp 2015-07-30 11:43:42 is 11550.834173155 s in the future\n/usr/bin/tar: ./
tasks/5a514e3a-2c8b-4aee-98a4-00192f879d21: time stamp 2015-07-30 11:43:42 is 11550.833540006 s in the future\n/usr/bin/tar: ./tasks: time stamp 2015-07-30 11:43:42 is 11550.832915777 s in the future\n/usr/bin/tar
: .: time stamp 2015-07-30 14:59:44 is 23312.832168412 s in the future\n')

Engine.log:

2015-07-30 08:31:12,577 ERROR [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (org.ovirt.thread.pool-8-thread-40) [12325ce] Command 'org.ovirt.engine.core.bll.storage.DeactivateStorageDomainComm
and' failed: VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: (u's
dUUID=95fe6a28-a292-4753-8b84-fd268141a79a, spUUID=10843014-2739-477b-b446-2383aec4730e, msdUUID=3551f61a-3476-4698-bafa-fc8fa2e25661, masterVersion=5',), code = 350 (Failed with error StorageDomainActionError and
 code 350)


Expected results:
Gluster domain deactivation while it's master should end successfully, reconstruct master should work.

Additional info: 
sosreport from engine: http://file.tlv.redhat.com/ebenahar/sosreport-RHEL6.7Server-20150730083705.tar.xz

sosreport from the hypervisor: http://file.tlv.redhat.com/ebenahar/sosreport-green-vdsb.qa.lab.tlv.redhat.com-20150730083706.tar.xz

Comment 1 Red Hat Bugzilla Rules Engine 2015-10-19 10:53:54 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 2 Sandro Bonazzola 2015-10-26 12:43:00 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 3 Ala Hino 2016-01-13 11:50:50 UTC
Both Elad and I failed to reproduce.

Comment 4 Liron Aravot 2016-03-22 12:34:58 UTC
This bug happens because unsynced clocks, we can use -m on tar to solve that issue. We shouldn't fail to migrate the master because of that.

Allon/Nir - what's your take on that?

Comment 5 Allon Mureinik 2016-03-22 13:08:16 UTC
(In reply to Liron Aravot from comment #4)
> This bug happens because unsynced clocks, we can use -m on tar to solve that
> issue. We shouldn't fail to migrate the master because of that.
> 
> Allon/Nir - what's your take on that?
Sounds reasonable to me.
Nir?

Comment 6 Nir Soffer 2016-03-22 17:28:08 UTC
(In reply to Liron Aravot from comment #4)
> This bug happens because unsynced clocks, we can use -m on tar to solve that
> issue. We shouldn't fail to migrate the master because of that.
> 
> Allon/Nir - what's your take on that?

Did tar fail because the timestamps are in the future? Maybe we don't handle
tar exit code properly?

Anyway I don't think we should care about the timestamps, so if using -m avoid
this issue, why not.

Comment 7 Liron Aravot 2016-03-23 13:57:02 UTC
after a check, tar isn't failing because of the timestamp but logs those warnings in stderr which is misleading (and already confused people when this bug was investigated), we should avoid those timestamp errors when performing the operation.