Bug 1248453 - [engine-backend] Gluster domain deactivation ends with a failure while the domain is master
[engine-backend] Gluster domain deactivation ends with a failure while the do...
Status: CLOSED WORKSFORME
Product: vdsm
Classification: oVirt
Component: General (Show other bugs)
4.17.0
x86_64 Unspecified
unspecified Severity high (vote)
: ovirt-3.6.2
: 4.17.17
Assigned To: Ala Hino
Elad
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-30 06:37 EDT by Elad
Modified: 2016-03-23 09:57 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-13 06:50:50 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
amureini: needinfo+
amureini: ovirt‑3.6.z?
tnisan: ovirt‑4.0.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

  None (edit)
Description Elad 2015-07-30 06:37:01 EDT
Description of problem:
Had 2 storage domain in the DC: 1 Gluster as master and another data domain.
I deactivated the Gluster domain and the operation ended with a failure in vdsm.


Version-Release number of selected component (if applicable):

vdsm-cli-4.17.0-1201.git7ba0684.el7.noarch
vdsm-yajsonrpc-4.17.0-1201.git7ba0684.el7.noarch
vdsm-infra-4.17.0-1201.git7ba0684.el7.noarch
vdsm-4.17.0-1201.git7ba0684.el7.noarch
vdsm-xmlrpc-4.17.0-1201.git7ba0684.el7.noarch
vdsm-jsonrpc-4.17.0-1201.git7ba0684.el7.noarch
vdsm-python-4.17.0-1201.git7ba0684.el7.noarch
glusterfs-libs-3.7.2-3.el7.x86_64
glusterfs-api-3.7.2-3.el7.x86_64
glusterfs-server-3.7.2-3.el7.x86_64
glusterfs-client-xlators-3.7.2-3.el7.x86_64
glusterfs-fuse-3.7.2-3.el7.x86_64
glusterfs-cli-3.7.2-3.el7.x86_64
glusterfs-geo-replication-3.7.2-3.el7.x86_64
glusterfs-3.7.2-3.el7.x86_64
libvirt-daemon-1.2.8-16.el7_1.3.x86_64
qemu-kvm-ev-2.1.2-23.el7_1.3.1.x86_64
selinux-policy-3.13.1-23.el7_1.8.noarch

ovirt-engine-3.6.0-0.0.master.20150627185750.git6f063c1.el6.noarch



How reproducible:
Always

Steps to Reproduce:
1. Have a DC with 2 storage domains: 1 Gluster as master and another data domain
2. Deactivate the Gluster domain

Actual results:

Gluster master domain deactivation failure in vdsm:
 
Thread-1547::ERROR::2015-07-30 08:31:11,883::task::866::Storage.TaskManager.Task::(_setError) Task=`a5505f7f-1eea-49e7-a02b-644d8e754b04`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1237, in deactivateStorageDomain
    pool.deactivateSD(sdUUID, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1091, in deactivateSD
    self.masterMigrate(sdUUID, newMsdUUID, masterVersion)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 810, in masterMigrate
    exclude=('./lost+found',))
  File "/usr/share/vdsm/storage/fileUtils.py", line 68, in tarCopy
    raise TarCopyFailed(tsrc.returncode, tdst.returncode, out, err)
TarCopyFailed: (1, 0, '', '/usr/bin/tar: ./vms: time stamp 2015-07-30 14:59:44 is 23312.863365733 s in the future\n/usr/bin/tar: ./tasks/31002a70-459d-418a-9b03-8aabb36b15c5/31002a70-459d-418a-9b03-8aabb36b15c5.ta
sk: time stamp 2015-07-30 11:43:41 is 11549.857334236 s in the future\n/usr/bin/tar: ./tasks/31002a70-459d-418a-9b03-8aabb36b15c5/31002a70-459d-418a-9b03-8aabb36b15c5.job.0: time stamp 2015-07-30 11:43:41 is 11549
.854094166 s in the future\n/usr/bin/tar: ./tasks/31002a70-459d-418a-9b03-8aabb36b15c5/31002a70-459d-418a-9b03-8aabb36b15c5.result: time stamp 2015-07-30 11:43:41 is 11549.850335046 s in the future\n/usr/bin/tar: 
./tasks/31002a70-459d-418a-9b03-8aabb36b15c5: time stamp 2015-07-30 11:43:41 is 11549.849210573 s in the future\n/usr/bin/tar: ./tasks/5a514e3a-2c8b-4aee-98a4-00192f879d21/5a514e3a-2c8b-4aee-98a4-00192f879d21.task
: time stamp 2015-07-30 11:43:42 is 11550.844568364 s in the future\n/usr/bin/tar: ./tasks/5a514e3a-2c8b-4aee-98a4-00192f879d21/5a514e3a-2c8b-4aee-98a4-00192f879d21.result: time stamp 2015-07-30 11:43:42 is 11550.
840811022 s in the future\n/usr/bin/tar: ./tasks/5a514e3a-2c8b-4aee-98a4-00192f879d21/5a514e3a-2c8b-4aee-98a4-00192f879d21.job.0: time stamp 2015-07-30 11:43:42 is 11550.834173155 s in the future\n/usr/bin/tar: ./
tasks/5a514e3a-2c8b-4aee-98a4-00192f879d21: time stamp 2015-07-30 11:43:42 is 11550.833540006 s in the future\n/usr/bin/tar: ./tasks: time stamp 2015-07-30 11:43:42 is 11550.832915777 s in the future\n/usr/bin/tar
: .: time stamp 2015-07-30 14:59:44 is 23312.832168412 s in the future\n')

Engine.log:

2015-07-30 08:31:12,577 ERROR [org.ovirt.engine.core.bll.storage.DeactivateStorageDomainCommand] (org.ovirt.thread.pool-8-thread-40) [12325ce] Command 'org.ovirt.engine.core.bll.storage.DeactivateStorageDomainComm
and' failed: VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeactivateStorageDomainVDS, error = Error in storage domain action: (u's
dUUID=95fe6a28-a292-4753-8b84-fd268141a79a, spUUID=10843014-2739-477b-b446-2383aec4730e, msdUUID=3551f61a-3476-4698-bafa-fc8fa2e25661, masterVersion=5',), code = 350 (Failed with error StorageDomainActionError and
 code 350)


Expected results:
Gluster domain deactivation while it's master should end successfully, reconstruct master should work.

Additional info: 
sosreport from engine: http://file.tlv.redhat.com/ebenahar/sosreport-RHEL6.7Server-20150730083705.tar.xz

sosreport from the hypervisor: http://file.tlv.redhat.com/ebenahar/sosreport-green-vdsb.qa.lab.tlv.redhat.com-20150730083706.tar.xz
Comment 1 Red Hat Bugzilla Rules Engine 2015-10-19 06:53:54 EDT
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 2 Sandro Bonazzola 2015-10-26 08:43:00 EDT
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high
Comment 3 Ala Hino 2016-01-13 06:50:50 EST
Both Elad and I failed to reproduce.
Comment 4 Liron Aravot 2016-03-22 08:34:58 EDT
This bug happens because unsynced clocks, we can use -m on tar to solve that issue. We shouldn't fail to migrate the master because of that.

Allon/Nir - what's your take on that?
Comment 5 Allon Mureinik 2016-03-22 09:08:16 EDT
(In reply to Liron Aravot from comment #4)
> This bug happens because unsynced clocks, we can use -m on tar to solve that
> issue. We shouldn't fail to migrate the master because of that.
> 
> Allon/Nir - what's your take on that?
Sounds reasonable to me.
Nir?
Comment 6 Nir Soffer 2016-03-22 13:28:08 EDT
(In reply to Liron Aravot from comment #4)
> This bug happens because unsynced clocks, we can use -m on tar to solve that
> issue. We shouldn't fail to migrate the master because of that.
> 
> Allon/Nir - what's your take on that?

Did tar fail because the timestamps are in the future? Maybe we don't handle
tar exit code properly?

Anyway I don't think we should care about the timestamps, so if using -m avoid
this issue, why not.
Comment 7 Liron Aravot 2016-03-23 09:57:02 EDT
after a check, tar isn't failing because of the timestamp but logs those warnings in stderr which is misleading (and already confused people when this bug was investigated), we should avoid those timestamp errors when performing the operation.

Note You need to log in before you can comment on or make changes to this bug.