Description of problem:I have one testing ovirt cluster with one glusterfs storage (Distributed replicated) and 3 HV nodes...it's all working fine...but when i try to add another glusterfs datastorage (replicateX3),i am not able to add the datastore to ovirt and it failed with the following error :- 2015-07-29 10:05:55,194 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-8-thread-32) [751c5f25] IrsBroker::Failed::AttachStorageDomainVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to AttachStorageDomainVDS, error = Cannot acquire host id: (u'd0e76dd4-c34a-456e-b7f6-02dc173a3cc1', SanlockException(90, 'Sanlock lockspace add failure', 'Message too long')), code = 661 Version-Release number of selected component (if applicable): Ovirt Version :- 3.5.2.1-1.el7.centos VDSM :- vdsm-4.16.20-0.el7.centos Glusterfs version :- glusterfs-3.6.3-1.el7 [root@stor1 ~]# gluster volume info 3TB Volume Name: 3TB Type: Replicate Volume ID: 78d1f376-178d-4b01-90c0-5dac90b50a6c Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: stor1:/bricks/b/vol2 Brick2: stor2:/bricks/b/vol2 Brick3: stor3:/bricks/b/vol2 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: enable nfs.disable: off [root@stor1 ~]# ------------------------ Actual results: Datastore failed to add with the error "Cannot acquire host id" Expected results: The Datastore should be add without any error. Additional info: 1. Engine logs :- http://paste.ubuntu.com/11971901/ 2. Sanlock lock (HV1) :- http://paste.ubuntu.com/11971916/ 3. VDSM Logs (HV1) :- http://paste.ubuntu.com/11971926/ 4. Sanlock lock (HV2) :- http://paste.ubuntu.com/11971950/ 5. VDSM Logs (HV2) :- http://paste.ubuntu.com/11971955/ 6. Var Messages (HV1) :- http://paste.ubuntu.com/11971967/ 7. Var Messages (HV2) :- http://paste.ubuntu.com/11971977/ Thanks, Punit
(In reply to punit from comment #0) According to sanlock logs, sanlock cannot update the delta lease on the gluster domain. In this case, failing to acquire a lock is expected. Please attach the glusterfs logs for this volumes to this bug. The logs should be found at /var/log/glusterfs/rhev_data_center*<gluster server>_<volume name>.log. Sahina, can you get someone to look at this?
David, can you look in sanlock logs and confirm that this is a gluster issue?
Correction for glusterfs logs - the logs are found at: /var/log/glusterfs/rhev-data-center-mnt-glusterSD-<server>:_<volume>.log
Yes, sanlock gets i/o errors 103 (ECONNABORTED) and 107 (ENOTCONN) from storage.
Hi Nir, The logs are here with both the Hypervisior node :- http://paste.ubuntu.com/12004403/ http://paste.ubuntu.com/12004410/ http://paste.ubuntu.com/12004788/ http://paste.ubuntu.com/12004825/
As I said in comment 1, someone from gluster should check these logs. Adding back lost needinfo for Sahina.
Changing category to "sd-gluster" since this is not a sanlock issue.
Ravi, can you look at this?
Could someone from the ovirt team can explain the steps that are carried out from a gluster POV when "but when i try to add another glusterfs datastorage (replicateX3),i am not able to add the datastore to ovirt and it failed " is performed? I'm assuming 'adding a datastorage' involves the following steps. 1. Forming a storage pool of stor{1..3} using `gluster peer probe` 2. Creating a replica 3 volume and starting it 3. FUSE Mounting the volume on *all* the hypervisors. - Is this correct? - At what point is the adding deemed successful? Will it fail if the FUSE mount is unmounted for some reason? From the logs given in comment #5, I see that the volume 3TB is being mounted and unmounted multiple times. - Does sanlock come into play on a volume just created and having no VM images yet?
(In reply to Ravishankar N from comment #9) > Could someone from the ovirt team can explain the steps that are carried out > from a gluster POV when "but when i try to add another glusterfs datastorage > (replicateX3),i am not able to add the datastore to ovirt and it failed " is > performed? > > I'm assuming 'adding a datastorage' involves the following steps. > 1. Forming a storage pool of stor{1..3} using `gluster peer probe` > 2. Creating a replica 3 volume and starting it I don't know about this, we don't have any information about this in the bug. punit, please confirm the steps above. > 3. FUSE Mounting the volume on *all* the hypervisors. Right - and then: 4. Sanlock try to acquire a host id on *all* hosts This includes writing to the each host block in the "<dom_uuid>/dom_md/ids" file, and reading other hosts blocks. According to sanlock log (see comment 4), sanlock get ECONNABORTED and ENOTCONN from storage at this point. > - At what point is the adding deemed successful? When sanlock can acquire the host id. Before acquiring the host id, a host is not allowed to touch the shared storage. > Will it fail if the FUSE > mount is unmounted for some reason? If it was unmounted before sanlock acquired the host id, it will fail. If it fail after that, the storage domain will become non-operational later, when storage domain monitoring fail to read from storage, > From the logs given in comment #5, I see > that the volume 3TB is being mounted and unmounted multiple times. > - Does sanlock come into play on a volume just created and having no VM > images yet? Yes, as described in step 4.
punit, we need full vdsm logs, showing the entire flow starting from the point when you try to create a gluster storage domain, until it fails. And we need the full logs attached to this bug. I cannot download the logs via the linkes you posted, it seems that downloading requires registration in that site. Please also check and answer Ravishankar questions from comment 9.
Hi, Logs are already attached and it's free website no need to register... Additional info: 1. Engine logs :- http://paste.ubuntu.com/11971901/ 2. Sanlock lock (HV1) :- http://paste.ubuntu.com/11971916/ 3. VDSM Logs (HV1) :- http://paste.ubuntu.com/11971926/ 4. Sanlock lock (HV2) :- http://paste.ubuntu.com/11971950/ 5. VDSM Logs (HV2) :- http://paste.ubuntu.com/11971955/ 6. Var Messages (HV1) :- http://paste.ubuntu.com/11971967/ 7. Var Messages (HV2) :- http://paste.ubuntu.com/11971977/ The logs are here with both the Hypervisior node :- http://paste.ubuntu.com/12004403/ http://paste.ubuntu.com/12004410/ http://paste.ubuntu.com/12004788/ http://paste.ubuntu.com/12004825/
(In reply to punit from comment #12) Punit, I cannot download the log from that site - you don't have an issue since you have an account there, but I don't. http://paste.ubuntu.com/11971926/plain/ Also I need *full* vdsm logs. Please check again comment 11. If we will not get the requested info we will have to close this bug.
Adding back needinfo for ravishankar for comment 10.
Hi, Please try with the following url :- http://ur1.ca/npig8 Also if you try to open this url it will be open http://paste.ubuntu.com/11971926/ instead of http://paste.ubuntu.com/11971926/plain/ As i don't have the ubuntu account,but i can easily can see the logs without the plain postfix in the url... As i cannot reproduce the logs again,so if you can check the logs it's ok .otherwise you may consider to close this bug. Thanks, punit
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
(In reply to punit from comment #15) > Please try with the following url :- > ... These urls do not work for me. I need full vdsm logs on my machine to investigate this. Closing for now, please reopen if when you can attach full logs to this bug.
Removing the need-info in my name as the bug is closed.