Bug 1249851
| Summary: | Error attaching glusterfs storage domain: "Cannot acquire host id" | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | punit <hypunit> |
| Component: | General | Assignee: | Nir Soffer <nsoffer> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Elad <ebenahar> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.5.2.1 | CC: | acanan, amureini, bugs, ecohen, gklein, hypunit, lsurette, nsoffer, ravishankar, rbalakri, sabose, teigland, yeylon, ylavi |
| Target Milestone: | ovirt-3.6.2 | Flags: | ylavi:
ovirt-3.6.z?
ylavi: planning_ack? ylavi: devel_ack? ylavi: testing_ack? |
| Target Release: | 3.6.2 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | storage | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-12-22 20:22:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
punit
2015-08-04 00:56:01 UTC
(In reply to punit from comment #0) According to sanlock logs, sanlock cannot update the delta lease on the gluster domain. In this case, failing to acquire a lock is expected. Please attach the glusterfs logs for this volumes to this bug. The logs should be found at /var/log/glusterfs/rhev_data_center*<gluster server>_<volume name>.log. Sahina, can you get someone to look at this? David, can you look in sanlock logs and confirm that this is a gluster issue? Correction for glusterfs logs - the logs are found at: /var/log/glusterfs/rhev-data-center-mnt-glusterSD-<server>:_<volume>.log Yes, sanlock gets i/o errors 103 (ECONNABORTED) and 107 (ENOTCONN) from storage. Hi Nir, The logs are here with both the Hypervisior node :- http://paste.ubuntu.com/12004403/ http://paste.ubuntu.com/12004410/ http://paste.ubuntu.com/12004788/ http://paste.ubuntu.com/12004825/ As I said in comment 1, someone from gluster should check these logs. Adding back lost needinfo for Sahina. Changing category to "sd-gluster" since this is not a sanlock issue. Ravi, can you look at this? Could someone from the ovirt team can explain the steps that are carried out from a gluster POV when "but when i try to add another glusterfs datastorage (replicateX3),i am not able to add the datastore to ovirt and it failed " is performed?
I'm assuming 'adding a datastorage' involves the following steps.
1. Forming a storage pool of stor{1..3} using `gluster peer probe`
2. Creating a replica 3 volume and starting it
3. FUSE Mounting the volume on *all* the hypervisors.
- Is this correct?
- At what point is the adding deemed successful? Will it fail if the FUSE mount is unmounted for some reason? From the logs given in comment #5, I see that the volume 3TB is being mounted and unmounted multiple times.
- Does sanlock come into play on a volume just created and having no VM images yet?
(In reply to Ravishankar N from comment #9) > Could someone from the ovirt team can explain the steps that are carried out > from a gluster POV when "but when i try to add another glusterfs datastorage > (replicateX3),i am not able to add the datastore to ovirt and it failed " is > performed? > > I'm assuming 'adding a datastorage' involves the following steps. > 1. Forming a storage pool of stor{1..3} using `gluster peer probe` > 2. Creating a replica 3 volume and starting it I don't know about this, we don't have any information about this in the bug. punit, please confirm the steps above. > 3. FUSE Mounting the volume on *all* the hypervisors. Right - and then: 4. Sanlock try to acquire a host id on *all* hosts This includes writing to the each host block in the "<dom_uuid>/dom_md/ids" file, and reading other hosts blocks. According to sanlock log (see comment 4), sanlock get ECONNABORTED and ENOTCONN from storage at this point. > - At what point is the adding deemed successful? When sanlock can acquire the host id. Before acquiring the host id, a host is not allowed to touch the shared storage. > Will it fail if the FUSE > mount is unmounted for some reason? If it was unmounted before sanlock acquired the host id, it will fail. If it fail after that, the storage domain will become non-operational later, when storage domain monitoring fail to read from storage, > From the logs given in comment #5, I see > that the volume 3TB is being mounted and unmounted multiple times. > - Does sanlock come into play on a volume just created and having no VM > images yet? Yes, as described in step 4. punit, we need full vdsm logs, showing the entire flow starting from the point when you try to create a gluster storage domain, until it fails. And we need the full logs attached to this bug. I cannot download the logs via the linkes you posted, it seems that downloading requires registration in that site. Please also check and answer Ravishankar questions from comment 9. Hi, Logs are already attached and it's free website no need to register... Additional info: 1. Engine logs :- http://paste.ubuntu.com/11971901/ 2. Sanlock lock (HV1) :- http://paste.ubuntu.com/11971916/ 3. VDSM Logs (HV1) :- http://paste.ubuntu.com/11971926/ 4. Sanlock lock (HV2) :- http://paste.ubuntu.com/11971950/ 5. VDSM Logs (HV2) :- http://paste.ubuntu.com/11971955/ 6. Var Messages (HV1) :- http://paste.ubuntu.com/11971967/ 7. Var Messages (HV2) :- http://paste.ubuntu.com/11971977/ The logs are here with both the Hypervisior node :- http://paste.ubuntu.com/12004403/ http://paste.ubuntu.com/12004410/ http://paste.ubuntu.com/12004788/ http://paste.ubuntu.com/12004825/ (In reply to punit from comment #12) Punit, I cannot download the log from that site - you don't have an issue since you have an account there, but I don't. http://paste.ubuntu.com/11971926/plain/ Also I need *full* vdsm logs. Please check again comment 11. If we will not get the requested info we will have to close this bug. Adding back needinfo for ravishankar for comment 10. Hi, Please try with the following url :- http://ur1.ca/npig8 Also if you try to open this url it will be open http://paste.ubuntu.com/11971926/ instead of http://paste.ubuntu.com/11971926/plain/ As i don't have the ubuntu account,but i can easily can see the logs without the plain postfix in the url... As i cannot reproduce the logs again,so if you can check the logs it's ok .otherwise you may consider to close this bug. Thanks, punit Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Adding back needinfo for ravishankar for comment 10. (In reply to punit from comment #15) > Please try with the following url :- > ... These urls do not work for me. I need full vdsm logs on my machine to investigate this. Closing for now, please reopen if when you can attach full logs to this bug. Removing the need-info in my name as the bug is closed. |