Bug 1133436
| Summary: | Creation of files on mount failing with "Cannot lock - Bad file descriptor" , "Cannot unlock - No such file or directory", "Cannot write to file3 - Transport endpoint is not connected" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | spandura | ||||
| Component: | replicate | Assignee: | Krutika Dhananjay <kdhananj> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | storage-qa-internal <storage-qa-internal> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | rhgs-3.0 | CC: | nchilaka, ravishankar, rhs-bugs, sankarshan, smohan | ||||
| Target Milestone: | --- | Keywords: | ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-11-28 09:55:39 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
spandura
2014-08-25 07:16:22 UTC
SOS Reports : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1133436/ Client1: dj Client2: king BRICK1 : rhs-client11 BRICK2 : rhs-client12 BRICK3 : rhs-client13 BRICK4 : rhs-client14 MGMT_NODE : mia Created attachment 930350 [details]
Scripts required to execute the case
With shwetha's help we were able to re-create the issue every time with in a matter of minutes. The interesting thing we found is that fsyncs are taking very long, close to the order of minutes. Need to find out why that is the case.
Here are some numbers:
root@rhs-client11 [Aug-28-2014- 9:53:05] >grep -i fsync info.txt
92.94 26650150.15 us 1522857.00 us 48115604.00 us 104 FSYNC
97.80 35502796.12 us 12597704.00 us 48115604.00 us 34 FSYNC
86.48 28516896.79 us 370091.00 us 56845990.00 us 71 FSYNC
86.98 42081453.95 us 20249447.00 us 56845990.00 us 21 FSYNC
89.59 19004670.68 us 5092117.00 us 38607219.00 us 109 FSYNC
86.36 23177027.56 us 10103503.00 us 31206852.00 us 32 FSYNC
55.51 37115151.02 us 913235.00 us 85055222.00 us 64 FSYNC
71.77 59096848.44 us 50639193.00 us 85055222.00 us 16 FSYNC
Next steps are to figure out why fsyncs are taking so long.
1) Some of the disks failed after the runs so now I am not sure if the issue is with hardware or gluster.
2) Once we get hardware in order. We want to strace the brick process to see the relative time between each syscall to figure out what is the bottle neck.
Pranith
(In reply to Pranith Kumar K from comment #4) > With shwetha's help we were able to re-create the issue every time with in a > matter of minutes. The interesting thing we found is that fsyncs are taking > very long, close to the order of minutes. Need to find out why that is the > case. > Here are some numbers: > root@rhs-client11 [Aug-28-2014- 9:53:05] >grep -i fsync info.txt > 92.94 26650150.15 us 1522857.00 us 48115604.00 us 104 > FSYNC > 97.80 35502796.12 us 12597704.00 us 48115604.00 us 34 > FSYNC > 86.48 28516896.79 us 370091.00 us 56845990.00 us 71 > FSYNC > 86.98 42081453.95 us 20249447.00 us 56845990.00 us 21 > FSYNC > 89.59 19004670.68 us 5092117.00 us 38607219.00 us 109 > FSYNC > 86.36 23177027.56 us 10103503.00 us 31206852.00 us 32 > FSYNC > 55.51 37115151.02 us 913235.00 us 85055222.00 us 64 > FSYNC > 71.77 59096848.44 us 50639193.00 us 85055222.00 us 16 > FSYNC > > Next steps are to figure out why fsyncs are taking so long. > 1) Some of the disks failed after the runs so now I am not sure if the issue > is with hardware or gluster. Took some servers from Kaleb and tried the test there. No issues reported in the bug are observed there. > 2) Once we get hardware in order. We want to strace the brick process to see > the relative time between each syscall to figure out what is the bottle neck. It continued to fail on setup provided by shwetha. Interestingly we observed following messages in /var/log/messages which point to some ata8 hardware issue. We are just debugging what it can be using Humble's help. sas: ata8: end_device-6:1: cmd error handler sas: ata8: end_device-6:1: dev error handler ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 ata8.00: failed command: SET FEATURES ata8.00: cmd ef/05:3c:00:00:00/00:00:00:00:00/40 tag 0 ata8.00: status: { ERR } ata8.00: error: { ABRT } ata8: hard resetting link ata8.00: configured for UDMA/133 ata8: EH complete Will update the bug as soon as we isolate the problem. > > Pranith Nag, Given that this bug was raised 4 releases back on afr-v1, and neither dev nor qe has seen this issue in subsequent dev/testing phase, can we go ahead and close it as WORKS FOR ME? i haven't seen this specific issue in recent times, hence we can close it , and open a new bug with latest info if we hit again Closing based on comment #10 |