Bug 1462506
Summary: | [Stress] : rmdirs from multiple FUSE mounts causes disconnects leading to Geo-Rep worker crash. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
Component: | disperse | Assignee: | Ashish Pandey <aspandey> |
Status: | CLOSED WORKSFORME | QA Contact: | Nag Pavan Chilakam <nchilaka> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.3 | CC: | amukherj, kiyer, nchilaka, rhinduja, rhs-bugs, rkavunga, storage-qa-internal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-16 09:40:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ambarish
2017-06-18 08:45:04 UTC
I see my brick logs flooded with this : /bricks1/A1 [2017-06-17 06:28:09.142084] E [MSGID: 122034] [ec-common.c:506:ec_child_select] 0-butcher-disperse-2: Insufficient available children for this request (have 0, need 4) [2017-06-17 06:28:09.142113] E [MSGID: 122037] [ec-common.c:1914:ec_update_size_version_done] 0-butcher-disperse-2: Failed to update version and size [Input/output error] geo-replication worker crashed because when they get an I/O error from mount point, worker stops and restart the process to retry the fop. Now I/O error from mount point was generated by ec xlators saying that one entire subvolume is down. The catch here is, logs says that [1] client has not witnessed any disconnects after successful connection to all the bricks. Even in the brick logs [2] says, the connection was available when ec complains about unavailability of bricks. Brick only got disconnected as part of unmount from client. So I suspect that ec calculation about the connected subvolume. This could be wrong :) [1] : sosreport-gqas015.sbu.lab.eng.bos.redhat.com-20170619015917/var/log/glusterfs/geo-replication-slaves/29f18f45-c822-4c0e-84ef-737e128e0368:gqas006.sbu.lab.eng.bos.redhat.com.%2Fbricks8%2FA1.butcher.gluster.com [2] : sosreport-gqas015.sbu.lab.eng.bos.redhat.com-20170619015917/var/log/glusterfs/geo-replication-slaves/bricks9-A1.log Is this bug seen in the latest releases? If not, can we please close this? |