Bug 1462181
Summary: | [Stress] : Rebalance fails when Geo Rep is in progress | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
Component: | rpc | Assignee: | Milind Changire <mchangir> |
Status: | CLOSED WONTFIX | QA Contact: | Rahul Hinduja <rhinduja> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.3 | CC: | amukherj, mchangir, nchilaka, rallan, rgowdapp, rhinduja, rhs-bugs, sheggodu, storage-qa-internal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | rpc-3.4.0? | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-13 09:10:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ambarish
2017-06-16 11:42:28 UTC
From logs,I see failures while fixing layout on / and a heal failure before that : [2017-06-16 11:07:50.852073] I [MSGID: 109028] [dht-rebalance.c:4717:gf_defrag_status_get] 0-glusterfs: Files migrated: 0, size: 0, lookups: 0, failures: 0, skipped: 0 [2017-06-16 11:08:05.034660] I [MSGID: 109063] [dht-layout.c:713:dht_layout_normalize] 0-testvol-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 [2017-06-16 11:08:05.034692] W [MSGID: 109005] [dht-selfheal.c:2111:dht_selfheal_directory] 0-testvol-dht: Directory selfheal failed: 1 subvolumes down.Not fixing. path = /, gfid = [2017-06-16 11:08:05.034734] I [MSGID: 108031] [afr-common.c:2264:afr_local_discovery_cbk] 0-testvol-replicate-2: selecting local read_child testvol-client-5 [2017-06-16 11:08:05.034831] I [MSGID: 108006] [afr-common.c:4854:afr_local_init] 0-testvol-replicate-1: no subvolumes up [2017-06-16 11:08:05.034865] W [MSGID: 109075] [dht-diskusage.c:44:dht_du_info_cbk] 0-testvol-dht: failed to get disk info from testvol-replicate-1 [Transport endpoint is not connected] [2017-06-16 11:08:05.035468] I [dht-rebalance.c:4211:gf_defrag_start_crawl] 0-testvol-dht: gf_defrag_start_crawl using commit hash 3390955361 [2017-06-16 11:08:05.035548] I [MSGID: 108006] [afr-common.c:4854:afr_local_init] 0-testvol-replicate-1: no subvolumes up [2017-06-16 11:08:05.042751] I [MSGID: 109081] [dht-common.c:4258:dht_setxattr] 0-testvol-dht: fixing the layout of / [2017-06-16 11:08:05.042778] W [MSGID: 109016] [dht-selfheal.c:1738:dht_fix_layout_of_directory] 0-testvol-dht: Layout fix failed: 1 subvolume(s) are down. Skipping fix layout. [2017-06-16 11:08:05.043029] E [MSGID: 109026] [dht-rebalance.c:4253:gf_defrag_start_crawl] 0-testvol-dht: fix layout on / failed [2017-06-16 11:08:05.043176] I [MSGID: 109028] [dht-rebalance.c:4713:gf_defrag_status_get] 0-testvol-dht: Rebalance is failed. Time taken is 25.00 secs [2017-06-16 11:08:05.043188] I [MSGID: 109028] [dht-rebalance.c:4717:gf_defrag_status_get] 0-testvol-dht: Files migrated: 0, size: 0, lookups: 0, failures: 1, skipped: 0 Milind, Can you take a look into logs and identify the cause for disconnection? What's the latest on this bug? Can we confirm if this bug is still valid in the latest releases? Considering this bug being quite old, we should try to take this to closure. requesting re-validation of BZ to Nag see comment #16 Closing - if it happens again or we have more information, please re-open. |