| Summary: | inconsistent profile info data when a brick goes down and comes up in a distributed replicated system | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | M S Vishwanath Bhat <vbhat> | ||||
| Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | |||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | mainline | CC: | gluster-bugs, mzywusko, sgowda | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | --- | |||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
M S Vishwanath Bhat
2011-03-31 07:38:12 UTC
During overnight heavy io tests when a brick goes down and comes up in a distributed replicates system, the profile info data is inconsistent. After the brick comes up, first profile info lists data properly, But from second time onwards the data displayed by profile info on that particular brick is always same even though there was heavy i/o going on that brick. I am attaching the profile results file and also archiving the logs. From the server log, we see that there are a lot of READDIRP failures once the server is back online: [2011-03-30 12:45:50.361394] I [server3_1-fops.c:590:server_readdir_cbk] 0-hosdu-server: 23: READDIR 0 (1) ==> 0 (No such file or directory) [2011-03-30 12:45:50.926563] I [server3_1-fops.c:590:server_readdir_cbk] 0-hosdu-server: 634: READDIR 149 (3145732) ==> 0 (No such file or directory) from the client logs, there seems to be a lot of split brains which have lead to failures on brick: [2011-03-30 13:29:22.18475] I [afr-common.c:716:afr_lookup_done] 0-hosdu-replicate-1: background meta-data data entry self-heal triggered. path: /fileop_L1_33/fileop_L1_33_L2_28/fileop_dir_33_28_19 [2011-03-30 13:29:22.18868] I [client3_1-fops.c:1300:client3_1_entrylk_cbk] 0-hosdu-client-3: remote operation failed: No such file or directory [2011-03-30 13:29:22.19445] I [client3_1-fops.c:1225:client3_1_inodelk_cbk] 0-hosdu-client-3: remote operation failed: No such file or directory [2011-03-30 13:29:22.20004] I [client3_1-fops.c:1300:client3_1_entrylk_cbk] 0-hosdu-client-3: remote operation failed: No such file or directory [2011-03-30 13:29:22.20204] I [afr-self-heal-common.c:1532:afr_self_heal_completion_cbk] 0-hosdu-replicate-1: background meta-data data entry self-heal completed on /fileop_L1_33/fileop_L1_33_L2_28/fileop_dir_33_28_19 [2011-03-30 13:29:22.29031] I [client3_1-fops.c:2127:client3_1_opendir_cbk] 0-hosdu-client-3: remote operation failed: No such file or directory [2011-03-30 13:29:22.29190] W [client3_1-fops.c:5037:client3_1_readdir] 0-hosdu-client-3: (36144048): failed to get fd ctx. EBADFD [2011-03-30 13:29:22.29215] W [client3_1-fops.c:5102:client3_1_readdir] 0-hosdu-client-3: failed to send the fop: File descriptor in bad state [2011-03-30 13:29:22.29482] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] 0-hosdu-replicate-1: entry self-heal triggered. path: /fileop_L1_33/fileop_L1_33_L2_28/fileop_dir_33_28_19, reason: checksums of directory differ, forced merge option set [2011-03-30 13:29:22.29856] I [client3_1-fops.c:1300:client3_1_entrylk_cbk] 0-hosdu-client-3: remote operation failed: No such file or directory [2011-03-30 13:29:22.30586] W [afr-common.c:110:afr_set_split_brain] (-->/usr/local/lib/glusterfs/3.2.0qa5/xlator/cluster/replicate.so(afr_sh_post_nonblocking_entry_cbk+0xab) [0x7fbba92b34dc] (-->/usr/local/lib/glusterfs/3.2.0qa5/xlator/cluster/replicate.so(afr_sh_entry_done+0xf1) [0x7fbba92aa6eb] (-->/usr/local/lib/glusterfs/3.2.0qa5/xlator/cluster/replicate.so(afr_self_heal_completion_cbk+0xcc) [0x7fbba92a68ac]))) 0-hosdu-replicate-1: invalid argument: inode [2011-03-30 13:29:22.30617] I [afr-self-heal-common.c:1532:afr_self_heal_completion_cbk] 0-hosdu-replicate-1: background entry self-heal completed on /fileop_L1_33/fileop_L1_33_L2_28/fileop_dir_33_28_19 Due to which the brick is not getting any i/o as afr sends requests only to the other brick in the pair. Closing the bug. |