Bug 1741899
Summary: | the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Sergey Pleshkov <s.pleshkov> |
Component: | replicate | Assignee: | bugs <bugs> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5 | CC: | amukherj, bugs, kdhananj, pkarampu, ravishankar, s.pleshkov |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-25 05:05:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sergey Pleshkov
2019-08-16 11:49:02 UTC
Before replace-brick I have split-brain event, but after nodes up it been healed automaticly > gluster volume replace-brick TST lsy-gl-0(1,2,3):/diskForData/tst lsy-gl-0(1,2,3):/diskForTestData/tst commit force
I assume you ran the replace-brick command thrice, once for each brick. Did you wait for heal count to be zero after each replace-brick? If not, you can end up with incomplete heals.
Hello. Replace brick commands were executed sequentially on all nodes with 12-24 hour pause. Heal count be zero every time. Could you check if there is actual missing data on lsy-gl-03? You might need to compute the checksum of each brick individually. https://github.com/gluster/glusterfs/blob/master/tests/utils/arequal-checksum.c can be used for that. # gcc tests/utils/arequal-checksum.c -o arequal-checksum On each brick, #./arequal-checksum -p /diskForTestData/tst -i .glusterfs (See ./arequal-checksum --help for details). [root@LSY-GL-03 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359953 Directories : 13244 Symbolic links : 511 Other : 0 Total : 373708 Metadata checksums Regular files : 800d132fc8dbd2d3 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : 523a264a8cb047533c6d72eee606bf2 Directories : 4f697d5629707031 Symbolic links : 173f1e2800747538 Other : 0 Total : 9aa921a4bd429a8 [root@LSY-GL-02 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359215 Directories : 13244 Symbolic links : 511 Other : 0 Total : 372970 Metadata checksums Regular files : 8098f54e92802273 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : d992a16c2b695ebaef21668a320a96ac Directories : 52134d6004145c08 Symbolic links : 173f1e2800747538 Other : 0 Total : 739f94ae1d03e126 [root@LSY-GL-01 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359215 Directories : 13244 Symbolic links : 511 Other : 0 Total : 372970 Metadata checksums Regular files : 812d17da8db2d6f3 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : b980694e409c76a1df19442db9576bc1 Directories : 26433d161d1e130e Symbolic links : 173f1e2800747538 Other : 0 Total : 57e50e5de4a17b56 [root@LSY-GL-03 host]# gluster volume heal TST info Brick lsy-gl-01:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-02:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-03:/diskForTestData/tst Status: Connected Number of entries: 0 Did compare of folder ัontent and find this strange anomaly with size of folders and files on bricks. lsy-lg-02 /diskForTestData/tst/.shard/.remove_me: total 48K 0 . 48K .. 0 1b69424e-47ca-44b9-b475-9f073956fd10 0 5459b172-600a-4464-8fcd-8e987a62fb37 /diskForTestData/tst/smb_conf: total 44K 4.0K . 0 .. 8.0K failover-dns.conf 4.0K ganesha.conf 4.0K krb5.conf 4.0K mnt-gvol.mount 4.0K mnt-prod.mount 4.0K mnt-tst.mount 0 mount-restart-scripts 4.0K resolv.conf 4.0K resolv.dnsmasq 4.0K smb.conf 0 user.map lsy-gl-03 /diskForTestData/tst/.shard/.remove_me: total 32K 0 . 32K .. 0 1b69424e-47ca-44b9-b475-9f073956fd10 0 5459b172-600a-4464-8fcd-8e987a62fb37 /diskForTestData/tst/smb_conf: total 80K 4.0K . 0 .. 8.0K failover-dns.conf 8.0K ganesha.conf 8.0K krb5.conf 8.0K mnt-gvol.mount 8.0K mnt-prod.mount 8.0K mnt-tst.mount 0 mount-restart-scripts 8.0K resolv.conf 8.0K resolv.dnsmasq 8.0K smb.conf 4.0K user.map Also find difference in .gluster folder like this: lsy-lg-02 /diskForTestData/tst/.glusterfs/e5/25: total 80K 0 . 12K .. 44K e5250ec5-b28e-4015-a3b3-8c9287b961ef 8.0K e525238c-3ee1-4581-941f-29b50a2159f9 8.0K e5254136-413b-4008-aa2a-871e22fd0e89 8.0K e5257805-e240-401a-a71b-c39718095b9a lsy-gl-03 /diskForTestData/tst/.glusterfs/e5/25: total 65M 0 . 12K .. 44K e5250ec5-b28e-4015-a3b3-8c9287b961ef 8.0K e525238c-3ee1-4581-941f-29b50a2159f9 8.0K e5254136-413b-4008-aa2a-871e22fd0e89 8.0K e5257805-e240-401a-a71b-c39718095b9a 65M e525b876-7fd1-46ba-93fa-293e27db983c Looks like the discrepancy is due to the no. of files (738 to be specific) amongst the bricks. The directories and symlinks and their checksums match on all 3 bricks. The only fix I can think of is to find out (manually) which are the files that differ in size and forcefully trigger a heal on them. You could go through "Hack: How to trigger heal on *any* file/directory" section of my blog-post https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide-part-3/ Also note that sharding is currently supported only for single writer use case, typically for backing store for oVirt. (https://github.com/gluster/glusterfs/issues/290) (In reply to Ravishankar N from comment #8) > Looks like the discrepancy is due to the no. of files (738 to be specific) > amongst the bricks. The directories and symlinks and their checksums match > on all 3 bricks. The only fix I can think of is to find out (manually) which > are the files that differ in size and forcefully trigger a heal on them. You > could go through "Hack: How to trigger heal on *any* file/directory" section > of my blog-post > https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide- > part-3/ Hello Is there any proven way to compare files / folders on two nodes of a glaster to find different files? I tried using the "rsync -rin" command but it turned out to be ineffective for comparison (selects all files in general) (In reply to Sergey Pleshkov from comment #10) > Hello > > Is there any proven way to compare files / folders on two nodes of a glaster > to find different files? > I tried using the "rsync -rin" command but it turned out to be ineffective > for comparison (selects all files in general) To compare just the directory structure (to find what files are missing), maybe you could run `diff <(ssh root@lsy-gl-01 ls -R /diskForTestData/tst) <(ssh root@lsy-gl-02 ls -R /diskForTestData/tst)` etc. after setting up password-less ssh. You would need to ignore the contents of .glusterfs though. Hello Once again, I executed the command to replace the brick for lsy-gl-03 (as a simple way to repair a identity of files on lsy-gl-03): (gluster volume replace-brick TST lsy-gl-03:/diskForData/tst lsy-gl-03:/diskForTestData/tst-fix commit force) it must synchronize all files from live nodes (lsy-gl-01, lsy-gl-03), as i know. But as a result, I again got a discrepancy between the actual sizes on the disk (df -h) [root@LSY-GL-02 host]# df -h Filesystem Size Used Avail Use% Mounted on LSY-GL-02:/TST 500G 115G 385G 23% /mnt/tst /dev/sdc1 500G 110G 390G 22% /diskForTestData [root@LSY-GL-03 host]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sdc1 500G 107G 394G 22% /diskForTestData LSY-GL-03:/TST 500G 115G 385G 23% /mnt/tst I finded diff files by exec a diff command (/diskForTestData/tst is symlink to /diskForTestData/tst-fix): [root@LSY-GL-02 ~]# diff <(ls -Ra /diskForTestData/tst/lsy-tst/) <(ssh host@lsy-gl-03 sudo ls -Ra /diskForTestData/tst/lsy-tst) 1c1 < /diskForTestData/tst/lsy-tst/: --- > /diskForTestData/tst/lsy-tst: 357638a357639,357643 > 00b0d046-1e1c-4088-bb67-527513bd432d.1 > 00b0d046-1e1c-4088-bb67-527513bd432d.2 > 00b0d046-1e1c-4088-bb67-527513bd432d.3 > 00b0d046-1e1c-4088-bb67-527513bd432d.4 > 00b0d046-1e1c-4088-bb67-527513bd432d.5 357644a357650,357652 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.1 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.2 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.3 357652a357661,357663 ..... Also finded a reason, what arequal-checksum command shows a lot more regular files on lsy-gl-03 - it is folder /diskForTestData/tst/lsy-tst/.shard and files in it. But on lsy-gl-03 it have size like 70gb, but on lsy-lg-01,02 - 58gb [root@LSY-GL-03 .shard]# du -sh /diskForTestData/tst/lsy-tst/.shard/ 70G /diskForTestData/tst/lsy-tst/.shard/ [root@LSY-GL-02 host]# du -sh /diskForTestData/tst/lsy-tst/.shard/ 58G /diskForTestData/tst/lsy-tst/.shard/ Also I have folder /diskForTestData/tst/.shard with identical files (hardlinks, i think) What should I do with this situation ? Copy .shard files from lsy-l-03 on lsy-gl-02,01 ? Heal status count zero [root@LSY-GL-03 tst]# gluster volume heal TST info Brick lsy-gl-01:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-02:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-03:/diskForTestData/tst-fix Status: Connected Number of entries: 0 Adding the Sharding maintainer Krutika to the bug for any possible advice on comment#12. (In reply to Ravishankar N from comment #13) > Adding the Sharding maintainer Krutika to the bug for any possible advice on > comment#12. Copying the shards across bricks from the backend is not a good idea. A parallel operation on the file while the copy is going on can lead to inconsistencies. Ravi, Seems like the main issue is replication inconsistency after a replace-brick. Any heal-related errors in the logs? I see cluster.favorite-child-policy set in volume-info. Would it be an issue here? (As an aside, network.ping-timeout is set to 5s and that's really low. @Sergey, you should probably set to a higher value, say 30s or more) -Krutika cluster.favorite-child-policy should not cause any problems w.r.t missing files. Perhaps Sergey can check for errors in glustershd.log. FWIW, I did try out a replace brick with the volume options being the same as this one (and having files > shard size) and the heals were successful. This was on glusterfs-5.5. Errors for what period? Since the last brick replacement (August 28) on lsy-gl-03 - there are no errors in the file glfsheal-TST.log, only informational messages Prior to this, all bricks were replaced sequentially (transfer over a separate disk on the node) - also no heal errros in logs glfsheal-TST.log on all nodes. After that, a problem was seen with the size of the raw data. In file glustershd.log from lsy-gl-03 - exist error messages, but not about TST volume On lsy-gl-01,02 in file glustershd.log many info messages about selfeal operations when replace brick process works Hello Are there any other suggestions or tips for me to do in this situation? Re-create TST volume and copy all data? Find all the file names that are broken into shards and run their forced heal? Also, why do the sizes of .shard folders on nodes differ? By the way, how much overhead should Gluster files take in volume? Now I see a difference of 5 GB on nodes 1 and 2? this is normal ? or should it be less? |