Description of problem: I have a gluster volume on 3 nodes (replicate) with following configuration [root@LSY-GL-0(1,2,3) /]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.6 (Maipo) [root@LSY-GL-02 host]# gluster volume info TST Volume Name: TST Type: Replicate Volume ID: a96c7b8c-61ec-4a4d-b47e-b445faf6c39b Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: lsy-gl-01:/diskForTestData/tst Brick2: lsy-gl-02:/diskForTestData/tst Brick3: lsy-gl-03:/diskForTestData/tst Options Reconfigured: cluster.favorite-child-policy: size features.shard-block-size: 64MB features.shard: on performance.io-thread-count: 24 client.event-threads: 24 server.event-threads: 24 server.allow-insecure: on network.ping-timeout: 5 transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.heal-timeout: 120 Recently this volume been moved to other disk by command gluster volume replace-brick TST lsy-gl-0(1,2,3):/diskForData/tst lsy-gl-0(1,2,3):/diskForTestData/tst commit force sequentialy, started with lsy-gl-03 node, all nodes been online And now i have this state [root@LSY-GL-02 host]# gluster volume status TST detail Status of volume: TST ------------------------------------------------------------------------------ Brick : Brick lsy-gl-01:/diskForTestData/tst TCP Port : 49154 RDMA Port : 0 Online : Y Pid : 7555 File System : xfs Device : /dev/sdc1 Mount Options : rw,seclabel,relatime,attr2,inode64,noquota Inode Size : 512 Disk Space Free : 399.9GB Total Disk Space : 499.8GB Inode Count : 262143488 Free Inodes : 261684925 ------------------------------------------------------------------------------ Brick : Brick lsy-gl-02:/diskForTestData/tst TCP Port : 49154 RDMA Port : 0 Online : Y Pid : 25732 File System : xfs Device : /dev/sdc1 Mount Options : rw,seclabel,relatime,attr2,inode64,noquota Inode Size : 512 Disk Space Free : 399.9GB Total Disk Space : 499.8GB Inode Count : 262143488 Free Inodes : 261684925 ------------------------------------------------------------------------------ Brick : Brick lsy-gl-03:/diskForTestData/tst TCP Port : 49154 RDMA Port : 0 Online : Y Pid : 25243 File System : xfs Device : /dev/sdc1 Mount Options : rw,seclabel,relatime,attr2,inode64,noquota Inode Size : 512 Disk Space Free : 357.6GB Total Disk Space : 499.8GB Inode Count : 262143488 Free Inodes : 261684112 [root@LSY-GL-02 host]# gluster volume heal TST full Launching heal operation to perform full self heal on volume TST has been successful Use heal info commands to check status. [root@LSY-GL-02 host]# gluster volume heal TST info Brick lsy-gl-01:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-02:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-03:/diskForTestData/tst Status: Connected Number of entries: 0 [root@LSY-GL-01 /]# df -Th Filesystem Type Size Used Avail Use% Mounted on LSY-GL-01:/TST fuse.glusterfs 500G 148G 353G 30% /mnt/tst /dev/sdc1 xfs 500G 100G 400G 20% /diskForTestData [root@LSY-GL-02 host]# df -Th Filesystem Type Size Used Avail Use% Mounted on LSY-GL-02:/TST fuse.glusterfs 500G 148G 353G 30% /mnt/tst /dev/sdc1 xfs 500G 100G 400G 20% /diskForTestData [root@LSY-GL-03 host]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sdc1 xfs 500G 143G 358G 29% /diskForTestData LSY-GL-03:/TST fuse.glusterfs 500G 148G 353G 30% /mnt/tst Version-Release number of selected component (if applicable): [root@LSY-GL-0(1,2,3) /]# rpm -qa | grep gluster* glusterfs-libs-5.5-1.el7.x86_64 glusterfs-fuse-5.5-1.el7.x86_64 glusterfs-client-xlators-5.5-1.el7.x86_64 centos-release-gluster5-1.0-1.el7.centos.noarch glusterfs-api-5.5-1.el7.x86_64 glusterfs-cli-5.5-1.el7.x86_64 nfs-ganesha-gluster-2.7.1-1.el7.x86_64 glusterfs-5.5-1.el7.x86_64 glusterfs-server-5.5-1.el7.x86_64 How reproducible: Umm, I will test it again soon and do comment Steps to Reproduce: 1. 2. 3. Actual results: Size of brick on lsy-gl-01, and lsy-gl-02 differ from size brick on lsy-gl-03. Healing full not fixed this situation Expected results: What things I should do to fix it ? Additional info:
Before replace-brick I have split-brain event, but after nodes up it been healed automaticly
> gluster volume replace-brick TST lsy-gl-0(1,2,3):/diskForData/tst lsy-gl-0(1,2,3):/diskForTestData/tst commit force I assume you ran the replace-brick command thrice, once for each brick. Did you wait for heal count to be zero after each replace-brick? If not, you can end up with incomplete heals.
Hello. Replace brick commands were executed sequentially on all nodes with 12-24 hour pause. Heal count be zero every time.
Could you check if there is actual missing data on lsy-gl-03? You might need to compute the checksum of each brick individually. https://github.com/gluster/glusterfs/blob/master/tests/utils/arequal-checksum.c can be used for that. # gcc tests/utils/arequal-checksum.c -o arequal-checksum On each brick, #./arequal-checksum -p /diskForTestData/tst -i .glusterfs (See ./arequal-checksum --help for details).
[root@LSY-GL-03 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359953 Directories : 13244 Symbolic links : 511 Other : 0 Total : 373708 Metadata checksums Regular files : 800d132fc8dbd2d3 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : 523a264a8cb047533c6d72eee606bf2 Directories : 4f697d5629707031 Symbolic links : 173f1e2800747538 Other : 0 Total : 9aa921a4bd429a8 [root@LSY-GL-02 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359215 Directories : 13244 Symbolic links : 511 Other : 0 Total : 372970 Metadata checksums Regular files : 8098f54e92802273 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : d992a16c2b695ebaef21668a320a96ac Directories : 52134d6004145c08 Symbolic links : 173f1e2800747538 Other : 0 Total : 739f94ae1d03e126 [root@LSY-GL-01 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359215 Directories : 13244 Symbolic links : 511 Other : 0 Total : 372970 Metadata checksums Regular files : 812d17da8db2d6f3 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : b980694e409c76a1df19442db9576bc1 Directories : 26433d161d1e130e Symbolic links : 173f1e2800747538 Other : 0 Total : 57e50e5de4a17b56
[root@LSY-GL-03 host]# gluster volume heal TST info Brick lsy-gl-01:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-02:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-03:/diskForTestData/tst Status: Connected Number of entries: 0
Did compare of folder сontent and find this strange anomaly with size of folders and files on bricks. lsy-lg-02 /diskForTestData/tst/.shard/.remove_me: total 48K 0 . 48K .. 0 1b69424e-47ca-44b9-b475-9f073956fd10 0 5459b172-600a-4464-8fcd-8e987a62fb37 /diskForTestData/tst/smb_conf: total 44K 4.0K . 0 .. 8.0K failover-dns.conf 4.0K ganesha.conf 4.0K krb5.conf 4.0K mnt-gvol.mount 4.0K mnt-prod.mount 4.0K mnt-tst.mount 0 mount-restart-scripts 4.0K resolv.conf 4.0K resolv.dnsmasq 4.0K smb.conf 0 user.map lsy-gl-03 /diskForTestData/tst/.shard/.remove_me: total 32K 0 . 32K .. 0 1b69424e-47ca-44b9-b475-9f073956fd10 0 5459b172-600a-4464-8fcd-8e987a62fb37 /diskForTestData/tst/smb_conf: total 80K 4.0K . 0 .. 8.0K failover-dns.conf 8.0K ganesha.conf 8.0K krb5.conf 8.0K mnt-gvol.mount 8.0K mnt-prod.mount 8.0K mnt-tst.mount 0 mount-restart-scripts 8.0K resolv.conf 8.0K resolv.dnsmasq 8.0K smb.conf 4.0K user.map Also find difference in .gluster folder like this: lsy-lg-02 /diskForTestData/tst/.glusterfs/e5/25: total 80K 0 . 12K .. 44K e5250ec5-b28e-4015-a3b3-8c9287b961ef 8.0K e525238c-3ee1-4581-941f-29b50a2159f9 8.0K e5254136-413b-4008-aa2a-871e22fd0e89 8.0K e5257805-e240-401a-a71b-c39718095b9a lsy-gl-03 /diskForTestData/tst/.glusterfs/e5/25: total 65M 0 . 12K .. 44K e5250ec5-b28e-4015-a3b3-8c9287b961ef 8.0K e525238c-3ee1-4581-941f-29b50a2159f9 8.0K e5254136-413b-4008-aa2a-871e22fd0e89 8.0K e5257805-e240-401a-a71b-c39718095b9a 65M e525b876-7fd1-46ba-93fa-293e27db983c
Looks like the discrepancy is due to the no. of files (738 to be specific) amongst the bricks. The directories and symlinks and their checksums match on all 3 bricks. The only fix I can think of is to find out (manually) which are the files that differ in size and forcefully trigger a heal on them. You could go through "Hack: How to trigger heal on *any* file/directory" section of my blog-post https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide-part-3/
Also note that sharding is currently supported only for single writer use case, typically for backing store for oVirt. (https://github.com/gluster/glusterfs/issues/290)
(In reply to Ravishankar N from comment #8) > Looks like the discrepancy is due to the no. of files (738 to be specific) > amongst the bricks. The directories and symlinks and their checksums match > on all 3 bricks. The only fix I can think of is to find out (manually) which > are the files that differ in size and forcefully trigger a heal on them. You > could go through "Hack: How to trigger heal on *any* file/directory" section > of my blog-post > https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide- > part-3/ Hello Is there any proven way to compare files / folders on two nodes of a glaster to find different files? I tried using the "rsync -rin" command but it turned out to be ineffective for comparison (selects all files in general)
(In reply to Sergey Pleshkov from comment #10) > Hello > > Is there any proven way to compare files / folders on two nodes of a glaster > to find different files? > I tried using the "rsync -rin" command but it turned out to be ineffective > for comparison (selects all files in general) To compare just the directory structure (to find what files are missing), maybe you could run `diff <(ssh root@lsy-gl-01 ls -R /diskForTestData/tst) <(ssh root@lsy-gl-02 ls -R /diskForTestData/tst)` etc. after setting up password-less ssh. You would need to ignore the contents of .glusterfs though.
Hello Once again, I executed the command to replace the brick for lsy-gl-03 (as a simple way to repair a identity of files on lsy-gl-03): (gluster volume replace-brick TST lsy-gl-03:/diskForData/tst lsy-gl-03:/diskForTestData/tst-fix commit force) it must synchronize all files from live nodes (lsy-gl-01, lsy-gl-03), as i know. But as a result, I again got a discrepancy between the actual sizes on the disk (df -h) [root@LSY-GL-02 host]# df -h Filesystem Size Used Avail Use% Mounted on LSY-GL-02:/TST 500G 115G 385G 23% /mnt/tst /dev/sdc1 500G 110G 390G 22% /diskForTestData [root@LSY-GL-03 host]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sdc1 500G 107G 394G 22% /diskForTestData LSY-GL-03:/TST 500G 115G 385G 23% /mnt/tst I finded diff files by exec a diff command (/diskForTestData/tst is symlink to /diskForTestData/tst-fix): [root@LSY-GL-02 ~]# diff <(ls -Ra /diskForTestData/tst/lsy-tst/) <(ssh host@lsy-gl-03 sudo ls -Ra /diskForTestData/tst/lsy-tst) 1c1 < /diskForTestData/tst/lsy-tst/: --- > /diskForTestData/tst/lsy-tst: 357638a357639,357643 > 00b0d046-1e1c-4088-bb67-527513bd432d.1 > 00b0d046-1e1c-4088-bb67-527513bd432d.2 > 00b0d046-1e1c-4088-bb67-527513bd432d.3 > 00b0d046-1e1c-4088-bb67-527513bd432d.4 > 00b0d046-1e1c-4088-bb67-527513bd432d.5 357644a357650,357652 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.1 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.2 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.3 357652a357661,357663 ..... Also finded a reason, what arequal-checksum command shows a lot more regular files on lsy-gl-03 - it is folder /diskForTestData/tst/lsy-tst/.shard and files in it. But on lsy-gl-03 it have size like 70gb, but on lsy-lg-01,02 - 58gb [root@LSY-GL-03 .shard]# du -sh /diskForTestData/tst/lsy-tst/.shard/ 70G /diskForTestData/tst/lsy-tst/.shard/ [root@LSY-GL-02 host]# du -sh /diskForTestData/tst/lsy-tst/.shard/ 58G /diskForTestData/tst/lsy-tst/.shard/ Also I have folder /diskForTestData/tst/.shard with identical files (hardlinks, i think) What should I do with this situation ? Copy .shard files from lsy-l-03 on lsy-gl-02,01 ? Heal status count zero [root@LSY-GL-03 tst]# gluster volume heal TST info Brick lsy-gl-01:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-02:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-03:/diskForTestData/tst-fix Status: Connected Number of entries: 0
Adding the Sharding maintainer Krutika to the bug for any possible advice on comment#12.
(In reply to Ravishankar N from comment #13) > Adding the Sharding maintainer Krutika to the bug for any possible advice on > comment#12. Copying the shards across bricks from the backend is not a good idea. A parallel operation on the file while the copy is going on can lead to inconsistencies. Ravi, Seems like the main issue is replication inconsistency after a replace-brick. Any heal-related errors in the logs? I see cluster.favorite-child-policy set in volume-info. Would it be an issue here? (As an aside, network.ping-timeout is set to 5s and that's really low. @Sergey, you should probably set to a higher value, say 30s or more) -Krutika
cluster.favorite-child-policy should not cause any problems w.r.t missing files. Perhaps Sergey can check for errors in glustershd.log. FWIW, I did try out a replace brick with the volume options being the same as this one (and having files > shard size) and the heals were successful. This was on glusterfs-5.5.
Errors for what period? Since the last brick replacement (August 28) on lsy-gl-03 - there are no errors in the file glfsheal-TST.log, only informational messages Prior to this, all bricks were replaced sequentially (transfer over a separate disk on the node) - also no heal errros in logs glfsheal-TST.log on all nodes. After that, a problem was seen with the size of the raw data. In file glustershd.log from lsy-gl-03 - exist error messages, but not about TST volume
On lsy-gl-01,02 in file glustershd.log many info messages about selfeal operations when replace brick process works
Hello Are there any other suggestions or tips for me to do in this situation? Re-create TST volume and copy all data? Find all the file names that are broken into shards and run their forced heal? Also, why do the sizes of .shard folders on nodes differ? By the way, how much overhead should Gluster files take in volume? Now I see a difference of 5 GB on nodes 1 and 2? this is normal ? or should it be less?