Description of problem ====================== When I enable bitrot detection on disperse or arbiter volume, and try to break a file by directly editing it one one (randomly selected) brick, scrub ondemand doesn't catch the problem. Version-Release =============== # rpm -qa | grep gluster | sort glusterfs-3.8.4-52.el7rhgs.x86_64 glusterfs-api-3.8.4-52.el7rhgs.x86_64 glusterfs-cli-3.8.4-52.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-52.el7rhgs.x86_64 glusterfs-events-3.8.4-52.el7rhgs.x86_64 glusterfs-fuse-3.8.4-52.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-52.el7rhgs.x86_64 glusterfs-libs-3.8.4-52.el7rhgs.x86_64 glusterfs-rdma-3.8.4-52.el7rhgs.x86_64 glusterfs-server-3.8.4-52.el7rhgs.x86_64 gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64 python-gluster-3.8.4-52.el7rhgs.noarch tendrl-gluster-integration-1.5.4-3.el7rhgs.noarch vdsm-gluster-4.17.33-1.2.el7rhgs.noarch # uname -a Linux mbukatov-usm1-gl1.example.com 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 13 10:46:25 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux How reproducible ================ 2/2 (one for each volume type) Steps to Reproduce ================== 1. Install GlusterFS and configure trusted storage pool on 6 machines 2. Create either disperse or arbiter volume (see example conf. below) 3. Enable bitrot detection on the volume: gluster volume bitrot VOLNAME enable 4. On client machine (with mounted glusterfs volume), create a test file with simple content like this: ``` cd /mnt/VOLNAME echo "this is a test file" > glusternative_bitrot.testfile ``` 5. On some storage machine, run on demand scrubbing for Bitrot Detection on the volume: gluster volume bitrot VOLNAME scrub ondemand 6. Create bitrot problem for the test file. Locate some brick which stores the testing file and edit this file directly on the brick, changing itβs content. So for example: [root@usm1-gl6 ~]# vim /mnt/brick_VOLNAME_3/3/glusternative_bitrot.testfile 7. Rerun scrub on the volume: gluster volume bitrot VOLNAME scrub ondemand Configuration I used for step #2 is available as gdeploy config files here: * https://github.com/usmqe/usmqe-setup/blob/c5501bd719f6bb5c0f322268a556e673c0a08b6e/gdeploy_config/volume_beta_arbiter_2_plus_1x2.create.conf * https://github.com/usmqe/usmqe-setup/blob/c5501bd719f6bb5c0f322268a556e673c0a08b6e/gdeploy_config/volume_gama_disperse_4_plus_2x2.create.conf Actual results ============== After step 4, I verified that bitrot translator calculated the hash (aka signature), looking at the file on some brick: ``` [root@mbukatov-usm1-gl5 ~]# getfattr -d -m . -e hex /mnt/brick_gama_disperse_1/1/glusternative_bitrot.testfile | grep bit-rot getfattr: Removing leading '/' from absolute path names trusted.bit-rot.signature=0x010200000000000000652f43b5db36036e530507441221fbc4bb98754922ad8e1f3a0169bf7520aff7 trusted.bit-rot.version=0x02000000000000005a15a39600005f61 ``` So far, so good (the hash is there as expected). After step 5, which is the 1st scrub ondemand run, I see no problems detected: ``` [root@mbukatov-usm1-gl5 ~]# gluster volume bitrot volume_gama_disperse_4_plus_2x2 scrub ondemand volume bitrot: success ``` But after step 7, when I run scrub for the 2nd time, I see: ``` [root@mbukatov-usm1-gl5 ~]# gluster volume bitrot volume_gama_disperse_4_plus_2x2 scrub ondemand volume bitrot: success ``` This is not expected. Expected results ================ But after step 7, when I run scrub for the 2nd time, I see that scrub reports an error during verification. Additional information ====================== I'm using trusted storage pool made of 6 virtual machines: ``` [root@mbukatov-usm1-gl5 ~]# gluster pool list UUID Hostname State d0062787-0c7f-4fe6-802d-9f5fd32c3f1e mbukatov-usm1-gl1.example.com Connected 45c18000-f754-442d-89d9-0d7632399dd7 mbukatov-usm1-gl2.example.com Connected 7a6b64b5-2f4d-4e72-8d54-afe70b8411d9 mbukatov-usm1-gl3.example.com Connected 674775c9-9f2b-406c-a0d0-280d397a1b02 mbukatov-usm1-gl4.example.com Connected 6310036a-5d7d-483c-9f21-74ea9fc40b01 mbukatov-usm1-gl6.example.com Connected 7f886288-d706-48ef-bf03-f710057870c5 localhost Connected ```
Adding blocker flag. I my understanding of bitrot feature is incorrect, link to the documentation along with explanation and remove blocker flag.
(In reply to Martin Bukatovic from comment #3) > I'm going to retry with distributed volume (which I haven't tried yet, as > this is not part volume configurations we test with). I retested with distrep 6x2 volume[1] and I see the same problem. [1] https://github.com/usmqe/usmqe-setup/blob/master/gdeploy_config/volume_alpha_distrep_6x2.create.conf
Hi Martin, 'gluster volume bitrot <volname> scrub ondemand' reports a success --> that is supposed to be interpreted as: "Triggering the scrubber was a success on <volname>" Whether the scrubber was able to detect any problems or not is supposed to be checked with the command - "gluster volume bitrot <volname> scrub status". After step7, when we run the mentioned command, are we seeing the GFID of the file that was corrupted, under the corresponding node details, along with 'error count' set to 1? If yes, scrubber functionality is working as expected. Whether we need to fix the docs for this, can be discussed further.
(In reply to Sweta Anandpara from comment #7) > 'gluster volume bitrot <volname> scrub ondemand' reports a success --> that > is supposed to be interpreted as: "Triggering the scrubber was a success on > <volname>" > > Whether the scrubber was able to detect any problems or not is supposed to > be checked with the command - "gluster volume bitrot <volname> scrub status". > > After step7, when we run the mentioned command, are we seeing the GFID of > the file that was corrupted, under the corresponding node details, along > with 'error count' set to 1? If yes, scrubber functionality is working as > expected. I have retried with arbiter volume (volume_beta_arbiter_2_plus_1x2) and can see the error being detected in the output of `scrub status`. So it works as expected. I'm sorry for the confusion. I'm going to create followup BZs for docs or other components after additional checking.
Update: I was using upstream documentation when drafting the test case, which doesn't seem to contain the description you kindly provided in comment 7: * http://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#bitrot-detection * http://docs.gluster.org/en/latest/release-notes/3.9.0/#on-demand-scrubbing-for-bitrot-detection I created this upstream issue to get his fixed: https://github.com/gluster/glusterdocs/issues/303
Downstream documentation describes the feature correctly: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/ch15s02 there is no need to create downstream doc BZ.