Bug 1516484 - [bitrot] scrub doesn't catch file manually changed on one of bricks for disperse or arbiter volumes
Summary: [bitrot] scrub doesn't catch file manually changed on one of bricks for dispe...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: bitrot
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Kotresh HR
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-22 17:43 UTC by Martin Bukatovic
Modified: 2017-11-25 16:57 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-25 15:48:59 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1517463 0 low CLOSED [bitrot] scrub ondemand reports it's start as success without additional detail 2021-02-22 00:41:40 UTC

Internal Links: 1517463

Description Martin Bukatovic 2017-11-22 17:43:03 UTC
Description of problem
======================

When I enable bitrot detection on disperse or arbiter volume, and try to
break a file by directly editing it one one (randomly selected) brick, 
scrub ondemand doesn't catch the problem.

Version-Release
===============

# rpm -qa | grep gluster | sort
glusterfs-3.8.4-52.el7rhgs.x86_64
glusterfs-api-3.8.4-52.el7rhgs.x86_64
glusterfs-cli-3.8.4-52.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-52.el7rhgs.x86_64
glusterfs-events-3.8.4-52.el7rhgs.x86_64
glusterfs-fuse-3.8.4-52.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-52.el7rhgs.x86_64
glusterfs-libs-3.8.4-52.el7rhgs.x86_64
glusterfs-rdma-3.8.4-52.el7rhgs.x86_64
glusterfs-server-3.8.4-52.el7rhgs.x86_64
gluster-nagios-addons-0.2.9-1.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.2.0-14.el7_4.3.x86_64
python-gluster-3.8.4-52.el7rhgs.noarch
tendrl-gluster-integration-1.5.4-3.el7rhgs.noarch
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch

# uname -a
Linux mbukatov-usm1-gl1.example.com 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 13 10:46:25 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

How reproducible
================

2/2 (one for each volume type)

Steps to Reproduce
==================

1. Install GlusterFS and configure trusted storage pool on 6 machines

2. Create either disperse or arbiter volume (see example conf. below)

3. Enable bitrot detection on the volume:
   gluster volume bitrot VOLNAME enable

4. On client machine (with mounted glusterfs volume), create a test file
   with simple content like this:

   ```
   cd /mnt/VOLNAME
   echo "this is a test file" > glusternative_bitrot.testfile
   ```

5. On some storage machine, run on demand scrubbing for Bitrot Detection
   on the volume:

   gluster volume bitrot VOLNAME scrub ondemand

6. Create bitrot problem for the test file.

   Locate some brick which stores the testing file and edit this file
   directly on the brick, changing it’s content. So for example:

   [root@usm1-gl6 ~]# vim /mnt/brick_VOLNAME_3/3/glusternative_bitrot.testfile

7. Rerun scrub on the volume:

   gluster volume bitrot VOLNAME scrub ondemand

Configuration I used for step #2 is available as gdeploy config files here:

* https://github.com/usmqe/usmqe-setup/blob/c5501bd719f6bb5c0f322268a556e673c0a08b6e/gdeploy_config/volume_beta_arbiter_2_plus_1x2.create.conf
* https://github.com/usmqe/usmqe-setup/blob/c5501bd719f6bb5c0f322268a556e673c0a08b6e/gdeploy_config/volume_gama_disperse_4_plus_2x2.create.conf

Actual results
==============

After step 4, I verified that bitrot translator calculated the hash (aka
signature), looking at the file on some brick:

```
[root@mbukatov-usm1-gl5 ~]# getfattr  -d  -m  .  -e  hex /mnt/brick_gama_disperse_1/1/glusternative_bitrot.testfile | grep bit-rot
getfattr: Removing leading '/' from absolute path names
trusted.bit-rot.signature=0x010200000000000000652f43b5db36036e530507441221fbc4bb98754922ad8e1f3a0169bf7520aff7
trusted.bit-rot.version=0x02000000000000005a15a39600005f61
```

So far, so good (the hash is there as expected).

After step 5, which is the 1st scrub ondemand run, I see no problems detected:

```
[root@mbukatov-usm1-gl5 ~]# gluster volume bitrot volume_gama_disperse_4_plus_2x2 scrub ondemand
volume bitrot: success
```

But after step 7, when I run scrub for the 2nd time, I see:

```
[root@mbukatov-usm1-gl5 ~]# gluster volume bitrot volume_gama_disperse_4_plus_2x2 scrub ondemand
volume bitrot: success
```

This is not expected.

Expected results
================

But after step 7, when I run scrub for the 2nd time, I see that scrub reports an error during
verification.

Additional information
======================

I'm using trusted storage pool made of 6 virtual machines:

```
[root@mbukatov-usm1-gl5 ~]# gluster pool list
UUID					Hostname                        State
d0062787-0c7f-4fe6-802d-9f5fd32c3f1e	mbukatov-usm1-gl1.example.com   Connected 
45c18000-f754-442d-89d9-0d7632399dd7	mbukatov-usm1-gl2.example.com	Connected 
7a6b64b5-2f4d-4e72-8d54-afe70b8411d9	mbukatov-usm1-gl3.example.com	Connected 
674775c9-9f2b-406c-a0d0-280d397a1b02	mbukatov-usm1-gl4.example.com	Connected 
6310036a-5d7d-483c-9f21-74ea9fc40b01	mbukatov-usm1-gl6.example.com	Connected 
7f886288-d706-48ef-bf03-f710057870c5	localhost                       Connected 
```

Comment 4 Martin Bukatovic 2017-11-22 17:57:33 UTC
Adding blocker flag. I my understanding of bitrot feature is incorrect, link to
the documentation along with explanation and remove blocker flag.

Comment 6 Martin Bukatovic 2017-11-23 09:26:48 UTC
(In reply to Martin Bukatovic from comment #3)
> I'm going to retry with distributed volume (which I haven't tried yet, as
> this is not part volume configurations we test with).

I retested with distrep 6x2 volume[1] and I see the same problem.

[1] https://github.com/usmqe/usmqe-setup/blob/master/gdeploy_config/volume_alpha_distrep_6x2.create.conf

Comment 7 Sweta Anandpara 2017-11-24 04:43:44 UTC
Hi Martin,

'gluster volume bitrot <volname> scrub ondemand' reports a success --> that is supposed to be interpreted as: "Triggering the scrubber was a success on <volname>"

Whether the scrubber was able to detect any problems or not is supposed to be checked with the command - "gluster volume bitrot <volname> scrub status".

After step7, when we run the mentioned command, are we seeing the GFID of the file that was corrupted, under the corresponding node details, along with 'error count' set to 1? If yes, scrubber functionality is working as expected.

Whether we need to fix the docs for this, can be discussed further.

Comment 8 Martin Bukatovic 2017-11-25 15:48:59 UTC
(In reply to Sweta Anandpara from comment #7)
> 'gluster volume bitrot <volname> scrub ondemand' reports a success --> that
> is supposed to be interpreted as: "Triggering the scrubber was a success on
> <volname>"
> 
> Whether the scrubber was able to detect any problems or not is supposed to
> be checked with the command - "gluster volume bitrot <volname> scrub status".
> 
> After step7, when we run the mentioned command, are we seeing the GFID of
> the file that was corrupted, under the corresponding node details, along
> with 'error count' set to 1? If yes, scrubber functionality is working as
> expected.

I have retried with arbiter volume (volume_beta_arbiter_2_plus_1x2) and can
see the error being detected in the output of `scrub status`.

So it works as expected. I'm sorry for the confusion.

I'm going to create followup BZs for docs or other
components after additional checking.

Comment 9 Martin Bukatovic 2017-11-25 16:16:18 UTC
Update: I was using upstream documentation when drafting the test case, which
doesn't seem to contain the description you kindly provided in comment 7:

* http://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#bitrot-detection
* http://docs.gluster.org/en/latest/release-notes/3.9.0/#on-demand-scrubbing-for-bitrot-detection

I created this upstream issue to get his fixed: https://github.com/gluster/glusterdocs/issues/303

Comment 10 Martin Bukatovic 2017-11-25 16:25:27 UTC
Downstream documentation describes the feature correctly:

https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/ch15s02

there is no need to create downstream doc BZ.


Note You need to log in before you can comment on or make changes to this bug.