Bug 1118442
Summary: | arequal-checksum of mount after gluster compilation mismatches with subsequent arequal-checksum after heal. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | spandura |
Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> |
Status: | CLOSED NOTABUG | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.0 | CC: | rhs-bugs, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-07-18 05:10:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
spandura
2014-07-10 17:52:00 UTC
Shwetha, These are some initial observations I made. Need your inputs about how the script works. >:: [ 10:25:11 ] :: Total Number of files and directories in the volume : 3554 Seems like the expected number of entries in the volume is 3554 >:: [ 10:26:31 ] :: arequal checksum of before >Entry counts >Regular files : 3100 >Directories : 386 >Symbolic links : 67 >Other : 0 >Total : 3553 >:: [ 10:26:31 ] :: arequal checksum of after >Entry counts >Regular files : 3101 >Directories : 386 >Symbolic links : 67 >Other : 0 >Total : 3554 According to these numbers, number of entries seems to be correct only after the self-heal. I think the question we need to ask now is why is the number '1' less before the self-heal started? I see one more log: :: [ 10:26:39 ] :: Checking if there are additional entries under /gluster-mount after to before :: [ FAIL ] :: Additional entries found under /gluster-mount after to before :: [ 10:26:39 ] :: Listing all the Additional entries found under /gluster-mount after to before +/gluster-mount/rhsauto048.lab.eng.blr.redhat.com_gluster-mount_collect_entries_after.log :: [ 10:26:47 ] :: Total Number of files and directories in the volume : 3555 I am not sure about how the test-script is implemented completely but it seems like the test also is creating files on the mount point? Although the test above says there are additional files, is it possible that the additional file is created by the 'test-run' itself? I see the following log in glusterd logs: rhsauto035-2014070823081404860924/var/log/glusterfs/etc-glusterfs-glusterd.vol.log:[2014-07-08 10:25:30.488695] E [glusterd-syncop.c:1204:gd_stage_op_phase] 0-management: Staging of operation 'Volume Heal' failed on localhost : Command not supported. Please use "gluster volume heal healtest info" and logs to find the heal information. Are we executing any info healed/heal-failed commands? If we look at 'rhsauto035-2014070823081404860924/var/log/glusterfs/glfsheal-healtest.log' The successive executions of 'heal info' commands seem to be happening 16 minutes apart. How does the script monitor the progress of self-heals? [2014-07-08 10:09:57.244794] W [client-rpc-fops.c:2758:client3_3_lookup_cbk] 0-healtest-client-0: remote operation failed: No such file or directory. P ath: bfc2e706-8d1f-4ac0-ac06-ac418aa15176 (bfc2e706-8d1f-4ac0-ac06-ac418aa15176) [2014-07-08 10:09:57.245535] I [afr-self-heal-common.c:2147:afr_sh_post_nb_entrylk_missing_entry_sh_cbk] 0-healtest-replicate-0: Non blocking entrylks failed. [2014-07-08 10:25:25.198457] I [dht-shared.c:334:dht_init_regex] 0-healtest-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ [2014-07-08 10:25:25.211309] I [glfs-master.c:93:notify] 0-gfapi: New graph 72687361-7574-6f30-3335-2e6c61622e65 (0) coming up [2014-07-08 10:25:25.211358] I [client.c:2280:notify] 0-healtest-client-0: parent translators are ready, attempting connect on transport Pranith. Pranith, The number of entries on mount i.e the output : :: [ 10:26:47 ] :: Total Number of files and directories in the volume : 3555 is calculated using the command : "number_of_entries=$(find "${MOUNT_POINT_CLIENT}" | wc -l)" 1) We do execute "healed" and "healed-failed" to check it outputs "Command Not Supported". 2) "heal info" is executed multiple times. a. Before bringing the brick online we execute "heal info" to get the summary i.e number of entries to heal b. Once the brick is brought online, we monitor the "indices/xattrop/" directory of each brick in the volume until it becomes zero or it reaches monitor timeout period i.e 30 minutes. c. we once again execute "heal info" to check for "Number of entries" : "0" from all the bricks so that there are no pending self-heals. Shwetha and I observed that one of the files is being created on the mount in between the runs which could cause arequal mismatch. Shwetha made the changes to the script and started the run just to make sure the issue is not because of this. Will resume debugging after that run based on the results. The Automation script was creating an extra file on the mount after the compilation. Fixed the script to create the file in a separate directory than mount point itself. Arequal-checksums now matches after compilation with subsequent arequal-checksum after heal. Moving the bug to "CLOSED" state. Link to the JOb which ran the test case: https://beaker.engineering.redhat.com/jobs/697049 |