Bug 1233518

Summary: [Backup]: Glusterfind session(s) created before starting the volume results in 'changelog not available' error, eventually
Product: [Community] GlusterFS Reporter: Kotresh HR <khiremat>
Component: glusterfindAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact: bugs <bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.0CC: avishwan, bugs, khiremat, mchangir, sanandpa
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1232729 Environment:
Last Closed: 2015-07-30 09:51:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1228598, 1232729    
Bug Blocks: 1223636    

Description Kotresh HR 2015-06-19 05:38:01 UTC
+++ This bug was initially created as a clone of Bug #1232729 +++

+++ This bug was initially created as a clone of Bug #1228598 +++

Description of problem:

Have a 4+2 disperse volume 'disperse' and create glusterfind sessions for the same. Mount it over nfs/fuse and create a couple of files/dirs. Execute the glusterfind pre command it fails with 'Historical changelogs not available' on both the nodes of the cluster


[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# gluster v status dispersevol
Status of volume: dispersevol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.140:/rhs/thinbrick1/disperse 49188     0          Y       21579
Brick 10.70.43.140:/rhs/thinbrick2/disperse 49189     0          Y       21599
Brick 10.70.42.75:/rhs/thinbrick2/disperse  49184     0          Y       17892
Brick 10.70.42.75:/rhs/thinbrick1/disperse  49185     0          Y       17912
Brick 10.70.43.140:/rhs/thinbrick3/disperse 49190     0          Y       21619
Brick 10.70.42.75:/rhs/thinbrick3/disperse  49186     0          Y       17932
NFS Server on localhost                     2049      0          Y       21640
NFS Server on 10.70.42.75                   2049      0          Y       17953
 
Task Status of Volume dispersevol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp43-140 ~]# mkdir /mnt/disperse
[root@dhcp43-140 ~]# mount -t glusterfs 10.70.43.140:/disperse /mnt/disperse
Mount failed. Please check the log file for more details.
[root@dhcp43-140 ~]# mount -t glusterfs 10.70.43.140:/dispersevol /mnt/disperse
[root@dhcp43-140 ~]# cd /mnt/disperse
[root@dhcp43-140 disperse]# ls -a
.  ..  .trashcan
[root@dhcp43-140 disperse]# 
[root@dhcp43-140 disperse]# 
[root@dhcp43-140 disperse]# 
[root@dhcp43-140 disperse]# echo "what a day.. hmph" > test1
[root@dhcp43-140 disperse]# mkdir dir2
[root@dhcp43-140 disperse]# ln test1 test1_ln
[root@dhcp43-140 disperse]# echo "what a day.. hmph" > test2
[root@dhcp43-140 disperse]# ln -s test2 test2_sln
[root@dhcp43-140 disperse]# cd
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# gluster v info dispersevol
 
Volume Name: dispersevol
Type: Disperse
Volume ID: e70dccdf-9f39-494a-a166-aa142049de07
Status: Created
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.43.140:/rhs/thinbrick1/disperse
Brick2: 10.70.43.140:/rhs/thinbrick2/disperse
Brick3: 10.70.42.75:/rhs/thinbrick2/disperse
Brick4: 10.70.42.75:/rhs/thinbrick1/disperse
Brick5: 10.70.43.140:/rhs/thinbrick3/disperse
Brick6: 10.70.42.75:/rhs/thinbrick3/disperse
Options Reconfigured:
changelog.changelog: on
storage.build-pgfid: on
performance.readdir-ahead: on
[root@dhcp43-140 ~]# gluster v start dispersevol
volume start: dispersevol: success
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# glusterfind list
SESSION                   VOLUME                    SESSION TIME             
---------------------------------------------------------------------------
testvol_sess              testvol                   2015-06-04 23:38:02      
cross32                   cross3vol                 2015-06-04 22:46:01      
sessd2                    dispersevol               2015-06-05 19:44:31      
sessd3                    dispersevol               2015-06-05 19:44:39      
sessd1                    dispersevol               2015-06-05 19:44:23      
cross31                   cross3vol                 2015-06-05 00:00:28      
sessc                     cross3vol                 2015-06-05 15:44:47      
[root@dhcp43-140 ~]# 
[root@dhcp43-140 ~]# glusterfind pre sessd1 dispersevol outd1.txt --regenerate-outfile
10.70.43.140 - pre failed: /rhs/thinbrick2/disperse Historical Changelogs not available: [Errno 2] No such file or directory

10.70.43.140 - pre failed: /rhs/thinbrick1/disperse Historical Changelogs not available: [Errno 2] No such file or directory

10.70.43.140 - pre failed: /rhs/thinbrick3/disperse Historical Changelogs not available: [Errno 2] No such file or directory

10.70.42.75 - pre failed: /rhs/thinbrick2/disperse Historical Changelogs not available: [Errno 2] No such file or directory

10.70.42.75 - pre failed: /rhs/thinbrick1/disperse Historical Changelogs not available: [Errno 2] No such file or directory

10.70.42.75 - pre failed: /rhs/thinbrick3/disperse Historical Changelogs not available: [Errno 2] No such file or directory

Generated output file /root/outd1.txt
[root@dhcp43-140 ~]#

***********************************************************88
This is what is seen in the backend brick logs:


[2015-06-05 14:32:15.422293] I [rpcsvc.c:2213:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2015-06-05 14:32:15.422417] W [options.c:936:xl_opt_validate] 0-dispersevol-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction
[2015-06-05 14:32:15.425732] E [changelog-helpers.c:634:htime_open] 0-dispersevol-changelog: Error extracting HTIME_CURRENT: No data available.
[2015-06-05 14:32:15.427484] E [ctr-helper.c:250:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is disabled.
...
...
...
[2015-06-05 14:42:29.646309] I [server-handshake.c:585:server_setvolume] 0-dispersevol-server: accepted client from dhcp43-140.lab.eng.blr.redhat.com-21847-2015/06/05-14:42:29:558126-dispersevol-client-0-0-0 (version: 3.7.0)
[2015-06-05 14:42:53.867949] E [posix-helpers.c:1088:posix_handle_pair] 0-dispersevol-posix: /rhs/thinbrick1/disperse/dir2: key:glusterfs.inodelk-dom-count flags: 1 length:23 error:Operation not supported
[2015-06-05 14:42:53.868063] E [posix.c:1391:posix_mkdir] 0-dispersevol-posix: setting xattrs on /rhs/thinbrick1/disperse/dir2 failed (Operation not supported)
[2015-06-05 14:43:19.746591] E [posix-helpers.c:1088:posix_handle_pair] 0-dispersevol-posix: /rhs/thinbrick1/disperse/test2_sln: key:glusterfs.inodelk-dom-count flags: 1 length:23 error:Operation not supported
[2015-06-05 14:43:19.746643] E [posix.c:1897:posix_symlink] 0-dispersevol-posix: setting xattrs on /rhs/thinbrick1/disperse/test2_sln failed (Operation not supported)
[2015-06-05 14:44:27.716197] I [server-handshake.c:585:server_setvolume] 0-dispersevol-server: accepted client from dhcp42-75.lab.eng.blr.redhat.com-18098-2015/06/05-14:44:27:456748-dispersevol-client-0-0-0 (version: 3.7.0)
[2015-06-05 14:44:56.663817] W [socket.c:642:__socket_rwv] 0-dispersevol-changelog: readv on /var/run/gluster/.0ea84248fa0f63dc12f2a29bd7f49b5921945.sock failed (No data available)

*********************************************************



Version-Release number of selected component (if applicable):


How reproducible: 1:1

RCA:
---

This is nothing to do with disperse volume. It should happen with any other
volume if it is done in following way.

1. gluster volume is created first
2. Immediately glusterfind session is created which updates cuurent time as
   say 't1' as start time for changelogs available time.
   Even though, the changelog is enabled (marked 'on' in vol file), backend  
   .glusterfs and actual changelog files gets created during volume start.
3. gluster volume is started.
   The changelog and HITIME.TSTAMP gets created during volume start and the
   TSTAMP is current time say 't1+n'

  If above is the case, then glustefind pre is requesting History API with
  start time as 't1' where as the changelog is actually available from 't1+s'
  which will always fail.

Solution:
--------
With the following patches which is already merged, the glusterfind session creation will fail unless volume is oneline. So that would fix the above issue


As mentioned in the RCA, changing the title of this bug.

Also, the patch mentioned above fixes the issue only when the status of the volume is in 'stopped' state. The issue still persists if the volume status is 'created'.

Comment 1 Anand Avati 2015-06-19 05:56:00 UTC
REVIEW: http://review.gluster.org/11322 (tools/glusterfind: Fail glusterfind creation if volume is offline) posted (#1) for review on release-3.7 by Kotresh HR (khiremat)

Comment 2 Anand Avati 2015-06-19 08:10:10 UTC
REVIEW: http://review.gluster.org/11322 (tools/glusterfind: Fail glusterfind creation if volume is offline) posted (#2) for review on release-3.7 by Kotresh HR (khiremat)

Comment 3 Anand Avati 2015-06-24 06:16:46 UTC
COMMIT: http://review.gluster.org/11322 committed in release-3.7 by Venky Shankar (vshankar) 
------
commit 0f6e81313048fb2be9b25765a45c612912fea123
Author: Kotresh HR <khiremat>
Date:   Wed Jun 17 17:05:22 2015 +0530

    tools/glusterfind: Fail glusterfind creation if volume is offline
    
    Following two fixes are done.
    
    1. Fail glusterfind session creation if volume is not online
       even before session directories are created. This avoids
       'glusterfind list' to pick the session directories and show
       status as 'Session Corrupted'.
    
    2. Check of '!Started' instead of wether the volume is 'Stopped'.
       It covers all the cases.
    
    BUG: 1233518
    Change-Id: Ie01a87500578b9cc3eb72aabd0f24f632fbee58f
    Reviewed-on: http://review.gluster.org/11278
    Signed-off-by: Kotresh HR <khiremat>
    Reviewed-on: http://review.gluster.org/11322
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Aravinda VK <avishwan>

Comment 4 Kaushal 2015-07-30 09:51:58 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.3, please open a new bug report.

glusterfs-3.7.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12078
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user