Bug 1270123 - Data Tiering: Database locks observed on tiered volumes on continous writes to a file
Data Tiering: Database locks observed on tiered volumes on continous writes t...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: tiering (Show other bugs)
3.7.6
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: Joseph Elwin Fernandes
bugs@gluster.org
:
: 1265399 (view as bug list)
Depends On: 1240577 1271729
Blocks: 1240569 1260923 1265399 1267242 glusterfs-3.7.6
  Show dependency treegraph
 
Reported: 2015-10-08 22:39 EDT by Joseph Elwin Fernandes
Modified: 2016-06-19 20:01 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.7.6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1240577
Environment:
Last Closed: 2015-11-17 00:59:41 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joseph Elwin Fernandes 2015-10-08 22:39:13 EDT
+++ This bug was initially created as a clone of Bug #1240577 +++

Description of problem:
=======================
When  a file is being continuously modified or written to, data base locks are being observed as below
[2015-07-07 12:46:00.025876] E [MSGID: 101106] [gfdb_sqlite3.c:694:gf_sqlite3_find_recently_chan
ged_files] 0-sqlite3: Failed preparing statment select GF_FILE_TB.GF_ID, (select group_concat( G
F_PID || ',' || FNAME || ',' || FPATH || ',' || W_DEL_FLAG ||',' || LINK_UPDATE , '::') from GF_
FLINK_TB where GF_FILE_TB.GF_ID = GF_FLINK_TB.GF_ID)  from GF_FILE_TB where ((GF_FILE_TB.W_SEC *
 1000000 + GF_FILE_TB.W_MSEC) >= ? ) OR ((GF_FILE_TB.W_READ_SEC * 1000000 + GF_FILE_TB.W_READ_MS
EC) >= ?) : database is locked



Due to this a file getting continuosly written is being demoted unncessarily


Version-Release number of selected component (if applicable):
==========================================================
[root@nchilaka-tier01 ~]# gluster --version
glusterfs 3.7.1 built on Jul  2 2015 21:01:51
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@nchilaka-tier01 ~]# rpm -qa|grep gluster
gluster-nagios-common-0.2.0-1.el6rhs.noarch
vdsm-gluster-4.16.20-1.2.el6rhs.noarch
glusterfs-client-xlators-3.7.1-7.el6rhs.x86_64
glusterfs-server-3.7.1-7.el6rhs.x86_64
nfs-ganesha-gluster-2.2.0-3.el6rhs.x86_64
python-gluster-3.7.1-6.el6rhs.x86_64
glusterfs-3.7.1-7.el6rhs.x86_64
glusterfs-api-3.7.1-7.el6rhs.x86_64
glusterfs-cli-3.7.1-7.el6rhs.x86_64
glusterfs-geo-replication-3.7.1-7.el6rhs.x86_64
glusterfs-rdma-3.7.1-7.el6rhs.x86_64
gluster-nagios-addons-0.2.4-2.el6rhs.x86_64
glusterfs-libs-3.7.1-7.el6rhs.x86_64
glusterfs-fuse-3.7.1-7.el6rhs.x86_64
glusterfs-ganesha-3.7.1-7.el6rhs.x86_64






Steps to Reproduce:
==================
1.create a tiered volume
2.set the tier vol options 
3.now create a file and keep appending lines in a loop as below 
  `for in in {0..1000000};do echo "hello world" >>file1 ;done`
4. Check the tier.log and it can be seen that database lock messages would be thrown



sos report server rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/bug.1240569

--- Additional comment from Joseph Elwin Fernandes on 2015-07-07 05:58:29 EDT ---

The reason this happens is in WAL mode opening a new db connection is an expensive operation as it will try to a lock on the WAL file(even though for a short time). The migration process opens a new connection per brick per promotion/demotion cycle. This is bad scheme.

Solution:
1) Create a new connection only in tier_init per brick and have the connection alive and use it for every promotion/demotion 
2) Enable pooling (Pooling=True in the connection string) when the connection is established, there isn't any locking of WAL file, because existing connections are reused internal to sqlite.

http://dev.yorhel.nl/doc/sqlaccess

--- Additional comment from Vijay Bellur on 2015-09-19 12:09:28 EDT ---

REVIEW: http://review.gluster.org/12191 (tier/ctr: Solution for db locks for tier migrator and ctr using sqlite version less than 3.7 i.e rhel 6.7) posted (#2) for review on master by Joseph Fernandes

--- Additional comment from Vijay Bellur on 2015-09-19 14:04:56 EDT ---

REVIEW: http://review.gluster.org/12191 (tier/ctr: Solution for db locks for tier migrator and ctr using sqlite version less than 3.7 i.e rhel 6.7) posted (#3) for review on master by Joseph Fernandes
Comment 1 Vivek Agarwal 2015-10-09 02:43:34 EDT
*** Bug 1265399 has been marked as a duplicate of this bug. ***
Comment 2 Vijay Bellur 2015-10-09 19:59:08 EDT
COMMIT: http://review.gluster.org/12325 committed in release-3.7 by Dan Lambright (dlambrig@redhat.com) 
------
commit 489f02879afd940d201d092720dbf13b2922b134
Author: Joseph Fernandes <josferna@redhat.com>
Date:   Fri Sep 18 19:57:54 2015 +0530

    tier/ctr: Solution for db locks for tier migrator and ctr using sqlite version less than 3.7 i.e rhel 6.7
    
    Problem: On RHEL 6.7, we have sqlite version 3.6.2 which doesnt support
    WAL journaling mode, as this journaling mode is only available in sqlite 3.7 and above.
    As a result we cannot have to progreses concurrently accessing sqlite, without
    running into db locks! Well WAL is also need for performace on CTR side.
    
    Solution: This solution is to use CTR db connection for doing queries when WAL mode is
    absent. i,e tier migrator will send sync_op ipc calls to CTR, which in turn will
    do the query and create/update the query file suggested by tier migrator.
    
    Pending: Well this solution will stop the db locks but the performance is still an issue for CTR.
    We are developing an in-Memory Transaction Log (iMeTaL) which will help boost the CTR
    performance by doing in memory udpates on the IO path and later flush the updates to
    the db in a batch/segment flush.
    
    Master patch: http://review.gluster.org/#/c/12191
    >> Change-Id: Ie3149643ded159234b5cc6aa6cf93b9022c2f124
    >> BUG: 1240577
    >> Signed-off-by: Joseph Fernandes <josferna@redhat.com>
    >> Signed-off-by: Dan Lambright <dlambrig@redhat.com>
    >> Signed-off-by: Joseph Fernandes <josferna@redhat.com>
    >> Reviewed-on: http://review.gluster.org/12191
    >> Tested-by: Gluster Build System <jenkins@build.gluster.com>
    >> Reviewed-by: Luis Pabon <lpabon@redhat.com>
    Signed-off-by: Joseph Fernandes <josferna@redhat.com>
    
    Change-Id: Ie8c7a7e9566244c104531b579126bb57fbc6e32b
    BUG: 1270123
    Reviewed-on: http://review.gluster.org/12325
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Dan Lambright <dlambrig@redhat.com>
    Tested-by: Dan Lambright <dlambrig@redhat.com>
Comment 3 Raghavendra Talur 2015-11-17 00:59:41 EST
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.6, please open a new bug report.

glusterfs-3.7.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/gluster-users/2015-November/024359.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.