Bug 1240569
Summary: | Data Tiering: Database locks observed on tiered volumes on continous writes to a file | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | |
Component: | tier | Assignee: | Joseph Elwin Fernandes <josferna> | |
Status: | CLOSED NOTABUG | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | rhgs-3.1 | CC: | dlambrig, ndevos, rcyriac, rhs-bugs, sankarshan, storage-qa-internal | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | tier-migration | |||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1256591 1267242 (view as bug list) | Environment: | ||
Last Closed: | 2016-05-25 13:21:37 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1256591, 1265399, 1267242, 1270123 | |||
Bug Blocks: |
Description
Nag Pavan Chilakam
2015-07-07 09:08:02 UTC
sos reports at /home/repo/sosreports/bug.1240569 sos report server rhsqe-repo.lab.eng.blr.redhat.com:/home/repo/sosreports/bug.1240569 The reason this happens is in WAL mode opening a new db connection is an expensive operation as it will try to a lock on the WAL file(even though for a short time). The migration process opens a new connection per brick per promotion/demotion cycle. This is bad scheme. Solution: 1) Create a new connection only in tier_init per brick and have the connection alive and use it for every promotion/demotion 2) Enable pooling (Pooling=True in the connection string) when the connection is established, there isn't any locking of WAL file, because existing connections are reused internal to sqlite. raised an upstream bug 1240577 On further analysis found that "Database locked" error was appearing not at init of db. It was happen in the query path of tier migrator. he message "E [MSGID: 101106] [gfdb_sqlite3.c:779:gf_sqlite3_find_unchanged_for_time] 0-sqlite3: Failed preparing statment select GF_FILE_TB.GF_ID, (select group_concat( GF_PID || ',' || FNAME || ',' || FPATH || ',' || W_DEL_FLAG ||',' || LINK_UPDATE , '::') from GF_FLINK_TB where GF_FILE_TB.GF_ID = GF_FLINK_TB.GF_ID) from GF_FILE_TB where ((GF_FILE_TB.W_SEC * 1000000 + GF_FILE_TB.W_MSEC) <= ? ) OR ((GF_FILE_TB.W_READ_SEC * 1000000 + GF_FILE_TB.W_READ_MSEC) <= ?) : database is locked" repeated 2 times between [2015-07-10 08:55:40.438914] and [2015-07-10 08:57:30.559193] Also another observation was that the database was not operating in the WAL journal mode. It was running in the regular journal mode! In the regular journal mode has issues with concurrent access. [root@rhgs1 .glusterfs]# ls 00 6e b7 changelogs d1.db d1.db-journal d5 health_check indices landfill [root@rhgs1 .glusterfs]# pwd /home/disk/d1/.glusterfs [root@rhgs1 .glusterfs]# Looks like sqlite 3.6.20 (which comes with RHEL 6.7) doesnot have WAL journal mode [root@rhgs1 ~]# sqlite3 joseph.db; SQLite version 3.6.20 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> PRAGMA journal_mode=WAL; delete sqlite> This problem doesnt happen on latest sqlite 3.8, tested on fedora21[root@fedora1 ~]# sqlite3 joseph.db SQLite version 3.8.10.2 2015-05-20 18:17:19 Enter ".help" for usage hints. sqlite> PRAGMA journal_mode=WAL; wal sqlite> We would require a higher version of sqlite3 on RHGS 3.x, which supports WAL journal mode. |