Bug 1540376
Summary: | Tiered volume performance degrades badly after a volume stop/start or system restart. | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Jeff Byers <jbyers> |
Component: | tiering | Assignee: | bugs <bugs> |
Status: | CLOSED WONTFIX | QA Contact: | bugs <bugs> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | bugs, jbyers, vnosov |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-02 08:13:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jeff Byers
2018-01-30 23:18:48 UTC
This problem appears to be related to the sqlite3 DB files that are used for the tiering file access counters, stored on each hot and cold tier brick in .glusterfs/<volname>.db. When the tier is first created, these DB files do not exist, they are created, and everything works fine. On a stop/start or service restart, the .db files are already present, albeit empty since I don't have cluster.write-freq- threshold nor cluster.read-freq-threshold set, so features.record-counters is off and nothing should be going into the DB. I've found that if I delete these .db files after the volume stop, but before the volume start, the tiering performance is normal, not degraded. Of course all of the history in these DB files is lost. Not sure what other ramifications there are to deleting these .db files. When I did have one of the freq-threshold settings set, I did see a record get added to the file, so the sqlite3 DB is working to some degree. The sqlite3 version I have installed is sqlite-3.6.20- 1.el6_7.2.x86_64. The problem was simple, the sqlite3 DB connection parameters were only being set on a newly created DB, not when there was an existing DB. Apparently the sqlite3 DB default parameters are not ideal. diff -up glusterfs-3.7.18/libglusterfs/src/gfdb/gfdb_sqlite3.c-orig glusterfs-3.7.18/libglusterfs/src/gfdb/gfdb_sqlite3.c --- glusterfs-3.7.18/libglusterfs/src/gfdb/gfdb_sqlite3.c-orig 2018-02-01 13:14:19.000000000 -0800 +++ glusterfs-3.7.18/libglusterfs/src/gfdb/gfdb_sqlite3.c 2018-02-01 13:31:24.000000000 -0800 @@ -449,9 +449,11 @@ gf_sqlite3_init (dict_t *args, void **db - /* If the file exist we skip the config part - * and creation of the schema */ - if (is_dbfile_exist) - goto db_exists; - /*Apply sqlite3 params to database*/ ret = apply_sql_params_db (sql_conn, args); @@ -462,6 +464,12 @@ gf_sqlite3_init (dict_t *args, void **db goto out; } + /* If the file exist we skip the config part + * and creation of the schema */ + if (is_dbfile_exist) + goto db_exists; + /*Create the schema if NOT present*/ ret = create_filetable (sql_conn->sqlite3_db_conn); if (ret) { Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions. Patch https://review.gluster.org/#/c/glusterfs/+/21331/ removes tier functionality from GlusterFS. https://bugzilla.redhat.com/show_bug.cgi?id=1642807 is used as the tracking bug for this. Recommendation is to convert your tier volume to regular volume (either replicate, ec, or plain distribute) with "tier detach" command before upgrade, and use backend features like dm-cache etc to utilize the caching from backend to provide better performance and functionality. |