Bug 1557551 - quota crawler fails w/ TLS enabled
Summary: quota crawler fails w/ TLS enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: quota
Version: rhgs-3.3
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: Sanoj Unnikrishnan
QA Contact: Vinayak Papnoi
URL:
Whiteboard:
Depends On: 1575858
Blocks: 1503137
TreeView+ depends on / blocked
 
Reported: 2018-03-16 20:55 UTC by John Strunk
Modified: 2023-09-14 04:25 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.12.2-9
Doc Type: Bug Fix
Doc Text:
Previously, while quota was enabled on a volume, the quota used values were not updated to the list command until a lookup was done from the client mount point. Due to this, there was inaccuracy while reporting the file size even after performing the crawl operation. With this fix, it is ensured that the crawl operation looks up all files and reports the accurate quota used.
Clone Of:
: 1575858 (view as bug list)
Environment:
Last Closed: 2018-09-04 06:44:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 0 None None None 2018-09-04 06:45:42 UTC

Description John Strunk 2018-03-16 20:55:08 UTC
Description of problem:
When enabling quota on a volume that has TLS enabled, the crawler is unable to mount and subsequently exits, leaving the stale mountpoint behind

Version-Release number of selected component (if applicable):
glusterfs-server-3.8.4-54.el7rhgs.x86_64


How reproducible:
always


Steps to Reproduce:
1. Create a 1x3 volume w/ both management and data TLS enabled
--> place tls keys (using common CA)
--> touch secure_access
--> start glusterd
--> create volume
--> set client.ssl and server.ssl
2. sudo gluster vol quota supervol01 enable
3. Note presence of stale mountpoint from quota

Actual results:

$ grep supervol01-brick /proc/mounts 
localhost:client_per_brick/supervol01.client.node3.bricks-supervol01-brick.vol /run/gluster/tmp/mntxqfB0P fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0
$ ls /run/gluster/tmp/mntxqfB0P
ls: cannot access /run/gluster/tmp/mntxqfB0P: Transport endpoint is not connected

From glusterd.log:
[2018-03-16 20:40:07.615411] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting quotad service
[2018-03-16 20:40:14.861880] W [MSGID: 106033] [glusterd-quota.c:331:_glusterd_quota_initiate_fs_crawl] 0-management: chdir /var/run/gluster/tmp/mntxqfB0P failed [Transport endpoint is not connected]
[2018-03-16 20:41:43.716241] E [socket.c:2631:socket_poller] 0-socket.management: socket_poller 127.0.0.1:1020 failed (Input/output error)

From brick log:
[2018-03-16 20:40:11.757728] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2018-03-16 20:40:11.761844] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-supervol01-posix: adding option 'glusterd-uuid' for volume 'supervol01-posix' with value 'fed60a35-4c4a-
4a31-8ed4-c261561b1196'
[2018-03-16 20:40:11.761860] I [MSGID: 115034] [server.c:406:_check_for_auth_option] 0-supervol01-io-stats: skip format check for non-addr auth option auth.login./bricks/supervol01/brick.all
ow
[2018-03-16 20:40:11.761876] I [MSGID: 115034] [server.c:406:_check_for_auth_option] 0-supervol01-io-stats: skip format check for non-addr auth option auth.login.e8f666df-0e75-458f-96ed-098c
418ea43f.password
[2018-03-16 20:40:11.761930] I [addr.c:55:compare_addr_and_update] 0-/bricks/supervol01/brick: allowed = "*", received addr = "127.0.0.1"
[2018-03-16 20:40:11.761936] I [login.c:34:gf_auth] 0-auth/login: connecting user name: node3
[2018-03-16 20:40:11.761943] I [addr.c:55:compare_addr_and_update] 0-/bricks/supervol01/brick: allowed = "*", received addr = "192.168.121.199"
[2018-03-16 20:40:11.761947] I [login.c:34:gf_auth] 0-auth/login: connecting user name: node4
[2018-03-16 20:40:11.761954] I [addr.c:55:compare_addr_and_update] 0-/bricks/supervol01/brick: allowed = "*", received addr = "192.168.121.70"
[2018-03-16 20:40:11.761957] I [login.c:34:gf_auth] 0-auth/login: connecting user name: node5
[2018-03-16 20:40:11.762049] I [MSGID: 121037] [changetimerecorder.c:1978:reconfigure] 0-supervol01-changetimerecorder: set
[2018-03-16 20:40:11.762124] I [MSGID: 0] [gfdb_sqlite3.c:1398:gf_sqlite3_set_pragma] 0-sqlite3: Value set on DB wal_autocheckpoint : 25000
[2018-03-16 20:40:11.762581] I [MSGID: 0] [gfdb_sqlite3.c:1398:gf_sqlite3_set_pragma] 0-sqlite3: Value set on DB cache_size : 12500
[2018-03-16 20:40:11.762721] I [socket.c:4242:socket_init] 0-supervol01-quota: SSL support for glusterd is ENABLED
[2018-03-16 20:40:11.762782] E [socket.c:4320:socket_init] 0-supervol01-quota: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2018-03-16 20:40:11.763036] W [socket.c:3911:reconfigure] 0-supervol01-quota: disabling non-blocking IO
[2018-03-16 20:40:11.763416] I [MSGID: 101190] [event-epoll.c:602:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-03-16 20:40:11.777503] I [addr.c:55:compare_addr_and_update] 0-/bricks/supervol01/brick: allowed = "*", received addr = "127.0.0.1"
[2018-03-16 20:40:11.777516] I [login.c:34:gf_auth] 0-auth/login: connecting user name: node3
[2018-03-16 20:40:11.777522] I [MSGID: 115029] [server-handshake.c:778:server_setvolume] 0-supervol01-server: accepted client from node3-2802-2018/03/16-20:40:07:620796-supervol01-client-0-0-0 (version: 3.8.4)
[2018-03-16 20:40:14.860687] E [socket.c:358:ssl_setup_connection] 0-tcp.supervol01-server: SSL connect error (client: 127.0.0.1:1015) (server: 127.0.0.1:49154)
[2018-03-16 20:40:14.860715] E [socket.c:202:ssl_dump_error_stack] 0-tcp.supervol01-server:   error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
[2018-03-16 20:40:14.860727] E [socket.c:2510:socket_poller] 0-tcp.supervol01-server: server setup failed
...
(and every 4 seconds in the log until the mount is removed manually)
...
[2018-03-16 20:50:08.266288] E [socket.c:358:ssl_setup_connection] 0-tcp.supervol01-server: SSL connect error (client: 127.0.0.1:1018) (server: 127.0.0.1:49154)
[2018-03-16 20:50:08.266337] E [socket.c:202:ssl_dump_error_stack] 0-tcp.supervol01-server:   error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
[2018-03-16 20:50:08.266351] E [socket.c:2510:socket_poller] 0-tcp.supervol01-server: server setup failed




Expected results:
Quota scan should be able to mount and unmount on volumes using TLS


Additional info:
$ sudo gluster vol info supervol01
 
Volume Name: supervol01
Type: Replicate
Volume ID: 25e8bbd7-1634-4030-91d7-5a2d922d680b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node3:/bricks/supervol01/brick
Brick2: node4:/bricks/supervol01/brick
Brick3: node5:/bricks/supervol01/brick
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
client.ssl: on
server.ssl: on
performance.cache-refresh-timeout: 60
performance.cache-size: 134217728
performance.nl-cache: on
performance.md-cache-timeout: 300
transport.address-family: inet
nfs.disable: on
auto-delete: enable
cluster.enable-shared-storage: enable

Comment 6 Sanoj Unnikrishnan 2018-04-17 06:19:03 UTC
Quota uses a per brick volfile to do per brick crawl (earlier this used to be a volume level volfile.), 
The function used for this is glusterd_generate_client_per_brick_volfile.

In this function the volfile generated does not have 
"option transport.socket.ssl-enabled on"

The fix is to correct the volfile generation

Comment 14 errata-xmlrpc 2018-09-04 06:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Comment 15 Red Hat Bugzilla 2023-09-14 04:25:40 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.