Bug 1557551

Summary: quota crawler fails w/ TLS enabled
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: John Strunk <jstrunk>
Component: quotaAssignee: Sanoj Unnikrishnan <sunnikri>
Status: CLOSED ERRATA QA Contact: Vinayak Papnoi <vpapnoi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: rhs-bugs, sheggodu, srmukher, storage-qa-internal, sunnikri, vpapnoi
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-9 Doc Type: Bug Fix
Doc Text:
Previously, while quota was enabled on a volume, the quota used values were not updated to the list command until a lookup was done from the client mount point. Due to this, there was inaccuracy while reporting the file size even after performing the crawl operation. With this fix, it is ensured that the crawl operation looks up all files and reports the accurate quota used.
Story Points: ---
Clone Of:
: 1575858 (view as bug list) Environment:
Last Closed: 2018-09-04 06:44:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1575858    
Bug Blocks: 1503137    

Description John Strunk 2018-03-16 20:55:08 UTC
Description of problem:
When enabling quota on a volume that has TLS enabled, the crawler is unable to mount and subsequently exits, leaving the stale mountpoint behind

Version-Release number of selected component (if applicable):
glusterfs-server-3.8.4-54.el7rhgs.x86_64


How reproducible:
always


Steps to Reproduce:
1. Create a 1x3 volume w/ both management and data TLS enabled
--> place tls keys (using common CA)
--> touch secure_access
--> start glusterd
--> create volume
--> set client.ssl and server.ssl
2. sudo gluster vol quota supervol01 enable
3. Note presence of stale mountpoint from quota

Actual results:

$ grep supervol01-brick /proc/mounts 
localhost:client_per_brick/supervol01.client.node3.bricks-supervol01-brick.vol /run/gluster/tmp/mntxqfB0P fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0
$ ls /run/gluster/tmp/mntxqfB0P
ls: cannot access /run/gluster/tmp/mntxqfB0P: Transport endpoint is not connected

From glusterd.log:
[2018-03-16 20:40:07.615411] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting quotad service
[2018-03-16 20:40:14.861880] W [MSGID: 106033] [glusterd-quota.c:331:_glusterd_quota_initiate_fs_crawl] 0-management: chdir /var/run/gluster/tmp/mntxqfB0P failed [Transport endpoint is not connected]
[2018-03-16 20:41:43.716241] E [socket.c:2631:socket_poller] 0-socket.management: socket_poller 127.0.0.1:1020 failed (Input/output error)

From brick log:
[2018-03-16 20:40:11.757728] I [glusterfsd-mgmt.c:54:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2018-03-16 20:40:11.761844] I [MSGID: 101173] [graph.c:269:gf_add_cmdline_options] 0-supervol01-posix: adding option 'glusterd-uuid' for volume 'supervol01-posix' with value 'fed60a35-4c4a-
4a31-8ed4-c261561b1196'
[2018-03-16 20:40:11.761860] I [MSGID: 115034] [server.c:406:_check_for_auth_option] 0-supervol01-io-stats: skip format check for non-addr auth option auth.login./bricks/supervol01/brick.all
ow
[2018-03-16 20:40:11.761876] I [MSGID: 115034] [server.c:406:_check_for_auth_option] 0-supervol01-io-stats: skip format check for non-addr auth option auth.login.e8f666df-0e75-458f-96ed-098c
418ea43f.password
[2018-03-16 20:40:11.761930] I [addr.c:55:compare_addr_and_update] 0-/bricks/supervol01/brick: allowed = "*", received addr = "127.0.0.1"
[2018-03-16 20:40:11.761936] I [login.c:34:gf_auth] 0-auth/login: connecting user name: node3
[2018-03-16 20:40:11.761943] I [addr.c:55:compare_addr_and_update] 0-/bricks/supervol01/brick: allowed = "*", received addr = "192.168.121.199"
[2018-03-16 20:40:11.761947] I [login.c:34:gf_auth] 0-auth/login: connecting user name: node4
[2018-03-16 20:40:11.761954] I [addr.c:55:compare_addr_and_update] 0-/bricks/supervol01/brick: allowed = "*", received addr = "192.168.121.70"
[2018-03-16 20:40:11.761957] I [login.c:34:gf_auth] 0-auth/login: connecting user name: node5
[2018-03-16 20:40:11.762049] I [MSGID: 121037] [changetimerecorder.c:1978:reconfigure] 0-supervol01-changetimerecorder: set
[2018-03-16 20:40:11.762124] I [MSGID: 0] [gfdb_sqlite3.c:1398:gf_sqlite3_set_pragma] 0-sqlite3: Value set on DB wal_autocheckpoint : 25000
[2018-03-16 20:40:11.762581] I [MSGID: 0] [gfdb_sqlite3.c:1398:gf_sqlite3_set_pragma] 0-sqlite3: Value set on DB cache_size : 12500
[2018-03-16 20:40:11.762721] I [socket.c:4242:socket_init] 0-supervol01-quota: SSL support for glusterd is ENABLED
[2018-03-16 20:40:11.762782] E [socket.c:4320:socket_init] 0-supervol01-quota: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2018-03-16 20:40:11.763036] W [socket.c:3911:reconfigure] 0-supervol01-quota: disabling non-blocking IO
[2018-03-16 20:40:11.763416] I [MSGID: 101190] [event-epoll.c:602:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-03-16 20:40:11.777503] I [addr.c:55:compare_addr_and_update] 0-/bricks/supervol01/brick: allowed = "*", received addr = "127.0.0.1"
[2018-03-16 20:40:11.777516] I [login.c:34:gf_auth] 0-auth/login: connecting user name: node3
[2018-03-16 20:40:11.777522] I [MSGID: 115029] [server-handshake.c:778:server_setvolume] 0-supervol01-server: accepted client from node3-2802-2018/03/16-20:40:07:620796-supervol01-client-0-0-0 (version: 3.8.4)
[2018-03-16 20:40:14.860687] E [socket.c:358:ssl_setup_connection] 0-tcp.supervol01-server: SSL connect error (client: 127.0.0.1:1015) (server: 127.0.0.1:49154)
[2018-03-16 20:40:14.860715] E [socket.c:202:ssl_dump_error_stack] 0-tcp.supervol01-server:   error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
[2018-03-16 20:40:14.860727] E [socket.c:2510:socket_poller] 0-tcp.supervol01-server: server setup failed
...
(and every 4 seconds in the log until the mount is removed manually)
...
[2018-03-16 20:50:08.266288] E [socket.c:358:ssl_setup_connection] 0-tcp.supervol01-server: SSL connect error (client: 127.0.0.1:1018) (server: 127.0.0.1:49154)
[2018-03-16 20:50:08.266337] E [socket.c:202:ssl_dump_error_stack] 0-tcp.supervol01-server:   error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
[2018-03-16 20:50:08.266351] E [socket.c:2510:socket_poller] 0-tcp.supervol01-server: server setup failed




Expected results:
Quota scan should be able to mount and unmount on volumes using TLS


Additional info:
$ sudo gluster vol info supervol01
 
Volume Name: supervol01
Type: Replicate
Volume ID: 25e8bbd7-1634-4030-91d7-5a2d922d680b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node3:/bricks/supervol01/brick
Brick2: node4:/bricks/supervol01/brick
Brick3: node5:/bricks/supervol01/brick
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
client.ssl: on
server.ssl: on
performance.cache-refresh-timeout: 60
performance.cache-size: 134217728
performance.nl-cache: on
performance.md-cache-timeout: 300
transport.address-family: inet
nfs.disable: on
auto-delete: enable
cluster.enable-shared-storage: enable

Comment 6 Sanoj Unnikrishnan 2018-04-17 06:19:03 UTC
Quota uses a per brick volfile to do per brick crawl (earlier this used to be a volume level volfile.), 
The function used for this is glusterd_generate_client_per_brick_volfile.

In this function the volfile generated does not have 
"option transport.socket.ssl-enabled on"

The fix is to correct the volfile generation

Comment 14 errata-xmlrpc 2018-09-04 06:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Comment 15 Red Hat Bugzilla 2023-09-14 04:25:40 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days