Bug 1409584 - seeing remote operation failed [Invalid argument] wrt inode locking in my systemic setup
Summary: seeing remote operation failed [Invalid argument] wrt inode locking in my sys...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-02 14:32 UTC by Nag Pavan Chilakam
Modified: 2018-12-05 07:53 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-10 07:24:20 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2017-01-02 14:32:13 UTC
Description of problem:
========================
I am seeing below error messages with invalid arguement
he message "W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-sysvol-client-3: remote operation failed" repeated 2 times between [2016-12-31 22:42:37.
957679] and [2016-12-31 22:42:37.959194]
[2016-12-31 22:42:37.969111] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-sysvol-client-3: Connected to sysvol-client-3, attached to remote volume 
'/rhs/brick1/sysvol'.
[2016-12-31 22:42:37.969145] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-sysvol-client-3: Server and Client lk-version numbers are not same, reope
ning the fds
[2016-12-31 22:42:37.969171] I [MSGID: 114042] [client-handshake.c:1053:client_post_handshake] 0-sysvol-client-3: 3 fds open - Delaying child_up until they are re-opened
[2016-12-31 22:42:37.971081] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-sysvol-client-3: remote operation failed [Invalid argument]
[2016-12-31 22:42:37.971125] E [MSGID: 108010] [afr-lk-common.c:677:afr_unlock_inodelk_cbk] 0-sysvol-replicate-1: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume sysvol-client-3 with lock owner 107260aee47f0000 [Invalid argument]
[2016-12-31 22:42:37.971671] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-sysvol-client-3: remote operation failed [Invalid argument]
[2016-12-31 22:42:37.971718] E [MSGID: 108010] [afr-lk-common.c:677:afr_unlock_inodelk_cbk] 0-sysvol-replicate-1: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume sysvol-client-3 with lock owner 107260aee47f0000 [Invalid argument]
[2016-12-31 22:42:37.973572] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-sysvol-client-3: remote operation failed [Invalid argument]
[2016-12-31 22:42:37.973608] E [MSGID: 108010] [afr-lk-common.c:677:afr_unlock_inodelk_cbk] 0-sysvol-replicate-1: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume sysvol-client-3 with lock owner 107260aee47f0000 [Invalid argument]
[2016-12-31 22:42:37.974395] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-sysvol-client-3: reopendir on <gfid:91f6b86c-8f2e-4bb9-82eb-893cfac75e75> succeeded (fd = 0)
[2016-12-31 22:42:37.975745] I [MSGID: 114060] [client-handshake.c:817:client3_3_reopendir_cbk] 0-sysvol-client-3: reopendir on <gfid:974d33c5-9a23-4462-9a5b-2c6e1f75fa83> succeeded (fd = 1)
[2016-12-31 22:42:37.976187] I [MSGID: 114041] [client-handshake.c:675:client_child_up_reopen_done] 0-sysvol-client-3: last fd open'd/lock-self-heal'd - notifying CHILD-UP




2b7517dff22. sources=[0]  sinks=1 
[2016-12-31 23:19:43.703506] E [MSGID: 108008] [afr-transaction.c:2602:afr_write_txn_refresh_done] 0-sysvol-replicate-3: Failing SETXATTR on gfid d0f39ca6-ec00-46bf-8eca-02b7517dff22: split-brain observed.
[2016-12-31 23:19:43.704941] E [MSGID: 114031] [client-rpc-fops.c:1550:client3_3_inodelk_cbk] 0-sysvol-client-3: remote operation failed [Invalid argument]
[2016-12-31 23:21:31.913428] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-sysvol-client-2: server 10.70.35.156:49154 has not responded in the last 42 seconds, disconnecting.



On my systemic setup, I am doing same path directory creation simultaneously from 3 different clients.
Each client used different server IP to mount the volume using fuse protocol
Also, Each client were dumping sosreports every 5min into the volume mount in a screen session, along with top output being appended to a file every minute
The dir-creations were happening from different users
Eg:
client1(el 7.2) was running the dir-creation using pavan@rhs-client23
client2(el 6.7) as root@rhs-client24
client3(el 7.3) as cli21@rhs-client21


Note: these logs are wrt client1 ie rhs-client23
Also, however note that I am able to access the mount




sosreports available at 
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/rhs-client23.lab.eng.blr.redhat.com/

test execution details available at https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609

Version-Release number of selected component (if applicable):
============
3.8.4-10






other BZs for reference(raised wrt issues on same setup)
1409472 - brick crashed on systemic setup
1397907 - seeing frequent kernel hangs when doing operations both on fuse client and gluster nodes on replica volumes (edit) [NEEDINFO]
1409568 - seeing socket disconnects and transport endpoint not connected frequently on systemic setup 
1409572 - In fuse mount logs:seeing input/output error with split-brain observed logs and failing GETXATTR and STAT
1409580 - seeing stale file handle errors in fuse mount logs in systemic testing
1409583 - seeing RPC status error messages and timeouts due to RPC (rpc-clnt.c:200:call_bail)

Comment 2 Nag Pavan Chilakam 2017-01-04 07:12:30 UTC
client sosreports are available at scp -r /var/tmp/$HOSTNAME qe@rhsqe-repo:/var/www/html/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/

Comment 4 Pranith Kumar K 2017-02-10 07:24:20 UTC
This log is important for debugging stale locks. So won't be fixing it.


Note You need to log in before you can comment on or make changes to this bug.