| Summary: | CPU usage 100% on client and brick... | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Amar Tumballi <amarts> | ||||||||||
| Component: | replicate | Assignee: | Pranith Kumar K <pkarampu> | ||||||||||
| Status: | CLOSED WORKSFORME | QA Contact: | |||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | medium | ||||||||||||
| Version: | 3.1.3 | CC: | gluster-bugs, masta, vraman | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | Type: | --- | |||||||||||
| Regression: | --- | Mount Type: | fuse | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Attachments: |
|
||||||||||||
|
Description
Aleks3Y
2011-03-17 11:23:45 UTC
Created attachment 455 CPU usage was around 100-150% Created attachment 456 in client CPU usage 60-120% Users CPU usage is 100% on brick and client machine, while using a software called 'r1soft' for backing up data, on a volume like below: gluster> volume info Volume Name: Volume01 Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: brick01:/exp1 Brick2: brick02:/exp2 Options Reconfigured: performance.flush-behind: on auth.reject: 127.0.0.2 auth.allow: 192.168.254.*,127.0.0.1,172.16.48.74 diagnostics.brick-log-level: WARNING diagnostics.client-log-level: WARNING same info on: http://pastebin.com/97PweKFw http://pastebin.com/pzm5G24e http://pastebin.com/7LaBRT4F log on client: [2011-03-16 09:41:25.941350] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.941509] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.941630] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.941760] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.941782] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks failed. Proceeding to blocking [2011-03-16 09:41:25.942150] D [afr-lk-common.c:987:afr_lock_blocking] Volume01-replicate-0: we're done locking [2011-03-16 09:41:25.942176] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] Volume01-replicate-0: Blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.942284] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.942362] D [afr-lk-common.c:987:afr_lock_blocking] Volume01-replicate-0: we're done locking [2011-03-16 09:41:25.942383] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] Volume01-replicate-0: Blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.942447] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.943252] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.943433] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.943551] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.943610] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.943732] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.943848] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.943922] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.943979] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.944054] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.944114] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.944237] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.944326] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.944347] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks failed. Proceeding to blocking [2011-03-16 09:41:25.944606] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.944922] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.944940] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks failed. Proceeding to blocking [2011-03-16 09:41:25.945037] D [afr-lk-common.c:987:afr_lock_blocking] Volume01-replicate-0: we're done locking [2011-03-16 09:41:25.945052] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] Volume01-replicate-0: Blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.945256] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.945350] D [afr-lk-common.c:987:afr_lock_blocking] Volume01-replicate-0: we're done locking [2011-03-16 09:41:25.945372] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] Volume01-replicate-0: Blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.945502] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.946050] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.946267] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.946412] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.946495] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.946564] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.946579] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks failed. Proceeding to blocking [2011-03-16 09:41:25.946682] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.946763] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.946860] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.946952] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.947051] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.947152] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.947185] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks failed. Proceeding to blocking [2011-03-16 09:41:25.947505] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.947661] D [afr-lk-common.c:987:afr_lock_blocking] Volume01-replicate-0: we're done locking [2011-03-16 09:41:25.947686] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] Volume01-replicate-0: Blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.947862] D [afr-lk-common.c:987:afr_lock_blocking] Volume01-replicate-0: we're done locking [2011-03-16 09:41:25.947885] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] Volume01-replicate-0: Blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.948060] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.948586] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.948654] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.948763] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.948920] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.949070] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.949127] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.949216] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.949358] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.949470] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.949547] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.949562] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks failed. Proceeding to blocking [2011-03-16 09:41:25.949862] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.949966] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.949989] D [afr-transaction.c:973:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks failed. Proceeding to blocking [2011-03-16 09:41:25.950149] D [afr-transaction.c:979:afr_post_nonblocking_inodelk_cbk] Volume01-replicate-0: Non blocking inodelks done. Proceeding to FOP [2011-03-16 09:41:25.950273] D [afr-lk-common.c:410:transaction_lk_op] Volume01-replicate-0: lk op is for a transaction [2011-03-16 09:41:25.950350] D [afr-lk-common.c:987:afr_lock_blocking] Volume01-replicate-0: we're done locking [2011-03-16 09:41:25.950390] D [afr-transaction.c:953:afr_post_blocking_inodelk_cbk] Volume01-replicate-0: Blocking inodelks done. Proceeding to FOP Server log: [2011-03-16 05:35:24.708136] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.708253] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.708344] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.708821] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.710140] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.710306] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.710544] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.710798] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.711046] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.711271] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.711339] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.711648] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.712345] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.712543] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.713543] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.713745] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.713955] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [2011-03-16 05:35:24.714046] D [inodelk.c:293:__inode_unlock_lock] Volume01-locks: Matching lock found for unlock [Repeats another 10000s of time - Removed due to comment size limit] Also, client avg speed was 20Mbit/s incoming on public interface and around 70Mbit/s on outgoing going to gluster bricks I've moved to NFS and i'm getting ~300Mbit/s now incoming and outgoing to 1 brick and then they syncing with cross connect. CPU usage is still high, but it's due syncing i think. guys, i can provide you with graphs if you need. hi, According to the glusterdumps provided in the attachments, brick which gave the glusterdump.1179 does not have any locks for some reason. I tried to look through the logs but the client and server logs have different timestamps. Would it be possible to zip and post the log files of the two bricks and the clients. I am a bit confused by the second comment. Could you please clarify why the data is going only to one brick instead of both the bricks in the replica pair?. Do post the graphs as well. Pranith. Created attachment 460 It's getting worser and worser. NO errors are showing in logs at all, but cpu usage is around 300%. About previous comment: I was able to get only 20Mbit/s with gluster native client and 300Mbit/s with NFS. Is there any way we can figure it out? I've put it on production and if it will work like that, i'll be asked to remove it :( (In reply to comment #7) > It's getting worser and worser. > NO errors are showing in logs at all, but cpu usage is around 300%. > > About previous comment: > I was able to get only 20Mbit/s with gluster native client and 300Mbit/s with > NFS. > > Is there any way we can figure it out? I've put it on production and if it will > work like that, i'll be asked to remove it :( Could you please tell me why the traffic/data is going only to one brick instead of both the bricks?. Pranith Because i'm using NFS right now. As I said, when i'm using native client, CPU usage is 100% on client server and max speed is 20Mbit/s I'm using backup software on the client... (In reply to comment #9) > Because i'm using NFS right now. > As I said, when i'm using native client, CPU usage is 100% on client server and > max speed is 20Mbit/s > > I'm using backup software on the client... I am not sure I understand. When you have the bricks Brick1: brick01:/exp1 Brick2: brick02:/exp2 in a replica pair, if you mount them with fuse or nfs data should go to both brick01:/exp1 and brick02:/exp2, why is it going to only one of these two bricks?. Pranith Yes, with fuse it's going to both servers, in netstat i can see 3 (three) active connections, two to main brick and one to 2nd. When i'm using nfs, it's only going to one and then bricks are syncing via local connection between each other. I was sure it's how it's supposed to be. (In reply to comment #11) > Yes, with fuse it's going to both servers, in netstat i can see 3 (three) > active connections, two to main brick and one to 2nd. > When i'm using nfs, it's only going to one and then bricks are syncing via > local connection between each other. > > I was sure it's how it's supposed to be. When you configure the volume as replicate, syncing between the two bricks brick01:/exp1 brick02:/exp2 is taken care. You don't need to sync them again using another local connection explicitly. How are you achieving the sync between these two bricks using local connection? using some other software?. I dont understand how you managed to mount only one brick with nfs. What command did you use to do this?. Could you paste the output of "mount" from the machine where you mounted the volume as fuse and nfs. Pranith (In reply to comment #12) > (In reply to comment #11) > > Yes, with fuse it's going to both servers, in netstat i can see 3 (three) > > active connections, two to main brick and one to 2nd. > > When i'm using nfs, it's only going to one and then bricks are syncing via > > local connection between each other. > > > > I was sure it's how it's supposed to be. > > When you configure the volume as replicate, syncing between the two bricks > brick01:/exp1 brick02:/exp2 is taken care. You don't need to sync them again > using another local connection explicitly. How are you achieving the sync > between these two bricks using local connection? using some other software?. I > dont understand how you managed to mount only one brick with nfs. What command > did you use to do this?. Could you paste the output of "mount" from the machine > where you mounted the volume as fuse and nfs. > > Pranith There is no response on this bug from you. Its been a long time and We havent faced similar problems from any other. Are you still hitting this issue?. unfortunately we stop using it :( (In reply to comment #13) > (In reply to comment #12) > > (In reply to comment #11) > > > Yes, with fuse it's going to both servers, in netstat i can see 3 (three) > > > active connections, two to main brick and one to 2nd. > > > When i'm using nfs, it's only going to one and then bricks are syncing via > > > local connection between each other. > > > > > > I was sure it's how it's supposed to be. > > > > When you configure the volume as replicate, syncing between the two bricks > > brick01:/exp1 brick02:/exp2 is taken care. You don't need to sync them again > > using another local connection explicitly. How are you achieving the sync > > between these two bricks using local connection? using some other software?. I > > dont understand how you managed to mount only one brick with nfs. What command > > did you use to do this?. Could you paste the output of "mount" from the machine > > where you mounted the volume as fuse and nfs. > > > > Pranith > > There is no response on this bug from you. Its been a long time and We havent > faced similar problems from any other. Are you still hitting this issue?. We don't have enough info to debug this issue and no other reported a similar issue. It is working fine for us. So closing this for now. Feel free to open if you get it again. |