| Summary: | Freeze of clients by intensive parallel writes | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Alex Aster <alrond> |
| Component: | locks | Assignee: | Pranith Kumar K <pkarampu> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 3.2.2 | CC: | gluster-bugs |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-06-13 07:17:21 UTC | Type: | --- |
| Regression: | --- | Mount Type: | fuse |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Alex Aster
2011-08-28 20:37:10 UTC
(In reply to comment #0) > Hello, I have very bad problem. > After some time, some my client-applications have been freezed. > Then currently folders/files(where clients worked) are freezed for any other > users and commands like ls, touch... > Kill -9 doesn't help. Only server restart or kill -9 of mounted glusterfs > helps. > > OS: Ubuntu Server 11.04, 64Bit > Typ: 2 replication content servers, each Server has 8TB Raid-6 with Ext4 (with > user_xattr). > GlusterFS: 3.2.3 > > Mounting via fstab: > serv4:/vol-content /content glusterfs > auto,noatime,nodiratime,nosuid,noexec,rw,allow_other,default_permissions,max_read=131072,_netdev > 0 0 > > I has this Problem already with 3.1.3 and XFS, with 3.2.2 and Ext4. > > I cannot always repeat this situation - a lot of parallel works read/writes > with PHP-Scripts of clients. > Currently FS works, because I have moved this clients to local disk. > > I have tried with clear configuration and with tuned - the same results. > > Currently configuration: > > Volume Name: vol-content > Type: Replicate > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: serv3:/media/content > Brick2: serv4:/media/content > Options Reconfigured: > diagnostics.dump-fd-stats: no > diagnostics.latency-measurement: no > diagnostics.client-log-level: INFO > diagnostics.brick-log-level: INFO > performance.io-cache: on > auth.allow: 192.168.0.* > performance.io-thread-count: 64 > performance.write-behind-window-size: 1GB > > Specially after upgrade to 3.2.3 I have enabled debug-modus and moved my > clients to Gluster. After some minutes half of the client have been zombie. > > For debug I have enabled: > gluster volume set vol-content diagnostics.brick-log-level DEBUG > gluster volume set vol-content diagnostics.client-log-level DEBUG > gluster volume set vol-content diagnostics.latency-measurement yes > gluster volume set vol-content diagnostics.dump-fd-stats yes > > I have both logs: server and bricks, but I don't known how can I find this > problem, because the logs are 400Mb just for those 2 Hours and no "error" words Could you please zip and attach the logs to the bug. I dont see any problem with the configuration you have at the moment. I send you logs via email because of some sensitive informations. 13:42 - GlusterFS upgrade to 3.2.3 and add debug to config up to 15:01 some script was started manually without problems ca 15:12 - autostart of all scripts. After one-two minutes some scripts already have been freezed. 15:22-23 I have stopped all I can always repeat this situation for additional tests. Has forgotten to write: Test folder is "/vol-rest/userupload_cluster/" (In reply to comment #3) > Has forgotten to write: > Test folder is "/vol-rest/userupload_cluster/" hi Alex, I don't seem to have gotten any logs mail, could you re-send the mail. Sorry for the inconvenience. Pranith. I have send two emails on 29th August. Aug 29 04:20:11 ubuntu postfix/smtp[25161]: 58647181566: to=<pranithk>, relay=east.smtp.exch024.serverdata.net[206.225.164.180]:25, delay=121, delays=107/0.02/0.6/13, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 63A03153) Aug 29 04:23:00 ubuntu postfix/smtp[26680]: 2B480181566: to=<pranithk>, relay=east.smtp.exch024.serverdata.net[206.225.164.180]:25, delay=101, delays=88/0.01/0.34/13, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 9335E64) And today repeated: Sep 19 13:15:39 ubuntu postfix/smtp[2371]: E2D28180F41: to=<pranithk>, relay=east.smtp.exch024.serverdata.net[206.225.164.180]:25, delay=129, delays=110/0.01/0.57/18, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 8F7EE24C) Sep 19 13:18:41 ubuntu postfix/smtp[2703]: 32BAC180F41: to=<pranithk>, relay=east.smtp.exch024.serverdata.net[206.225.164.180]:25, delay=105, delays=92/0.01/0.46/13, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 924B3D1) (In reply to comment #5) > I have send two emails on 29th August. > > Aug 29 04:20:11 ubuntu postfix/smtp[25161]: 58647181566: > to=<pranithk>, > relay=east.smtp.exch024.serverdata.net[206.225.164.180]:25, delay=121, > delays=107/0.02/0.6/13, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as > 63A03153) > Aug 29 04:23:00 ubuntu postfix/smtp[26680]: 2B480181566: > to=<pranithk>, > relay=east.smtp.exch024.serverdata.net[206.225.164.180]:25, delay=101, > delays=88/0.01/0.34/13, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as > 9335E64) > > > And today repeated: > > Sep 19 13:15:39 ubuntu postfix/smtp[2371]: E2D28180F41: > to=<pranithk>, > relay=east.smtp.exch024.serverdata.net[206.225.164.180]:25, delay=129, > delays=110/0.01/0.57/18, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as > 8F7EE24C) > Sep 19 13:18:41 ubuntu postfix/smtp[2703]: 32BAC180F41: > to=<pranithk>, > relay=east.smtp.exch024.serverdata.net[206.225.164.180]:25, delay=105, > delays=92/0.01/0.46/13, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as > 924B3D1) hi Alex, I went through the media-content logs, did not find anything wrong with it. The other log is not extracting pranith @ ~/Downloads/3487/content 21:54:11 :) $ gunzip content.log.gz gzip: content.log.gz: unexpected end of file pranith @ ~/Downloads/3487/content 21:54:17 :( $ ls -l !$ ls -l content.log.gz -rw-r--r-- 1 pranith pranith 6957684 2011-09-21 21:40 content.log.gz Could you update that log. If there are 2 bricks and 1 client, there should be 3 logs. 2 (bricks) media-content logs and 1 (mount) content log. Do you hang out on IRC? if yes what is your nick?. Please feel free to re-open with the necessary logs. |