| Summary: | geo-rep passive sessions went to faulty state while creating 10K files and doing metadata operations in loop. | ||
|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Vijaykumar Koppad <vkoppad> |
| Component: | geo-replication | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> |
| Status: | CLOSED NOTABUG | QA Contact: | storage-qa-internal <storage-qa-internal> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.1 | CC: | aavati, avishwan, csaba, david.macdonald, ndevos |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-12-24 09:58:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Vijaykumar Koppad
2013-11-22 06:36:13 UTC
This message caused the brick process to stop: [2013-11-22 04:53:22.139406] M [posix-helpers.c:1309:posix_health_check_thread_proc] 0-master-posix: still alive! -> SIGTERM The reason the health-check failed, is because of this (and similar): [2013-11-22 04:53:19.528109] E [posix.c:616:posix_opendir] 0-master-posix: opendir failed on /bricks/master_brick2/: No such file or directory It suggests that the directory for this brick does not exist. Could it be that the brick process has been started before the directory /bricks/master_brick2/ was available (is /bricks a mountpoint)?. Or, in case the directory was removed/renamed while the brick process was running, I think it is correct to stop the process for the brick. Could you post the 'gluster volume info master' output to verify the bricks? # gluster volume info master Volume Name: master Type: Distributed-Replicate Volume ID: 94f27837-1db2-457e-9a61-08a44d84c7ef Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.43.0:/bricks/master_brick1 Brick2: 10.70.43.29:/bricks/master_brick2 Brick3: 10.70.43.40:/bricks/master_brick3 Brick4: 10.70.43.53:/bricks/master_brick4 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on Looks like the brick process got a SIGTERM due to posix health check which indicates some problem with the underlying storage layer. geo-rep sessions going to faulty is expected behavior in this scenario. |