Red Hat Bugzilla – Bug 1299752
One of the bricks on hot tier doesn't heal after node failure/recovery
Last modified: 2016-09-17 10:18:57 EDT
Description of problem:
On a tiered volume with hot tier set as replica, one of the brick doesn't heal after node failure/recovery. Node was rebooted after it turned unresponsive and a hard reboot was performed. cpu utilization of glusterfsd process (of affected node's hot tier brick) is at 200% and lots of error messages are seen in brick, tier logs.
We are yet to determine what triggered this issue and how the system ended up in such state.
Volume Name: reg-test-cycle1
Volume ID: f4b57f6b-f54b-4e46-834e-1a3ee2718a57
Number of Bricks: 20
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick7: 10.70.43.141:/rhs/brick6/leg2 --> source
Brick8: 10.70.42.45:/rhs/brick6/leg2 -> sink
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Version-Release number of selected component (if applicable):
Unable to determine if this is reproducible
Steps to Reproduce:
There are no exact steps to reproduce this issue
The glusterfsd process consumes 200% of cpu and heal doesn't happen on the hot tier brick of one of the nodes
No brick failure, high cpu consumption or heal failure
sosreport an the affected node seems to hang, I'll update necessary logs manually from that node.