Created attachment 1294660[details]
heketi topology file
Description of problem:
Originally there were two separate gluster clusters, each having three nodes (one in each AZ) with replica 3 volumes. The gluster cluster was build on OCP and deployed with heketi. When large numbers of messages are pushed to the MQ queue, the queue files (MQ objects) get damaged. Thinking it could be due to insufficient resource available on the gluster nodes that cause the damage architecture changes were made to have only one gluster cluster with 6 nodes (still replica 3 volumes). On performing the same load test, they are seeing the same issue of MQ objects being damaged.
Version-Release number of selected component (if applicable):
gluster 3.8.4
IBM MQ v9
Have requested cns and heketi versions.
How reproducible:
Happens with regularity during load testing.
Steps to Reproduce:
Routine writing of large numbers of messages to the MQ queue seems to cause the damage. I've requested more quantitative information on the MQ message volume that seems to trigger the problem.
Actual results:
MQ files are being 'damaged'. Have asked for additional detail on what that means.
Expected results:
Would expect MQ to be able to write messages to the gluster volume(s) without damaging the queue files.
Additional info:
I have requested new MQ logs.
These errors were posted to the case:
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 25m 16 {kubelet ip-10-98-60-24.eu-west-1.compute.internal} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/secret/353351d2-5d9c-11e7-85a9-0a879126be0e-default-token-qo8l4" (spec.Name: "default-token-qo8l4") pod "353351d2-5d9c-11e7-85a9-0a879126be0e" (UID: "353351d2-5d9c-11e7-85a9-0a879126be0e") with: Get https://internal-paperboyprj-techtest-master-1067458796.eu-west-1.elb.amazonaws.com:8443/api/v1/namespaces/storage-utif/secrets/default-token-qo8l4: net/http: TLS handshake timeout
1h 4m 23 {kubelet ip-10-98-60-24.eu-west-1.compute.internal} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/secret/353351d2-5d9c-11e7-85a9-0a879126be0e-default-token-qo8l4" (spec.Name: "default-token-qo8l4") pod "353351d2-5d9c-11e7-85a9-0a879126be0e" (UID: "353351d2-5d9c-11e7-85a9-0a879126be0e") with: Get https://internal-paperboyprj-techtest-master-1067458796.eu-west-1.elb.amazonaws.com:8443/api/v1/namespaces/storage-utif/secrets/default-token-qo8l4: EOF
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
20h 32m 14 {kubelet ip-10-98-62-152.eu-west-1.compute.internal} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/secret/3569b8d3-5d9c-11e7-85a9-0a879126be0e-default-token-qo8l4" (spec.Name: "default-token-qo8l4") pod "3569b8d3-5d9c-11e7-85a9-0a879126be0e" (UID: "3569b8d3-5d9c-11e7-85a9-0a879126be0e") with: Get https://internal-paperboyprj-techtest-master-1067458796.eu-west-1.elb.amazonaws.com:8443/api/v1/namespaces/storage-utif/secrets/default-token-qo8l4: net/http: TLS handshake timeout
20h 22m 30 {kubelet ip-10-98-62-152.eu-west-1.compute.internal} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/secret/3569b8d3-5d9c-11e7-85a9-0a879126be0e-default-token-qo8l4" (spec.Name: "default-token-qo8l4") pod "3569b8d3-5d9c-11e7-85a9-0a879126be0e" (UID: "3569b8d3-5d9c-11e7-85a9-0a879126be0e") with: Get https://internal-paperboyprj-techtest-master-1067458796.eu-west-1.elb.amazonaws.com:8443/api/v1/namespaces/storage-utif/secrets/default-token-qo8l4: EOF