Bug 1467958 - [GSS] MQ objects damaged on pushing loads of messages
[GSS] MQ objects damaged on pushing loads of messages
Status: CLOSED NOTABUG
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: CNS-deployment (Show other bugs)
3.2
x86_64 Linux
urgent Severity urgent
: ---
: ---
Assigned To: Michael Adam
Anoop
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-05 11:49 EDT by Cal Calhoun
Modified: 2017-10-16 07:22 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-07-18 17:41:56 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sosreport for node 10.98.60.24 (96 bytes, text/plain)
2017-07-05 16:20 EDT, Cal Calhoun
no flags Details
sosreport for node 10.98.62.148 (96 bytes, text/plain)
2017-07-05 16:21 EDT, Cal Calhoun
no flags Details
sosreport for node 10.98.62.152 (96 bytes, text/plain)
2017-07-05 16:22 EDT, Cal Calhoun
no flags Details

  None (edit)
Description Cal Calhoun 2017-07-05 11:49:29 EDT
Created attachment 1294660 [details]
heketi topology file

Description of problem:

Originally there were two separate gluster clusters, each having three nodes (one in each AZ) with replica 3 volumes.  The gluster cluster was build on OCP and deployed with heketi.  When large numbers of messages are pushed to the MQ queue, the queue files (MQ objects) get damaged.  Thinking it could be due to insufficient resource available on the gluster nodes that cause the damage architecture changes were made to have only one gluster cluster with 6 nodes (still replica 3 volumes).  On performing the same load test, they are seeing the same issue of MQ objects being damaged.

Version-Release number of selected component (if applicable):

gluster 3.8.4
IBM MQ v9

Have requested cns and heketi versions.

How reproducible:

Happens with regularity during load testing.

Steps to Reproduce:

Routine writing of large numbers of messages to the MQ queue seems to cause the damage.  I've requested more quantitative information on the MQ message volume that seems to trigger the problem. 

Actual results:

MQ files are being 'damaged'.  Have asked for additional detail on what that means.

Expected results:

Would expect MQ to be able to write messages to the gluster volume(s) without damaging the queue files.

Additional info:

I have requested new MQ logs.

These errors were posted to the case:

Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                                                    SubobjectPath   Type            Reason          Message
  ---------     --------        -----   ----                                                    -------------   --------        ------          -------
  1h            25m             16      {kubelet ip-10-98-60-24.eu-west-1.compute.internal}                     Warning         FailedMount     MountVolume.SetUp failed for volume "kubernetes.io/secret/353351d2-5d9c-11e7-85a9-0a879126be0e-default-token-qo8l4" (spec.Name: "default-token-qo8l4") pod "353351d2-5d9c-11e7-85a9-0a879126be0e" (UID: "353351d2-5d9c-11e7-85a9-0a879126be0e") with: Get https://internal-paperboyprj-techtest-master-1067458796.eu-west-1.elb.amazonaws.com:8443/api/v1/namespaces/storage-utif/secrets/default-token-qo8l4: net/http: TLS handshake timeout
  1h            4m              23      {kubelet ip-10-98-60-24.eu-west-1.compute.internal}                     Warning         FailedMount     MountVolume.SetUp failed for volume "kubernetes.io/secret/353351d2-5d9c-11e7-85a9-0a879126be0e-default-token-qo8l4" (spec.Name: "default-token-qo8l4") pod "353351d2-5d9c-11e7-85a9-0a879126be0e" (UID: "353351d2-5d9c-11e7-85a9-0a879126be0e") with: Get https://internal-paperboyprj-techtest-master-1067458796.eu-west-1.elb.amazonaws.com:8443/api/v1/namespaces/storage-utif/secrets/default-token-qo8l4: EOF

Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                                                    SubobjectPath   Type            Reason          Message
  ---------     --------        -----   ----                                                    -------------   --------        ------          -------
  20h           32m             14      {kubelet ip-10-98-62-152.eu-west-1.compute.internal}                    Warning         FailedMount     MountVolume.SetUp failed for volume "kubernetes.io/secret/3569b8d3-5d9c-11e7-85a9-0a879126be0e-default-token-qo8l4" (spec.Name: "default-token-qo8l4") pod "3569b8d3-5d9c-11e7-85a9-0a879126be0e" (UID: "3569b8d3-5d9c-11e7-85a9-0a879126be0e") with: Get https://internal-paperboyprj-techtest-master-1067458796.eu-west-1.elb.amazonaws.com:8443/api/v1/namespaces/storage-utif/secrets/default-token-qo8l4: net/http: TLS handshake timeout
  20h           22m             30      {kubelet ip-10-98-62-152.eu-west-1.compute.internal}                    Warning         FailedMount     MountVolume.SetUp failed for volume "kubernetes.io/secret/3569b8d3-5d9c-11e7-85a9-0a879126be0e-default-token-qo8l4" (spec.Name: "default-token-qo8l4") pod "3569b8d3-5d9c-11e7-85a9-0a879126be0e" (UID: "3569b8d3-5d9c-11e7-85a9-0a879126be0e") with: Get https://internal-paperboyprj-techtest-master-1067458796.eu-west-1.elb.amazonaws.com:8443/api/v1/namespaces/storage-utif/secrets/default-token-qo8l4: EOF
Comment 5 Cal Calhoun 2017-07-05 12:24:48 EDT
Version Information:

  cns-deploy-3.1.0-14.el7rhgs.x86_64
  heketi-3.1.0-14.el7rhgs.x86_64
  heketi-client-3.1.0-14.el7rhgs.x86_64
Comment 7 Cal Calhoun 2017-07-05 16:15:45 EDT
@Vijay: I'll attach the three that I have and ask for the others.
Comment 8 Cal Calhoun 2017-07-05 16:20 EDT
Created attachment 1294737 [details]
sosreport for node 10.98.60.24
Comment 9 Cal Calhoun 2017-07-05 16:21 EDT
Created attachment 1294738 [details]
sosreport for node 10.98.62.148
Comment 10 Cal Calhoun 2017-07-05 16:22 EDT
Created attachment 1294739 [details]
sosreport for node 10.98.62.152

Note You need to log in before you can comment on or make changes to this bug.