Bug 1601874
Summary: | While creating, deleting pvs in loop and running gluster volume heal in gluster pods 2 of the 4 gluster pods are in 0/1 state | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | vinutha <vinug> |
Component: | Containers | Assignee: | Antonio Murdaca <amurdaca> |
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9.0 | CC: | amukherj, aos-bugs, clichybi, hchiramm, jokerman, knarra, mmccomas, mpatel, pprakash, rhs-bugs, rtalur, sankarshan, sarumuga, vbellur, vinug |
Target Milestone: | --- | ||
Target Release: | 3.9.z | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-04-09 14:20:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1605158 |
Description
vinutha
2018-07-17 11:45:04 UTC
There were some discussion in the call regarding pod limits for a node. Referring to https://access.redhat.com/documentation/en-us/openshift_container_platform/3.9/html-single/scaling_and_performance_guide/#scaling-performance-current-cluster-limits Max number of pods per node = Min(250, number of cpu cores * 10) considering that the cpu count on these machines is 32, we have pod limit = min(250, 32 * 10) = 250 This holds true only if a. all pods have cpu resource defined as 100 millicpus b. host machines(which might be VMs have dedicated 32 cpu cores) [root@dhcp46-119 ~]# nproc 32 [root@dhcp46-119 ~]# lscpu | grep -E '^Thread|^Core|^Socket|^CPU\(' CPU(s): 32 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 8 The above comments are based on the nproc and lscpu output. I would need data to ensure that these 32 cores are dedicated to the VM. Please attach the data from the hypervisor to prove the same. (In reply to Raghavendra Talur from comment #7) > There were some discussion in the call regarding pod limits for a node. > > Referring to > https://access.redhat.com/documentation/en-us/openshift_container_platform/3. > 9/html-single/scaling_and_performance_guide/#scaling-performance-current- > cluster-limits > > Max number of pods per node = Min(250, number of cpu cores * 10) > > considering that the cpu count on these machines is 32, we have > > pod limit = min(250, 32 * 10) = 250 > > This holds true only if > a. all pods have cpu resource defined as 100 millicpus > b. host machines(which might be VMs have dedicated 32 cpu cores) c. the default pod limits have not been overwritten in node-config any way you can test this out with newer docker referenced in https://bugzilla.redhat.com/show_bug.cgi?id=1560428#c48? Can you try docker-1.13.1-94 which we believe has the fixes? Placing need info on kasturi to try this out and get back. Removing need info on vinutha as he is no more with RedHat. Below are the test steps i ran to verify this bug. 1) Created 100 mongodb pods with 1GB pvc in size and 1GB Ram allocated. 2) Started running a loop of volume heal info command from all the gluster pods present in the system (3 pods were present) 3) From another terminal starting running load on mongodb 4) From another terminal started to run a loop of pvc create and pvc delete for 300 pvcs. After 300 pvcs were created and deleted i did not see any gluster pod going into 0/1 state , see that all pods are up and running, no gluster core generated in any of the node. But on the mongodb i did a issue similar as below, will be raising a separate bug for that : ========================================================================================= 2019-03-30 19:03:40:170 12580 sec: 276 operations; 0.1 current ops/sec; est completion in 9 hours 10 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=1, Max=30015487, Min=29999104, Avg=30007296, 90=30015487, 99=30015487, 99.9=30015487, 99.99=30015487] [READ-MODIFY-WRITE: Count=1, Max=60030975, Min=59998208, Avg=60014592, 90=60030975, 99=60030975, 99.9=60030975, 99.99=60030975] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] 2019-03-30 19:03:50:170 12590 sec: 276 operations; 0 current ops/sec; est completion in 9 hours 10 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-MODIFY-WRITE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] 2019-03-30 19:04:00:170 12600 sec: 276 operations; 0 current ops/sec; est completion in 9 hours 10 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-MODIFY-WRITE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=:27017:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketException: :27017: invalid IPv6 address}, caused by {java.net.UnknownHostException: :27017: invalid IPv6 address}}] 2019-03-30 19:04:10:170 12610 sec: 276 operations; 0 current ops/sec; est completion in 9 hours 11 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-MODIFY-WRITE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-FAILED: Count=1, Max=30015487, Min=29999104, Avg=30007296, 90=30015487, 99=30015487, 99.9=30015487, 99.99=30015487] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] 2019-03-30 19:04:20:170 12620 sec: 276 operations; 0 current ops/sec; est completion in 9 hours 11 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-MODIFY-WRITE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] 2019-03-30 19:04:30:170 12630 sec: 276 operations; 0 current ops/sec; est completion in 9 hours 12 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-MODIFY-WRITE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches PrimaryServerSelector. Client view of cluster state is {type=UNKNOWN, servers=[{address=:27017:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketException: :27017: invalid IPv6 address}, caused by {java.net.UnknownHostException: :27017: invalid IPv6 address}}] 2019-03-30 19:04:40:170 12640 sec: 277 operations; 0.1 current ops/sec; est completion in 9 hours 9 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=1, Max=30015487, Min=29999104, Avg=30007296, 90=30015487, 99=30015487, 99.9=30015487, 99.99=30015487] [READ-MODIFY-WRITE: Count=1, Max=60030975, Min=59998208, Avg=60014592, 90=60030975, 99=60030975, 99.9=60030975, 99.99=60030975] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] 2019-03-30 19:04:50:170 12650 sec: 277 operations; 0 current ops/sec; est completion in 9 hours 10 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-MODIFY-WRITE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] 2019-03-30 19:05:00:170 12660 sec: 277 operations; 0 current ops/sec; est completion in 9 hours 10 minutes [READ: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-MODIFY-WRITE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [READ-FAILED: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] [UPDATE: Count=0, Max=0, Min=9223372036854775807, Avg=�, 90=0, 99=0, 99.9=0, 99.99=0] com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=UNKNOWN, servers=[{address=:27017:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketException: :27017: invalid IPv6 address}, caused by {java.net.UnknownHostException: :27017: invalid IPv6 address}}] Since the actual issue which is reported is not seen, moving the bug to verified state. Oc version : ====================== [root@dhcp47-89 ~]# oc version oc v3.9.71 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dhcp47-89.lab.eng.blr.redhat.com:8443 openshift v3.9.71 kubernetes v1.9.1+a0ce1bc657 sh-4.2# rpm -qa | grep glusterfs glusterfs-libs-3.12.2-32.el7rhgs.x86_64 glusterfs-3.12.2-32.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-32.el7rhgs.x86_64 glusterfs-server-3.12.2-32.el7rhgs.x86_64 glusterfs-api-3.12.2-32.el7rhgs.x86_64 glusterfs-cli-3.12.2-32.el7rhgs.x86_64 glusterfs-fuse-3.12.2-32.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-32.el7rhgs.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0619 |