Description of problem: When trying to scale cloudforms pod below errors are seen in the pod and it doesn't scale : oc get pods NAME READY STATUS RESTARTS AGE cloudforms-3-03cdq 1/1 Running 28 2d cloudforms-3-p61zz 0/1 Running 358 2d memcached-1-6ahiy 1/1 Running 2 2d postgresql-1-ndp3c 1/1 Running 1 2d [root@ansible-m1 ~]# os rsh cloudforms-3-03cdq [----] I, [2017-01-27T12:44:33.654925 #1722:e61130] INFO -- : <AutomationEngine> MiqAeEvent.build_evm_event >> event=<"evm_worker_start"> inputs=<{:event_details=>"Worker started: ID [2982], PID [16020], GUID [574155ca-e48e-11e6-ae92-4ec637fe7c9c]", :type=>"MiqReportingWorker", "MiqEvent::miq_event"=>3036, :miq_event_id=>3036, "EventStream::event_stream"=>3036, :event_stream_id=>3036}> [----] I, [2017-01-27T12:44:33.666973 #15989:e61130] INFO -- : MIQ(MiqGenericWorker#log_status) [Generic Worker] Worker ID [2978], PID [15989], GUID [56fb61a0-e48e-11e6-ae92-4ec637fe7c9c], Last Heartbeat [2017-01-27 12:44:32 UTC], Process Info: Memory Usage [287612928], Memory Size [589725696], Proportional Set Size: [154088000], Memory % [2.31], CPU Time [68.0], CPU % [0.0], Priority [30] [----] E, [2017-01-27T12:44:33.667414 #15989:e61130] ERROR -- : MIQ(MiqGenericWorker::Runner) ID [2978] PID [15989] GUID [56fb61a0-e48e-11e6-ae92-4ec637fe7c9c] Error heartbeating to MiqServer because DRb::DRbConnError: druby://127.0.0.1:41860 - #<Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 41860> Worker exiting. the 2nd appliance is starting but fails to seek the first one. Also, cfme with the same GUID... sh-4.2# cat GUID 347e7bc4-e463-11e6-a02b-2a9c7f6c0ea5sh-4.2# Version-Release number of selected component (if applicable): Cfme 4.3 OCP 4.0 How reproducible: Steps to Reproduce: 1. Deploy cloudforms on openshift using below article : https://access.redhat.com/documentation/en/red-hat-cloudforms/4.2/single/installing-red-hat-cloudforms-on-openshift-container-platform/#prerequisites 2. Once the pods start running to scale : oc scale --replicas=2 rc cloudforms 3. The new pod never starts with status 1/1 Actual results: Second pod never starts and the the 2nd appliance is starting but fails to seek the first one Expected results: It should have started successfully. Additional info:
Patch that supports PetSet was merged https://github.com/ManageIQ/manageiq-pods/commit/67925e980bf3501eb199e1e5698591588206e078 As this will be supported in CF 4.5 on OCP 3.5 we will have to move to SatefulSet so not moving this bug to POST yet.
Move to statefulset was completed and released downstream, It was included in the template released on build 5.8.0.12. Therefor I am moving this bug to ON_QA
Verified. Setup OCP3.5 and Podified CFME (5.8.0.12) ran command: oc scale statefulset cloudforms --replicas=2 oc get pods Result: two cloudforms pods were seen (cloudforms-0 & cloudforms-1)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1366