Bug 2213196
| Summary: | The 'foreman-proxy' service on Satellite fails with error "malloc_consolidate(): unaligned fastbin chunk detected" while provisioning a host | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Jayant Bhatia <jbhatia> |
| Component: | Foreman Proxy | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED ERRATA | QA Contact: | Lukas Pramuk <lpramuk> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.12.4 | CC: | ahumbe, alsouza, arahaman, aruzicka, bart.demeester, hyu, lpramuk, osousa, pmoravec, rlavi, saydas, sshtein |
| Target Milestone: | 6.12.5 | Keywords: | PrioBumpGSS, Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | rubygem-gssapi-1.3.1-1 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-08-24 18:24:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jayant Bhatia
2023-06-07 12:11:28 UTC
Ok, For all the support cases, the realm usage in system deployment seems to be a key point. Even the end-user who confirmed that upgrading to 6.13 no longer triggers the issue had realm enabled and used during deployment. 100% Reproducer steps: * Install Satellite 6.12 on RHEL 8.8 * Set up it with some kickstart content and provisioning features. * Enable Realm feature and select Realm Capsule in a host group ( https://access.redhat.com/documentation/en-us/red_hat_satellite/6.12/html/installing_satellite_server_in_a_connected_network_environment/configuring_external_authentication_satellite#External_Authentication_for_Provisioned_Hosts_satellite ) * Use that Hostgroup to deploy a system Actual Results: 2023-06-09T14:48:12 [I|app|8cec5277] Started POST "/hosts" for 10.74.24.4 at 2023-06-09 14:48:12 +0530 2023-06-09T14:48:12 [I|app|8cec5277] Processing by HostsController#create as */* 2023-06-09T14:48:12 [I|app|8cec5277] Parameters: {"utf8"=>"✓", "authenticity_token"=>"QjOBBGNhrwvSjm2UOA1wz4tmRbzvsnXgcRm0gyIyXD9QnwXsGCjxbPBrHCGFx75OClCjn5vOi7vo8HWdvoASbg==", "host"=>{"name"=>"celia-helgert", "organization_id"=>"1", "location_id"=>"2", "hostgroup_id"=>"1", "content_facet_attributes"=>{"lifecycle_environment_id"=>"1", "content_view_id"=>"6", "content_source_id"=>"1", "kickstart_repository_id"=>"70"}, "managed"=>"true", "progress_report_id"=>"[FILTERED]", "type"=>"Host::Managed", "interfaces_attributes"=>{"0"=>{"_destroy"=>"0", "type"=>"Nic::Managed", "identifier"=>"", "name"=>"celia-helgert", "domain_id"=>"2", "subnet_id"=>"1", "ip"=>"192.168.233.156", "ip6"=>"", "managed"=>"1", "primary"=>"1", "provision"=>"1", "execution"=>"1", "virtual"=>"0", "tag"=>"", "attached_to"=>"", "compute_attributes"=>{"type"=>"VirtualVmxnet3", "network"=>"dvportgroup-847"}}}, "compute_attributes"=>{"cpus"=>"1", "corespersocket"=>"1", "memory_mb"=>"4096", "firmware"=>"bios", "cluster"=>"SYSLAB", "resource_pool"=>"Resources", "path"=>"/Datacenters/PRIME/vm/SYSMGMT/saydas", "guest_id"=>"rhel8_64Guest", "hardware_version"=>"Default", "memoryHotAddEnabled"=>"1", "cpuHotAddEnabled"=>"1", "add_cdrom"=>"0", "boot_order"=>["network", "disk"], "start"=>"1", "annotation"=>"", "scsi_controllers"=>"{\"scsiControllers\":[{\"type\":\"VirtualLsiLogicController\",\"key\":1000}],\"volumes\":[{\"thin\":true,\"name\":\"Hard disk\",\"mode\":\"persistent\",\"controllerKey\":1000,\"sizeGb\":10,\"datastore\":\"Arcade\",\"storagePod\":null}]}"}, "architecture_id"=>"1", "operatingsystem_id"=>"2", "provision_method"=>"build", "build"=>"1", "medium_id"=>"", "ptable_id"=>"136", "pxe_loader"=>"PXELinux BIOS", "disk"=>"", "root_pass"=>"[FILTERED]", "is_owned_by"=>"4-Users", "enabled"=>"1", "comment"=>"", "overwrite"=>"false"}, "media_selector"=>"synced_content"} 2023-06-09T14:48:12 [I|app|8cec5277] Add realm entry for new host host.example.com ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:12 8cec5277 [I] Started POST /realm/EXAMPLE.COM/ ==> /var/log/messages <== Jun 9 14:48:12 satellite612 smart-proxy[24297]: /usr/share/foreman-proxy/modules/realm_freeipa/provider.rb:46: warning: already initialized constant XMLRPC::Config::ENABLE_NIL_PARSER Jun 9 14:48:12 satellite612 smart-proxy[24297]: /usr/share/gems/gems/xmlrpc-0.3.0/lib/xmlrpc/config.rb:22: warning: previous definition of ENABLE_NIL_PARSER was here ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:12 8cec5277 [I] Finished POST /realm/EXAMPLE.COM/ with 200 (468.61 ms) ==> /var/log/foreman/production.log <== 2023-06-09T14:48:12 [I|app|8cec5277] Adding Compute instance for host.example.com ==> /var/log/messages <== Jun 9 14:48:12 satellite612 smart-proxy[24297]: X.X.X.X - - [09/Jun/2023:14:48:12 IST] "POST /realm/EXAMPLE.COM/ HTTP/1.1" 200 633 Jun 9 14:48:12 satellite612 smart-proxy[24297]: - -> /realm/EXAMPLE.COM/ ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:14 8cec5277 [I] Started GET /tftp/serverName 2023-06-09T14:48:14 8cec5277 [I] Finished GET /tftp/serverName with 200 (0.93 ms) 2023-06-09T14:48:14 8cec5277 [I] Started GET /tftp/serverName 2023-06-09T14:48:14 8cec5277 [I] Finished GET /tftp/serverName with 200 (0.81 ms) ==> /var/log/foreman/production.log <== 2023-06-09T14:48:14 [I|app|8cec5277] Create DHCP reservation host.example.com for host.example.com-00:50:56:b4:11:5d/192.168.233.156 ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:14 8cec5277 [I] Started POST /dhcp/192.168.233.0 2023-06-09T14:48:14 8cec5277 [I] Next-server option not IPv4, trying to resolve 'satellite612.example.com' 2023-06-09T14:48:14 8cec5277 [I] Finished POST /dhcp/192.168.233.0 with 200 (24.34 ms) ==> /var/log/foreman/production.log <== 2023-06-09T14:48:14 [I|app|8cec5277] Add DNS A record for host.example.com/192.168.233.156 ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:14 8cec5277 [I] Started POST /dns/ 2023-06-09T14:48:14 8cec5277 [I] Finished POST /dns/ with 200 (14.28 ms) ==> /var/log/foreman/production.log <== 2023-06-09T14:48:14 [I|app|8cec5277] Add DNS PTR record for 192.168.233.156/host.example.com ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:14 8cec5277 [I] Started POST /dns/ ==> /var/log/messages <== Jun 9 14:48:14 satellite612 smart-proxy[24297]: X.X.X.X - - [09/Jun/2023:14:48:14 IST] "GET /tftp/serverName HTTP/1.1" 200 53 Jun 9 14:48:14 satellite612 smart-proxy[24297]: - -> /tftp/serverName Jun 9 14:48:14 satellite612 smart-proxy[24297]: X.X.X.X - - [09/Jun/2023:14:48:14 IST] "GET /tftp/serverName HTTP/1.1" 200 53 Jun 9 14:48:14 satellite612 smart-proxy[24297]: - -> /tftp/serverName Jun 9 14:48:14 satellite612 smart-proxy[24297]: X.X.X.X - - [09/Jun/2023:14:48:14 IST] "POST /dhcp/192.168.233.0 HTTP/1.1" 200 0 Jun 9 14:48:14 satellite612 smart-proxy[24297]: - -> /dhcp/192.168.233.0 Jun 9 14:48:14 satellite612 named[1067]: client @0x7fce3c0a25b0 127.0.0.1#48349/key rndc-key: updating zone 'sat.EXAMPLE.COM/IN': adding an RR at 'host.example.com' A 192.168.233.156 Jun 9 14:48:14 satellite612 smart-proxy[24297]: X.X.X.X - - [09/Jun/2023:14:48:14 IST] "POST /dns/ HTTP/1.1" 200 0 Jun 9 14:48:14 satellite612 smart-proxy[24297]: - -> /dns/ Jun 9 14:48:14 satellite612 named[1067]: client @0x7fce3c0a25b0 127.0.0.1#55857/key rndc-key: updating zone '233.168.192.in-addr.arpa/IN': adding an RR at '156.233.168.192.in-addr.arpa' PTR host.example.com. ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:14 8cec5277 [I] Finished POST /dns/ with 200 (15.93 ms) ==> /var/log/foreman/production.log <== 2023-06-09T14:48:14 [I|app|3e5ca2c8] Started GET "/tasks/d111af69-aa42-46ea-8417-7a18c0293d46" for 10.74.24.4 at 2023-06-09 14:48:14 +0530 2023-06-09T14:48:14 [I|app|8cec5277] Deploying TFTP PXELinux configuration for host.example.com 2023-06-09T14:48:14 [I|app|3e5ca2c8] Processing by TasksController#show as */* 2023-06-09T14:48:14 [I|app|3e5ca2c8] Parameters: {"id"=>"d111af69-aa42-46ea-8417-7a18c0293d46"} ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:14 8cec5277 [I] Started POST /tftp/PXELinux/00:50:56:b4:11:5d 2023-06-09T14:48:14 8cec5277 [I] Finished POST /tftp/PXELinux/00:50:56:b4:11:5d with 200 (2.08 ms) ==> /var/log/foreman/production.log <== 2023-06-09T14:48:14 [I|app|3e5ca2c8] Rendered tasks/_list.html.erb (Duration: 5.0ms | Allocations: 6717) 2023-06-09T14:48:14 [I|app|3e5ca2c8] Completed 200 OK in 13ms (Views: 5.4ms | ActiveRecord: 4.1ms | Allocations: 8504) 2023-06-09T14:48:14 [I|app|8cec5277] Deploying TFTP PXEGrub2 configuration for host.example.com ==> /var/log/foreman-proxy/proxy.log <== 2023-06-09T14:48:14 8cec5277 [I] Started POST /tftp/PXEGrub2/00:50:56:b4:11:5d ==> /var/log/messages <== Jun 9 14:48:14 satellite612 smart-proxy[24297]: X.X.X.X - - [09/Jun/2023:14:48:14 IST] "POST /dns/ HTTP/1.1" 200 0 Jun 9 14:48:14 satellite612 smart-proxy[24297]: - -> /dns/ Jun 9 14:48:14 satellite612 smart-proxy[24297]: X.X.X.X - - [09/Jun/2023:14:48:14 IST] "POST /tftp/PXELinux/00:50:56:b4:11:5d HTTP/1.1" 200 0 Jun 9 14:48:14 satellite612 smart-proxy[24297]: - -> /tftp/PXELinux/00:50:56:b4:11:5d Jun 9 14:48:14 satellite612 smart-proxy[24297]: malloc_consolidate(): unaligned fastbin chunk detected Jun 9 14:48:14 satellite612 systemd[1]: Created slice system-systemd\x2dcoredump.slice. Jun 9 14:48:14 satellite612 systemd[1]: Started Process Core Dump (PID 24415/UID 0). And then the host submission fails as foreman-proxy is killed. We got into this issue after an upgrade to Satellite 6.12 (See RH case 03533178) and RHEL8.8 Working situation (pre-upgrade) : - RHEL8.6 - Satellite 6.11 Provisioning at that moment worked without issues using a REALM. After upgrade we have following: - RHEL8.8 - Satellite 6.12 The same provision now fails with the symptoms described in this Bugzilla Please treat this with the highest priority as per company policy we prefer to stay one version behind the latest release for stability reasons. With absolute certainty, I confirm that this fixes the issue
# pwd
/usr/share/gems/gems/gssapi-1.2.0
# git diff
diff --git a/lib/gssapi/lib_gssapi.rb b/lib/gssapi/lib_gssapi.rb
index 0e61dad..e877801 100644
--- a/lib/gssapi/lib_gssapi.rb
+++ b/lib/gssapi/lib_gssapi.rb
@@ -179,11 +179,13 @@ module GSSAPI
class GssCtxIdT < GssPointer
def self.release_ptr(context_ptr)
min_stat = FFI::MemoryPointer.new :OM_uint32
- maj_stat = LibGSSAPI.gss_delete_sec_context(min_stat, context_ptr, LibGSSAPI::GSS_C_NO_BUFFER)
+ ptr_p = FFI::MemoryPointer.new :pointer
+ ctx_ptr = ptr_p.write_pointer(context_ptr)
+ maj_stat = LibGSSAPI.gss_delete_sec_context(min_stat, ctx_ptr, LibGSSAPI::GSS_C_NO_BUFFER)
end
def self.gss_c_no_context
- self.new(GSSAPI::LibGSSAPI::GSS_C_NO_CONTEXT)
+ GssPointer.new(GSSAPI::LibGSSAPI::GSS_C_NO_CONTEXT)
end
end
i.e. the patch https://github.com/zenchild/gssapi/commit/6514d2474be31e1b97fe02f0a2c302463c8a0c88 that resides in rubygem-gssapi 1.3.1 shipped with Satellite 6.13
I applied the same patch on my 6.12 reproducer and tested three concurrent builds. So far , no error or no crash of services whatsoever.
VERIFIED. @Satellite 6.12.5 Snap2 rubygem-xmlrpc-0.3.2-1.el8sat.noarch by manual reproducer described in comment#8 and comment#10: 1) Set up Satellite with kickstart content and provisioning features 2) Enable Realm feature and select Realm Capsule in a Host Group ( https://access.redhat.com/documentation/en-us/red_hat_satellite/6.12/html/installing_satellite_server_in_a_connected_network_environment/configuring_external_authentication_satellite#External_Authentication_for_Provisioned_Hosts_satellite ) 3) Use that Host Group to deploy a system and check the log FIX: # hammer host create --name client --hostgroup baremetal --location-id 2 --organization-id 1 --provision-method build --build true --compute-attributes="start=1" --mac 56:6f:6d:f5:00:0e Host created. # grep smart-proxy /var/log/messages ... Aug 22 16:49:13 localhost smart-proxy[67392]: <SAT_IP> - - [22/Aug/2023:16:49:13 EDT] "POST /tftp/iPXE/56:6f:6d:f5:00:0e HTTP/1.1" 200 0 Aug 22 16:49:13 localhost smart-proxy[67392]: - -> /tftp/iPXE/56:6f:6d:f5:00:0e Aug 22 16:49:13 localhost smart-proxy[67392]: <SAT_IP> - - [22/Aug/2023:16:49:13 EDT] "POST /tftp/fetch_boot_file HTTP/1.1" 200 0 Aug 22 16:49:13 localhost smart-proxy[67392]: - -> /tftp/fetch_boot_file >>> foreman-proxy no longer core dumps vs. REPRO: # hammer host create --name client --hostgroup baremetal --location-id 2 --organization-id 1 --provision-method build --build true --compute-attributes="start=1" --mac 56:6f:6d:f5:00:10 Could not create the host: Deploy TFTP PXELinux config for client.example.com task failed with the following error: ERF12-5472 [ProxyAPI::ProxyException]: Unable to set TFTP boot entry for 56:6f:6d:f5:00:10 ([RestClient::ServerBrokeConnection]: Server broke connection) for Capsule https://<SAT_FQDN>:9090/tftp Failed to perform rollback on Create Reverse IPv4 DNS record for client.example.com - ERF12-1261 [ProxyAPI::ProxyException]: Unable to delete DNS entry ([Errno::ECONNREFUSED]: Failed to open TCP connection to <SAT_FQDN>:9090 (Connection refused - connect(2) for "<SAT_FQDN>" port 9090)) for Capsule https://<SAT_FQDN>:9090/dns Failed to perform rollback on Create IPv4 DNS record for client.example.com - ERF12-1261 [ProxyAPI::ProxyException]: Unable to delete DNS entry ([Errno::ECONNREFUSED]: Failed to open TCP connection to <SAT_FQDN>:9090 (Connection refused - connect(2) for "<SAT_FQDN>" port 9090)) for Capsule https://<SAT_FQDN>:9090/dns Failed to perform rollback on Create DHCP Settings for client.example.com - ERF12-0635 [ProxyAPI::ProxyException]: Unable to delete DHCP entry for 56:6f:6d:f5:00:10 ([Errno::ECONNREFUSED]: Failed to open TCP connection to <SAT_FQDN>:9090 (Connection refused - connect(2) for "<SAT_FQDN>" port 9090)) for Capsule https://<SAT_FQDN>:9090/dhcp Failed to perform rollback on DHCP conflicts removal for client.example.com - ERF12-4395 [ProxyAPI::ProxyException]: Unable to retrieve DHCP entry for 56:6f:6d:f5:00:10 ([Errno::ECONNREFUSED]: Failed to open TCP connection to <SAT_FQDN>:9090 (Connection refused - connect(2) for "<SAT_FQDN>" port 9090)) for Capsule https://<SAT_FQDN>:9090/dhcp # grep smart-proxy /var/log/messages Aug 22 16:48:24 localhost smart-proxy[66678]: /usr/share/foreman-proxy/modules/realm_freeipa/provider.rb:46: warning: already initialized constant XMLRPC::Config::ENABLE_NIL_PARSER Aug 22 16:48:24 localhost smart-proxy[66678]: /usr/share/gems/gems/xmlrpc-0.3.0/lib/xmlrpc/config.rb:22: warning: previous definition of ENABLE_NIL_PARSER was here Aug 22 16:48:24 localhost smart-proxy[66678]: <SAT_IP> - - [22/Aug/2023:16:48:24 EDT] "POST /realm/EXAMPLE.COM/ HTTP/1.1" 200 586 Aug 22 16:48:24 localhost smart-proxy[66678]: - -> /realm/EXAMPLE.COM/ ... Aug 22 16:48:25 localhost smart-proxy[66678]: <SAT_IP> - - [22/Aug/2023:16:48:25 EDT] "GET /unattended/templateServer HTTP/1.1" 200 67 Aug 22 16:48:25 localhost smart-proxy[66678]: - -> /unattended/templateServer Aug 22 16:48:25 localhost smart-proxy[66678]: malloc(): unaligned fastbin chunk detected Aug 22 16:48:25 localhost systemd-coredump[67435]: Resource limits disable core dumping for process 66678 (smart-proxy). Aug 22 16:48:25 localhost systemd-coredump[67435]: Process 66678 (smart-proxy) of user 990 dumped core. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Satellite 6.12.5 Async Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:4749 The fix is *NOT* in rubygem-xmlrpc-0.3.2-1 but in rubygem-gssapi-1.3.1-1 The bug is *NOT* fixed in 6.12.5 (which is shipped with rubygem-xmlrpc-0.3.2-1 but with rubygem-gssapi-1.2.0-8 and /usr/share/gems/gems/gssapi-1.2.0/lib/gssapi/lib_gssapi.rb does *miss* the patch. The bug is really fixed in 6.13, I double-checked that. So for potential Satellite 6.12 customers: please upgrade to 6.13 (or higher). |