Description of problem: Fields in upstream HTTPClientConfig struct [1] are 8, whereas they are 5 in the operator [2]. Component version: Prometheus Operator 4.9. Actual results: When a configuration corresponding to a missing field in the operator struct is performed on alertmanager-main secret, an error like the following is received: prometheus-operator fails to parse this configuration due to an unknown field (follow_redirects): level=error ts=2022-01-14T08:10:38.69780826Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="Sync \"openshift-monitoring/main\" failed: provision alertmanager configuration: base config from Secret could not be parsed: yaml: unmarshal errors:\n line 4: field follow_redirects not found in type alertmanager.httpClientConfig\n line 116: field follow_redirects not found in type alertmanager.httpClientConfig" Expected results: No error because both structs have the same fields. Additional info: The reason why there are two different versions of the same struct seems to be explained here [3][4][5]. Upstream version used by the operator is 0.29.0 [6]. [1] https://github.com/prometheus/common/blob/v0.29.0/config/http_config.go#L158-L179 [2] https://github.com/openshift/prometheus-operator/blob/release-4.9/pkg/alertmanager/types.go#L178-L184 [3] https://github.com/prometheus/alertmanager/issues/1985 [4] https://github.com/prometheus/alertmanager/pull/1804#issuecomment-482038079 [5] https://github.com/prometheus/alertmanager/blob/4017d1a478e546909a2b4b4173b6aab3e7dd2dbe/config/config.go#L53-L58 [6] https://github.com/openshift/prometheus-operator/blob/release-4.9/go.mod#L25
This issue should be resolved in 4.10 via https://github.com/prometheus-operator/prometheus-operator/pull/4333/
tested with 4.10.0-0.nightly-2022-01-17-182202 and followed steps in https://docs.openshift.com/container-platform/4.9/monitoring/managing-alerts.html#applying-custom-alertmanager-configuration_managing-alerts output the current Alertmanager configuration into file alertmanager.yaml, and edit to include http_config.follow_redirects 1. oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml 2. edit alertmanager.yaml, include http_config.follow_redirects **************************** global: resolve_timeout: 5m http_config: follow_redirects: "false" inhibit_rules: - equal: - namespace - alertname source_matchers: - severity = critical target_matchers: - severity =~ warning|info - equal: - namespace - alertname source_matchers: - severity = warning target_matchers: - severity = info receivers: - name: Default - name: Watchdog - name: Critical - name: webhook webhook_configs: - send_resolved: "true" http_config: follow_redirects: "true" url: http://gems-agent.gemcloud-system:8041/alert route: group_by: - namespace group_interval: 5m group_wait: 30s receiver: Default repeat_interval: 12h routes: - matchers: - alertname = Watchdog receiver: Watchdog - matchers: - severity = critical receiver: Critical - receiver: webhook match: severity: critical **************************** 3. Apply the new configuration, no error for that # oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run=client -o=yaml | oc -n openshift-monitoring replace secret --filename=- secret/alertmanager-main replaced 4. the new config file loaded, # oc -n openshift-monitoring exec -c alertmanager alertmanager-main-0 -- cat /etc/alertmanager/config/alertmanager.yaml global: resolve_timeout: 5m http_config: follow_redirects: "false" inhibit_rules: - equal: - namespace - alertname source_matchers: - severity = critical target_matchers: - severity =~ warning|info - equal: - namespace - alertname source_matchers: - severity = warning target_matchers: - severity = info receivers: - name: Default - name: Watchdog - name: Critical - name: webhook webhook_configs: - send_resolved: "true" http_config: follow_redirects: "true" url: http://gems-agent.gemcloud-system:8041/alert route: group_by: - namespace group_interval: 5m group_wait: 30s receiver: Default repeat_interval: 12h routes: - matchers: - alertname = Watchdog receiver: Watchdog - matchers: - severity = critical receiver: Critical - receiver: webhook match: severity: critical **************************** also the same result from # oc -n openshift-monitoring get secret alertmanager-main -o jsonpath="{.data.alertmanager\.yaml}" | base64 -d but there is error in alertmanager # oc -n openshift-monitoring logs -c alertmanager alertmanager-main-0 | tail level=info ts=2022-01-18T13:16:46.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml level=error ts=2022-01-18T13:16:46.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n line 4: cannot unmarshal !!str `false` into bool\n line 26: cannot unmarshal !!str `true` into bool\n line 28: cannot unmarshal !!str `true` into bool" level=info ts=2022-01-18T13:16:51.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml level=error ts=2022-01-18T13:16:51.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n line 4: cannot unmarshal !!str `false` into bool\n line 26: cannot unmarshal !!str `true` into bool\n line 28: cannot unmarshal !!str `true` into bool" level=info ts=2022-01-18T13:16:56.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml level=error ts=2022-01-18T13:16:56.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n line 4: cannot unmarshal !!str `false` into bool\n line 26: cannot unmarshal !!str `true` into bool\n line 28: cannot unmarshal !!str `true` into bool" level=info ts=2022-01-18T13:17:01.568Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml level=error ts=2022-01-18T13:17:01.568Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n line 4: cannot unmarshal !!str `false` into bool\n line 26: cannot unmarshal !!str `true` into bool\n line 28: cannot unmarshal !!str `true` into bool" level=info ts=2022-01-18T13:17:06.567Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/etc/alertmanager/config/alertmanager.yaml level=error ts=2022-01-18T13:17:06.567Z caller=coordinator.go:118 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n line 4: cannot unmarshal !!str `false` into bool\n line 26: cannot unmarshal !!str `true` into bool\n line 28: cannot unmarshal !!str `true` into bool"
continue with Comment 4, did not see the change in the ${alertmanager_route}/#/status, it remains the default configuration
I think the issue here is that you are evaluating bool as strings. I checked the following with amtool and it validates: global: resolve_timeout: 5m http_config: follow_redirects: false inhibit_rules: - equal: - namespace - alertname source_matchers: - severity = critical target_matchers: - severity =~ warning|info - equal: - namespace - alertname source_matchers: - severity = warning target_matchers: - severity = info receivers: - name: Default - name: Watchdog - name: Critical - name: webhook webhook_configs: - send_resolved: true http_config: follow_redirects: true url: http://gems-agent.gemcloud-system:8041/alert route: group_by: - namespace group_interval: 5m group_wait: 30s receiver: Default repeat_interval: 12h routes: - matchers: - alertname = Watchdog receiver: Watchdog - matchers: - severity = critical receiver: Critical - receiver: webhook match: severity: critical
updated as Comment 6, no error in alertmanager container, and we could see the configuration loaded in ${alertmanager_route}/#/status page
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056