Bug 909973 - [abrt] pacemaker-1.1.8-3.fc18: crm_abort: Process /usr/libexec/pacemaker/cib was killed by signal 6 (SIGABRT)
Summary: [abrt] pacemaker-1.1.8-3.fc18: crm_abort: Process /usr/libexec/pacemaker/cib ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: pacemaker
Version: 18
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Andrew Beekhof
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:a9b789c361528e4adaf80cf18e0...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-11 14:59 UTC by gabicr
Modified: 2014-02-05 23:01 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-05 23:01:53 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: backtrace (9.49 KB, text/plain)
2013-02-11 14:59 UTC, gabicr
no flags Details
File: cgroup (164 bytes, text/plain)
2013-02-11 14:59 UTC, gabicr
no flags Details
File: core_backtrace (1.25 KB, text/plain)
2013-02-11 14:59 UTC, gabicr
no flags Details
File: dso_list (2.90 KB, text/plain)
2013-02-11 14:59 UTC, gabicr
no flags Details
File: environ (361 bytes, text/plain)
2013-02-11 14:59 UTC, gabicr
no flags Details
File: limits (1.29 KB, text/plain)
2013-02-11 14:59 UTC, gabicr
no flags Details
File: maps (16.90 KB, text/plain)
2013-02-11 14:59 UTC, gabicr
no flags Details
File: open_fds (599 bytes, text/plain)
2013-02-11 14:59 UTC, gabicr
no flags Details
File: proc_pid_status (890 bytes, text/plain)
2013-02-11 15:00 UTC, gabicr
no flags Details
File: var_log_messages (10.76 KB, text/plain)
2013-02-11 15:00 UTC, gabicr
no flags Details

Description gabicr 2013-02-11 14:59:38 UTC
Description of problem:
pcs cluster stop --all

Version-Release number of selected component:
pacemaker-1.1.8-3.fc18

Additional info:
backtrace_rating: 4
cmdline:        /usr/libexec/pacemaker/cib
crash_function: crm_abort
executable:     /usr/libexec/pacemaker/cib
kernel:         3.7.5-201.fc18.x86_64
remote_result:  NOTFOUND
uid:            991

Truncated backtrace:
Thread no. 1 (4 frames)
 #2 crm_abort at utils.c:1061
 #3 write_cib_contents at io.c:711
 #4 crm_trigger_dispatch at mainloop.c:105
 #10 cib_init at main.c:551

Comment 1 gabicr 2013-02-11 14:59:41 UTC
Created attachment 696124 [details]
File: backtrace

Comment 2 gabicr 2013-02-11 14:59:43 UTC
Created attachment 696125 [details]
File: cgroup

Comment 3 gabicr 2013-02-11 14:59:45 UTC
Created attachment 696126 [details]
File: core_backtrace

Comment 4 gabicr 2013-02-11 14:59:50 UTC
Created attachment 696127 [details]
File: dso_list

Comment 5 gabicr 2013-02-11 14:59:52 UTC
Created attachment 696128 [details]
File: environ

Comment 6 gabicr 2013-02-11 14:59:54 UTC
Created attachment 696129 [details]
File: limits

Comment 7 gabicr 2013-02-11 14:59:56 UTC
Created attachment 696130 [details]
File: maps

Comment 8 gabicr 2013-02-11 14:59:58 UTC
Created attachment 696131 [details]
File: open_fds

Comment 9 gabicr 2013-02-11 15:00:00 UTC
Created attachment 696132 [details]
File: proc_pid_status

Comment 10 gabicr 2013-02-11 15:00:10 UTC
Created attachment 696133 [details]
File: var_log_messages

Comment 11 Andrew Beekhof 2013-02-12 04:19:27 UTC
Logical assertion is failing:
 
    crm_debug("Wrote digest %s to disk", digest);
    tmp_cib = retrieveCib(tmp1, tmp2, FALSE);
    CRM_ASSERT(tmp_cib != NULL);

Confirmed by the logs:

Feb  5 16:16:28 nod1 cib[17385]:    error: crm_abort: write_cib_contents: Triggered fatal assert at io.c:711 : tmp_cib != NULL

Stack variables:

        tmp_cib = 0x0
        tmp1 = 0x1832500 "/var/lib/pacemaker/cib/cib.3hMlhw"
        tmp2 = 0x1832530 "/var/lib/pacemaker/cib/cib.ObiXcZ"

Cause is also in the logs:

Feb  5 16:16:28 nod1 cib[17385]:    error: validate_cib_digest: Digest comparision failed: expected f186c27d612ffbd8183dd66d340a5260 (/var/lib/pacemaker/cib/cib.ObiXcZ), calculated d68a94e6e24e58e44fa36592c317a589
Feb  5 16:16:28 nod1 cib[17385]:    error: retrieveCib: Checksum of /var/lib/pacemaker/cib/cib.3hMlhw failed!  Configuration contents ignored!

However the usual culprit is unlikely in this code path:

Feb  5 16:16:28 nod1 cib[17385]:    error: retrieveCib: Usually this is caused by manual changes, please refer to http://clusterlabs.org/wiki/FAQ#cib_changes_detected



Has this cluster ever functioned or is this the first time the error has been observed?

Comment 12 gabicr 2013-02-13 14:34:55 UTC
Hello!


I also got


Feb 13 16:31:53 nod1 lrmd[989]:   notice: operation_finished: WebFS_start_0:19855 [ modprobe: FATAL: Module scsi_hostadapter not found. ]
Feb 13 16:31:54 nod1 crmd[992]:   notice: process_lrm_event: LRM operation WebFS_start_0 (call=146, rc=0, cib-update=208, confirmed=true) ok



Cluster run when only one resource - ClusterIP (ocf::heartbeat:IPaddr2) - was defined.

Then I tried to define resource for Squid - proxy and followed the 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html#_adding_a_resource ..

Comment 13 gabicr 2013-02-13 14:40:01 UTC
 pcs resource create Proxy ocf:heartbeat:Squid params squid_exe="/usr/sbin/squid" squid_conf="/etc/squid/squid.conf" squid_pidfile="/v
ar/run/squid.pid" squid_port="3128"


427  crm_verify -L -V
  428  pcs resource a
  429  pcs resource 
  430  pcs property set no-quorum-policy=ignore
  431  crm_verify -L -V
  432  pcs constraint colocation add ClusterIP Proxy
  433  pcs constraint 
  434  pcs status
  435  ls -la /etc/squid/
  436  pcs constraint order ClusterIP then Proxy
  437  pcs constraint 
  438  pcs status
  439  crm_simulate -sL
  440  histo
  441  cat /etc/drbd.d/global_common.conf
  442  drbdadm -- --overwrite-data-of-peer primary data
  443  cat /proc/drbd 
  444  mkfs.ext4 /dev/drbd1
  445  mkdir /var/spool/squid1
  446  mount /dev/drbd1  /var/spool/squid1
  447  pcs cluster cib drbd_cfg
  448  pcs -f drbd_cfg resource create Proxydata ocf:linbit:drbd drbd_resource=data op monitor interval=60s 
449  pcs -f drbd_cfg resource master ProxydataClone Proxydata master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
  450  pcs -f drbd_cfg resource show
  451  pcs cluster push cib drbd_cfg
  452  pcs status
  453  pcs cluster cib fs_cfg
  454  pcs -f fs_cfg resource create WebFS Filesystem device="/dev/drbd/by-res/data" directory="/var/spool/squid1" fs_type="ext4"
  455  pcs -f fs_cfg constraint colocation add WebFS ProxydataClone INFINITY with-rsc-role=Master
  456  pcs -f fs_cfg constraint order promote ProxydataClone then start WebFS
  457  pcs -f fs_cfg constraint colocation add Proxy WebFS INFINITY
  458  pcs -f fs_cfg constraint order WebFS then Proxy
  459  pcs -f fs_cfg constraint
  460  pcs -f fs_cfg resource show
  461  pcs cluster push cib fs_cfg
  462  mount /dev/drbd1 /var/spool/squid1
  463  pcs status
  464  pcs cluster sync
  465  pcs -f fs_cfg constraint
  466  pcs cluster 
  467  pcs cluster stop --all
  468  pcs cluster start --all
  469  pcs cluster 
  470  pcs status
  471  vi /etc/squid/squid.conf
  472  pcs cluster stop --all
  473  vi /etc/squid/squid.conf
  474  pcs cluster start --all
  475  pcs status
  476  tail -f /var/log/messages
  477  pcs resource
  478  ping 10.125.20.22
  479  ping 10.125.20.21
  480  ping 10.125.20.23






:

Comment 14 Fedora End Of Life 2013-12-21 11:19:45 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Andrew Beekhof 2014-01-10 02:00:35 UTC
Many apologies, somehow I lost track of this bug.
You don't appear to be doing anything unusual, very strange that you are/were experiencing this.

Sometimes it can be caused by stray whitespace in the configuration, but I can't imagine how that would result from the commands you posted.

Were you able to make progress?  If not we can move to a later version of Fedora and discuss

Comment 16 Fedora End Of Life 2014-02-05 23:01:53 UTC
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.