PROBLEM:
We have rebooted both the primary and standby postgres nodes. After the reboot , the enterprise failover manager(EFM) , was not getting started on the standby node.
PRIMARY NODE – 10.20.30.40
STANDBY NODE – 10.20.30.41
[root@STANDBY efm-3.9]# systemctl start edb-efm-3.9
Job for edb-efm-3.9.service failed because the control process exited with error code. See "systemctl status edb-efm-3.9.service" and "journalctl -xe" for details.
-- Check the log
cat /var/log/efm-3.9/efm.log
at com.enterprisedb.efm.nodes.EfmAgent.run(EfmAgent.java:211)
at com.enterprisedb.efm.main.ServiceCommand.main(ServiceCommand.java:111)
2021-06-28 11:14:10 com.enterprisedb.efm.nodes.EfmAgent run ERROR: Exception starting service
java.lang.SecurityException: authentication failed
SOLUTION:
Check the cluster status on primary node:
[root@PRIMARY ~]# /usr/edb/efm-3.9/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Master 10.20.30.40 UP UP
Allowed node host list:
10.20.30.40
Membership coordinator: 10.20.30.40
Standby priority host list:
(List is empty.)
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Master 10.20.30.40 3/80000D0
No standby databases were found.
Here, we don’t see the entry for standby server. So let’s add that in allow node list.
Run allow-node on primary
[root@PRIMARY ~]# /usr/edb/efm-3.4/bin/efm allow-node efm 10.20.30.41
Start the efm service on standby server.
[root@STANDBY efm-3.9]# systemctl start edb-efm-3.9
It succeeded this time. Check the cluster status again.
Cluster status:
[root@STANDBY ~]# /usr/edb/efm-3.9/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Master 10.20.30.40 UP UP
Standby 10.20.30.41 UP UP
Allowed node host list:
10.20.30.40 10.20.30.41
Membership coordinator: 10.20.30.40
Standby priority host list:
10.20.30.41
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Master 10.20.30.40 3/80001B0
Standby 10.20.30.41 3/80001B0 3/80001B0
Standby database(s) in sync with master. It is safe to promote.