In this article we will explain how to do switchover manually . Switchover means, change of role between master and slave. postgres doesn’t provide any inbuilt command to do switchover . We will use the pg_ctl promote command with some additional steps to do the switchover.
Note – There are few third party tools like EDB FAILOVER MANAGER(EFM) and REPMGR , which can be used to manage the replication(like switchover, failover).
MASTER – > 10.20.30.6
SLAVE – > 10.20.30.7
POSTGRES VERSION – > 14
1. Check whether replication process and the sync status:
-- Run on master server(10.20.30.6 )
ostgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 13132
usesysid | 10
usename | postgres
application_name | walreceiver
client_addr | 10.20.30.6
client_hostname |
client_port | 18178
backend_start | 2022-07-11 16:44:53.917819+03
backend_xmin |
state | streaming
sent_lsn | 0/4652FA0
write_lsn | 0/4652FA0
flush_lsn | 0/4652FA0
replay_lsn | 0/4652FA0
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2022-07-12 17:00:50.440717+03
postgres=# SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) from pg_stat_replication;
-[ RECORD 1 ]---+--
pg_wal_lsn_diff | 0
-- Run on SLAVE server(10.20.30.7)
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 10229
status | streaming
receive_start_lsn | 0/4000000
receive_start_tli | 2
written_lsn | 0/4652FA0
flushed_lsn | 0/4652FA0
received_tli | 2
last_msg_send_time | 2022-07-12 17:00:32.22551+03
last_msg_receipt_time | 2022-07-12 17:01:30.559348+03
latest_end_lsn | 0/4652FA0
latest_end_time | 2022-07-11 16:50:36.695865+03
slot_name |
sender_host | 10.20.30.7
sender_port | 5444
conninfo | user=postgres password=******** channel_binding=prefer dbname=replication host=10.20.30.7 port=5444 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
postgres=# select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp();
-[ RECORD 1 ]-----------------+-----------------------------
pg_last_wal_receive_lsn | 0/4652FA0
pg_last_wal_replay_lsn | 0/4652FA0
pg_last_xact_replay_timestamp | 2022-07-11 16:45:29.86949+03
Now we will proceed with switchover
2. Shutdown the MASTER: [ ON MASTER SERVER 10.20.30.6 ]
-bash-4.2$ /usr/pgsql-14/bin/pg_ctl stop -D /oracle/pg_data
waiting for server to shut down.... done
server stopped
3. Promote slave as new master:[ RUN ON SLAVE SERVER 10.20.30.7 ]
[postgres]$ /usr/pgsql-14/bin/pg_ctl promote -D /oracle/pg_data
waiting for server to promote.... done
server promoted
-- CHECK THE LOG FILE:
Is the server running on that host and accepting TCP/IP connections?
2022-07-12 17:03:43.890 +03 [10225] LOG: received promote request
2022-07-12 17:03:43.890 +03 [10225] LOG: redo done at 0/5000028 system usage: CPU: user: 0.39 s, system: 0.66 s, elapsed: 87473.32 s
2022-07-12 17:03:43.890 +03 [10225] LOG: last completed transaction was at log time 2022-07-11 16:45:29.86949+03
2022-07-12 17:03:43.906 +03 [10225] LOG: selected new timeline ID: 3
2022-07-12 17:03:43.997 +03 [10225] LOG: archive recovery complete
2022-07-12 17:03:44.020 +03 [10223] LOG: database system is ready to accept connections
We can see it has been opened with read , write with a new timeline id. But replication is broken. i.e currently we have only master server, no slave server.
So for the switchover to successful, we need to make old master as slave
4. On OLD MASTER( i.e NEW SLAVE 10.20.30.6)
-- Create a standby.signal file(empty)
touch /oracle/pg_data/standby.signal
-- Update the postgres.auto.conf file with NEW MASTER SERVER DETAILS
-bash-4.2$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=postgres password=postgres channel_binding=prefer host=10.20.30.6 port=5444 sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
wal_log_hints = 'on'
5 . START NEW SLAVE ( 10.20.30.6):
/usr/pgsql-14/bin/pg_ctl start -D /oracle/pg_data
6. Check whether master-slave replication is working fine or not:
-- ON MASTER:
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 6965
usesysid | 10
usename | postgres
application_name | walreceiver
client_addr | 10.20.30.7
client_hostname |
client_port | 51244
backend_start | 2022-07-13 10:29:03.01259+03
backend_xmin |
state | streaming
sent_lsn | 0/9600C20
write_lsn | 0/9600C20
flush_lsn | 0/9600C20
replay_lsn | 0/9600C20
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2022-07-13 10:31:10.636569+03
-- ON SLAVE:
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 30164
status | streaming
receive_start_lsn | 0/5000000
receive_start_tli | 3
written_lsn | 0/8000060
flushed_lsn | 0/8000060
received_tli | 3
last_msg_send_time | 2022-07-13 10:30:32.937076+03
last_msg_receipt_time | 2022-07-13 10:29:33.399633+03
latest_end_lsn | 0/8000060
latest_end_time | 2022-07-13 10:30:02.798967+03
slot_name |
sender_host | 10.20.30.6
sender_port | 5444
conninfo | user=postgres password=postgres channel_binding=prefer host=10.20.30.6 port=5444 sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
You can do some data changes on MASTER, and check whether they are getting replicated to SLAVE or not.
-- ON MASTER:
postgres=# create table test101 as select * from test100;
SELECT 88065
-- ON SLAVE:
postgres=# select count(*) from test101;
-[ RECORD 1 ]
count | 88065
We have switched over the roles of MASTER and SLAVE with few simple steps.