POSTGRES ARCHITECTURE & INTERNALS:
1. How connection is established in postgres?
The supervisor process is called postmaster and listens at a specified TCP/IP port for incoming connections. Whenever a connection request comes, it spawns a new backend process. Those backend processes communicate with each other and with other processes of the instance using semaphores and shared memory to ensure data integrity throughout concurrent data access.
Once a connection is established, the client process can send a query to the backend process it’s connected to. The backend process parses the query, creates an execution plan, executes the plan, and returns the retrieved rows to the client by transmitting them over the established connection.
2. Explain the postgres architecture ?
3. What are some important background processes in postgres?
BGWriter -> It writes dirty/modified buffers to disk
WalWriter- > WAL buffers are written out to disk at every transaction commit. The default value is -1.
SysLogger:Error Reporting and Logging
Stats Collector -> It collects and reports information about database activities. The permanent statistics are stored in pg_catalog schema in the global subdirectory.
Archiver ->. Setting up the database in Archive mode means to capture the WAL data of each segment file once it is filled and save that data somewhere before the segment file is recycled for reuse.
4. What are the memory components in postgres?
- shared_buffer ->
- − Sets the number of shared memory buffers used by the database server
- − Each buffer is 8K bytes
- − Minimum value must be 16 and at least 2 x max_connections
- − Default setting is managed by dynatune
- − 6% – 25% of available memory is a good general guideline
- − You may find better results keeping the setting relatively low and using the operating system cache more instead
- wal_buffer ->
- Number of disk-page buffers allocated in shared memory for WAL data
- − Each buffer is 8K bytes
- − Needs to be only large enough to hold the amount of WAL data created by a typical transaction since the WAL data is flushed out to disk upon every transaction commit
- − Minimum allowed value is 4
- − Default setting is -1 (auto-tuned)
- clog buffer
Private Memory used by each server process:
- Temp_buffer -> for temp table operations.
- work_mem ->
- − Amount of memory in KB to be used by internal sorts and hash tables beforeswitching to temporary disk files
- − Minimum allowed value is 64 KB
- − It is set in KB and the default is managed by dynatune
- − Increasing the work_mem often helps in faster sorting
- − work_mem settings can also be changed on a per session basis
- maintenaince_work_mem ->
− Maximum memory in KB to be used in maintenance operations such as
VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY − Minimum allowed value is 1024 KB
− It is set in KB and the default is managed by dynatune
− Performance for vacuuming and restoring database dumps can be improved
by increasing this value
- autovacuum_work_mem ->
− Maximum amount of memory to be used by each autovacuum worker process
− Default value is -1, indicates that maintenance_work_mem to be used instead
5. When wal writer write data to wal segement?
6. When bgwriter writes data to disk?
7. What are some wal related parameter in postgrs.conf file?
max_wal_size – > It is the total size of wal segment(soft limit). If the this size is filled, then checkpoint will occurs.
max_wal_sender = 10 ( maximum number of wal sender process)
8. What is checkpoint? When checkpoint happens in postgres?
A checkpoint is a point in the transaction log sequence at which all data files have been updated to reflect the information in the log. All data files will be flushed to disk.
- CREATE DATABASE,
- pg_ctl stop|restart,
- When we issue checkpoint command manually.
For periodic checkpoints below parameter play important role.
- checkpoint_timeout = 5min
- max_wal_size = 1GB
A checkpoint is begun every checkpoint_timeout seconds, or if max_wal_size is about to be exceeded, whichever comes first. The default settings are 5 minutes and 1 GB, respectivel
With these (default) values, PostgreSQL will trigger a
CHECKPOINT every 5 minutes, or after the WAL grows to about 1GB on disk
9. Lets say your work_mem is 4MB. However your query is doing heavy sorting, for which work_mem required might be more than 4MB. So will the query execution be successful?
Yes it will be successful , Because it utilized the work_mem fully, it started using temp files.
10. What are the different phases of statement processing?
11. Which parameters controls the behaviour of BGWriter?
bgwriter_delay – size of sleep delay when number of processed buffers exceeded.
bgwriter_lru_maxpages – number of processed buffers after bgwriter delays.
bgwriter_lru_multiplier – multiplier used by bgwriter to calculate how many buffers need to be cleaned out in the next round.
These settings used to make bgwriter more or less aggressive – lower values of maxpages and multiplier will make bgwriter lazier, and higher maxpages and multiplier with low delays will make bgwriter more diligent.
12. Does postgres support direct i/o.?
Postgres doesnt support direct i/o.
13. How authentication happens in postgres?
pg_hba.conf file is used defining host based authentication methods. In this file, we define the details like host,database, ip address, user and method.
Different types of methods in pg_hba.conf file are .
- Trust: for this option users can connect to the database without specifying a password. When using this option one should be cautious.
- Reject: This option rejects a connection to a database(s) for a user for a particular record in the file.
- Password: this option prompts the user for a password before connecting to the database. When this method is specified the password is not encrypted between the client and the database.
- Md5: this option prompts the user for a password before connecting to the database. When this method is specified the client is required to supply a double-MD5-hashed password for authentication.
- Ident – Obtain the operating system user name of the client by contacting the ident server on the client and check if it matches the requested database user name. Ident authentication can only be used on TCP/IP connections. When specified for local connections, peer authentication will be used instead.
If you are doing any changes to pg_hba.conf file, then you need to reload /restart the cluster for the changes to take place.
14. What is the significance of pg_ident.conf file?
Just like we have os authenticated db users in oracle. Here in postgres also we have similar concept. We can provide mapping of os user and postgres db user inside pg_ident.conf file.
15. What is this wal_level parameter , different values of wal_level?
- Wal_level determines how much information is written to the WAL
- The default value is replica, adds logging required for WAL archiving as well as information required to run read-only queries on a standby server
- The value logical is used to add information required for logical decoding
- The value minimal, removes all logging except the information required to recover from a crash or immediate shutdown
- This parameter can only be set at server start
16. What is visibility map in postgres?
Every heap relation(i.e table/index) have a visibility map associated with them.
Every visibility map has 2 bits per page.
The first bit, if set, indicates that the page is all-visible( means those pages need not to be vacuumed)
The second bit, if set means, all tuples on this page has been frozen. ( no need to vacuum)
Note – > Visiblity map bits are set by VACUUM operation. And if data is modified ,bits will be cleared.
This condition helps in index only scan.
17. What is free space mapping(FSM) in postgres?
Each table/index has a Free space mapping file. It keeps information about which pages are free.
The VACUUM process also updates the Free Space Map and using VACUUM FULL we can recover those free spaces.
18. What is initial fork?
19. What is TOAST?
20. What happens in the background during vacuuming process?
21. What is transaction id wraparound? How to overcome it.
22. What is the significance of search_path in postgres?
23. What is multi version concurrency control( MVCC)?
MVCC helps in currency control by maintaining multiple copies of same tuple, so that read and write operations same tuple doesnt impacts. Update in postgres means new row will be inserted and old row will be invalided( we can say delete followed by insert). And these old tuples can be visible to the other transactions depending upon the isolation level.
24. What are the advantages and disadvantages of MVCC?
Advantage is read consistency. Because of mvcc readers are writers dont block each other. If a a row has been modified by a user , but not committed . And if another user try to access that row, Then he will get the old committed data.
Disadvantage is , it causes bloating. Also it needs more data to keep multiple version of the data. If your database doing lot of DML activities, then postgres need to keep all the old transaction records also(updated or deleted) . And maintenance activity like vaccum need to be done remove the dead tuples.
25. Difference between pg_log, pg_clog, pg_xlog?
26. What is ctid?
It is similar to row_id in oracle.
The physical location of the row version within its table. Note that although the
ctid can be used to locate the row version very quickly, a row’s
ctid will change if it is updated or moved by
27. What is oid?
Every row in postgres will have a object identifier called oid.
28. difference between oid and relfilenode?
29. Difference between postgres.conf and postgres.auto.conf file?
postgres.conf is the configuration file of the postgres cluster. But when we do any config changes using alter system command, then those parameters are added in postgres.auto.conf file.
When postgres starts , it will first read postgres.conf and then it will read postgres.auto.conf file.
30. What is timeline in postgres?
31. what is full page write in postgres?
full_page_write parameter is set to on ( default value).
When this parameter is on, the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint. This is needed because dirty write that is in process during an operating system crash might be only partially completed, leading to an on-disk page that contains a mix of old and new data. The row-level change data normally stored in WAL will not be enough to completely restore such a page during post-crash recovery. Storing the full page image guarantees that the page can be correctly restored.
32. What is xmin and xmax in postgres?
xmin and xmax are in row header.
When a row is inserted, the value of
xmin is set equal to the transaction id that performed the INSERT command, while
xmax is null.
When a row is deleted, the
xmax value of the current version is labeled with the transaction id that performed DELETE.
The new row will have xmin same as that of xman of previous version.
1. What is the default port in postgres?
Default is 5432.
2. What is the default block size in postgres? Can we set different block size?
3. What is a tablespace?
A tablespace is used to map a logical name to a physical location on disk. In the simplest word, we can understand tablespace as a location on the disk where all database objects like table and indexes are stored.
It is also important to note that the name of the tablespace must not start with pg_, as these are reserved for the system tablespaces.
Tablespace list can be found using below command:
postgres=# select * from pg_tablespace;
4. What are the tablespaces created by default after installing postgres cluster?
Pg_global – > PGDATA/global – > used for cluster wide table and system catalog
Pg_default – > PGDATA/base directory – > it stores databases and relations
Whenever any user creates any table or index, it will be created under pg_default and this is the default_tablespace setting.
However you can change the default_tablespace setting using below one.
alter system set default_tablespace=ts_postgres;
5. What is the use of temporary tablespace?
It is used to store temporary objects like temporary table etc. But If we don’t set temp_tablespace parameter explicitly then , temporary objects will be created under pg_default tablespace.
postgres=# SELECT name, setting FROM pg_settings where name=’temp_tablespaces’;
6.What are the default databases created after setting up postgres cluster?
postgres=# select datname from pg_database;
7.What is the significance of template0 and template1 databases? What the difference between these two?
Whenever we create a new database , it creates is using the template1 database. And if you modify anything in template1 database like creating new extension in template1 database, then these changes will be reflected to other created databases.
Template0 – > No changes are allowed to this database. If you messed up your template1 database , then you can revert them by createing a new one from template0.
In the below screenshot, you can see for template1 , datallowconn is true, But template0 it is false.
8. What are common database object names in postgres compare to industry terms.
9. Difference between user and schema in postgres?
If you are aware of oracle db, then you know that both user and schema are same . But in postgres both are different.
A schema is a collection of database objects. But a user is used to connect to postgres cluster. A single user can be used to connect to any database in the cluster( provider permission is give). However schema is local to that particular database.
Below diagram should clear your doubts
10. Difference between role and user in postgres? Can we convert a role to user?
Role and user are almost same in postgres, only difference is , a role cannot login , But a user can. We can say like a user is role with login privilege.
And yes we can convert the role to a user.
alter role <role_name> nologin;
11. Difference mode to stop a postgres cluster using pg_ctl?
Postgres has 3 shutdown modes.
smart. – > It is the normal mode . i.e shutdown will wait for all existing connection to be terminated And it might take a lot of time. And no new connections will be allowed during this.
fast -> It terminates all existing sessions and performs checkpoint . It is comparably quick. But if there is lot of trnsactions that need to written to disk, then checkpoint might take lot of time.
immediate – > It will abort the instance. I.e it will terminate all existing session and shutdown the instance without performing checkpoint. It is the quickest . But upon instance startup , recovery will happen.
So if you have lot of pending transactions in your database, then the best way is perform checkpoint manually and then stop the server.
postgress#CHECKPOINT; – > Run as super user
enterprisedb$ pg_ctl stop -m fast
12. Can you give comparison of shutdown modes between oracle and postgres?
13. Can we stop a particular database in postgres cluster?
No we cannot stop/start a particular database in postgres. We can only shutdown the postgres cluster.
14. What are foreign data wrappers? What is its use?
15. what is the use of share_preload_libraries in postgres.conf file?
16. what different types of streaming replications are present in postgres? And which parameters control that.
18.What is random page cost?
19.Difference between explain and explain analyze in postgres?
Explain – > Generates query plan by calculating the cost
Explain analyze -> It will execute the query and provides query statistics of the executed query. This gives more accurate plan details. Please be careful while running insert,update,delete like DML commands with explain analyze, as it will run the query and cause data changes.
2. What are the different datatypes in postgres?
2. what is the maximum file size of table or index in postgres? Can we increase that ?
Max size is 1GB. If a table size is big, then it can spread across multiple files.
4. Is there a way in which we can rebuild/reorg a table online to release free space?
As we know vacuum full is used to rebuild table and it releases free space to operating system. However this method, puts an exclusive lock on the table.
So we can use pg_repack extension to rebuild a table online.
4. How pg_repack works internally?
4. How can i check the version of postgres?
4. What is the latest version of postgres in market?
5. While dropping a postgres database i am getting error? What might be the issue?
If you want to drop a database , then you need to fire the command, after connecting to a different database in the same postgres cluster with superuser privilege.
5. What are some key difference between oracle and postgres?
5. Between postgres and nosql database like mongodb , which one is better?
5. What is table partitioning in postgres? What are the advantages?
5. How you monitor long running queries in postgres?
5. How to kill a session in postgres?
5. What is an extension in postgres? Which extensions you have used ?
5. What are the popular tools for managing backup and recovery in postgres?
edb bart , barman etc
5. How can we use connection pooling in postgres?
pgbouncer is used for connection pooling
5. Which utility is used to upgrade postgres cluster?
pg_upgrade utility is used to upgrade postgres version.
5. How can we encrypt specific columns in postgres?
pgcrypto extension can be used
5. What is the use of pgbench utility?
5. What is difference between pg_cancel_backend vs pg_terminate_backend?
5. Do we have synonyms in postgres?
5. How can we convert a non partitioned table to partitioned table in postgres?
5. How can we convert a non partitioned table to partitioned table in postgres?
5. Can we create invisible index in postgres?
5. Suppose we have some views using a table and if drop that table, what will be the impact on those views?
5. Difference between open source installation and edb installation?
1. What is vacuuming in postgres?
Whenever tuples are deleted or becomes obsolete due to update, then they are not removed physicallly. These tuples are known as dead tuples. So vacuuming can be used to clear those dead tuples and release the space.
2. Difference between vacuum and vacuum full?
- Vacuum without full option, will reclaim the space and re-use it. However that free space is not returned to operating system.
- This will not put any exclusive lock on the system. So impact on the normal read write of the database.
- We can run this in parallel also. called as parallel vacuuming
- Vacuum with full option, will reclaim more space and release the space to operating system.
- However we need additional storage , as it will rewrite the entire data to new disk file without any extra space.
- Also it will put exlusive lock on the table for which vaccuming is in progress.
- This method is much slower.
3. What do you mean by vacuum freeze? And when we use it?
4. What is auto vaccuming in postgres?
Autovacuum triggers when dead tuples in a table reaches 20 percent of. table and it analyze the table when it reaches 10 percent of a table.
Below four parameters control the behaviour of the autovacumming.
autovacuum_vacuum_scale_factor = 0.2; autovacuum_analyze_scale_factor = 0.1; autovacuum_vacuum_threshold (integer)50 autovacuum_analyze_threshold (integer)50
At table level: ALTER TABLE table_name SET (autovacuum_enabled = false);
5. How can you check whether the database is primary or standby(replication).
method 1 :
We can check using below query. If the output is t means true i.e it is standby(slave server). If false means its primary or master.
You can run below query.
select * from pg_stat_replication;
5. I have a master and slave setup with streaming replication. Now I want to break the replication and open the standby server for some testing purpose and once the testing is done, the server again need to be a standby of that primary with replication enabled. How can i do that?
5. How pg_rewind works?
5. How can i do switchover in postgres without efm?
In postgres there is nothing called switchover, we can only do failover. but with efm we can do switchover using efm promote command.
5.Difference between standby.signal and recovery.signal?
recovery.signal: tells PostgreSQL to enter normal archive recovery . ( example – Point in time recovery).
standby.signal: tells PostgreSQL to enter standby mode .
5. In streaming replication which additional processes run on primary and standby?
5. Explain different types of replications methods available in postgres?
5. What is the significance of the parameter synchronous_commit?
5. Suppose A is primary and B is standby ? Now failover happened and then A becomes standby and B becomes primary. Now i want to make the A as primary again and B as standby . How can i achieve that?
5. Explain the architecture of EFM?
You need one master, one slave and one witness server.
5. What are some common reasons which cause streaming replication to fail.
5. What are the changes related to recovery.conf happened in postgres 12.
From postgres 12 onwards, the parameters of recovery.conf are now part of postgres.conf file. Instead of recovery.conf file, only two files are present, standby.signal and recovery.signal file( note both are empty files).
5. When we do failover, what happens in the timeline.
Timeline gets changed. i.e new timeline id is selected.
5. Like snapshot standby in oracle, can we use slave server in postgres as snapshot standby and revert it back to slave server once testing is done?
5.Is there any mandatory parameter for pg_rewind to work?
5.Is replication possible between different versions of postgres?
BACKUP & RECOVERY:
1. Explain different backup methods in postgres?
Reference – https://www.tutorialdba.com/p/postgresql-backup-recovery-overview.html
5. difference between pg_dump and pg_dumpall?
Both are logical backup utilities.
pg_dump -> Used for taking backup of databases, schema,table .backup format is like sql, tar file or directory
pg_dumpall -> Used for taking backup of complete cluster. backup format is sql .
5. Can i take backup of complete cluster using pg_dump?
5. What is the use of -globals options in pg_dumpall?
1. What pg_basebackup does internally?
1. Explain how postgres does crash recovery?
In case of crash of postgres instance, when the instance starts again, it will reply the wal file and recover the instance.
1. Can we take incremental backup using postgres? and how?
You can use third party tools like BART and barman.
1. Which important parameter is used for point in time recovery ?
1. What do you mean by continuous archiving in postgres?
17.Can I take export of only users using pg_dump?
1. How can i rebuild all the indexes of a database in one command ?
As a super user run the below command.
REINDEX DATABASE <DB_NAME>;
2. What happens internally when we rebuild indexes?
3. Different type of indexes present in postgres ?
4. Does create index puts lock on the table? If yes , then any workaround to avoid this locking?
Yes create index command puts exclusive lock on the table . So you can add CONCURRENTLY keyword, which will avoid the exclusive lock on the table.
5. What is index only scan in postgres?
PATCH & UPGRADE:
1. What is major version upgrade and minor version upgrade in postgres?
2. Explain how you do a major upgrade in postgres?
3. What is the user of link option in pg_upgrade?
4. What are the advantages and disadvantages of using link option?
4. Difference between explain and explain analyze?
4. How do you find which objects are bloated in db?