Postgres Tips and Tricks By Lloyd Albin 5 1 2013

Postgres Tips and Tricks By Lloyd Albin 5/1/2013, 6/11/2013

Why does my query run slow SELECT max(l 1. upload_timestamp) AS Last. Upload. Timestamp, lcase(l 1. network) as network, l 1. filename, count(l 1. file_id) AS Number. Uploads FROM lab_upload_log AS l 1 GROUP BY lcase(l 1. network), l 1. filename DISTINCT and GROUP BY give the same results, but internally DISTINCT is faster. SELECT DISTINCT max(l 1. upload_timestamp) AS Last. Upload. Timestamp, lcase(l 1. network) as network, l 1. filename, count(l 1. file_id) AS Number. Uploads FROM lab_upload_log AS l 1 GROUP BY lcase(l 1. network), l 1. filename

Why does my query run slow SELECT max(l 1. upload_timestamp) AS Last. Upload. Timestamp, lcase(l 1. network) as network, l 1. filename, count(l 1. file_id) AS Number. Uploads FROM lab_upload_log AS l 1 GROUP BY lcase(l 1. network), l 1. filename CREATE INDEX lab_upload_log_idx ON lab_data_ops. lab_upload_log USING btree (network, filename); Create a functional index for the field CREATE INDEX lab_upload_log_idx 2 ON lab_data_ops. lab_upload_log USING btree (lower(network), filename); Create an index for the field if you are going to use the field in a WHERE clause or a GROUP BY clause.

Why does my query run slow SELECT ucase(final_lab_upload_log. lab) AS lab, count(final_lab_upload_log. filename) AS "# Unique Files", max(x. Num. Uploads) AS "Total # Uploads“ FROM final_lab_upload_log LEFT JOIN ( SELECT count(lab_upload_log. filename) AS Num. Uploads, ucase(lab_upload_log. lab) AS lab FROM lab_upload_log WHERE lcase(lab_upload_log. network)='vtn‘ AND lab_upload_log. upload_timestamp > (curdate() - 30) GROUP BY ucase(lab_upload_log. lab)) AS x ON x. lab = final_lab_upload_log. lab WHERE final_lab_upload_log. Last. Upload. Timestamp > (curdate() - 30) GROUP BY ucase(final_lab_upload_log. lab) Remove doubling of the ucase. The doubling stops the index from being used. SELECT ucase(final_lab_upload_log. lab) AS lab, … GROUP BY ucase(final_lab_upload_log. lab)

How much will this really benefit me? The previous queries when used in production run in: 5 seconds (individually) But in reality, we have three of these types of queries and so they take over: 15 seconds combined After the changes on the three previous slides: 30 -100 milliseconds per each individual query <200 milliseconds for the 3 queries together

How to generate this table Lab Upload Log Today Lab Last 7 Days # Unique Files Total # Files Uploaded Last 30 Days # Unique Files Total # Files Uploaded FH 2 # Unique Files Total # Files Uploaded 2 SA Total 56 198 HV 55 202 10 0 0 10 54 54 12 12 307 312 The better way to do this would be to write a single query to This table is generated by 3 separate queries and then merged together via javascript. generate this table. SELECT a. lab, c. u_files, c. files, b. u_files, b. files, a. u_files, a. files FROM a LEFT JOIN b ON a. lab = b. lab LEFT JOIN c ON a. lab = b. lab

Reduce Code Don’t write the same query in two different places. This causes you to have to maintain twice as much code and makes it so that when people are updating the code, that they may only catch one instance of the code. A good example of this problem is when you create a user defined query in atlas on a folder and then create a module with the same query and you attach the module to other folders. The user defined query should be removed and the module attached to that folder.

How much is the savings worth

Where did my NULL’s go? Source Table SELECT * FROM table WHERE v != 398 v t 100 v t 500 100 v t 398 100 500 SELECT * FROM table WHERE t != ‘ 398’ null SELECT * FROM table WHERE v != 398 OR v IS NULL null v t 100 500 null When you do comparisons in Postgres NULL’s are automatically removed unless you specifically ask for them.

The normal way to get current job DISTINCT ON with ORDER BY and SELECT d. * FROM ( GROUP BY with MIN or MAX can give SELECT b. emplid, the same results, but internally b. empl_rcd, DISTINCT ON is faster and is easier to b. effdt, read the code. max(b. effseq) AS effseq FROM ( SELECT ps_job. emplid, ps_job. empl_rcd, max(ps_job. effdt) AS effdt FROM finance_feeds. ps_job GROUP BY ps_job. emplid, ps_job. empl_rcd ) a LEFT JOIN finance_feeds. ps_job b ON a. emplid: : text = b. emplid: : text AND a. empl_rcd = b. empl_rcd AND a. effdt = b. effdt GROUP BY b. emplid, b. empl_rcd, b. effdt ) c LEFT JOIN finance_feeds. ps_job d ON c. emplid: : text = d. emplid: : text AND c. empl_rcd = d. empl_rcd AND c. effdt = d. effdt AND c. effseq = d. effseq WHERE d. empl_status: : text <> 'T' : : text ; -- 833 rows returned (execution time: 32 ms; total time: 93 ms ) -- 14, 719 rows in finance_feeds. ps_job

Fast way to get current job SELECT * FROM ( SELECT DISTINCT ON (emplid, empl_rcd) * FROM finance_feeds. ps_job ORDER BY emplid, empl_rcd, effdt DESC, effseq DESC ) a WHERE a. empl_status: : text <> 'T' : : text; -- 833 rows returned (execution time: 31 ms; total time: 31 ms) -- 14, 719 rows in finance_feeds. ps_job Much simpler code using DISTINCT ON with ORDER BY instead of GROUP BY with MAX.

Query Plan Difference Original Query DISTINCT ON Query Lab. Key does not support DISTINCT ON at this time. https: //www. labkey. org/wiki/home/Docu mentation/page. view ? name=labkey. Sql Query plan is almost ½ the speed and only scans the table once instead of three times.

How to compare two queries There is a great command called EXCEPT. This will compare the results of two queries and tell you what is in the first query that is not in the second query. SELECT * FROM view_a EXCEPT SELECT * FROM view_b SELECT * FROM (SELECT * FROM view_a)) a EXCEPT SELECT * FROM (SELECT * FROM view_b)) b This will show you all lines in view_a that are not in view_b. To find out all lines on view_b that are not in view_a, reverse the two queries. If you are comparing two complex query statements, wrap them in a simple SELECT statement so that the EXCEPT will not get confused.

How to compare two queries There is a great command called EXCEPT. This will compare the results of two queries and tell you what is in the first query that is not in the second query. SELECT * FROM ( SELECT 1 UNION SELECT 2 ) a EXCEPT SELECT * FROM ( SELECT 1 UNION SELECT 3 ) b ? Column? 2

How to compare two queries Normally you will also want to reverse the two queries so that you can check the results going the other direction. This way you have two sets of results, what is extra in each query. SELECT * FROM ( SELECT 1 UNION SELECT 3 ) b EXCEPT SELECT * FROM ( SELECT 1 UNION SELECT 2 ) a ? Column? 3

How to compare two queries If you want only the records that match, then you want to use INTERSECT. SELECT * FROM ( SELECT 1 UNION SELECT 3 ) b INTERSECT SELECT * FROM ( SELECT 1 UNION SELECT 2 ) a ? Column? 1

Finding the slow line in your query EXPLAIN will show you the query plan, and this by itself is helpful, but even more helpful is the EXPLAIN (ANALYZE, BUFFERS) which compares the query plan to what actually happened when the query was run. Also use http: //explain. depesz. com/ Nested Loop Left Join (cost=305819. 49. . 449850. 69 rows=1 width=572) (actual time=1871. 328. . 9512784. 289 rows=10983 loops=1) Filter: ((NOT (hashed Sub. Plan 42)) AND ((Sub. Plan 38) IS NOT NULL)) Rows Removed by Filter: 74 Buffers: shared hit=2412568803 read=99 1) 2) Estimate rows=1 vers Actual rows=10983. When you have big difference between these numbers, this is a sign of a problem. This can be caused by not having enough statistics, not having an index, etc. Actual time=1871. 328. . 9912784. 289. This means that this row started 1. 87 seconds into the query and took the difference of the two times, 2. 64 hours, to complete

Number of months between dates date_trunc(‘month’, date) -- First day of the month date + ‘ 1 day’: : interval -- Converts last day of the month to the first day of next month age(date, date) -- The difference between two timestamps as an interval date_part(‘year’, date/interval) -- returns only the year portion of the date/interval SELECT (date_part('year', age(max(b. earnenddate): : timestamp + interval '1 day', date_trunc('month', min(b. earnenddate): : timestamp))*12 + date_part('month', age(max(b. earnenddate): : timestamp + interval '1 day', date_trunc('month', min(b. earnenddate): : timestamp))): : integer AS elapsed_months FROM ( SELECT '06/30/2012' AS earnenddate UNION SELECT '12/31/2013' AS earnenddate elapsed_months ) b; 19

Creating a Table from a View CREATE TABLE schema. table AS SELECT * FROM schema. view; This allows you to create a table without having to look up all the field names and types to first generate a table and then fill it with the results of the view. SELECT * INTO schema. table FROM schema. view; The Postgre. SQL usage of SELECT INTO to represent table creation is historical. It is best to use CREATE TABLE AS for this purpose in new code.

Update a Table from a View BEGIN; TRUNCATE schema. table; INSERT INTO schema. table SELECT * FROM schema. view; COMMIT; This allows you to take the results of a view and append them to an existing table. You may wish to TRUNCATE schema. table before adding the new data.

What order is my data in? When you don’t use an ORDER BY clause, your data is in physical order of insert and update. CREATE TEMP TABLE test ( key SERIAL, test BOOLEAN ); key test SELECT * FROM test; f 2 f 3 INSERT INTO test VALUES (1, FALSE); INSERT INTO test VALUES (2, FALSE); INSERT INTO test VALUES (3, FALSE); 1 f deleted key test 1 f 2 f 3 f

What order is my data in? UPDATE test SET test = TRUE WHERE key = 2; deleted key test 1 2 f 3 f 2 X f t SELECT * FROM test; key test 1 f 3 f 2 t

ORDER BY field Normally you use ORDER BY field_name, but you can also use ORDER BY field_number. I have found this to sometimes be useful when unioning one or more sets of data together. SELECT 'test 5' AS test 1 UNION ALL SELECT 'test 2' AS test 2 ORDER BY 1 test 2 test 5 Notes: ORDER BY test 1 will work. ORDER BY test 2 will not work. UNION ALL gives you all rows from each SELECT and runs much faster. UNION only gives you DISTINCT rows between the two tables and ordered

TRUNCATE vers DELETE TRUNCATE is normally the best way to go because it removes all the data within the table(s) quickly and by specifying more than one table, deals automatically with foreign key dependencies. Delete can take a long time depending on the foreign key dependencies, etc. TRUNCATE is not MVCC-save, so after truncation, the table will appear empty to all concurrent transactions, even if they are using a snapshot taken before the truncation occurred. DELETE does not have this issue. RESTART IDENTITY Automatically restart sequences owned by columns of the truncated table(s). test 1 test 2 test 5

pg_stat_activity 9. 1 SELECT * FROM pg_stat_activity; datid datname procpid usesysid username application_name 487377201 main 18133 10 postgres EMS SQL Manager for Postgre. SQL client_addr client_port backend_start xact_start 10. 6. 106. 202 64241 2013 -06 -24 15: 10: 13. 139305 - 15: 22: 32. 10078107 07 07 waiting current_query False query_start SELECT * FROM pg_stat_activity This will show you what queries that you currently have running on a server. As user Postgres, you will see all queries running on a server. If there is no query running, you will see <IDLE>.

Postgres Tips & Tricks POSTGRES 9. 2+

pg_stat_activity 9. 2+ SELECT * FROM pg_stat_activity; datid datname procpid usesysid username application_name 487377201 main 18133 10 postgres EMS SQL Manager for Postgre. SQL client_addr client_port backend_start xact_start 10. 6. 106. 202 64241 2013 -06 -24 15: 10: 13. 139305 - 15: 22: 32. 10078107 07 07 waiting current_query False query_start <idle>: SELECT * FROM pg_stat_activity This will show you what queries that you currently have running on a server. As user Postgres, you will see all queries running on a server. They will also say <idle> or <active>. The <idle> connections show you the last query executed.

Killing your own backend’s SELECT pg_cancel_backend(pid); This cancels your current command leaves your connection open for your next command. If you are in the middle of a transaction, the transaction will be aborted once you try and COMMIT your transaction. It will also complain about every line failing until you try and COMMIT. SELECT pg_terminate_backend(pid); This cancels your current command closes your connection. If you are in the middle of a transaction, the transaction will be aborted instantly. If all the backends you see are gone and you are still getting the open connections when trying to drop your database, contact a dba.

Postgres Tips & Tricks POSTGRES 9. 3+

Materialized Views CREATE MATERIALIZED VIEW schema. materialized_view AS SELECT * FROM schema. table; SELECT * FROM schema. materialized_view; REFRESH MATERIALIZED VIEW schema. materialized_view; This is basically a simple melding of a TABLE and VIEW into a single entity. When you create the MATERIALIZED VIEW is populates the underlying TABLE. Every time you use the MATERIALIZED VIEW it returns you the data in the TABLE. To update the data in the TABLE, you need to run the REFRESH MATERIALIZED VIEW command.

Foreign Data Tables CREATE EXTENSION postgres_fdw; CREATE SERVER db_main FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'db. scharp. org', dbname 'main', port '5432'); CREATE USER MAPPING FOR postgres SERVER db_main OPTIONS (user 'webservices', password 'password'); CREATE FOREIGN TABLE ist. webservices_token ( "time" TIMESTAMP WITHOUT TIME ZONE DEFAULT now() NOT NULL, token TEXT NOT NULL The Postgres DBA’s should take care of ) the EXTENSION, SERVER and USER SERVER db_main; MAPPING’s. The developer can then create the FOREIGN TABLES. For each SELECT * FROM ist. webservices_token; user that wants to use the FOREIGN TABLE, there must be a USER MAPPING created by a DBA.