PostgreSQL Tips and Tricks

abalashov · on Aug 9, 2009

#4 seems like a zealous overgeneralisation. JOINs get a lot of crap.

Yes, it's no secret that large joins on huge amounts of data can often be very intensive. From that does not follow, however, the exhortation to simply avoid them categorically. That's a little too much blanket statement for me.

For instance, joins are often used in situations where there is a numeric type column that refers to a very small table of enumerated values that have a textual or other translation, and there is a need to produce the latter in a single query. There's nothing wrong with that join from a performance standpoint, even for very large values of n.

gtuhl · on Aug 9, 2009

JOINs rightly get a lot of crap - they are massive performance bottlenecks if you are working with large tables.

Having a single awkward example does not detract from that.

I'll add to this by noting the post does say "For smaller tables it doesn’t matter but as tables get bigger avoid joining when you can" which is sound advice.

I'd say joining scales up to perhaps a million or two rows unless you have a lot of RAM (you can see join spills to disk in the EXPLAIN ANALYZE output). I often am working with tables in the 10-30 million row range so my perspective is probably a little slanted towards the negative.

nimbix · on Aug 8, 2009

You might also be interested in this large collection of PostgreSQL SQL tricks: http://www.postgres.cz/index.php/PostgreSQL_SQL_Tricks

psnj · on Aug 8, 2009

Some reasonable advice in there, but #3? Ruh roh.

rudd · on Aug 8, 2009

It's not particularly bad advice. Perhaps for someone who hasn't worked with databases long enough, it's dangerous to tell them "don't worry about normality." But denormalizing certain groups of tables can certainly cause a significant performance gain.

psnj · on Aug 8, 2009

Yes, you're right. I agree that there are times that denormalizing makes sense. My point was that the item said:

* Throw away the normal forms you learned in school

* Denormalize [...] whenever it makes a query faster.

I think that's bad advice as stated. As you imply, you have to know why the rules are there to know when to break them intelligently. I didn't get that message from the item as posted, rather that the author thought that NF had little value, which probably doesn't belong in a list of database tips and tricks.

[edit: format]

gtuhl · on Aug 8, 2009

Been unplugged all day so haven't been around to respond. This is a completely fair criticism - the wording in my post is indeed too plain. Without understanding the normal forms making smart decisions about denormalization is going to be very difficult.

rjurney · on Aug 8, 2009

Its bad advice because people denormalize the wrong tables. They'll actually slow queries down by denormalizing a table with a few rows into a table with many rows - which makes things slower.

wsprague · on Aug 8, 2009

I think "don't worry about normality" is a ridiculously bad statement. But perhaps "selectively denormalize when you run out of other optimzations" would be a little more reasonable.

One approach is to make all your denormalizations populated by triggers -- that way to you can still think about the DB in "clean" terms in your code when modifying data, but get your denormalized tables for queries, too.

abalashov · on Aug 9, 2009

The difference is in a matter of degree. I agree that denormalisation can be beneficial - even necessary - in certain scenarios, particularly when other design optimisations are exhausted. But that's a far cry from "the normal forms you learned about in school are an utter waste of time that will throw your database performance in the gutter."

gtuhl · on Aug 9, 2009

Agreed it was too strong of a statement, but denormalization really is something you do all the time in big databases. It is not some targeted last resort technique.

If you have 30 million rows in a table you absolutely cannot do joins so you bring everything in that you commonly need.

jwilliams · on Aug 8, 2009

Agreed.

Most of the serious commercial databases I've worked with/near/on have been de-normalised to some extent.

e.g. SalesForce run massive amounts of data. Their implementation really boils down to one de-normalised table; with a dozen or so supporting tables.

When I hear about some (not all) NoSQL implementations I occasionally wonder why they just didn't adopt a de-normalised SQL solution.

rjurney · on Aug 8, 2009

Because if you're going to do that - why not drop the SQL and the ORM altogether? Most shops at that level aren't doing analysis on their production SQL anyway - they dump that into a data warehouse. So why not do your CRUD on NoSQL and dump that some place ppl can run SQL on it?

jwilliams · on Aug 9, 2009

Because often that's going too far. Quite often you still need transactions, established technologies, some structured-data, , etc, etc.. Sometimes a regular SQL database, but with some de-normalised tweaks is all you need.

There is definitely a place for NoSQL. However, many projects simply don't need to bite this off. They could start with SQL, from there you can decide how rigid/structured you want to be... I definitely agree that this comes with baggage; but if you're prepared to be pragmatic it's often got a lot of upside.

smithjchris · on Aug 8, 2009

We do it the other way round to your suggestion.

SQL Server + NHibernate does handle the CRUD stuff, custom fields and DDD-based domain logic very well.

NoSQL (CouchDB in our case) doesn't handle transactions, locking and business rules effectively. However it excels at providing extremely fast views on schemaless data such as our core domain model plus custom fields.

rjurney · on Aug 8, 2009

Definitely interesting :)

dylanz · on Aug 8, 2009

Good post, but most of those are SQL specific, and not really unique to Postgres, right? But yeah, very crucial points you covered that you see a lot of people not implementing!

gtuhl · on Aug 8, 2009

A lot of generic SQL stuff in there for sure. But, last I worked with MySQL (it has admittedly been a few years) the indexing options were severely limited when compared with PostgreSQL. Last I used it you could not index an expression or qualify an index with a where clause for example. Also at that time MySQL could only make use of one index per query.

All of those deficiencies may have since been eliminated.

figital · on Aug 8, 2009

So glad to see PostgreSQL bubbling up these day's as it's been my DB of choice for many years. I'd love to see a more portable version and of course something like mySQL's group_concat(). If you are game for trying it out be sure to check out the tools at sqlmanager.net (unfortunately they don't offer Linux versions anymore).

gtuhl · on Aug 8, 2009

I used to be a huge MySQL fan until a coworker forced us to use PostgreSQL for a big project. At the start I wasn't happy about it but have since completely changed sides. It is an amazing piece of software with some exceptional code and engineering behind it. When you are faced with high concurrency (especially if you have nontrivial write volume) PostgreSQL is a beast.

We peak at 1000s of transactions a second on an OLTP database that is over 100GB in size and PostgreSQL handles it like a champ.

chowmeined · on Aug 8, 2009

Postgres lets you define your own aggregate functions and has an array type. http://mssql-to-postgresql.blogspot.com/2007/12/cool-groupco...

neilc · on Aug 8, 2009

Note that the array_accum user-defined agg described by that blog post is built into PostgreSQL 8.4 (it's called array_agg(), per the SQL standard).

figital · on Aug 8, 2009

Thanks for the link. Looking forward to trying the array approach as well.

berntb · on Aug 8, 2009

Interesting.

Any nice reference about pgsql and query optimization? Any book recommendation?

As an old mysql guy, as far as I do db:s, the advice re subqueries (#4) was unusual... :-)

(I looked at trying pgsql for a hobby a few years back and found lacking support for different char sets for different tables, etc. Is that [still] so?)

rudd · on Aug 8, 2009

Yes, I'm pretty sure charset support in Postgres is at the database level, not the table level. See here: http://www.postgresql.org/docs/8.4/interactive/multibyte.htm...

berntb · on Aug 8, 2009

Ah, thanks. Sigh.

(No book recommendations for something like e.g. "High Performance MySQL"? Well, nice with no fanatical fanboys in this area, at least. :-)

olefoo · on Aug 9, 2009

If you are looking for a book on Postgresql the Douglas book http://www.amazon.com/PostgreSQL-Developers-Library-Korry-Do... is pretty good introduction to most of the topics specific to postgres, it includes a sample implementation of a postgres accessing script in most of the languages commonly used for the task, which can be a bit repetitive (perl, python, php, java, C, same tune different instrument) but that does make it a fairly good reference.

gtuhl · on Aug 9, 2009

Thought I was add that the PostgreSQL documentation is really, really good for learning. It is a bit dense but overall very well written, includes nontrivial examples, and you can really get a solid dive into a topic reading it.

The current 8.3 documentation is here: http://www.postgresql.org/docs/8.3/static/

I especially enjoy the sections on indexes: http://www.postgresql.org/docs/8.3/static/indexes.html