Wednesday, February 8, 2017

Work Around Transaction Support in an RDBMS for Bulk Loads - Just Say No

Recently, some of my contacts have been struggling with bulk loading into conventional RDBMS products like Oracle, SQL Server and MySQL. They were surprised by the issue.

My reply to them is that this is a standard issue. RDBMS products by their nature guarantee transactions. This is very expensive.

They do a lot of logging to guarantee transactions. For example, a standard technique in Oracle to add a column to a large table is to create a copy of the table with the additional column using NOLOGGING. Afterwards you add back in constraints, indexes, … .

The standard joke in Oracle is that it first does redo and then an undo and then actually does something. For those familiar with Oracle knobs, undo has significance.

Also, there is a lot of context switching going on between the RDBMS and the external interface. Heck, even within Oracle, PL/SQL implemented BULK COLLECT to reduce all this context switching between PL/SQL and SQL. (Bulk Processing with BULK COLLECT and FORALL by Steven Feuerstein)

So, the trick is to identify the knobs that minimize logging and context switching.

However, if you want to be true heretic, consider asking the question of why use an RDBMS if you don’t need transaction support?

For further details on this type of stuff, check out the following blog posts

  1. SQL Can Be Slow -- Why Do People Doubt This?
  2. Data Warehousing and SQL -- Tread Carefully by S. Lott
  3. NoSQL Database Doesn’t Mean No Schema by Steven F. Lott