A really good article on the tradeoffs between state based vs migration based schema transformation is titled Critiquing two different approaches to delivering databases: Migrations vs state by Alex Yates on 6/18/2015. The blog Database Schema Migration by S. Lott on 9/27/2016 ends with "It's all procedural migration. I'm not [sure] declarative ("state") tools can be trusted beyond discerning the changes and suggesting a possible migration."
Let me start by examining the statement of the problem: state based vs migration based schema transformation. The problem statement implies that the solution is an either or type of thing. Why not both?
As opposed to speaking in generalities, let me pick a specific problem which is often encountered in real systems which demonstrates the core issue. Also, let me pick a particular tool set to execute on this particular problem.
The specific problem involves
- Adding some domain table and its associated data.
- Adding a not null / mandatory column to some table that already has data.
- Creating a foreign key between the domain table and the new mandatory column.
- SomeTable exists with data.
- A new domain table (SomeDomain) gets created.
- The new domain table (SomeDomain) gets populated with data.
- A new nullable / not mandatory column SomeDomainUuid is added to SomeTable.
- Column SomeDomainUuid in SomeTable gets populated with data.
- Column SomeDomainUuid in SomeTable is made not null / mandatory.
- A foreign key is created between the two tables.
- The new domain table (SomeDomain) gets populated with data.
- Column SomeDomainUuid in SomeTable gets populated with data.
INSERT INTO dbo.SomeDomain VALUES (newid(), 'Fred'), (newid(), 'Barney');The second step is to write a script (Update_dbo_Deal_AdvertiserTypeUuid.sql) to populate SomeDomainUuid column in SomeTable.
UPDATE [dbo].[SomeTabe] SET [SomeDomainUuid] = (SELECT SomeDomainUuid FROM [dbo].[SomeDomain] WHERE Name = 'Fred');The last preparation step is to write a script (Script.PostDeployment.sql) to run the above 2 scripts after the state based change has happened.
:r ".\Populate_dbo_SomeDomain.sql" go :r ".\Update_dbo_Deal_AdvertiserTypeUuid.sql" goNow that the desired end state has been specified as well as the data manipulation scripts written, it is time to modify a database. The Microsoft terminology for this is "publishing the database project". There will be an issue because the state changes will be made and then the Script.PostDeployment.sql script will be run. In between the state changes and the script being run, there will need to be data in the SomeDomainUuid column in the table SomeTable. This issue is addressed by using the GenerateSmartDefaults option when publishing the database project.
Let's summarize what this combination of state based and migration based schema transformation has allowed us to do. We were able to take 7 steps and reduce it down to 2 steps. These 2 steps couldn't be automated anyways because they involved data. These are the pros. The con is that have to be familiar with the framework and select the GenerateSmartDefaults option out of the 60 plus available options.
In conclusion, a purely state based approach can't work because of data. There is no way for the software to know how to do data migrations. Only humans know the correct way to do data migrations. In our example, there is no way for the software to know whether or not the new column SomeDomainUuid is to be initially populated with "Fred" or "Barney". This is a long winded and nice way of saying that a purely state based database schema migration can't be trusted. However, the combination of state based and migration based can truly improve productivity.
This comment has been removed by the author.
ReplyDeleteThanks for your blog post. I totally agree with you!
ReplyDeleteI didn't discuss the various hybrid solutions in my original blog post. With that post I was trying to clarify the debate and explain the pros and cons of each pure approach. There are lots of people who have only experienced one or the other.
I totally agree that in most (not all) cases it is best to take a bit from both sides to get the best solution. Even in the dark days before database source control most people did dev on some dev database and got it into a working 'state', then they created a 'migration' script either manually or using a diff tool. This migration would then be run on each of the testing/staging and eventually production databases.
Now there are a bunch of more sophisticated options. For SQL Server for example a few options worth lookking at include the SSDT refactor log and pre/post deploy scripts, Redgate SQL Source Control 'migrations' and ReadyRoll 'programmable objects'.
Hybrid FTW!
A database is a collection of information it could be any information like student record, Employee record, book record and much more that is organized so that it can be easily accessed, managed and updated. Data is organized into rows, columns, and tables, and it is indexed to make it easier to find relevant information. Database project help
ReplyDelete