2012/11/30

Automatic DB migration for Java web apps with Liquibase

Introduction


In an scenario of agile development, new versions are frequently released and deployed, and continuous changes in your database schema are frequent.

To deal with these database changes, a mechanism should be in place. In Ruby on Rails, you have it out-of-the-box and it works great. But in Java web apps, you have to find a solution and plug it in your own projects.

We will implement an automatic database update mechanism for Java web apps trying to meet the following goals:
  • It should not interfere during development
  • It should be easy to generate database updates during development
  • It should be easy to test database updates during development
  • At production, database updates should be performed automatically
Liquibase is a good tool to deal with database migrations:
  • Open-source
  • Database agnostic: can update most popular SQL databases
  • Integration: available as command line tool, maven plugin, ant task
  • Flexibility: deals SQL schema updates, custom updates via a Java class, even system commands updates
  • Automatic: can be integrated for automatic updates as spring bean or as servlet listener

Our App Before Liquibase


If we use hibernate and the hibernate3-maven-plugin,  during development our database schema is automatically kept up-to-date: hibernte3-maven-plugin extracts schema info from JPA annotations and hibernate configuration.

We will use a project based on the AppFuse framework, with flavours for Spring MVC, Struts2, Tapestry, JSF, but this is applicable to any java web app based on any framework.

From the AppFuse quickstart page, I copy the maven command to generate an initial Spring MVC app from AppFuse archetypes:

mvn archetype:generate -B
 -DarchetypeGroupId=org.appfuse.archetypes
 -DarchetypeArtifactId=appfuse-basic-spring-archetype
 -DarchetypeVersion=2.2-SNAPSHOT
 -DgroupId=com.mycompany
 -DartifactId=migration
 -DarchetypeRepository=http://oss.sonatype.org/content/repositories/appfuse

By inspecting the project's pom.xml, we can see the database schema is kept up-to-date during development by generating drop and create DDL commands, extracted from the JPA annotations in our model classes.

<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>hibernate3-maven-plugin</artifactId>
    <version>2.2</version>
    <configuration>
        <components>
            <component>
                <name>hbm2ddl</name>
                <implementation>annotationconfiguration</implementation>
                <!-- Use 'jpaconfiguration' if you're using JPA. -->
                <!--<implementation>jpaconfiguration</implementation>-->
            </component>
        </components>
        <componentProperties>
            <drop>true</drop>
            <jdk5>true</jdk5>
            <propertyfile>target/classes/jdbc.properties</propertyfile>
            <skip>${skipTests}</skip>
        </componentProperties>
    </configuration>
    <executions>
        <execution>
            <phase>process-test-resources</phase>
            <goals>
                <goal>hbm2ddl</goal>
            </goals>
        </execution>
    </executions>
    <dependencies>
    ... <!-- jdbc driver here -->
    </dependencies>
</plugin>

This is fine during development, because we do not need to worry about database migrations as yet. By running maven process-test-resources or maven jetty:run, we can see the tables are dropped and recreated each time we run our web app with jetty:

> mvn jetty:run
...
[INFO] [hibernate3:hbm2ddl {execution: default}]
[INFO] Configuration XML file loaded: file:.../migration/src/main/resources/hibernate.cfg.xml
[INFO] Configuration XML file loaded: file:.../migration/src/main/resources/hibernate.cfg.xml
[INFO] Configuration Properties file loaded: ...\migration\target\classes\jdbc.properties
alter table user_role drop foreign key FK143BF46A9B523CC9;
alter table user_role drop foreign key FK143BF46A407D00A9;
drop table if exists app_user;
drop table if exists role;
drop table if exists user_role;
create table app_user (id bigint not null auto_increment, account_expired bit not null, account_locked bit not null, address varchar(150), city varchar(50), country varchar(100), postal_code varchar(15), province varchar(100), credentials_expired bit not null, email varchar(255) not null unique, account_enabled bit, first_name varchar(50) not null, last_name varchar(50) not null, password varchar(255) not null, password_hint varchar(255), phone_number varchar(255), signup_date date, username varchar(50) not null unique, version integer, website varchar(255), primary key (id)) ENGINE=InnoDB;
create table role (id bigint not null auto_increment, description varchar(64), name varchar(20), primary key (id)) ENGINE=InnoDB;
create table user_role (user_id bigint not null, role_id bigint not null, primary key (user_id, role_id)) ENGINE=InnoDB;
alter table user_role add index FK143BF46A9B523CC9 (role_id), add constraint FK143BF46A9B523CC9 foreign key (role_id) references role (id);
alter table user_role add index FK143BF46A407D00A9 (user_id), add constraint FK143BF46A407D00A9 foreign key (user_id) references app_user (id);
...

Managing db updates in our project


Our development workflow could be like this one:


We will have two main Liquibase usage scenarios:
  • Liquibase at build-time: will generate all db changelogs
  • Liquibase at run-time: will automatically update the server schema as needed on deployment, including generation of the first database version for an empty schema

Liquibase at build-time

We will be evolving our app and when we have something to commit and push to our project's global repo, we can then generate the database migrations, if any.

We will add to our maven project the liquibase plugin and needed executions in order to:
  • generate database diff changelogs at any time when we want to consolidate our model updates
  • generate database production data dumps as changelogs at any time to consolidate app preloaded db data
  • exercise liquibase db migrations at any time for rapid testing (jetty:run)

Liquibase at runtime

The first time we deploy the app, it will contain the initial db changelog for an empty schema. Liquibase will generate all the db tables and populate with initial database data (default users, user roles, any lookup tables...). On subsequent deployments, our app will contain an additional db changelog to bring the server database schema up-to-date.


Integrating the db update at app startup


Liquibase can perform automatic db update at runtime by looking at the registered change sets in a changelog file and checking if they are applied against a table in our schema called DATABASECHANGELOG. It will create it automatically if it does not exist.

We can implement the automatic db update either with a Spring bean or with a Servlet listener.

We add the liquibase lib as dependency:
<dependency>
    <groupId>org.liquibase</groupId>
    <artifactId>liquibase-core</artifactId>
    <version>2.0.5</version>
</dependency>

And configure the Liquibase Spring bean in spring's applicationContext-resources.xml configuration file:

<bean id="liquibase" class="liquibase.integration.spring.SpringLiquibase">
    <property name="dataSource" ref="dataSource" />
    <property name="changeLog" value="classpath:db/db.changelog.xml" />
    <property name="defaultSchema" value="${db.name}" />
</bean>


Working on our app during development: evolving our model


During development, we will possible be making many changes to our app. We do not want to spend time for now on db migrations. Just evolve our model, annotate it with JPA and when doing rapid testing with jetty:run, generate the db from scratch.

Liquibase maintains a table with applied change sets to our database. Based on the contents of that table, on startup our app will run liquibase to apply any missing migration to our db.

In order to avoid:
  • liquibase trying to create db tables already created by hibernate3 maven plugin
  • liquibase changeset version conflicts (because of changes applied to existing changelogs, for instance)
we will need that maven performs these tasks:
  • Drop all tables from our schema, so DATABASECHANGELOG is deleted
  • Let Hibernate generate our db up-to-date based on our annotations
  • Make liquibase mark the db as up-to-date by updating DATABASECHANGELOG, without applying any db migrations
We can accomplish this by adding to our pom.xml:

<plugin>
    <groupId>org.liquibase</groupId>
    <artifactId>liquibase-maven-plugin</artifactId>
    <version>2.0.5</version>
        <configuration>
            <skip>${skipTests}</skip>
            <propertyFile>target/classes/liquibase.properties</propertyFile>
            <changeLogFile>target/classes/db/db.changelog.xml</changeLogFile>
        </configuration>
    <executions>
        <!-- drop db before generating schema with hbm2ddl to avoid any 
            inconsistencies between changelog files and DATABASECHANGELOG table -->
        <execution>
            <id>drop-db</id>
            <phase>process-resources</phase>
            <goals>
                <goal>dropAll</goal>
            </goals>
            <configuration>
                <propertyFile>target/classes/liquibase.properties</propertyFile>
                <changeLogFile>target/classes/db/db.changelog.xml</changeLogFile>
            </configuration>
        </execution>
        <!-- mark db up-to-date in the  DATABASECHANGELOG table after generating 
            schema with hbm2ddl so that no migration is executed -->
        <execution>
            <id>mark-db-up-to-date</id>
            <phase>test-compile</phase>
            <goals>
                <goal>changelogSync</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Our liquibase.properties config file:

driver=${jdbc.driverClassName}
url=${jdbc.url}
username=${jdbc.username}
password=${jdbc.password}

Our initial empty liquibase changelog file (db.changelog.xml):

<databaseChangeLog>
</databaseChangeLog>


Generating db diffs to consolidate our model


And now we want to generate our db migration so we can consolidate our updated model. We will want our app ready to run a db update when deployed.

To generate our db diff, this is what we'll do:
  • Generate a mydb_prev schema, based on the current liquibase registered changelogs.
  • Generate a mydb schema based on our JPA annotations
  • Compute the db diff between these schemas and generate the db changelogs
We'll do this in a specific maven profile (db-diff), so we can activate it at any time to generate the db changelogs.

<plugin>
    <groupId>org.liquibase</groupId>
    <artifactId>liquibase-maven-plugin</artifactId>
    <version>${liquibase.version}</version>
    <configuration>
        <propertyFile>target/classes/liquibase-diff.properties</propertyFile>
        <changeLogFile>target/classes/db/db.changelog.xml</changeLogFile>
        <diffChangeLogFile>src/main/resources/db/db-${timestamp}.changelog.xml</diffChangeLogFile>
        <logging>info</logging>
    </configuration>
    <executions>
        <execution>
            <id>generate-db-prev</id>
            <phase>process-resources</phase>
            <goals>
                <goal>update</goal>
            </goals>
            <configuration>
                <dropFirst>true</dropFirst>
            </configuration>
        </execution>
        <execution>
            <id>generate-db-diff</id>
            <phase>process-test-resources</phase>
            <goals>
                <goal>diff</goal>
            </goals>
        </execution>
    </executions>
    <dependencies>
        <dependency>
            <!-- jdbc driver here -->
        </dependency>
    </dependencies>
</plugin>

I omit the buildnumber-maven-plugin config in our pom.xml at validate phase so that we can generate unique changelog filenames based on current timestamp.

Our liquibase-diff.properties config file:

driver=${jdbc.driverClassName}
url=${jdbc.url.prev}
username=${jdbc.username}
password=${jdbc.password}
referenceDriver=${jdbc.driverClassName}
referenceUrl=${jdbc.url}
referenceUsername=${jdbc.username}
referencePassword=${jdbc.password}

We can now run maven like this:
mvn process-test-resources -Pdb-diff

and voilà. Liquibase generates for us the changelog file for the initial version of our db, as we have run the diff against an empty changelog file:

<databaseChangeLog>
    <changeSet author="jgarcia (generated)" id="1354207484885-1">
        <createTable tableName="app_user">
            <column autoIncrement="true" name="id" type="BIGINT">
                <constraints nullable="false" primaryKey="true"/>
            </column>
            <column name="account_expired" type="BIT">
                <constraints nullable="false"/>
            </column>
            <column name="account_locked" type="BIT">
                <constraints nullable="false"/>
            </column>
            <column name="address" type="VARCHAR(150)"/>
            <column name="city" type="VARCHAR(50)"/>
            <column name="country" type="VARCHAR(100)"/>
            ...
        </createTable>
    </changeSet>
    <changeSet author="jgarcia (generated)" id="1354207484885-2">
        <createTable tableName="role">
    ...

We can now include this file in our initially empty db.changelog.xml file like this:

<databaseChangeLog>
    <include file="db/db-20121120_120949.changelog.xml" />
</databaseChangeLog>


Generating preloaded db data if any


Many web apps will have a set of data preloaded in the db: initial set of internal user accounts, available user roles, list of applicable taxes, ... whatever.

These can also be defined as a changesets so that Liquibase can update the database for us when running our app.

To generate data changesets, the maven liquibase plugin won't be of much help, as it does not include a goal for this. Instead, as it is also included in the plugin, we'll call directly the liquibase main java class as if we were using it from the command line. We'll do it with the exec-maven-plugin in a db-data maven profile so that we can generate the preloaded db data at any time:

<plugin>
    <groupId>org.codehaus.mojo</groupId>  
    <artifactId>exec-maven-plugin</artifactId>  
    <version>1.2.1</version>
    <executions>
        <execution>
            <phase>process-resources</phase>
            <goals>
                <goal>java</goal>
            </goals>
            <configuration>
                <mainClass>liquibase.integration.commandline.Main</mainClass>
                <includePluginDependencies>true</includePluginDependencies>
                <arguments>  
                    <argument>--driver=${jdbc.driverClassName}</argument>
                    <argument>--changeLogFile=src/main/resources/db/db-data-${timestamp}.changelog.xml</argument>
                    <argument>--url=${jdbc.url}</argument>
                    <argument>--username=${jdbc.username}</argument>
                    <argument>--password=${jdbc.password}</argument>
                    <argument>--diffTypes=data</argument>
                    <argument>--logLevel=info</argument>
                    <argument>generateChangeLog</argument>
                </arguments>
            </configuration>
        </execution>
    </executions>
    <dependencies>
        ...
        <!-- jdbdc driver -->
        <!-- liquibase plugin -->
        ...
    </dependencies>
</plugin>
...
<properties>
    <!-- avoid generating db schema + inserting db-unit -->
    <skipTests>true</skipTests>
</properties>

In AppFuse, the profile prod feeds the database with production data instead of test data. We use this profile to regenerate the db and populate it with production data.
After this, we can use the db-data profile to generate our changelog for the initial db data.

After running the maven commands:
mvn test-compile -Pprod
mvn process-resources -Pdb-data

we obtain this file from liquibase:

<databaseChangeLog>
    <changeSet author="jgarcia (generated)" id="1354214520109-1">
        <insert tableName="user_role">
            <column name="user_id" valueNumeric="2"/>
            <column name="role_id" valueNumeric="1"/>
        </insert>
        ...
    </changeSet>
    <changeSet author="jgarcia (generated)" id="1354214520109-2">
        <insert tableName="role">
            <column name="id" valueNumeric="1"/>
            <column name="description" value="Administrator role (can edit Users)"/>
            <column name="name" value="ROLE_ADMIN"/>
        </insert>
        ...
    </changeSet>
    <changeSet author="jgarcia (generated)" id="1354214520109-3">
        <insert tableName="app_user">
            <column name="id" valueNumeric="1"/>
            <column name="account_expired" valueBoolean="false"/>
            <column name="country" value="US"/>
            <column name="postal_code" value="80210"/>
            ...

This file needs to be updated, as liquibase:
  • dumps all data. You will have to keep only the added data to your db from the previous db version. No diff here performed by Liquibase.
  • data is not properly ordered regarding referential integrity

Once this file has been cleaned-up, we can include it as well in our main db.changelog.xml file:

<databaseChangeLog>
    <include file="db/db-20121120_120949.changelog.xml" />
    <include file="db/db-data-20121128_170043.changelog.xml" />
</databaseChangeLog>


Exercising the automatic db migration during rapid testing


Our app is now ready to perform all registered db migrations when deployed in a server. However, it would be nice too to exercise this during development when we launch a jetty:run for rapid testing of our unpackaged app.

For this purpose, we add a profile that performs these steps:
  • drops all tables from our schema
  • skips db schema generation from our JPA annotations
  • skips feeding db with unit test data
We can add these in a db-test profile:
<plugin>
    <groupId>org.liquibase</groupId>
    <artifactId>liquibase-maven-plugin</artifactId>
    <version>2.0.5</version>
    <executions>
        <execution>
            <id>drop-db</id>
            <phase>process-resources</phase>
            <goals>
                <goal>dropAll</goal>
            </goals>
            <configuration>
                <propertyFile>target/classes/liquibase.properties</propertyFile>
                <skip>false</skip>
            </configuration>
        </execution>
    </executions>
</plugin>
...
<properties>
    <skipTests>true</skipTests>
</properties>

We can now exercise our db migration to check it works fine:

> mvn jetty:run -Pdb-test
...
[INFO] [liquibase:dropAll {execution: drop-db}]
...
[INFO] [hibernate3:hbm2ddl {execution: default}]
[INFO] skipping hibernate3 execution
...
[INFO] Started Jetty Server
...
2012-11-30 13:25:19.341:INFO:/:Initializing Spring root WebApplicationContext
INFO 30/11/12 13:25:liquibase: Successfully acquired change log lock
INFO 30/11/12 13:25:liquibase: Reading from `migration`.`DATABASECHANGELOG`
INFO 30/11/12 13:25:liquibase: Reading from `migration`.`DATABASECHANGELOG`
INFO 30/11/12 13:25:liquibase: ChangeSet db/db-20121120_120949.changelog.xml::20121120_120949::jgarcia (generated) ran successfully in 921ms
INFO 30/11/12 13:25:liquibase: ChangeSet db/db-data-20121128_170043.changelog.xml::20121128_170043-data::jgarcia (generated) ran successfully in 37ms
INFO 30/11/12 13:25:liquibase: Successfully released change log lock

The logs show Liquibase has updated our db successfully.


Liquibase pitfalls


Your db schema will usually be specified in your jdbc connection parameters.
Liquibase automatically inserts schema references in the generated changelogs, for indexes, lke this:
  • baseTableSchemaName="migration"
  • referencedTableSchemaName="migration"
You better erase these or you will run into trouble if you set a different schema name in your jdbc configuration file.

For a full list of the maven liquibase plugin goals and params, you can run this command:

mvn liquibase:help

It is up-to-date, as opposed to the liquibase site documentation.

Liquibase validates db change sets by comparing some attributes of the change sets present in the changelog file against those registered in the DATABASECHANGELOG table as applied change sets. It compares:
  • the full path filename that contains the change set
  • the MD5 checksum of the change sets
If any of these is different, it will try to re-apply the changeset. If you want to avoid this, you can clear the corresponding fields in the db table and liquibase will refill them with the actual values.

To avoid differences in filename because of different path (relative vs absolute path, path updtes, etc), you can set the logicalFilePath attribute of the in each liquibase file. There is no parameter to omit the path of changelog files in an applied changeset.


Liquibase best practices


It is cleaner to have a single db.changelog.xml file that includes the generated db changelog files, so changelogs are grouped together:

<databaseChangeLog>
    <include file="db/db-20121120_120949.changelog.xml" />
    <include file="db/db-data-20121128_170043.changelog.xml" />
    <include file="db/db-20121129_093229.changelog.xml" />
</databaseChangeLog>

I also like to consolidate a set of related changesets from a changelog file in a single changeset. Instead of having many changesets, each one creating a table, creating an index, creating a referential integrity, etc, I tend to group many of these updates as a single changeset.

Liquibase autogenerates a numeric id to identify each changeset. I prefer to assign it a timestamp, as it gives more info and they still appear ordered.

Test, test, test.


Sources


Sources can be found here.

2012/11/07

About Me

I have been working as developer for more than 20 years. I enjoy software development.

In these years I have worked with a variety of technologies: C, C++, ObjectStore and Poet at the beginning of my professional career. Later on, Java and the incipient servlets, Oracle, MySql. Then Struts, Hibernate, Lucene ... And lately, I am working with Java, Spring, Spring Security, Hibernate Search, Struts 2, Apache CXF, Bootstrap and jQuery, to name a few.

Same goes for Engineering practices: from cascade lifecycle to spiral to agile.

And the tools: make, ant, maven, jenkins, ... CVS, visual source safe, subversion, mercurial, git ...

Lately I am into experimenting with Ruby on Rails, Grails, MongoDB. There are some great online courses around about these!

I am interested in technologies that allow to build better applications: meeting user expectations, and with beautiful and maintainable code.

I am a committer of AppFuse: an open-source java web framework. Among other things, I have contributed with:
  • better i18n support
  • upgrade to Hibernate 4
  • re-implement the full-text search service with Hibernate Search + Lucene

BitBucket GitHub