Database Test Driven Development

UPDATE: This article is based on extensions written for tsqlunit. Those extensions are not any longer supported. However, if you are still using SQL Server 2000 you might want to check them out as tSQLt is not compatible with SQL Server 2000. Besides of these points, this article is still a worthwhile introduction to Database Test Driven Development.

Test Driven Development (TDD) is known to be a successful method of object oriented development. In database development however, TDD practices are not wide-spread and development teams struggle with applying the TDD principles to the SQL language. This is a problem, because it leads to poorly tested code. In turn, not having the appropriate test cases, makes it difficult to improve your existing database design. Not implementing TDD practices in the database, overtime, leads to a decaying architecture and can hinder the evolution of the overall application. This article discusses ways to enable software developers to use TDD to develop in databases.

TDD is based on the following principles:
1. develop only what you need
2. build software so that it is testable
3. build software so that it is flexible
4. build a regression test suite that provides the confidence to improve design by refactoring
5. automate the tedious parts of testing
6. run tests frequently to provide constant feedback

The key step in test driven development is to write the test first. Then write only enough code to make it pass. This accomplishes two things: first, it means no matter what you write, you’ll have test coverage; second, you’ll be forced to write your code in a testable manner.

TDD is being used in many object oriented programming environments. But can it successfully be applied to database development?

Sebastian and I first met on a project where we would put this to the test. The project was an ETL (extract-transform-load) application whose implementation was almost entirely in MS SQL Server 2000’s T-SQL language. When we first began to work on the project, the ETL application suffered from substantial quality and performance problems.

A single tester was assigned to the project to create test scenarios, execute them and validate them by hand. Five developers in the mean time were working on correcting defects and improving the application’s performance. However, as each fix was made, previously working components broke. Many of these defects were not discovered until days or weeks after the change was made. The project was in a state of chaos. We were not even sure if we would be able to complete it.

We realized, that the source of the issue was the long feedback loop between making a code change and seeing its impact to the system. Therefore, we decided to give automated testing a try. We built a custom database testing framework which was used to write unit and integration tests. Developers began writing test cases first before making their code change. They specified the inputs, execution steps and expected results as an automated test case. They ran the test to make sure it failed first. Then they changed the code and executed the test to make sure it now passed.

Over time, a large testing suite was created which could be executed at any time to see if we were making successful changes. This provided instant feedback on the effects of our code. It also allowed us to make performance enhancements knowing that the overall functionality had not been changed. TDD had been a powerful force in bringing success to this project.
What to Test

Databases include a variety of objects which you must construct. They implement the data integrity, the application logic and even security. Your application may also include seed data that is deployed. It is clear that programmed logic such as stored procedures and functions should be tested. However, you should also test query statements (including views), constraints (such as foreign keys, uniques, defaults, etc), security, performance and scalability. The data itself may also be tested.
Examples

The following examples are implemented using T-SQL Unit plus a few additional procedures we have written to make testing easier.

A Simple Example

Suppose we need to routinely generate a list of numbers. We may want to create a function that returns a table of integers with a given number of rows. We may want to use the following syntax to call the function:

SELECT no FROM dbo.FnNum(5)

And expect that this returns the numbers from 1 to 5.

We begin by writing a test case. In this example, as in many test cases, we will need to compare our expected results with the actual results returned. We accomplish this here by using two tables (expected and actual) and a procedure, CompareResults to perform the comparison. Here are the steps that the test case performs:

1. Create tables to store the expected and actual results. Additionally it creates a table to store the results of a comparison between the expected and actual results tables.
2. Execute the function, storing the results in the “actual” table.
3. Populate the “expected” table with the results we would expect to see.
4. Use a table comparison procedure, which compares the “expected” and “actual” results. The comparison output is stored in the “CompareResults” table.
5. If there were unexpected results reported, signal to TSQL-Unit that the test case has failed.
6. Clean up the tables we created.

And here is the code for the unit test:

CREATE PROCEDURE dbo.ut_FnNum
AS
BEGIN
    IF OBJECT_ID('actual') IS NOT NULL DROP TABLE actual
    IF OBJECT_ID('expected') IS NOT NULL DROP TABLE expected

    SELECT no
      INTO actual
      FROM dbo.FnNum1(5)

    SELECT no INTO expected
    FROM (SELECT 1 no UNION ALL
          SELECT 2 no UNION ALL
          SELECT 3 no UNION ALL
          SELECT 4 no UNION ALL
          SELECT 5 no )tmp

    EXEC dbo.tsu_AssertTablesAreEqual 'expected',
		'actual',
		'A.no = B.no',
		@@PROCID,
		'Output of FnNum did not match expectation.'

    DROP TABLE actual
    DROP TABLE expected

END

We would then create the FnNum function. Here’s a simple way to write the function:

CREATE FUNCTION dbo.FnNum1(
       @n INT
)
RETURNS @num TABLE (no BIGINT IDENTITY(1,1))
AS
BEGIN
    DECLARE @c BIGINT;
    SET @c = 1;

    WHILE @c <= @n
    BEGIN
        INSERT INTO @num DEFAULT VALUES;
        SET @c=@c+1;
    END

    RETURN;
END

A SQL expert may conceive of a better performing way to write this function. Using the test case, a developer can performance tune the function and make sure that the behavior has not changed.

Clearly, even for a simple function there may be other tests that you would write. For example, should the function raise a specific error if a negative number is passed? How about 0? And so on. These specifications would all warrant new test cases to be created.

Fake Tables

One problem often encountered in writing tests for database procedures are the tight interconnections within the schema. Foreign keys and other constraints can make it difficult for a procedure to create, modify or remove data in a table. In fact, that is the point of these constraints; we do not want to write procedures that violate the integrity of our data.

However, this can make writing unit tests that exercise a single behavior difficult. If the procedure we are testing is designed to create records in one or two tables, it will be very labor intensive to create a test if these tables have relationships with a large web of other tables.

One common approach to this problem is to create enough ‘background’ data in the testing database to satisfy the constraints of all our tests. This data often gets in the way however and can become increasingly difficult to maintain as the database schema grows.

A more robust approach for unit testing is to isolate the tables under test by removing their constraints. This must be done with a certain amount of care however. First, there may be some tests that you’re writing that are designed to test those constraints. Second, you will want to be sure that those constraints are reapplied when you’re finished. Third, you may want to preserve the existing data in the table, or test with an empty or known set of data.

One way to achieve this in a reusable fashion is to have a procedure that renames the table and creates a table with the same column definition but without the constraints. By wrapping the test case in a transaction which is rolled back after the test, the original schema and data can be preserved.

As an example, let’s suppose that Sebastian dropped his cell phone and the “1” key broke off, like so:

Let’s create a function that returns the phone numbers in his directory that he can dial. The PhoneNumber table references the Person table. Our function does not care about the data that is in our Person table, so we will use a fake to help us test.

Here is the schema of the PhoneNumber table:

CREATE TABLE dbo.PhoneNumber (
	PersId int references dbo.Person,
	PhoneNumber varchar(10)
)

Our unit test fakes out the PhoneNumber table with a new table matching the same columns. The fakeTbl procedure takes care of the details of renaming the table and creating the schema for a new PhoneNumber table without the constraints.

CREATE PROCEDURE dbo.ut_GetPhonesWithoutDigit
AS
BEGIN
    IF OBJECT_ID('actual') IS NOT NULL DROP TABLE actual
    IF OBJECT_ID('expected') IS NOT NULL DROP TABLE expected

    EXEC dbo.tsu_FakeTable 'PhoneNumber'

    INSERT INTO PhoneNumber(PersId, PhoneNumber) VALUES(1, '6205552233')
    INSERT INTO PhoneNumber(PersId, PhoneNumber) VALUES(2, '6105552233')
    INSERT INTO PhoneNumber(PersId, PhoneNumber) VALUES(3, '6205152233')
    INSERT INTO PhoneNumber(PersId, PhoneNumber) VALUES(4, '6205552213')
    INSERT INTO PhoneNumber(PersId, PhoneNumber) VALUES(5, '6205552234')
    INSERT INTO PhoneNumber(PersId, PhoneNumber) VALUES(8, '6205552235')
    INSERT INTO PhoneNumber(PersId, PhoneNumber) VALUES(12,'6205552236')

------------------------------------------------------------------------

    SELECT * INTO actual
    FROM dbo.GetPhonesWithoutDigit(1)

------------------------------------------------------------------------

    SELECT PersId, PhoneNumber
      INTO expected
      FROM (
            SELECT 1 ,'6205552233' UNION ALL
            SELECT 5 ,'6205552234' UNION ALL
            SELECT 8 ,'6205552235' UNION ALL
            SELECT 12,'6205552236'
           )expected (PersId, PhoneNumber);

     EXEC dbo.tsu_AssertTablesAreEqual 'expected',
            'actual',
            'A.PersId = B.PersId',
            @@PROCID,
            'Output of GetPhonesWithoutDigit did not match expectation.'

END

Stub Records

Another issue that we discovered early on in test driving our database code was that changing the schema broke our tests. The most obvious way of creating test data in a table is to use an INSERT statement. Let’s suppose we wrote a test case that inserted data into a table that would then be used by a procedure that we’re testing. Some day later, the table schema needs to change. For example, we add a column with a NOT NULL constraint. Suddenly, our existing test case fails because we can no longer insert the row using the old INSERT statement. For a single test case, this is no big deal: we just fix the INSERT statement and move on. However, if we have many of these statements, then we have built a rather fragile test suite. The column may not be important to these tests, but it is still causing them to fail.

One way we resolved this issue was to use stub records for our tests. Imagine a procedure that given the name of a table and a value for its primary key, inserts a record into that table. It does this by looking at the table’s schema and creating default values for the columns that require a value, such as those with NOT NULL constraints and foreign keys. The test case writer, could then issue an UPDATE statement against the table using the primary key value that they specified.

The following code illustrates the use of a stub record procedure in a test case. For this example, suppose we want to test an UPDATE trigger on the Person table. Everytime a column in this table is updated, it should write an entry to the PersonLog table. Let us also suppose that the Person table has many columns. Most of these columns are not relevant to this particular test case, we simply want to make sure the logging works. If we had to write INSERT statements into each of our test cases, and then we decided to change some of the Person table columns, we’ll have a lot of work to do to fix these tests.

First the schema of the Person and PersonLog tables:

CREATE TABLE dbo.Person
 (Id int NOT NULL PRIMARY KEY,
  Name varchar(20) NOT NULL,
  BirthDate DATETIME NOT NULL,
  Gender CHAR(1) NOT NULL,
  IsHealthy BIT NOT NULL,
  IsAlien BIT NOT NULL,
  Height NUMERIC(10,2) NOT NULL,
  Weight NUMERIC(20,2) NOT NULL,
  COL01 VARBINARY(100) NOT NULL,
  COL02 VARBINARY(100) NOT NULL,
  COL03 VARBINARY(100) NOT NULL,
  COL04 VARBINARY(100) NOT NULL,
  COL05 VARBINARY(100) NOT NULL,
  COL06 VARBINARY(100) NOT NULL,
  COL07 VARBINARY(100) NOT NULL,
  COL08 VARBINARY(100) NOT NULL,
  COL09 VARBINARY(100) NOT NULL,
  COL10 VARBINARY(100) NOT NULL,
  COL11 VARBINARY(100) NOT NULL,
  COL12 VARBINARY(100) NOT NULL,
  COL13 VARBINARY(100) NOT NULL,
  COL14 VARBINARY(100) NOT NULL,
  COL15 VARBINARY(100) NOT NULL,
  COL16 VARBINARY(100) NOT NULL,
  COL17 VARBINARY(100) NOT NULL,
  COL18 VARBINARY(100) NOT NULL,
  COL19 VARBINARY(100) NOT NULL,
  COL20 VARBINARY(100) NOT NULL
)
GO

CREATE TABLE dbo.PersonLog
 (Id int NOT NULL,
  LogDTime DATETIME NOT NULL
 )
GO

And now the unit test:

CREATE PROCEDURE dbo.ut_PersLogTrigger
AS
BEGIN

--SETUP-------------------------------------------
    IF OBJECT_ID('actual') IS NOT NULL DROP TABLE actual
    IF OBJECT_ID('expected') IS NOT NULL DROP TABLE expected

    DECLARE @TestStartTime DATETIME

    EXEC dbo.tsu_FakeTable 'PersLog';

--INPUTS------------------------------------------

    EXEC dbo.tsu_StubRecord 'Pers',1
    EXEC dbo.tsu_StubRecord 'Pers',2
    EXEC dbo.tsu_StubRecord 'Pers',5
    EXEC dbo.tsu_StubRecord 'Pers',6

--EXECUTE-----------------------------------------

    SET @TestStartTime = GETDATE();

    UPDATE Pers SET IsAlien = 1 WHERE id=1;
    UPDATE Pers SET IsAlien = 1 WHERE id=5;

    UPDATE Pers SET Weight = 0 WHERE id=5;

--VALIDATE----------------------------------------

    SELECT id,
           CASE WHEN LogDTime BETWEEN @TestStartTime and GETDATE()
                THEN 'Time in range'
                ELSE 'Time out of range'
           END DateCorrect
      INTO actual
      FROM dbo.PersLog

    SELECT Id, DateCorrect
      INTO expected
      FROM (
            SELECT 1, 'Time in range' UNION ALL
            SELECT 5, 'Time in range' UNION ALL
            SELECT 5, 'Time in range'
           )expected(Id, DateCorrect);

    EXEC dbo.tsu_AssertTablesAreEqual 'expected',
            'actual',
            'A.Id = B.Id',
            @@PROCID,
            'Contents of PersLog did not match expectation.'

--CLEANUP-----------------------------------------

END

The stub record technique gave us the ability to create a suite of robust tests. These tests could survive schema changes that did not relate to the actual procedure under test. However, further reflection on this technique hints that its need indicates a code smell. Consider the parallel in the object oriented environment of a table to a class. Its columns would be analogous to the data members of the class. A record would be likened to an instance of the class, an object. If it is difficult to construct an object when writing a unit test, this may indicate that the class is too complex, or that it needs a better constructor method. If tests are constantly breaking when we modify data members of our class, then it indicates that our encapsulation may be faulty.

The same argument can be made for creating a record in a database table. If we have difficulty maintaining our tests because our columns change, then it may indicate an inefficient design. There are two possibilities: either we should have a robust way to create a record or our table is too complicated for the procedures we want to write. One potential solution is to provide a stored procedure for every table, which is used to insert data into the table. In addition to testability, this provides an abstraction layer between your business logic and schema. It may also support a better security model.

Downloads

tsqlunit you can find at the tsqlunit home page.
Our extensions to tsqlunit are here: tsqlunit extensions for SQL Server 2000
Here are all the above used examples.

Share and Enjoy: