Supporting a Database Refactoring with Tests and Views
As we develop a database application, we often realize that decisions made early on regarding normalization need to be changed. A recent post on the Agile Databases mailing list illustrates this need.
Let’s suppose we have an application that stores the office hours that employees make free for questions. The initial design allowed employees to enter their office hours in free form text and was stored in the following schema.
dbo.Employee |
---|
id INT |
firstName VARCHAR(50) |
lastName VARCHAR(50) |
officeHours VARCHAR(4000) |
Records may be stored as follows:
id | firstName | lastName | officeHours |
---|---|---|---|
101 | George | Franklin | Mon before noon, Tue btw 4 and 5 |
102 | Sarah | McGuire | MWF 11-12 |
103 | Jim | Young | Thur 3-4 and sometimes on Wednesday at 12 |
This information is displayed as a summary report, showing the employee’s name and their office hours. The client would like to keep this summary report, but also would like a calendar feature that displays the office hours. Clearly this table design does not support the calendar feature very well. It would be difficult to parse the officeHours column into a meaningful calendar format.
The decision was made to normalize the office hours information, forcing employees to enter office hours in a structured format. The new schema is as follows:
dbo.Employee |
---|
id INT |
firstName VARCHAR(50) |
lastName VARCHAR(50) |
dbo.AvailableHours |
---|
id INT |
employeeId INT CONSTRAINT FOREIGN KEY REFERENCES Employee (id) |
day VARCHAR(9) |
startTime VARCHAR(8) |
endTime VARCHAR(8) |
Our first step in implementing this refactoring should be to ensure that there are tests in place. To start, the SELECT statement that retrieves the summary report will be encapsulated in a view. This will allow us to write test cases for it.
CREATE VIEW dbo.AvailableHoursSummary AS SELECT firstName, lastName, officeHours FROM dbo.Employee;
The test case is simple:
CREATE PROCEDURE OfficeHoursTests.[test_AvailableHoursSummary_Lists_Hours_For_Each_Employee] AS BEGIN EXEC tSQLt.fakeTable 'dbo', 'Employee'; INSERT INTO dbo.Employee (id, firstName, lastName, officeHours) VALUES (1, 'George', 'Franklin', 'Mon before noon, Tue btw 4 and 5'); INSERT INTO dbo.Employee (id, firstName, lastName, officeHours) VALUES (2, 'Sarah', 'McGuire'. 'MWF 11-12'); CREATE TABLE OfficeHoursTests.expected ( firstName VARCHAR(50), lastName VARCHAR(50), officeHours VARCHAR(MAX) ); INSERT INTO OfficeHoursTests.expected (firstName, lastName, officeHours) VALUES ('George', 'Franklin', 'Mon before noon, Tue btw 4 and 5'); INSERT INTO OfficeHoursTests.expected (firstName, lastName, officeHours) VALUES ('Sarah', 'McGuire', 'MWF 11-12); SELECT firstName, lastName, officeHours INTO OfficeHoursTests.actual FROM dbo.AvailableHoursSummary EXEC tSQLt.assertEqualsTable 'OfficeHoursTests.expected', 'OfficeHoursTests.actual'; END; GO
Our next step is to create the new AvailableHours table as described above. We’ll preserve the officeHours column in the Employee table while we make the refactoring. We will also likely preserve the column longer so that data can be migrated. Notice that automatically converting the data into the new schema is not practical. Either employees will need to update their office hours or the administrator will need to do this manually for them. In the mean time, we can continue with the refactoring.
We’ll modify the previous test case to use the AvailableHours table.
CREATE PROCEDURE OfficeHoursTests.[test_AvailableHoursSummary_Lists_Hours_For_Each_Employee] AS BEGIN EXEC tSQLt.fakeTable 'dbo', 'Employee'; EXEC tSQLt.fakeTable 'dbo', 'AvailableHours'; INSERT INTO dbo.Employee (id, firstName, lastName) VALUES (1, 'George', 'Franklin'); INSERT INTO dbo.AvailableHours (id, employeeId, day, startTime, endTime) VALUES (1, 1, 'Monday', '1:00 PM', '2:00 PM'); INSERT INTO dbo.AvailableHours (id, employeeId, day, startTime, endTime) VALUES (2, 1, 'Tuesday', '4:00 PM', '5:00 PM'); INSERT INTO dbo.Employee (id, firstName, lastName) VALUES (2, 'Sarah', 'McGuire'); INSERT INTO dbo.AvailableHours (id, employeeId, day, startTime, endTime) VALUES (3, 2, 'Monday', '3:00 PM', '4:00 PM'); INSERT INTO dbo.AvailableHours (id, employeeId, day, startTime, endTime) VALUES (4, 2, 'Wednesday', '10:00 AM', '11:00 AM'); INSERT INTO dbo.AvailableHours (id, employeeId, day, startTime, endTime) VALUES (5, 2, 'Friday', '1:00 PM', '3:00 PM'); CREATE TABLE OfficeHoursTests.expected ( firstName VARCHAR(50), lastName VARCHAR(50), officeHours VARCHAR(MAX) ); INSERT INTO OfficeHoursTests.expected (firstName, lastName, officeHours) VALUES ('George', 'Franklin', 'Monday 1:00 PM-2:00 PM; Tuesday 4:00 PM-5:00 PM;'); INSERT INTO OfficeHoursTests.expected (firstName, lastName, officeHours) VALUES ('Sarah', 'McGuire', 'Monday 3:00 PM-4:00 PM; Wednesday 10:00 AM-11:00 AM; Friday 1:00 PM-3:00 PM;'); SELECT firstName, lastName, officeHours INTO OfficeHoursTests.actual FROM dbo.AvailableHoursSummary EXEC tSQLt.assertEqualsTable 'OfficeHoursTests.expected', 'OfficeHoursTests.actual'; END; GO
Now we need a way to modify the view to summarize the office hours data. We could do this using a complicated set of UNION clauses or PIVOT. However, this has the limitation that we must predefine how many rows from AvailableHours will be summarized. Instead, we will use a user defined function which will translate the office hours data into a single string given an employee’s id.
Here is the function and the change to the view:
CREATE FUNCTION OfficeHours.ConsolidateAvailableHours (@employeeId INT) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @consolidatedHours VARCHAR(MAX); SET @consolidatedHours = ''; DECLARE @day VARCHAR(15), @startTime VARCHAR(15), @endTime VARCHAR(15); DECLARE hours CURSOR LOCAL FAST_FORWARD FOR SELECT day, startTime, endTime FROM OfficeHours.AvailableHours WHERE employeeId = @employeeId ORDER BY id; OPEN hours; FETCH NEXT FROM hours INTO @day, @startTime, @endTime; WHILE @@FETCH_STATUS = 0 BEGIN SET @consolidatedHours = @consolidatedHours + @day + ' ' + @startTime + '-' + @endTime + '; '; FETCH NEXT FROM hours INTO @day, @startTime, @endTime; END; CLOSE hours; DEALLOCATE hours; RETURN RTRIM(@consolidatedHours); END; GO CREATE VIEW OfficeHours.AvailableHoursSummary AS SELECT firstName, lastName, OfficeHours.ConsolidateAvailableHours(id) AS officeHours FROM OfficeHours.Employee GO
Now that we’ve implemented the new database structure, we are free to begin the calendar feature. We’ve demonstrated that a change to the database structure does not have to halt development and can be appropriately unit tested. The code referencing the old structure can still function until the data has been fully migrated. If, in the mean time, both sources of data need to be considered, the function could be updated to query from the Employee.officeHours field and the AvailableHours table.