Sql Table Refactoring Solution

Steve Smith posted an interesting SQL Table Refactoring Challenge on his blog. I am prepared to go out on a limb and share my solution for how to make a table more efficient.

The first step I would take is to change the columns of the table a little bit.

  • Change CountryCode to a char(2) since they are all 2-character ISO standards. The varchar does limit itself based on the size of the data, but has an overhead of 2 bytes. This will also prevent anything longer from being entered into the table.
  • If you’re feeling very ambitious you can make a CountryId column that is of type smallint and uses the id of the country instead of the 2-characters. This will require an update to all of the existing data so I would say this isn’t worth the effort.
  • AboveFold can be changed to a tinyint. Note: these values are 0 to 255 so we will need to remap the values in the database.
  • A smallint can be used for the Clicks column, since it doesn’t need to get very large.
  • Period can change to a date. This will take up 3 bytes which is much smaller than the 8 for the datetime.

As an extra note I believe that int is required to get up to 50,000 for the tables which require it. If we could drop the limit to around 30000 we could use a smallint for those.

Once I have that table created I create 2 duplicate tables (duplicated schema not all of the data). [lq_ActivityLogLoad] and [lq_ActivityLogLoadNext] At any given time we will write to one of these two tables and not the huge table.

Then I create a job which will switch to which of these two tables we are writing. After switching the job will load all of this data directly into the large table using an Upsert (Update or Insert). After loading the data from the table it can empty out that load table and wait a minute and perform this action over again. Doing this loading pattern will make it so the writes will be on these small tables which aren’t being read from.

I would change the primary key to be based on the ID and put a unique non-clustered index on the columns currently used for the primary key preserving the safety from the unique constraint.

Here is the table I created for this.

CREATE TABLE [dbo].[lq_ActivityLog](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[PlacementID] [int] NOT NULL,
[CreativeID] [int] NOT NULL,
[PublisherID] [int] NOT NULL,
[CountryCode] [char](2) NOT NULL,
[RequestedZoneID] [int] NOT NULL,
[AboveFold] [tinyint] NOT NULL,
[Period] [date] NOT NULL,
[Clicks] [smallint] NOT NULL,
[Impressions] [int] NOT NULL,
CONSTRAINT [PK_lq_ActivityLog] PRIMARY KEY CLUSTERED
(
[ID] ASC
)
 

Keep Binaries in Source Control

No, not all of them. Just the ones required for your build process. Doing this will make your life a lot easier. You want to reference all of your assemblies locally and keep them in source control. You also want all of your testing executables kept there. If you have anything extra your build process depends on make sure you include that as well. A great example is a post I did about building a library which uses OpenAccess on a build server. The build server of course doesn’t have OpenAccess installed. That would be crazy.

You get a lot of great benefits out of keeping these files local and in source control. Since libraries change infrequently, they will not require much source control storage space. This also means that when updating from source control you will rarely have to wait for an update. If you do have to wait you probably needed it anyway, and would have just been waiting on an Internet download instead.

Build Server Benefits

If you have a build server this will allow you to build your application without having to install your tools and libraries on the server. When the server gets the latest version of your project from your source control server, it will already have the current versions of all of your tools and libraries. This keeps things simple and easy to work with.

If you don’t go this route you will have to install everything on the build server and make sure that all of the paths still match. Ever dealt with the Program Files (x86) folder?

New Computer Benefits

Have a new developer joining the project or just a fresh install of Windows on your computer? Great! Now just go and find the 10 different libraries and tools that are required to build this project. Once you have those make sure that they’re in the correct folders… Wouldn’t that be annoying?

If you keep those tools in source control and always reference with relative paths, no one will have to hunt for all of the tools required for the build. If you were referencing libraries in their install locations you would have all kinds of problems if something installed in a bad place. For example I’ve seen libraries try to install in a user-specific directory. Yes, that is a worst case example, but it can happen.

Keep everything relative and you can move your code anywhere. I want to be able to copy everything in source control onto a freshly installed copy of Windows and have it build. If I can do that, I know the project has a good structure.

Old Versions of the Project

Ever want to go back to an old version of your project and look at something? I bet you had some trouble building if you’ve changed the versions of any libraries you’re using. If the specific version you need is with that code you never have to worry. At any snapshot in time the repository has the correct build scripts, libraries, tools, utilities, and source code. You’re all set to go back to any point in the history of the application and take a look.

You might be saying, “what if I have changed versions of my IDE?” It shouldn’t matter. If you’ve got build scripts in place you can use those to get that app running.

Some people try to keep track of which versions of which tools are used for which release of an application, but if you’ve got them in source control you don’t have to worry about it. Just get the version you want, and the required files come with it.

Now that’s what I call a clean, self-contained build.

Agile Story Estimation

One challenging part of software development is estimating the amount of time that tasks will take. Why is this a challenge? There are plenty of reasons: sometimes you’re dealing with legacy code, sometimes there are a great deal of unknowns, and sometimes you aren’t doing the whole story and can’t estimate the other work. There are plenty of other reasons why estimating stories is difficult, so how do you resolve the difficulties and get things estimated.

 

Improving Estimates

I find it is best to bring the whole group together to make estimates. With a group, you’re less likely to forget about different aspects of the project. You’ll have everyone there who will be doing the tasks. Someone there may know more about the legacy system or a certain part of it than you do.The group should help prevent you from making any unrealistic estimates.

Another great way of improving your story estimates is to do the estimating shortly before you’re going to be doing the work. The best estimates you can make can only be done immediately before doing the task. Any earlier and you don’t know exactly how the system will be architected. Estimate any earlier and the system may have become easier or harder to change before you start the task.

To help build group consensus on the estimates you want to limit the number of options available. With too many numbers it will be very hard to get everyone to agree. You also want to have less options as the numbers get larger. The reason being that larger estimates are less accurate anyway, so you really just want a magnitude for the estimate. Is this a day-long task? More?

Find some nice system to limit the numbers. A great way that I’ve seen used in the past is Fibonacci numbers only.

1, 2, 3, 5, 8, 13, 21…

That set allows for some accuracy at the bottom and some good general choices up higher. 8 is great because it is a full-day.

My favorite set of numbers for estimating is using the powers of 2.

1, 2, 4, 8, 16, 32…

I’ll explain later why I like this set of numbers.

Another easy set to use is Prime Numbers (with 1 included).

1, 2, 3, 5, 7, 11, 13

Whatever set you choose make sure that the group knows which ones you’re using and stick with it.

Implementing Estimates

Once you select your strategy for estimating you’ll need to implement it. Estimating should be done close to when the task will be done, so I recommend estimating on each Monday all of the stories for the week.

Bring everyone together and make sure they understand how the estimating will be done. You want to start by reading one of the stories. Let the team ask any questions about the story. Once everyone understands the task you get to estimate.

Make sure that everyone selects their estimate secretly this is important so that no one’s estimate is influenced by others’ estimates. I’ve seen a few interesting ways of doing this.

The cheapest approach is to have each person hold up a number of fingers equal to his estimate, and then all estimators reveal their estimates at the same time. This is kind of like a game of rock-paper-scissors. We can call this approach Rock-Paper-Estimate.

Another easy approach is to have everyone write down the number and reveal at the same time. You could use paper for this or little whiteboards.

A very cool trick that was brought to my attention recently came from someone handing me a deck of cards from the Visual Studio 2010 launch event in Las Vegas. Planning Poker. It is basically a deck of cards with estimates on the cards. There are 4 different colors, so 4 people could be estimating at any given time with a single deck.

IMG_0359

IMG_0360

Whatever method you use, make sure you work for the group's agreement and try to keep things fast-paced.

Agile Resources - The Agile Actors

Agile development has a few roles that people will be playing. Those roles define certain responsibilities that the parties involved will need to take on. Each role serves an integral role in the development process.

Product Owner

This is the representative of the customer’s interests. The product owner is responsible for knowing the details of all the work that needs to be done. When a developer has a question about how something should work in the system the product owner is the one with the answer.

The big picture is maintained by the product owner, who will be there to remind everyone of the big picture as the project is being worked on. This role is vital to the development without a project. If you’ve ever worked on a project where the product owner was brought in at the beginning once and once at the end I am sure you know that things didn’t go well.

Knowing where the project is going and making sure it isn’t deviating from what is important cannot be stressed enough, and these are vital for the product owner.

Team Lead

Agile teams work very fluidly. During development, most team members are the same. The team lead has some extra responsibilities including: helping to resolve blocking and impeding issues, guiding stand ups, guiding retrospectives, guiding iteration planning sessions. I like the term “team lead”, because this person isn’t really the boss. This role is here to help guide the project and the progress. The leader is the person making sure that the agile process stays on the tracks.

The leader doesn’t make decisions. The leader helps keep everyone moving by clearing the path and making sure everyone sticks together.

Team Member

These are the people creating the application. The team lead is also one of these people. They will be breaking stories into tasks, estimating the size and difficulty of the stories, and working on the stories which create the application. If something needs to be fixed, created, or modified, these are the people who will be working on it.

Fluidity is the word to describe the development team. Each member of an agile team must wear multiple hats. The developers will be designing, developing, and testing the application throughout the process.

Agile Resources - Development Process Recurring Steps

Different processes are used to create software applications. Agile software development is about responding and embracing change. It is about working closely with the customer instead of the customer just saying what they want up front. Agile teams follow processes during development, but they’re flexible, quick processes which allow the continue to continue making advances.

Most agile processes involve some sort of iterative development. Iterating allows for quick changes in the allocation of developer resources, which means responding to changes when the response is needed. The following is a list of each iterating piece of the development process grouped by how often the step occurs.

Months

Releases – if you’re working on an application end users will need to update when you release then your releases will likely be done in terms of months. Releases will include collections of features as well as bug fixes. Planning a release means selecting the features you’ll want to complete within the release time period. This is not specifics, but is only a large scale decision of what features will be worked on most of the iteration. (This plan needs to be flexible as it will likely change.)

Weeks

Iterations – an iteration usually lasts one or two weeks and is the primary development cycle for the application. During an iteration planning meeting, the team and the product owner will collaboratively decide which stories will be completed within the iteration. Everyone will decide and commit to achieving these goals. Iterations short, planning session as well as a short retrospective at the end. This communication keeps the team on the same page and allows the process to change as needed.

Minor Releases – if your application allows for it, which means that your application doesn’t require an update on everyone’s computer, an update should be released with every iteration. Keeping releases small makes them a lot easier and less likely to cause problems.

Days

Standing Meeting – during this meeting no one should sit down. It should take only a few minutes, so there is no need to be sitting. This is a great time for everyone to let others know of their progress as well as their plans for the rest of the day. This is a great time to inform everyone of any issues as well as to get questions answered which have quick answers. These help keep everyone in the loop, so no members are left out.

Hours

Change Pair Programmers – I believe highly in pair programming. From what I’ve seen, it produces much higher quality code. The system works best when pairs are changed in terms of hours. The pair needs to work together long enough to avoid the costs of context switching and often enough that knowledge and technique is spread thoroughly in the application and the team.

Minutes

Unit Tests – I think everyone should be unit testing. These should be written all the time and you should be writing your next unit test within minutes of your previous one. These should be written quickly and easily. The challenge of unit testing is getting to a point where the testing is easy. Once writing tests is easy, you’re testing “the right way”.

Code Commits – commit your changes frequently. A lot can happen to a code base in an hour. Keep checking in frequently. It shouldn’t be so often that it impairs your work, but you really only want to be impacting a handful of files. Long periods between commits mean they’re large and difficult to manage. If you have to write more than one thing in the description of your commit, you’re probably committing too much. (The Single Responsibility Principle should apply to code commits as well.)