It is one of the more common questions over on the SSIS Forums. How can I check if a record exists and if so, how can I check (quickly) if it has changed. Jamie Thomson has a blog post that outlines this, and I’ve made it a sticky post on the forums. However, if you need to check many, if not all of the columns in a table for changes, Jamie’s blogged method might be a bit laborious. This is where the Konesan’s Checksum transformation comes in. Read on…
The idea behind using a checksum is that you calculate it when you first see the record and then you store that checksum along with the record into the destination table. Later we can compare just the checksum value to see if this record has changed instead of physically checking every field. This concept is featured in Joy Mundy/Warren Thornthwaite’s book, The Microsoft Data Warehouse Toolkit, chapter 6.
Next, let’s set the stage by creating our table.
CREATE TABLE [dbo].[forums_checksum](
[column1] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[column2] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[column3] [int] NULL,
[hash] [int] NULL
) ON [PRIMARY]
Let’s use a flat file as our source that looks something like this (and I’ll only work with two rows for simplicity’s sake):
Now, inside SSIS, setup your flat file connection and set the delimiter correctly (semi-colon in this case). I’ll assume that you already know how to do this. Also, I should point out that “column1″ is the “key” column of the table for this exercise.
Here’s the package assembled in its entirety and after executing for the first time. The two records above are NEW and hence go down the insert path.
(NOTE – for each image below and elsewhere on this site, you can click on it to get a more readable image)
Selecting out of the [forums_checksum] table will show you the stored checksum: (select * from forums_checksum)
KEY1 XXXX 123 -1338461000
KEY2 ZZZZ 456 -546099347
Now, change the column3 value to 333 for the “KEY1″ record and rerun the package:
Note here now that we did not perform an insert at all. Instead our two records went down the “records that exist” path coming out of the lookup. Also note that we only updated one (1) record. The other record went into the row counter for illustration purposes. You could just not have anything coming out of the conditional split transformation, but I always like to capture all rows if I can.
Selecting from the table again will show our update:
KEY1 XXXX 333 995021855
KEY2 ZZZZ 456 -546099347
And that’s about it. Pretty simple, really. The only thing you’ll have to watch out for if you have millions of rows (perhaps) is the buffer to cache all of the key fields (plus the checksum field) in the lookup.
Below you will see how I’ve configured each of the main components of my package:
Second screen for the lookup:
OLE DB Command:
OLE DB Command parameter mappings: