Feb 202007
 

Generating surrogate keys is no sweat in SSIS. Though it does require a custom script to be written. (You might be able to use the Row Number Transformation)

Below is a screen shot of the control flow for a package I’ve built. To make this work (and I’m assuming that you’ll be interested in starting with a key value greater than what’s in the table already) you’ll need two tasks on the control flow: Execute SQL Task & Data Flow.

SurrogateKeys01

You can see the Execute SQL task below. It’s pretty simple. Set the ResultSet to “Single row”, type in the query, return it to a variable, User::MaxKey. Here’s how that works:

SurrogateKeys02

Here you can see the initial setup. Pretty straightforward. The SQLStatement is simply:

select maxkey = case
when max(clientkey) is null then 0
else max(clientkey)
end
from clientdim

SurrogateKeys03

There are no parameter mappings to speak of, so the next thing we’re interested in will be the Result Set section. Set the Result Name to 0 (zero) and the Variable Name to which ever variable you’ve defined. In this case, I’ve used User::MaxKey. This variable, whatever you call it, should be of integer type.

Next, in the data flow you should be able to perform whatever you need to do. When you determine where you want to introduce the surrogate key into the pipeline, add a script component. It should be set to transformation.

SurrogateKeys04

Here you can see the rest of the pipeline in my data flow. I use a row count transformation to simply count the rows going to the destination table. This is not a requirement. Edit the Script Component.

SurrogateKeys05

Here I just want to illustrate that I did not chose to include any input columns. We don’t operate on any input columns from the data flow in this situation, so leave them blank.

SurrogateKeys06

Inthis screen shot, notice that I’ve added one output column, ClientKey. It is a four-byte signed integer. This will be the name of the added column to the data flow.

SurrogateKeys07

In this screen shot the one thing I’m interested in is the fact that we’ve added the variable, MaxKey, to the list of ReadOnlyVariables. The next step involves the script itself, so click on that “Design Script…” button. The script follows:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
Inherits UserComponent
Private NextKey As Int32 = 0

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

Dim MaximumKey As Int32 = Me.Variables.MaxKey ‘ Grab value of MaxKey which was passed in

‘ NextKey will always be zero when we start the package.
‘ This will set up the counter accordingly
If (NextKey = 0) Then
‘ Use MaximumKey +1 here because we already have data
‘ and we need to start with the next available key
NextKey = MaximumKey + 1
Else
‘ Use NextKey +1 here because we are now relying on
‘ our counter within this script task.
NextKey = NextKey + 1
End If

Row.ClientKey = NextKey ‘ Assign NextKey to our ClientKey field on our data row
End Sub

End Class

EDIT: For those of you looking for the C# version of this script for SSIS 2008, here you go:

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    private int NextKey = 0;

     public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        int MaxKey = Variables.MaxClientKey;

        if (NextKey == 0)
        {
            NextKey = MaxKey + 1;
        }
        else
        {
            NextKey = NextKey + 1;
        }

        Row.ClientKey = NextKey;

    }

}

And that’s it. Close the script and the designer for the script component. In the destination (or the next component after the script transformation) you’ll be able to use the new column, ClientKey in my case.

  14 Responses to “Generating Surrogate Keys”

  1. Phil,

    It can be hard to generate a key counter in ssis and the row number transformation makes live a little easier.

    But why not use an identity column in the destination?

  2. It’s not really hard as I’ve illustrated.

    The issue (and why I hate identity columns) is that the data is more portable when you generate your own surrogate keys. When using identity columns and have referential integrity enforced, moving data from one landscape to another generally breaks that integrity. This is especially important in fact/dimension modeling. When using your own counter as I’ve talked about (I’m not the pioneer in this concept, by the way) it is quite simple and accurate to copy the data from one landscape to another without breaking the integrity.

  3. Hi Phil,

    I wonder if the red arrow contents new dim data to be inserted. And It might comes from the Checksum transformation.

    In your scenario, every new record will be assigned a new surrogate key after it has gone thru the Script component, will it?

    Are you sure that the select statment always return the maxValue exactly when the data haven’t arrived to the destinestion yet.

    thank you for sharing experience,

    khanhmy

  4. Hi Phil,

    Generating surrogate key is very very cool!

    Thanks alot,

  5. Khanhmy,
    The red arrow in this case is coming off of a lookup transformation and indicates NEW records that do not exist in the destination.

    Every NEW record will pass through the script transformation and hence obtain a new, unique surrogate key.

    I am positive that the select statement will always return the maxValue correctly, because it happens before the dataflow. The select statement is needed to see what the current maximum surrogate key is in the table before we start processing records. No other process touches the destination table (in my case) so yes, it is correct in its operation.

  6. Hi Phil,

    I’ve already inserted new rows into DimTable using “generating surrogate key”. Everything runs well. :) :P :D

    I’ve read the topic: “Checking to see if a record exists and if so update else insert” many times.

    Unfortunely, in my case, existing data needs update as historical data and then all changed rows are inserted as current data.

    Would you give me some suggestions not to route notEqualChecksum output to staging table?

    In my scenario, the notEqualChecksum output first goes straight to the SQLCommand for update after that to staging destination.

    For inserting changed rows, I must get the MaxKey and generate surrogate keys again.

    I have not much changed data to store ouside the buffer. So, It seem I’m going on the long way. :(

    Hope you can help me! Thank you!

    khanhmy

  7. Thanks – this is a great solution and tutorial.

    The function should have been in the product to start with.
    You have used to opportunity to present a handy introduction to the data transformation script component.

  8. The solution is very good. Thanks for it.
    But you mentioned that Row Number Transformation task can also be used. I don’t know how. If I want to use incremental data extraction I have to pass the biggest surrogate key (already ditributed to the dimension table) to the Row Number Transformation’s Seed property. But it is not exposed through Expression so I can’t change its value dynamicly. Maybe I miss something or the task is quite inflexible.

  9. We are running into issues when generating surrogate keys in Sql Server 2008. the code works great with SQl server 2005.

    The surrogate key does not get passed to the package.

    Any ideas????

  10. Hi,

    This seems to be a good solution. But Can we have some custom component to generate incremental value so that the logic can be reused, instead of using SQLTask and script component in each packages.

  11. I have the same problem as AS. In SSIS 2008, I can’t get the key from the script into the OLE DB Destination task. When I put a dataviewer between the Script Transformation task and the OLE DB Destination task, it doesn’t show anything.

    Thanks,
    Bob

  12. Good blog post. I certainly love this website. Stick with it!

  13. Thanks for news , I like it. And in general, I guess the stuff is cool .I am not sure that site already helped me with what I look for because my blog is about flowers in Ukraine, but this is good start. For sure it is good start and I shall get back here often . Please do not stop. Cannot get enough of your content!

  14. You are so cool! I don’t believe I’ve read a single thing like this
    before. So good to discover someone with a few genuine thoughts on
    this subject. Seriously.. thanks for starting this up.
    This site is one thing that is required on the internet, someone
    with some originality!

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php