Sep 042009
 

There are many, many resources out on the ‘Net regarding SSIS and the Lookup component and what each of its cache modes are and how to implement them in your own package. This is going to be a technical post, for those of you interested in what each cache mode does behind the scenes.

For this post, use the following schema and data:

create table fact_sales 
(id int identity(1,1),
 sales_rep_id int,
 sales_dollars decimal(18,2)
)

create table dim_sales_rep 
( id int identity(1,1),
  first_name varchar(30),
  last_name varchar(50)
  )
  
insert into fact_sales (sales_rep_id, sales_dollars) values (1,120.99);
insert into fact_sales (sales_rep_id, sales_dollars) values (2,24.87);
insert into fact_sales (sales_rep_id, sales_dollars) values (3,98.11);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,70.64);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,114.19);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,37.00);
insert into fact_sales (sales_rep_id, sales_dollars) values (5,401.50);

insert into dim_sales_rep (first_name, last_name) values ('John','Doe');
insert into dim_sales_rep (first_name, last_name) values ('Jane','Doe');
insert into dim_sales_rep (first_name, last_name) values ('Larry','White');
insert into dim_sales_rep (first_name, last_name) values ('Carrie','Green');
insert into dim_sales_rep (first_name, last_name) values ('Adam','Smith');

FULL Cache Mode
First, it is always advisable to build a query for the lookup, instead of choosing a table in the Table/View drop-down. The primary reason is so that you can limit the resultset to only the columns needed to perform the lookup as well as return any columns needed downstream, and to have the ability to add a WHERE clause if needed.

The full cache mode will run the specified query (or its own depending on how you assigned the lookup table) and attempt to cache all of the results. It will execute this query very early on in the package execution to ensure that the first set of rows coming out of the source(s) are cached. If SSIS runs out of memory on the machine though, the data flow will fail as the lookup component will not spool its memory overflow to disk. Be cautious of this fact. Once the data is cached, the lookup component will not go back to the database to retrieve its records, so long as the data flow is not restarted. (In SQL Server 2008, you can now reuse lookup caches.)

Using SQL Profiler, you can see that only one database call is made:

declare @p1 int
set @p1=1
exec sp_prepare @p1 output,NULL,N'select sales_rep_id, sales_dollars
 from fact_sales',1
select @p1
go
exec sp_execute 1
go
SET NO_BROWSETABLE ON
go
declare @p1 int
set @p1=1
exec sp_prepare @p1 output,NULL,N'select id, first_name, last_name
from dim_sales_rep',1
select @p1
go
exec sp_execute 1
go
exec sp_unprepare 1
go
exec sp_unprepare 1
go

PARTIAL Cache Mode
Partial cache mode will not execute a query immediately at package execution. Instead, it will wait until its first input row arrives. Once the row arrives, whatever lookup value (in this case, sales_rep_id) is being passed in, will get substituted for a parameter, and then SSIS will send the query to the database for retrieval. At this point, all of the data returned will be cached for future lookups. If a new sales_rep_id is encountered, then the query will have to be re-executed, and the new resultset will get added to the lookup cache.

In other words, in the above data, if my source is “select sales_rep_id, sales_dollars from fact_sales”, we should have five database calls made by the lookup component. Even though for sales_rep_id = 4 we have three entries, in partial cache mode the first time we retrieve the lookup records for sales_rep_id = 4, the results will be cached, allowing future occurrences of sales_rep_id = 4 to be retrieved from cache.

This is illustrated in the SQL Profiler data:

exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',1
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',2
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',3
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',5
go
exec sp_unprepare 1
go


In the above data, you can see at the end each sales_rep_id being passed in. Note that we only have one line for sales_rep_id = 4. That’s because the remaining two records were bounced against the lookup cache, avoiding a trip to the database.

NO Cache Mode
Using the NO Cache Mode will essentially tell SSIS that you want each incoming row (from fact_sales in this case) to be bounced against the database. Since we have seven fact_sales rows, we will see seven calls to the database – MOST of the time. It is important to note that even though we are telling the lookup component to avoid caching rows, it will keep the last match in memory and use it for the next comparison. If the next comparison’s key value matches the value still in memory, a database call is avoided, and the value is carried forward.

In our example data above, if we sort by sales_rep_id, we will still only have five calls to the database because after we lookup our first value of sales_rep_id = 4, it will be reused for the subsequent lookups for sales_rep_id = 4. If we sort our data by sales_dollars, we will have six database calls, because only two sales_rep_id = 4 records are together and hence the first lookup is only used once.

Here is a simple table illustrating each no cache example mentioned above:

SALES_REP_ID, SALES_DOLLARS, LOOKUP DATABASE CALL Y or N
1       120.99    Y
2       24.87     Y
3       98.11     Y
4       70.64     Y
4       114.19    N
4       37.00     N
5       401.50    Y


SALES_REP_ID, SALES_DOLLARS, LOOKUP DATABASE CALL Y or N
2      24.87      Y
4      37.00      Y
4      70.64      N
3      98.11      Y
4      114.19     Y
1      120.99     Y
5      401.50     Y

The SQL Profiler data for the second example above is here:

exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',2
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',3
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',1
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',5
go

  29 Responses to “SSIS – Lookup Cache Modes – Full, Partial, None”

  1. Hi, thanks for this great article.
    What would your recommended lookup caching strategy if I’m looking to do Upserts on a database.
    My dataflow would probably be a lookup that decides if a row exists, and then if it isn’t the rows are redirected to a OLEDB (fast load) insert component, if it exists, the rows go to an OLEDB command component that performs update statements.
    It is possible that the data flowing through the data flow have duplicate Ids..
    how can I configure the lookup to ensure it detects not only duplicates on disk at the start of import?

    thanks heaps.

  2. Thats great article!

    thanks a lot :)

  3. Great article! Very helpful, indeed.

  4. Helpful!

  5. Excellent article, very helpful..

  6. Nice one

  7. Nice article.

  8. Excellent way of articulating the differences

  9. Good one,
    very useful info for my current implementation.

  10. Hello ? excellent website you’ve produced. We loved scanning this posting. I did want to write the remark in order to tell you just how the design of this site is very great looking. I used to be a graphic designer, now I am a duplicate editor. I’ve always enjoyed working with computers and ‘m trying to learn code in my free time.

  11. hello friends, how are you I wish you the best conversations

  12. This web site is really a walk-via for all the data you wished about this and didn’t know who to ask. Glimpse right here, and you’ll positively uncover it.

  13. An impressive share, I just given this onto a colleague who was doing a little analysis on this. And he in fact bought me breakfast because I found it for him.. smile. So let me reword that: Thnx for the treat! But yeah Thnkx for spending the time to discuss this, I feel strongly about it and love reading more on this topic. If possible, as you become knowledgeable, would you mind updating your blog with more information? It is highly helpful for me. Big thumb up for this blog post!

  14. Very good article with straight forword information.

  15. Hello,

    I just wrote an blog post about the importance of creating a covering index when using the Lookup component in No Cached or Partial Cache modes.

    See: http://www.sqlbadpractices.com/ssis-non-cached-lookups-without-a-covering-index/

  16. Hi,
    I am very new to this technology .The article is good.It is more useful if you could show some snapshot of the lookup properties as well
    Thanks,
    Nidhin

  17. Laughter would be bereaved if snobbery died.

  18. Nice Article !!

    Regards
    Naveen

  19. Really Great I understand a lot :-)

  20. Good article and easy to understand.

  21. Awesome explanation, thank you so much!

  22. the article was short and easy to understand.
    thanks

  23. I could ask a lot of questions about this information, but you have laid everything out on the table here to the point it is easy to grasp. I agree with much of your article.

  24. I like material like this. This is a great article and I really enjoyed reading it. You have an original style that makes your ideas stand out from other writers.

  25. Zune and iPod: Most people compare the Zune to the Touch, but after seeing how slim and surprisingly small and light it is, I consider it to be a rather unique hybrid that combines qualities of both the Touch and the Nano. It’s very colorful and lovely OLED screen is slightly smaller than the touch screen, but the player itself feels quite a bit smaller and lighter. It weighs about 2/3 as much, and is noticeably smaller in width and height, while being just a hair thicker.

  26. Very nice,simple and informative

  27. Most helpful!

  28. Great Article .Please share some knowledge about CCM.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php