<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SSIS Talk &#187; SSIS Advanced Techniques</title>
	<atom:link href="http://www.ssistalk.com/category/ssis/ssis-advanced-techniques/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ssistalk.com</link>
	<description>Random thoughts and experiences with SSIS, by Phil Brammer</description>
	<lastBuildDate>Wed, 01 Feb 2012 12:48:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>SSIS &#8211; Lookup Cache Modes &#8211; Full, Partial, None</title>
		<link>http://www.ssistalk.com/2009/09/04/ssis-lookup-cache-modes-full-partial-none/</link>
		<comments>http://www.ssistalk.com/2009/09/04/ssis-lookup-cache-modes-full-partial-none/#comments</comments>
		<pubDate>Fri, 04 Sep 2009 14:32:44 +0000</pubDate>
		<dc:creator>Phil Brammer</dc:creator>
				<category><![CDATA[SSIS]]></category>
		<category><![CDATA[SSIS Advanced Techniques]]></category>
		<category><![CDATA[SSIS Data flow]]></category>

		<guid isPermaLink="false">http://www.ssistalk.com/?p=162</guid>
		<description><![CDATA[There are many, many resources out on the &#8216;Net regarding SSIS and the Lookup component and what each of its cache modes are and how to implement them in your own package.  This is going to be a technical post, for those of you interested in what each cache mode does behind the scenes.

For [...]]]></description>
			<content:encoded><![CDATA[<p>There are many, many resources out on the &#8216;Net regarding SSIS and the Lookup component and what each of its cache modes are and how to implement them in your own package.  This is going to be a technical post, for those of you interested in what each cache mode does behind the scenes.</p>
<p><span id="more-162"></span></p>
<p>For this post, use the following schema and data:<br />
<code>
<pre>
create table fact_sales
(id int identity(1,1),
 sales_rep_id int,
 sales_dollars decimal(18,2)
)

create table dim_sales_rep
( id int identity(1,1),
  first_name varchar(30),
  last_name varchar(50)
  )

insert into fact_sales (sales_rep_id, sales_dollars) values (1,120.99);
insert into fact_sales (sales_rep_id, sales_dollars) values (2,24.87);
insert into fact_sales (sales_rep_id, sales_dollars) values (3,98.11);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,70.64);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,114.19);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,37.00);
insert into fact_sales (sales_rep_id, sales_dollars) values (5,401.50);

insert into dim_sales_rep (first_name, last_name) values ('John','Doe');
insert into dim_sales_rep (first_name, last_name) values ('Jane','Doe');
insert into dim_sales_rep (first_name, last_name) values ('Larry','White');
insert into dim_sales_rep (first_name, last_name) values ('Carrie','Green');
insert into dim_sales_rep (first_name, last_name) values ('Adam','Smith');
</pre>
<p></code></p>
<p><strong>FULL Cache Mode</strong><br />
First, it is always advisable to build a query for the lookup, instead of choosing a table in the Table/View drop-down.  The primary reason is so that you can limit the resultset to only the columns needed to perform the lookup as well as return any columns needed downstream, and to have the ability to add a WHERE clause if needed.</p>
<p>The full cache mode will run the specified query (or its own depending on how you assigned the lookup table) and attempt to cache all of the results.  It will execute this query very early on in the package execution to ensure that the first set of rows coming out of the source(s) are cached.  If SSIS runs out of memory on the machine though, the data flow will fail as the lookup component will not spool its memory overflow to disk.  Be cautious of this fact.  Once the data is cached, the lookup component will not go back to the database to retrieve its records, so long as the data flow is not restarted.  (In SQL Server 2008, you can now reuse lookup caches.)</p>
<p>Using SQL Profiler, you can see that only one database call is made:<br />
<code>
<pre>
declare @p1 int
set @p1=1
exec sp_prepare @p1 output,NULL,N'select sales_rep_id, sales_dollars
 from fact_sales',1
select @p1
go
exec sp_execute 1
go
SET NO_BROWSETABLE ON
go
declare @p1 int
set @p1=1
exec sp_prepare @p1 output,NULL,N'select id, first_name, last_name
from dim_sales_rep',1
select @p1
go
exec sp_execute 1
go
exec sp_unprepare 1
go
exec sp_unprepare 1
go
</pre>
<p></code></p>
<p><strong>PARTIAL Cache Mode</strong><br />
Partial cache mode will not execute a query immediately at package execution.  Instead, it will wait until its first input row arrives.  Once the row arrives, whatever lookup value (in this case, sales_rep_id) is being passed in, will get substituted for a parameter, and then SSIS will send the query to the database for retrieval.  At this point, all of the data returned will be cached for future lookups.  If a new sales_rep_id is encountered, then the query will have to be re-executed, and the new resultset will get added to the lookup cache.</p>
<p>In other words, in the above data, if my source is &#8220;select sales_rep_id, sales_dollars from fact_sales&#8221;, we should have five database calls made by the lookup component.  Even though for sales_rep_id = 4 we have three entries, in partial cache mode the first time we retrieve the lookup records for sales_rep_id = 4, the results will be cached, allowing future occurrences of sales_rep_id = 4 to be retrieved from cache.</p>
<p>This is illustrated in the SQL Profiler data:<br />
<code>
<pre>
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',1
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',2
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',3
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',5
go
exec sp_unprepare 1
go</pre>
<p></code><br />
In the above data, you can see at the end each sales_rep_id being passed in.  Note that we only have one line for sales_rep_id = 4.  That&#8217;s because the remaining two records were bounced against the lookup cache, avoiding a trip to the database.</p>
<p><strong>NO Cache Mode</strong><br />
Using the NO Cache Mode will essentially tell SSIS that you want each incoming row (from fact_sales in this case) to be bounced against the database.  Since we have seven fact_sales rows, we will see seven calls to the database &#8211; MOST of the time.  It is important to note that even though we are telling the lookup component to avoid caching rows, it will keep the last match in memory and use it for the next comparison.  If the next comparison&#8217;s key value matches the value still in memory, a database call is avoided, and the value is carried forward.  </p>
<p>In our example data above, if we sort by sales_rep_id, we will still only have five calls to the database because after we lookup our first value of sales_rep_id = 4, it will be reused for the subsequent lookups for sales_rep_id = 4.  If we sort our data by sales_dollars, we will have six database calls, because only two sales_rep_id = 4 records are together and hence the first lookup is only used once.</p>
<p>Here is a simple table illustrating each no cache example mentioned above:<br />
<code>
<pre>SALES_REP_ID, SALES_DOLLARS, LOOKUP DATABASE CALL Y or N
1       120.99    Y
2       24.87     Y
3       98.11     Y
4       70.64     Y
4       114.19    N
4       37.00     N
5       401.50    Y
</pre>
<p></code><br />
<code>
<pre>SALES_REP_ID, SALES_DOLLARS, LOOKUP DATABASE CALL Y or N
2      24.87      Y
4      37.00      Y
4      70.64      N
3      98.11      Y
4      114.19     Y
1      120.99     Y
5      401.50     Y
</pre>
<p></code></p>
<p>The SQL Profiler data for the second example above is here:<br />
<code>
<pre>exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',2
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',3
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',1
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',5
go</pre>
<p></code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssistalk.com/2009/09/04/ssis-lookup-cache-modes-full-partial-none/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>ETL World Record &#8211; 1 terabyte loaded in 30 minutes</title>
		<link>http://www.ssistalk.com/2009/03/15/etl-world-record-1-terabyte-loaded-in-30-minutes/</link>
		<comments>http://www.ssistalk.com/2009/03/15/etl-world-record-1-terabyte-loaded-in-30-minutes/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 03:45:02 +0000</pubDate>
		<dc:creator>Phil Brammer</dc:creator>
				<category><![CDATA[SSIS]]></category>
		<category><![CDATA[SSIS Advanced Techniques]]></category>
		<category><![CDATA[SSIS Data flow]]></category>

		<guid isPermaLink="false">http://www.ssistalk.com/?p=127</guid>
		<description><![CDATA[I know, this isn&#8217;t &#8220;breaking&#8221; news or anything, but what is new is the white paper detailing how Microsoft was able to achieve this record breaking speed using SSIS.  Check it out below as its a very interesting read, and it may help generate some new ideas for your implementations.
http://blogs.msdn.com/sqlperf/archive/2009/03/03/an-etl-world-record-revealed-finally.aspx
]]></description>
			<content:encoded><![CDATA[<p>I know, this isn&#8217;t &#8220;breaking&#8221; news or anything, but what is new is the white paper detailing how Microsoft was able to achieve this record breaking speed using SSIS.  Check it out below as its a very interesting read, and it may help generate some new ideas for your implementations.</p>
<p><a href="http://blogs.msdn.com/sqlperf/archive/2009/03/03/an-etl-world-record-revealed-finally.aspx">http://blogs.msdn.com/sqlperf/archive/2009/03/03/an-etl-world-record-revealed-finally.aspx</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssistalk.com/2009/03/15/etl-world-record-1-terabyte-loaded-in-30-minutes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SSIS &#8211; Using the API</title>
		<link>http://www.ssistalk.com/2008/12/31/ssis-using-the-api/</link>
		<comments>http://www.ssistalk.com/2008/12/31/ssis-using-the-api/#comments</comments>
		<pubDate>Wed, 31 Dec 2008 23:35:05 +0000</pubDate>
		<dc:creator>Phil Brammer</dc:creator>
				<category><![CDATA[SSIS Advanced Techniques]]></category>

		<guid isPermaLink="false">http://www.ssistalk.com/?p=104</guid>
		<description><![CDATA[From the SSIS development team, Matt Masson has posted a few blog posts on how to use the SSIS API.  The posts use references to the 2008 version of SSIS, but to modify them for 2005 requires a simple change in most cases &#8211; Upgrading custom SSIS 2005 components to 2008.
You can find the [...]]]></description>
			<content:encoded><![CDATA[<p>From the SSIS development team, Matt Masson has posted a few blog posts on how to use the SSIS API.  The posts use references to the 2008 version of SSIS, but to modify them for 2005 requires a simple change in most cases &#8211; <a href="http://blogs.msdn.com/mattm/archive/2007/06/05/katmai-custom-components-and-upgrade.aspx">Upgrading custom SSIS 2005 components to 2008</a>.</p>
<p>You can find the blog posts here: <a href="http://blogs.msdn.com/mattm/archive/2008/12/30/samples-for-creating-ssis-packages-programmatically.aspx">http://blogs.msdn.com/mattm/archive/2008/12/30/samples-for-creating-ssis-packages-programmatically.aspx</a></p>
<p>Also, there is a post on using a new API framework for SSIS 2008, titled EzAPI: <a href="http://blogs.msdn.com/mattm/archive/2008/12/30/ezapi-alternative-package-creation-api.aspx">http://blogs.msdn.com/mattm/archive/2008/12/30/ezapi-alternative-package-creation-api.aspx</a></p>
<p>I haven&#8217;t looked into the EzAPI yet, but it certainly sounds interesting.</p>
<p>Let me know what you think about the new posts and if you&#8217;d like to see any thing else from the dev team and I&#8217;ll pass it along.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssistalk.com/2008/12/31/ssis-using-the-api/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SSIS &#8211; Dynamic substrings based on control table</title>
		<link>http://www.ssistalk.com/2007/04/06/ssis-dynamic-substrings-based-on-control-table/</link>
		<comments>http://www.ssistalk.com/2007/04/06/ssis-dynamic-substrings-based-on-control-table/#comments</comments>
		<pubDate>Fri, 06 Apr 2007 20:49:38 +0000</pubDate>
		<dc:creator>Phil Brammer</dc:creator>
				<category><![CDATA[SSIS Advanced Techniques]]></category>

		<guid isPermaLink="false">http://www.ssistalk.com/2007/04/06/ssis-dynamic-substrings-based-on-control-table/</guid>
		<description><![CDATA[Last night a user posted to the SSIS Forum a situation where he needed to be able to dynamically substring one field based on the substring rules contained in a table.  So I put together a package that does just this.  Before we go there though, I just want to mention that there [...]]]></description>
			<content:encoded><![CDATA[<p>Last night a user <a href="http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1441283&amp;SiteID=1" target="_blank">posted</a> to the <a href="http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=80&amp;SiteID=1" target="_blank">SSIS Forum</a> a situation where he needed to be able to dynamically substring one field based on the substring rules contained in a table.  So I put together a package that does just this.  Before we go there though, I just want to mention that there are many ways, progmatically of course, to tackle this problem.  The example below strictly follows my interpretations of Bill&#8217;s challenge.  There is a better way by using the split() function, but never-the-less here&#8217;s the example using substring().</p>
<p><span id="more-47"></span> First, here&#8217;s the setup:</p>
<p><code>CREATE TABLE [dbo].[segmentControl](<br />
[delim] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,<br />
[Seg1Start] [int] NULL,<br />
[Seg1Len] [int] NULL,<br />
[Seg2Start] [int] NULL,<br />
[Seg2Len] [int] NULL,<br />
[Seg3Start] [int] NULL,<br />
[Seg3Len] [int] NULL,<br />
[Seg4Start] [int] NULL,<br />
[Seg4Len] [int] NULL,<br />
[Seg5Start] [int] NULL,<br />
[Seg5Len] [int] NULL<br />
) ON [PRIMARY]</code></p>
<p><code>insert into segmentControl<br />
values (':',1,3,5,3,9,2,12,3,16,3)</code></p>
<p><code>CREATE TABLE [dbo].[segmentTest](<br />
[org] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL<br />
) ON [PRIMARY]</code></p>
<p><code>insert into segmentTest<br />
values ('aaa:aaa:ab:213:azz')<br />
insert into segmentTest<br />
values ('aaa:aab:cc:983:zza')</code></p>
<hr /> That sets up our tables. segmentControl contains the rules for the substring function for each segment, up to five.  Bill mentioned that five was the most he&#8217;d have, so I&#8217;ve followed suit.  There is a delimiter in that table as well, but it&#8217;s not as important as the data we&#8217;re working on (segmentTest table, org column) contains the same delimiter.  No real reason to use it for the purposes of spliting the field up.Next, we&#8217;ll select the delimiter (in this case, there will only ever be one row in the segmentControl table) from the segmentControl table for use later. Create a package scoped variable, delimiterChar, and make it a string type.  Next add an Execute SQL task to the control flow.  The SQL will be simply, &#8220;select delim from segmentControl&#8221;.  As usual, click on the photos for the larger version.<a href="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment3.jpg" title="SSIS - Dynamic substring 01"><img src="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment3.jpg" alt="SSIS - Dynamic substring 01" height="382" width="408" /></a></p>
<p>Next, add a data flow.  It should look like this:</p>
<p><a href="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment1.jpg" title="SSIS - Dynamic substring 02"><img src="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment1.jpg" alt="SSIS - Dynamic substring 02" /></a></p>
<p>Inside the data flow, add an OLE DB source and hook it up to segmentTest table.  Hook that up to a derived column transformation and add six fields: Segment1, Segment2, &#8230;, Segment5, delimiterChar.  The setup is below:</p>
<p><a href="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment5.jpg" title="SSIS - Dynamic substring 03"><img src="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment5.jpg" alt="SSIS - Dynamic substring 03" height="392" width="420" /></a></p>
<p>From here, go into a lookup component.  Join on the delimiter fields, and return all fields from the segmentControl table except for its delim column.</p>
<p><a href="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment7.jpg" title="SSIS - Dynamic substring 04"><img src="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment7.jpg" alt="SSIS - Dynamic substring 04" /></a></p>
<p>Coming out of the lookup component, go into a script component.  Select all fields in the input screen.  Edit the script, and use script linked at the bottom of this page.  Coming out of the script component you can hook it up to whatever you wish.  In this example, I used a Row Count component so that I could simply use a data viewer to see the results.</p>
<p><a href="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment9.jpg" title="SSIS - Dynamic substring 05"><img src="http://www.ssistalk.com/wp-content/uploads/2007/04/ssis_segment9.jpg" alt="SSIS - Dynamic substring 05" height="314" width="438" /></a></p>
<p>And that&#8217;s about it.  The most complex part of this is the script, and credit goes out to the SSIS forum user, jaegd, for posting the script that I used as the foundation for this problem.</p>
<p>Script source: <a href="http://www.ssistalk.com/wp-content/uploads/2007/04/dynamicsubstring.txt" title="dynamicsubstring.txt">dynamicsubstring.txt</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssistalk.com/2007/04/06/ssis-dynamic-substrings-based-on-control-table/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

