“Why does my data flow take so long to execute when I use an OLE DB Command component?
“I need to update values in another table for every row in my data flow.”

These are common questions I see when dealing with “slow” performance of an SSIS package. Sometimes though, the developer of the package does not even understand which component is taking so long to process say, thousands of records in their data flow, so it is understandable why this component may proliferate in many shops. (Sometimes it is needed, of course, but you need to always ask the question – do I need the updates to happen RIGHT NOW?)

If you have a need to update data in another table, while still pushing data from a source to a destination, the OLE DB Command is often used because of the thought that it is part of the data flow, so why not use it.

Read on for more….

The trouble with the OLE DB Command component is that it executes whatever command you have provided for each and every row that passes through it. The data flow is designed to operate on buffers of data, which contain by default around 10,000 rows. If you are simply moving data from one table to another, this concept works really well and quite fast too, I might add.

However, if you need to update data, should you use the OLE DB Command component in that data flow? Are there other options? The answers of course are “maybe” and “yes.”

If you simply need to update data in a table and you don’t care when it happens, then I would recommend not using an OLE DB Command component at all in your data flow. Instead, I recommend you build a staging table on your destination database server, and populate that table with your “update” data. Then, in the control flow, after the data flow successfully executes, you can issue an Execute SQL Task to perform a batch update.

This recommendation is built on the fact that using an OLE DB Command component ensures that your update operations are done row-by-row, rather than in a batch operation.

Why? How much faster is the batch operation compared to the OLE DB Command operation? Well, the answer depends a bit, but in my simple test below, the time to update 5,000 records took 15 minutes using the OLE DB Command component and 4 seconds (yup – SECONDS) to populate a staging table and then perform a batch update.

Here is the setup I used:

CREATE TABLE numbers (
  number int not null
)
GO

insert into numbers (number)
select number from
(
select row_number() over (order by s1.name) as number
  from sysobjects s1
 cross join sysobjects s2
 cross join sysobjects s3
 cross join sysobjects s4
 cross join sysobjects s5
) as numbers
where number between 1 and 1000000;
GO

select number
  from numbers
 where number < 5003;
GO

/*
number
1
2
3
4
5
...
4998
4999
5000
5001
5002
*/

Here is a picture of the first SSIS package.

OLE DB Command Data Flow setup

I used the following SQL for the OLE DB Source component:

SELECT number
  FROM numbers
 WHERE number < 5001
 ORDER BY number DESC

I used the following expression in the Derived Column component to create a new column, new_number:

number + 1

I used the following SQL in the OLE DB Command component:

update numbers
  set number = ?
 where number = ?

In the OLE DB Command "Column Mappings" tab, I mapped the incoming column, new_number, to the Param_0 parameter, and the incoming column, number, to the parameter, Param_1.

I then executed the package. Time to execute: 15 minutes, 43 seconds.

OLE DB Command Data Flow setup

Next I tried the batch update approach. Use the same setup as above, except replace the OLE DB Command component with an OLE DB Destination, and push the two data flow columns to a staging table.

Here is the data flow layout:
OLE DB Command Data Flow setup

Here is the staging table DDL:

CREATE TABLE staging_numbers (
  orig_number int not null,
  new_number int not null
)

Map the data flow column, number, to the orig_number field in the staging table. Map the data flow column, new_number, to the new_number field in the staging table, as shown below:
OLE DB Command Data Flow setup

Then, in the control flow, hook up an Execute SQL Task to the new Data Flow task, so that the Execute SQL Task executes after the Data Flow, as shown below:
OLE DB Command Data Flow setup

The Execute SQL Task SQL is below:

update numbers
  set number = staging_numbers.new_number
from numbers, staging_numbers
where numbers.number = staging_numbers.orig_number

Upon executing the package, the execution time shrank to under five seconds:
OLE DB Command Data Flow setup

So, the moral of this story is that whenever you think you need to use an OLE DB Command component, ask yourself, "Do I *really* need to use this component, or should I use a different approach?"