Sep 172009
 

When using a Merge or Merge Join component in a data flow, your incoming data is required to be sorted. While it may be easy to drop a Sort component into your data flow, it may make more sense to perform the sort in your source query (if you are using a source such as an OLE DB Source component).

If you decide to use an ORDER BY in your source and want to tell the SSIS Data Flow that your data is sorted, follow the below steps:

First, ensure that your data is being sorted with the ORDER BY clause. Make note of which column(s) are in the sort and their positions.

Second, open the Advanced Editor for the OLE DB Source component by right-clicking on the OLE DB Source and selecting “Show Advanced Editor…”

Then, click on the “Input and Output Properties” tab.

Then highlight the “OLE DB Source Output Properties” branch and set the “IsSorted” property to True.

IsSorted = True

Next, expand the “OLE DB Source Output Properties” branch. From here, select the first column under “Output Columns” that corresponds to the first column in your ORDER BY clause. Set its “SortKeyPosition” property to the number 1.

SortKeyPosition

Repeat the last step for each column in your ORDER BY clause, advancing the SortKeyPosition by one each time. (1,2,3,…)

That’s it. Now the SSIS Data Flow will understand and acknowledge that the incoming data is sorted and will not require a Sort Component between the source and a Merge Join component, for instance.

  15 Responses to “SSIS – Avoiding the Sort Components”

  1. Hi Phil,

    Great post and a very important area to highlight. As I see it the big problem with the SORT component is that the bigger the sort the slower it gets – and the performance degradation isn’t linear it’s n log n. Invariably users don’t find this out till they hit the full production volume by which time the Solution has been built and tested. The existence of a SORT inside a dataflow is definitely a “smell” for me.

    How I wish the SSIS team would buy NSORT for the product…

    Cheers, James

  2. Thanks for the post, Phil. I agree that sorting on the database side is faster and always preferred. Unfortunately you still have to resort to the Sort component when using flat files as a data source. The one bonus is that the Sort component also lets you remove duplicate values in your sort key. So, if you are trying to do a ‘SELECT DISTINCT’ from a flat file, Sort is your friend.

  3. Ryan,
    That is not true – you can still use the above techniques for flat files. The IsSorted property is available for flat files as well.

  4. Phil,
    The IsSorted is available for flat files, but that is only useful if the data on the flat file is sorted, right?
    what if the flat file data is not sorted? then we can’t specify on ORDER BY, and so we have to use the Sort component.
    Correct me if I’m wrong?
    thanks.

  5. Guna,
    Yes, that’s correct. Only set the IsSorted property to true if your source is, well, sorted. Actually, to be more specific, use this property only if your SortKeyPosition columns are specified as the source is actually sorted. In other words if the source is sorted on the first column in the file, but you need to use the Merge Join component and joining on the second column in the source file, you should not use the IsSorted property, if the file is only sorted by the first column.

    If your flat file or database source is not sorted, then setting IsSorted to True can have negative effects when trying to use components like the Merge or Merge Join.

  6. Hi Phil/Guna,

    Since i am very new to SSIS i need some clarifications.
    Correct me if i am wrong as per the above discussion setting the IsSortProperty to true indicates Merge Join component that the data is Sorted form where as the data internally would not be sorted ???

    I have used two OLE DB Source wher i have set the IsSorted Property to true and also i have set sort Key Position,the rows coming out of the Merge join here is 4 which is Incorrect.
    But If use the Sort tranfomation after this OLE DB Source i am getting 13 rows which is the Proper result.

    So can u please tell me the difference between the IsSortedProperty and the Sort Tranfomation.

    Thank u in advance.

    Santhosh Hiriyanna

  7. This article was helpful, thanks..

  8. Its Really helpful.

  9. thanks..it’sreally helpful

  10. it’s helpful…thanks

  11. Thank you!!!! Life saver on the shared knowledge! ~ Paul

  12. Thanks a lot mann….very very useful comment.

  13. [...] inputs must be sorted. Ideally, this sort can be pushed into the source query. If the data isn’t already sorted (i.e. no indexes), the cost of the sort might outweigh the [...]

  14. good job,Its Really helpful :)

  15. Santhosh,

    I beleive the oledb source didn’t have the sorted data but you are cheating the merge transform by providing the is sorted option true and setting the sort key position to 1. thats why the merge transform unknowingly cheated you by giving wrong result.

    *check your oledb source data is sorted or not.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php