Mar 072007
 

Jamie Thomson and I briefly discussed an issue I was having today with respect to marking sources as sorted (IsSorted=True) and then to later have the IsSorted property changed back to “False” as a result of a derived column transformation. Read on for more…

Here’s the setup:

IsSorted Data Flow Layout

Basically, I have an OLE DB Source which is sorted via a SQL query. The flat file is also sorted, but has trailing spaces in the fields that I’m interested in using as a key. The idea here is that I need to “Merge Join” the two sources together, but do not want the expense of a “Sort” inside the package. The cost of performing the sort has already been done upfront, so why perform a sort inside SSIS? Next, I cannot reliably use a “Merge Join” because the key fields won’t match due to trailing spaces coming from the Flat File Source, so I have to trim them.

With that said, both sources have their IsSorted property set to true:

IsSorted Property

The Flat File source is setup similarly. The key fields under OLE DB Source Output -> Output Columns have their SortKeyPosition set accordingly:

IsSorted Property 2

From here, I could hook up both sources to the merge join no problem. However, as stated before, I can’t reliably do that because of the trailing spaces on the key field from the flat file source. Adding the derived column to perform a TRIM() on the key flat file field, will result in the merge join not working any more as a result of the IsSorted property is now “False.”

IsSorted Data Flow Layout with Merge Join

In talking with Jamie, he states that this is logical. Because I am REPLACING the key field with its trimmed version, SSIS *MUST* assume that the value has changed and hence is no longer sorted.

So, how do I get around this? Well, two ways really. I could take my flat file processing out and perform that in another data flow to trim the fields and store the results in a second, flat (or raw) file (or even a staging table). From there, I could just read in that second file and go straight into the merge join. The second option I have is to include the “Sort Transformation” back into the dataflow, between the derived column and the merge join transformations. I didn’t want to do this because of buffer reasons, so from my perspective, I may go with the approach of splitting the flat file processing off and into its own data flow.

Thanks, Jamie, for your feedback today.

  5 Responses to “Derived Columns and IsSorted=True”

  1. No worries Phil.

    I have an issue with this actually. If you know that the output is still sorted by the sort keys on the input then it would help to be able to specify, via the advanced editor, that the output is sorted.

    Unfortunately you cannot do this because the derived column component is synchronous.

    Its not a bug, more of an anomoly. And we’re stuck with it I’m afraid.

    You do have another option. Instead of replacing the original column add the derived value as a new column. The output will still be sorted although te new column will not be part of the Sort Key. In many cases though that won’t be a problem. It depends on your scenario.

    -Jamie

  2. Hello.
    My name is Daeseong Han and new SQL Server MVP, Korean.

    Here is my new trick about this problem.

    http://www.sqlleader.com/mboard.asp?exec=view&strBoardID=DSHan&intSeq=1033

    If you need to login my web site(sqlleader.com), use id and password – test / test.

    – D.S Han

  3. Good post. In all these discussions the thing that seems missing is the realisation that simplicity can go further still. Why ‘click’ to annouce you’ve arrived when bluetooth/GPS/other can do it automatically.

  4. Inforgraphic – Smartphone Influencing Factors

  5. The Only Way Is Essex S08E11 PDTV XviD-TM

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php