Jamie Thomson and I briefly discussed an issue I was having today with respect to marking sources as sorted (IsSorted=True) and then to later have the IsSorted property changed back to “False” as a result of a derived column transformation. Read on for more…
Here’s the setup:
Basically, I have an OLE DB Source which is sorted via a SQL query. The flat file is also sorted, but has trailing spaces in the fields that I’m interested in using as a key. The idea here is that I need to “Merge Join” the two sources together, but do not want the expense of a “Sort” inside the package. The cost of performing the sort has already been done upfront, so why perform a sort inside SSIS? Next, I cannot reliably use a “Merge Join” because the key fields won’t match due to trailing spaces coming from the Flat File Source, so I have to trim them.
With that said, both sources have their IsSorted property set to true:
The Flat File source is setup similarly. The key fields under OLE DB Source Output -> Output Columns have their SortKeyPosition set accordingly:
From here, I could hook up both sources to the merge join no problem. However, as stated before, I can’t reliably do that because of the trailing spaces on the key field from the flat file source. Adding the derived column to perform a TRIM() on the key flat file field, will result in the merge join not working any more as a result of the IsSorted property is now “False.”
In talking with Jamie, he states that this is logical. Because I am REPLACING the key field with its trimmed version, SSIS *MUST* assume that the value has changed and hence is no longer sorted.
So, how do I get around this? Well, two ways really. I could take my flat file processing out and perform that in another data flow to trim the fields and store the results in a second, flat (or raw) file (or even a staging table). From there, I could just read in that second file and go straight into the merge join. The second option I have is to include the “Sort Transformation” back into the dataflow, between the derived column and the merge join transformations. I didn’t want to do this because of buffer reasons, so from my perspective, I may go with the approach of splitting the flat file processing off and into its own data flow.
Thanks, Jamie, for your feedback today.