Archive for April, 2009

Here I would like to show you a solution for following problem.
One source row coming from Excel contains several values within one column. Each one of these values should be reflected in a separate row in the data destination.
So the basic problem is that there we have a “list column” whose contents should be result in separate rows.

To make the problem more complex: The number of values in the “list column” is variable and there is a second column which contains corresponding values.

Excel Source

Excel Sheet

Excel Sheet

solution: SSIS Script Component

To implement a solution for the problem we use the most adequate item – the Script Component.

  1. We start with a data flow task in our package and add an OLE DB Excel Source component (that returns the source data above).
    We can use following SQL statement:

    SELECT
    F1,F2,F3
    FROM
    [sheet1$B1:D5]

    The imported columns can be named as KEY, PERSON, DEPARTMENT.
    Excel Source

    Excel Source

  2. Then we add a Derived Column Component to the data flow as well.
    Derived Column

    Derived Column

    In order to replace the CRs with commas we replace the imported columns with following formula:

    REPLACE(REPLACE(REPLACE(REPLACE(PERSON,"\n",",")," ",""),",,",","),",,",",")
    REPLACE(REPLACE(REPLACE(REPLACE(DEPARTMENT,"\n",",")," ",""),",,",","),",,",",")

    Derived Column Settings

    Derived Column Settings

  3. As result we receive following list of elements:

    • – -
    • row 1
    • PERSON: Mr. Wayne,Mr. Kent,Mr. Parker
    • DEPARTMENT: Heros/Batman,Heros/Superman,Heros/Spiderman
    • – -
    • row 2
    • PERSON: Mr. Kirk,Ms. Janeway
    • DEPARTMENT: StarTrek/Classic,StartTrek/Voyager
    • – -
    • row 3
    • PERSON: “Mr. Wrong”
    • DEPARTMENT: Wrong entry 1,Wrong entry 2
    • – -
  4. Now we add a Script Component to the data flow.
    When we are prompted to select the type of the component we choose Transformation.

    Script Component

    Script Component

  5. Then we right-click on the Script Component and choose Edit. In the Script Transformation Editor dialog box we select all three columns to make them available as input columns. Now these column are available in the Script code.
    Script Component Columns

    Script Component Columns

  6. Since the tranformation is going to produce more output rows than it receives input rows we have to mark the transformation as asynchronous. This is done by setting the SynchronousInputID property of the output to None.
    Script Component Settings

    Script Component Settings

  7. Next, we have to specify the output columns that the output buffer will contain. So we select the transformation’s output and click Add Column and set the properties (column name and data type).
    Script Component Input / Output rows

    Script Component Input / Output rows

  8. Now we can add the code by clicking on the Design Script button.
    Script Component - Design Script

    Script Component - Design Script

  9. Following Code is splitting our list of items into rows:

    ' Variables
    Dim i As Integer = 0
    Dim itemList_PERSON As String = Row.PERSON
    Dim itemList_DEPARTMENT As String = Row.DEPARTMENT
    Dim delimiter As String = ','
    If Not (String.IsNullOrEmpty(itemList_PERSON) Or String.IsNullOrEmpty(itemList_DEPARTMENT)) Then
    Dim inputListArray_Wer() As String = itemList_PERSON.Split(New String() {delimiter}, StringSplitOptions.RemoveEmptyEntries)
    Dim inputListArray_Abt() As String = itemList_DEPARTMENT.Split(New String() {delimiter}, StringSplitOptions.RemoveEmptyEntries)
    If (inputListArray_Wer.Length = inputListArray_Abt.Length) Then
    ' new rows
    For Each item As String In inputListArray_Wer
    With Output0Buffer
    .AddRow()
    .KEY = CInt(Row.KEY)
    .PERSON = item
    .DEPARTMENT = inputListArray_Abt(i).ToString()
    End With
    ' increase counter
    i = i + 1
    Next
    Else
    ' Filtered Rows
    With Output1Buffer
    .AddRow()
    .KEY = CInt(Row.KEY)
    .PERSON = Row.PERSON
    .DEPARTMENT = Row.DEPARTMENT
    End With
    End If
    End If

    The code shows that there are two outputs. Whenever a record set cannot be split or the columns PERSON and DEPARTMENT have a different number of itemss the rows is redirected to the second output (here Output1).
  10. In order to receive the output from the Script Component we add a Union All Component. Additionally we add two Data Viewer Components, since we want to see the resulting outputs from the Script Component.
    Union ALL

    Union ALL

  11. Running the SSIS package will bring the following result.
    Data Flow completed

    Data Flow completed

  12. As we can see, the first DataViewer Component show us the valid rows produced by our Script Component.
    Data Viewer valid rows

    Data Viewer valid rows

  13. The second DataViewer Component show us the invalid rows produced by our Script Component.
    Data Viewer invalid rows

    Data Viewer invalid rows

download
Of course I provide you the solution as download.

problem
If I have a column with several values (separated by CR) which I need separated into rows then following solution can help:

possible solution: T-SQL
One possible solution using T-SQL is following statement.
The function fn_Split returns all values in a list and the replace function converts the CR (carriage returns) to commas.


SELECT
[Key]
,[b.Value]
,[c.Value]
FROM
[dbo.TableExcel] AS a
CROSS APPLY
fn_Split(REPLACE(a.Persons, CHAR(10)+CHAR(13),','), ',') AS b
CROSS APPLY
fn_Split(REPLACE(a.Departments, CHAR(10)+CHAR(13),','), ',') AS c

The corresponding fn_Split function:

CREATE FUNCTION [dbo].[fn_Split](@text nvarchar(max), @delimiter char(1) = ' ')
RETURNS @Strings TABLE (
position int IDENTITY PRIMARY KEY,
value nvarchar(max)
)
AS
BEGIN
DECLARE @index int
SET @index = -1
WHILE (LEN(@text) > 0)
BEGIN
SET @index = CHARINDEX(@delimiter , @text)
IF (@index = 0) AND (LEN(@text) > 0)
BEGIN
INSERT INTO @Strings VALUES (@text)
BREAK
END
IF (@index > 1)
BEGIN
INSERT INTO @Strings
VALUES (LEFT(@text, @index - 1))
SET @text = RIGHT(@text, (LEN(@text) - @index))
END
ELSE
SET @text = RIGHT(@text, (LEN(@text) - @index))
END
RETURN
END

When Excel data is not coming into SSIS right, then do the following:

  1. use option “IMEX=1” in ConnectionString (Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\..\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1")
  2. modify registry setting “TypeGuessRows=8” to a much higher value
  3. use a sql statement to retrieve data (e.g. “SELECT * FROM [sheet1$A5:M2000]“)

NOTE: The setting IMEX=1 tells the Excel driver to use Import mode. This mode reads the registry setting “ImportMixedTypes=Text” which forces mixed data to be converted to text. To achieve a more reliable column type recognition, the registry setting “TypeGuessRows=8” should be increased. By default the ISAM driver analyzes  the first eight rows and determines the column datatypes from this sampling. If the first analysed rows contain only numeric values, then setting IMEX=1 will not convert this column datatype to Text – it will remain numeric.

ATTENTION: The IMEX setting has some other modes:

  • 0 is Export mode
  • 1 is Import mode
  • 2 is Linked mode (full update capabilities)

Registry Setting Location:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\X.X\Engines\Excel

supplement
If you experience following error the above described solution could fix it.
1. An OLE DB error has occurred. Error code: 0x80040E21.
2. Failed to retrieve long data for column “XYZ”.
3. There was an error with output column “XYZ” (55488) on output “Excel Source Output” (109). The column status returned was: “DBSTATUS_UNAVAILABLE”.
4. The “output column “XYZ” (55488)” failed because error code 0xC0209071 occurred, and the error row disposition on “output column “XYZ” (55488)” specifies failure on error. An error occurred on the specified object of the specified component.
5. [DTS.Pipeline] Error: The PrimeOutput method on component “Info Source” (101) returned error code 0xC0209029. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.
6. [DTS.Pipeline] Error: Thread “SourceThread0″ has exited with error code 0xC0047038.