Often ETL design is realized using set-based data processing (SQL-Statements). Especially developers with a database professional background are induced to extract, load and transform (ETL) data using Stored Procedures. On the other side there are advantages of record-based data processing.

The following slides give an overview and recommendation about usage scenarios. There are some tips which should be considered when comparing set-based operations and record-based operations.

T-SQL-Statements: Advantages

  • SQL will outperform data-flows (if data is transferred table-to-table on the same server)
  • Using transactions is simple
  • SQL (Stored Procedures) can be debugged
  • Set-based operation vs. record-based operation (seems to be more performing)
  • SQL tools support statement generation
  • Stored Procedures are supported by internal Optimization Engine (execution plan)
  • DB experts can design data processing tasks
  • Optimized algorithm for Sorting / Grouping statements (e.g. GROUP BY, DISTINCT, ORDER BY)
Data-Flows: Advantages
  • Data from heterogeneous sources (e.g. non-SQL-sources)

  • Data transfer machine-to-machine
  • Data transforms are flexible, transparent
    (SSIS: buffers for heavy operations / T-SQL: temporary tables, cursors)
  • Broad transformation functionality (fuzzy logic, aggregations, derived columns, scripts)
  • Easier handling of „bad“ data (different data sinks, exception handling)
  • .Net Data providers as alternative to OLE DB Providers (e.g. mySAP .Net Data Provider)
  • Visual representation of work flow / data flow à better maintenance, understanding of data-flows
    (since they are a bit self-documenting)
  • Parallel execution / step-by-step execution (Checkpoints) of data-flows
  • Easier debugging using events („component <component name> (xxxx) wrote xxxx rows“)
  • Easier handling of different data types (conversion, …)
Performance Tips

Bulk operation vs. OLE DB Command Transformation
  • Use bulk updates instead of single statement updates
  • OLE DB Command with parametrized query usually slow (no bulk operation)
  • Load changed records to staging table and perform set-based update with a stored procedure
  • Prefer usage of set-based updates (stored procedures) over OLE DB Command

Load only changed rows: “Delta Detection”

  • If possible extract and load only a delta set

Eliminating Unnecessary Work

  • Data columns and rows need space within the buffer (performance)
  • Only extract and process needed data
  • Use optimized SQL Statements for data extraction (up-to-date statistics required)

Job dependencies

  • Dependencies among ETL jobs (e.g. job “B” cannot start while job “A” is not finished)
  • Master packages should be used with maximum degree of parallelism
  • Implement parallel execution (MaxEngineThreads, MaxConcurrentExecutables)

Asynchronous transformations (bottlenecks)

  • slowest parts within ETL process
  • Asynchronous operations (block data flow) vs. synchronous operations (row by row basis)
  • E.g. Sort Transformation and Aggregate Transformation (ok for small data sets)
  • Blocking transformations should be avoided wherever possible
  • Merge Join: delivered data should be pre-sorted (by stored procedures or select queries)
  • Prefer stored procedures to inline selects and ensure that join keys are indexed
    (then internal DB Optimizing Engine is used)

Slow database operations

  • Reason: concurrency, integrity checks, indexes, triggers
  • Data processing outside database (better performance)
  • Disable integrity checking (disable constraint) before load in target database
  • Disable DB triggers before load in target database
  • Drop indexes before load and recreate them after (e.g. SQL: drop index / create index)
  • Parallel bulk operations (if table is partitioned or there are no indexes)
    Note: parallel load attempts into same table/partition can cause locks (on data rows or indexes)

Cross-DB Joins

  • Slow down ETL extracts considerably
  • Use staging tables (schemas should be used to separate staging areas)

Transaction Protocol

  • Disable Transaction Protocol (not convenient for OLAP)

Using SQL Statement to retrieve data

  • Avoid using table or view access mode in OLE DB Source
    (otherwise a row set is opened for both to retrieve column meta data and the rows)
  • Use SQL Statement: 10times faster
    (SQL Statements use internal DB Optimizer, but SSIS cannot optimize statements)

Heavy data load operations

  • Use staging tables and SSIS Checkpoints (only available for Control Flow)
  • Avoid buffer swapping (slow) to disk (use performance counters to monitor)
  • Reduce columns covered by Lookup transformation to the minimal possible set (natural key and surrogate key)
  • Use Merge Join in preference to Lookup transformation when possible

Auditing and Logging

  • Use event log for exception and data quality handling
  • Use validations (e.g. no. of extracts equals no. of inserts)

Conclusion

  • Generally use record-based operations (ETL-Tools)

  • Intelligent usage of set-based operations (e.g. for bulk operations: table-to-table)
  • No data transformations within set-based-operations
  • Sorting / Aggregation statements should be done in set-based operations (at source extraction)
  • Staging tables should be used
  • Asynchronous operations (Sorting, Aggregating) should be avoided when possible
  • Consequently eliminate unnecessary work
  • Only extract needed data from sources
  • Try to pre-sort Merge Join input flows

Slides (Download)

The IsNumeric function has a very variable interpretation algorithm. In order to get only numeric values following statement should be used:


CASE WHEN (NUM_VALUE LIKE '%[0-9]%' AND ISNUMERIC(NUM_VALUE) = 1) THEN CAST(NUM_VALUE AS FLOAT) ELSE NULL END AS NUM_VALUE

How to add a Excel Export Link to a Report
Add an item (e.g. TextBox, Image, etc.) and open the properties dialog. Under the navigation tab use the setting Go to URL and add following comand.


=Globals!ReportServerUrl.ToString() + "/Pages/ReportViewer.aspx?"
+ System.Web.HttpUtility.UrlEncode("https://ReportServer/ReportFolder/Report.rdl")
+ "&rs:Command=Render&rs:Format=EXCEL&rc:OmitFormulas=true"
+ "&[Parameter_Name1]=" + Join(Parameters![Parameter_Name1].Value, "&[Parameter_Name1]=") --> Multivalue Parameter
+ "&[Parameter_Name2]=" + System.Web.HttpUtility.UrlEncode(Parameters![Parameter_Name2].Value) --> Normaler Paraemter

It is important to use the HttpUtility.UrlEncode() function in order to pass the correct values.
In the properties dialog of the report add a reference to the System.Web assembly.

There are two options executing SSIS packages:

- xp_cmdshell command (not recommended for security reasons)
- sp_start_job command

The main difference between both options is the execution method. The xp_cmdshell command is a synchronous call and the sp_start_job command is an asynchronous call.

xp_cmdshell command (synchronous)

  • enable xp_cmdshell mode in Surface Area Configuration Tool for SQL Server
  • use following procedure to execute SSIS package:

    DECLARE @returncode int
    EXEC @returncode = xp_cmdshell 'dtexec /f "PackageNameWithFullPath.dtsx"'

sp_start_job (asynchronous)

  • define job in SQL Server Agent with SSIS execution step
  • use following procedure to start SQL Server Agent job with SSIS execution step:

    [msdb].dbo.sp_start_job @job_name='JobName'
  • check execution status of job, since the call is asynchronous:

    SELECT
    [server],
    [start_execution_date],
    [stop_execution_date],
    [run_date],
    [run_duration],
    [run_status],
    [message]
    FROM
    MSDB.DBO.SYSJOBS Z
    INNER JOIN
    MSDB.DBO.SYSJOBACTIVITY A
    ON Z.JOB_ID = A.JOB_ID
    INNER JOIN
    (
    SELECT
    MAX(SESSION_ID) AS SESSION_ID
    FROM
    MSDB.DBO.SYSSESSIONS
    ) AS B
    ON A.SESSION_ID = B.SESSION_ID
    LEFT JOIN
    MSDB.DBO.SYSJOBHISTORY C
    ON A.JOB_HISTORY_ID = C.INSTANCE_ID
    WHERE
    Z.NAME = 'JobName'

There is following way to make the call of sp_start_job synchronous.

sp_start_job (synchronous)

  • define job in SQL Server Agent with SSIS execution step
  • use following procedure to start SQL Server Agent job with SSIS execution step:

    ALTER PROCEDURE [dbo].[AGENT_JOB_CHECK2]
    -- Add the parameters for the stored procedure here
    DECLARE @job_name nvarchar(100)='',
    DECLARE @maxwaitmins int = 5
    AS
    BEGIN
    set NOCOUNT ON;
    set XACT_ABORT ON;
    BEGIN TRY
    declare @running as int
    declare @seccount as int
    declare @maxseccount as int
    set @maxseccount = 60*@maxwaitmins
    set @seccount = 0
    set @running = 0
    declare @job_owner sysname
    declare @job_id UNIQUEIDENTIFIER
    set @job_owner = SUSER_SNAME()
    -- get job id
    select @job_id=job_id
    from msdb.dbo.sysjobs sj
    where sj.name=@job_name
    -- invalid job name then exit with an error
    if @job_id is null
    RAISERROR (N'Unknown job: %s.', 16, 1, @job_name)
    -- output from stored procedure xp_sqlagent_enum_jobs is captured in the following table
    declare @xp_results TABLE ( job_id UNIQUEIDENTIFIER NOT NULL,
    last_run_date INT NOT NULL,
    last_run_time INT NOT NULL,
    next_run_date INT NOT NULL,
    next_run_time INT NOT NULL,
    next_run_schedule_id INT NOT NULL,
    requested_to_run INT NOT NULL, -- BOOL
    request_source INT NOT NULL,
    request_source_id sysname COLLATE database_default NULL,
    running INT NOT NULL, -- BOOL
    current_step INT NOT NULL,
    current_retry_attempt INT NOT NULL,
    job_state INT NOT NULL)
    -- start the job
    declare @r as int
    exec @r = msdb..sp_start_job @job_name
    -- quit if unable to start
    if @r<>0
    RAISERROR (N'Could not start job: %s.', 16, 2, @job_name)
    -- start with an initial delay to allow the job to appear in the job list (maybe I am missing something ?)
    WAITFOR DELAY '0:0:10';
    set @seccount = 10
    -- check job run state
    insert into @xp_results
    execute master.dbo.xp_sqlagent_enum_jobs 1, @job_owner, @job_id
    set @running= (SELECT top 1 running from @xp_results)
    while @running<>0 and @seccount < @maxseccount
    begin
    WAITFOR DELAY '0:0:10';
    set @seccount = @seccount + 10
    delete from @xp_results
    insert into @xp_results
    execute master.dbo.xp_sqlagent_enum_jobs 1, @job_owner, @job_id
    set @running= (SELECT top 1 running from @xp_results)
    end
    -- result: query
    SELECT
    [server],
    [start_execution_date],
    [stop_execution_date],
    [run_date],
    [run_duration],
    [run_status], -- 0: failed, 1: success, null: running
    [message]
    FROM
    MSDB.DBO.SYSJOBS Z
    INNER JOIN
    MSDB.DBO.SYSJOBACTIVITY A
    ON Z.JOB_ID = A.JOB_ID
    INNER JOIN
    (
    SELECT
    MAX(SESSION_ID) AS SESSION_ID
    FROM
    MSDB.DBO.SYSSESSIONS
    ) AS B
    ON A.SESSION_ID = B.SESSION_ID
    LEFT JOIN
    MSDB.DBO.SYSJOBHISTORY C
    ON A.JOB_HISTORY_ID = C.INSTANCE_ID
    WHERE Z.NAME = @job_name
    -- result: not ok (=1) if still running
    --if @running <> 0
    --return 0
    --else
    --return 1
    END TRY
    BEGIN CATCH
    DECLARE
    @ErrorMessage NVARCHAR(4000),
    @ErrorNumber INT,
    @ErrorSeverity INT,
    @ErrorState INT,
    @ErrorLine INT,
    @ErrorProcedure NVARCHAR(200);
    SELECT
    @ErrorNumber = ERROR_NUMBER(),
    @ErrorSeverity = ERROR_SEVERITY(),
    @ErrorState = ERROR_STATE(),
    @ErrorLine = ERROR_LINE(),
    @ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-');
    SELECT @ErrorMessage =
    N'Error %d, Level %d, State %d, Procedure %s, Line %d, ' +
    'Message: '+ ERROR_MESSAGE();
    RAISERROR
    (
    @ErrorMessage,
    @ErrorSeverity,
    1,
    @ErrorNumber, -- original error number.
    @ErrorSeverity, -- original error severity.
    @ErrorState, -- original error state.
    @ErrorProcedure, -- original error procedure name.
    @ErrorLine -- original error line number.
    );
    END CATCH
    END

[Source: http://blog.boxedbits.com/archives/124]

Kurzanleitung für Mercedes W123: Lüftungsgitter ausbauen und wechseln

  • Halteklammern lösen: Dazu die Spreizer in der Mitte mit einem kleinen Schraubendreher durch drücken (siehe Bild 1). Diese fallen dann in den Kasten und müssen nach Abnahme des Gitters eingesammelt werden. Es sind bei beiden Seiten je vier Klammern zu lösen.
    W123 Luftgitter: Bild 1

    W123 Luftgitter: Bild 1

  • linken Scheibenwischer abmontieren: Dazu Kappe am Scheibenwischer hochklappen und dann die Schraube lösen. Den Scheibenwischer hochklappen und mit leichten Drehbewegungen abziehen. Achtung: die vorhandene Unterlegscheibe und Mutter aufheben. Die Gummimuffe (siehe großer Kreis) muss ebenfalls abgezogen werden (siehe Bild 2).
    W123 Luftgitter: Bild 2

    W123 Luftgitter: Bild 2

  • Abnahme der Lüftungsgitter: Das Lüftungsgitter mit leichten Hebelbewegungen abziehen und noch vorhandene Klammern einsammeln. Die durchgefallenen Spreizer einsammeln. Es müssen am Ende je Gitter vier Klammern und vier Spreizer vorhanden sein.
  • neue Lüftungsgitter einsetzen: Die neuen Lüftungsgitter einsetzen (müssen leicht einrasten) und je Gitter die vier Halteklammern einsetzen. In jede Halteklammer den Spreitzer einsetzen und eben drücken (etwas Kraft ist notwendig).
    W123 Luftgitter: Bild 3

    W123 Luftgitter: Bild 3

  • linken Scheibenwischer montieren: Zuerst die Gummimuffe einsetzen und dann den Scheibenwischer aufsetzen. Darauf wieder die Scheibe und die Mutter montieren.
    W123 Luftgitter: Bild 4

    W123 Luftgitter: Bild 4

In the following you find a compact catalog of steps which should be considered in a SQL Server environment.
It is a conclusion of information from different sources.

Check List
Installation & Patching

  • Are only required components installed? Additional components should be installed when required.
  • As Service Account a Local or Domain Account should be rather used than a System Account.
  • A separate account for each service should be used.
  • Service accounts should use standard privileges (no special). The privileges are applied per Group Membership (SQL Server supplied group accounts).
  • Service Accounts should be applied with SQL Server Configuration Manager.
  • Always the most recent updates should be used.
  • Do not apply automatic updates. Rather test them before on a test environment.

Configuration: Surface Area Reduction

  • Surface Area Configuration Tool: only required features should be enabled.
  • Only Required Services should be set to Auto startup. Others should be set to Manual startup. Also consider to set some Services to Disabled startup.
  • If not needed supported Network protocols should be disabled in Configuration Manager.
  • If not needed, SQL Server should not be exposed to all networks (e.g. the Internet).
  • For some Configurations it is easier if named instances use fixed ports, rather than dynamic ports.
  • Surface Area Configuration Tool: the xp_cmdshell should be disabled unless it is absolutely needed.
  • Surface Area Configuration Tool: COM components should be disabled once all COM components have been converted to SQLCLR.
  • Surface Area Configuration Tool: both mail procedures (database mail and SQL mail) should be disabled unless it required to send mails from SQL Server. Rather use database mail.
  • Surface Area Configuration Tool: standard policy should be enforced for extended procedure usage.
  • No system stored procedure should be removed by dropping it.
  • User or Administrator access should not be denied to the extended procedures by using DENY all.
  • Surface Area Configuration Tool: All exceptions to the above statements should be documented here.

SQL Server Agent

  • In order to apply special privileges to Jobs, Credentials should be used, instead of adjusting the privileges of SQL Server Agent service account.
  • In order to execute a Job with different Windows credentials, Proxy accounts should be used.

Security: Authentication (general)

  • Always Windows Authentication should be used. Mixed Authentication should be used only for Non-Windows accounts or incompatible applications.
  • The sa account should have a strong and known password.
  • If possible, the sa account should be renamed.
  • The SQL Server Management should not be done using the sa account. Rather use a known login and assign the sysadmin privilege to it.

Security: Authentication (administrative access)

  • Use administrative privileges only when needed to access SQL Server.
  • The number of administrators should be as low as possible.
  • Administrative logins should not be linked to the BuiltIn\Administrators group.

Security: Authorisation

  • Permissions on DB objects should be managed using Database roles or Windows groups.
  • Guest access should not be enabled.

Security: Scanner Tools

  • Use Microsoft Best Practices Analyser to identiy possible weak points in SQL Server 2005.
  • Use Microsoft Baseline Security Analyser to identify weak points in your operating System.

Security: Password Policy

  • Set up a strong password policy, including an expiration and a complexity policy. E.g. use the company password policy.
  • If using SQL logins, SQL Server should run on Windows Server 2003 OS. Also password policies should be mandatory.
  • Application should support the change procedure for SQL Login passwords.
  • New SQL Logins should have the property MUST_CHANGE_ON_FIRST_ACCESS on true.
  • Security on Database objects should be managed by using schemas.

Database Properties

  • Databases should have distinct owners. The sa user should not own all databases.
  • The number of owners per Database should be as low as possible.

Schema Properties

  • Similar SQL objects should be grouped into the same schema.
  • Schemas should not be owned only by dbo.
  • The number of schema owners should be minimized.

Catalog Views

  • Catalog views are secure by default, so no additional action is required to secure them.

Remote Data Source Execution

  • Instead of Remote server definitions rather Linked Servers should be used.
  • Ad hoc queries through Linked Servers should be disabled if not needed.

Execution Context

  • Rather use EXECUTE AS instead of SETUSER.
  • Rather use WITH NO REVERT/COOKIE instead of Application Roles.

Data Encryption

  • Information which is classified as secret (or confidential) should be encrypted.
  • Data encryption should be done with symmetric keys which are protected by using asymmetric keys or certificates.
  • Keys should be password-protected and the master key encryption should be removed for the most secure configuration.
  • The service master key, database master keys, and certificates should be backed-up by using the key-specific DDL statements.

Auditing

  • The amount of auditing data (detail level) should be project specific.
  • C2 auditing should be enabled only if explicitly needed.
  • DDL and specific server events should be audited by using trace events or event notifications.
  • DML must be audited by using trace events.
  • WMI should ber used to be alerted of emergency events.

Fazit
My recommendation is to use following Test Protocol in order to check the several steps.

Download: SQL Server: Check List Best Practices

Wireshark Logo

Wireshark Logo

Wireshark is the successor application of Ethereal. Wireshark is a free (GNU General Public License) network traffic analyser (packet sniffer).
This tool is very useful to test encrypted connections of own applications or to analyse active connections and data flows on the own desktop.

Here is an application screenshot:
The application layout consists of 3 parts (windows) – packet list, packet details and packet bytes.

Wireshark Application Screenshot

Wireshark Application Screenshot

A sample ICQ data flow:
This data flow contains the message: “Gratulation zum Motoradführerschein :)

ICQ Data Flow

ICQ Data Flow

Supported Systems:
Wireshark runs on Unix-like systems (e.g. Linux, Solaris, HP-UX, FreeBSD, NetBSD, OpenBSD and Mac OS X) and on Microsoft Windows.

[Source: Wireshark]

Linked SQL Server (applies to SQL Server 2005 / SQL Server 2008 / later)
It can happen that an application may need data stored in second database which is not located on the same instance of SQL Server. That instance could be on the same physical machine or it could be on another machine. There are many solutions to this scenario, but it depends on your requirements which solution you should choose. If there is no need to transport a huge amount of data between the two SQL Server instances then you should consider to register the second instance as Linked SQL Server into the first instance.

The advantage is that the Linked Server can provide seamlessly data access to the application. By the way, Linked Servers can be also used to register Non-SQL-Server instances.

Basic Architecture Example
The following illustration shows a basic architecture sample. As you can see the Client is only accessing the Application Server and the Application Server is only accessing the first SQL Server.

Linked SQL Server: Basic Architecture

Linked SQL Server: Basic Architecture

This means the Application Server is submitting only one query to the first SQL Server instance and gets the result back only from the the same SQL Server instance. The data retrieval from the second SQL Server instance is processed by the first SQL Server instance.
More detailed: If the application running on the Application Server submits a query to the frist SQL Server instance that needs to retrieve data from a database hosted on the second SQL Server instance, then a so called Distributed Query is fired. An Example for such a query:


SELECT SERVER2.[DB].[SCHEMA].[TABLE].[FIELD]
FROM SERVER2.[DB]

In this sample the SERVER2 (= Linked Server Registration name on SERVER1) identifies that I am starting a distributed query.

How to setup the Linked Server
You can register a Linked Server instance within Management Studio as the following illustration shows.

Linked SQL Server: Management Studio

Linked SQL Server: Management Studio

For a greater flexibility I recommend to use the stored procedure.

EXEC master.dbo.sp_addlinkedserver
@server = N'REGISTRATION_NAME',
@srvproduct=N'',
@provider=N'SQLNCLI',
@datasrc=N'SERVER\INSTANCE',
@provstr=N'',
@catalog = 'DATABASE_NAME'

Linked Server Options: Server Options
Following Server options are available:

  • Collation Compatible: is used to identify whether or not the Linked Server has the same collation as the Local Server, should be set only the to true if you know for sure that both instances have the same collation.
  • Data Access: is used allow data access on the linked server, should be true. This option can be used to disable a Linked Server temporally.
  • RPC: is used to allow remote procedures calls from the Linked Server.
  • RPC out: is used to allow remote procedures calls to the Linked Server (procedures which are defined on the second instance).
  • Use Remote Collation: is used to specify that the collation setting from the remote instance is used.
  • Collation Name: is to specify the collation setting of the Linked Server.
  • Connection Timeout: is used to specify a time the Local Server waits to obtain a connection to the Linked Server (Zero used the remote instance setting).
  • Query Timeout: is used to specify a time the Local Server waits to retrieve data from the Linked Server (Zero used the remote instance setting).

Linked Server Options: Security

  • Login mapping: is a way to associate a login on the Local Server, with a login on the Remote Server. There are two options – Impersonate login and SQL login.
  • Impersonate login: uses a local Windows login and uses it to connect to the Linked Server, by impersonating it. For this login delegation has to be enabled between the two Servers!
  • SQL login: is used by associating a local login with a remote login and password. The remote login needs to be a SQL Server Authenticated user on the remote server.

For logins which are not defined in the mapping you can specify the behavior for the connection to the linked server. There are four different options that are available by choosing the corresponding radio button.

  • Not be made: any users who are not added to the mapping list cannot connect to the Linked Server.
  • Be made without using a security context: can be used for Data Sources which do not require any authentication. E.g. text files.
  • Be made using Login’s current security context: the Windows account of the current login is used to connect to the Linked Server. The Local Server has to be able to impersonate the corresponding local account. This option is a simple way to specify that all Windows accounts are able to connect to the Linked Server, without mapping each login.
  • Be made with this security context: specifies that all logins connecting to the Linked Server are using a single remote login and password (SQL Server Authenticated login).

Encryption of Connection stream
SQL Server is able to encrypt the connection between Client and Server. Since SQL Server 2005 you do not necessarily need to use SSL certificates. However, it makes sense to have certificates, because the Client can not check the identity of the Server. By the way, if you use a certificate which was generated from the Server itself then it will not be trustful.

The Login packets are always encrypted since SQL Server 2005, if the Client supports it. Encrypted communication between Client and Server is supported since MDAC 2.6 and third party Client tools should be ckecked. It is important to know that encryption takes additional CPU time and that encrypted data streams cannot be compressed anymore.

The encryption option can be activated in the Configuration Manager (SQL Server Network Cobnfiguration). Choosing the Properties dialog for an Instance you can set the option “Force Encryption” to true.
With this option set to true the SQL Server instance only accepts encrypted connections. If you want to have both connection options (encrypted / non-encrypted) then you should not set this option to true.

If you do not choose a certificate (under certificate tab) then SQL Server 2005 (or later) generates a self-signed certificate. This means that the communication between Client and Server is encrypted and that spoofing of data is not possible anymore.

Encryption of Linked Server Connection Stream
To enable the encryption of a Linked Server connection stream you should use the provider string option “Encrypted=YES”. The following stored procedure includes this option.


EXEC master.dbo.sp_addlinkedserver
@server = N'REGISTRATION_NAME',
@srvproduct=N'',
@provider=N'SQLNCLI',
@datasrc=N'SERVER\INSTANCE',
@provstr=N'Encrypt=yes;',
@catalog = 'DATABASE_NAME'

The Linked SQL Server connection is now encrypted using a self-signed certificate or if available a Server certificate.

How to check encrypted connection for SQL Server
I recommend the tool Wireshark (former known as Ethereal) to check the packages and look into the data. Just run this tool, capture a stream and fire a SELECT statement.

Conclusion
Registering another SQL Server instance as Linked Server allows you to submit T-SQL statements on one SQL Server instance, which retrieves data from a second instance. Moreover it is possible to fire linked statements so that you can join data between several instances.

Manchmal ist es schwierig, jemanden den Geschmack von Wein zu beschreiben. Eine coole Suchmaschine Weinsuchmaschine hilft dabei: aromicon.

aromicon ist eine Geschmackssuchmaschine für Wein. Sie können Weine nach den Suchkriterien Wunscharomen, passende Speisen, Süße, Rebsorte, Tannine und Region anzeigen lassen. Die Suche erfolgt in tausenden von Weinen aus zahlreichen Onlineshops.

[Quelle: http://www.aromicon.com]

If you experience following error “Executed as user: MyDomain\SQLServer. The package execution failed. The step failed.”
then check that your Service User for the SQL Server Agent has full file permissions to all packages, config files and log files!
Moreover this service user should be member of the SQL Server DTS Group.

Importing Date values e.g. from Flat Files it can happen that the format of the date is different to the Host System date formatting of the regional settings. Especial the date setting for English (UK: day/month/year) and English (US: month/day/year) are often very problematic.

In SSIS a solution for this issue can be realized with a ScriptComponent as follows.

  1. Date column
    First of all you should import your Date column as nvarchar type. In this example let us assume we are going to import a Birthday column in Germany format on a host system with US-based regional setting.
  2. Script Component: Columns
    We specify as import column the BIRTHDAY_STR column (nvarchar) from the Flat File and as export column we specify a column of type date named BIRTHDAY_DATE.
  3. ScriptComponent: Code
    In the Script we are using the TryParse method which is stable and flexible enough. But in order to import the German date we have to create a culture object.


    Dim dt As DateTime
    Dim culture As Globalization.CultureInfo
    culture = Globalization.CultureInfo.CreateSpecificCulture("de-DE")
    Try
    If DateTime.TryParse(CStr(Row.BIRTHDAY_STR), culture, Globalization.DateTimeStyles.None, dt) Then
    Row.BIRTHDAY_DATE = dt
    Else
    Row.BIRTHDAY_DATE_IsNull = True
    End If
    Catch ex As Exception
    Row.BIRTHDAY_DATE_IsNull = True
    End Try

This example should help you to import the dates of different regional settings.

Traurig aber wahr: Michael Jackson ist verblasst.
Seine Musik begleitet uns aber weiter und momentan spiegelt sich dies stark in den Verkauscharts (z.B. bei Amazon) wieder.

Amazon Top 10 Bestsellerliste

Amazon Top 10 Bestsellerliste

[Quelle: Amazon.de]

Here I would like to show you a solution for following problem.
One source row coming from Excel contains several values within one column. Each one of these values should be reflected in a separate row in the data destination.
So the basic problem is that there we have a “list column” whose contents should be result in separate rows.

To make the problem more complex: The number of values in the “list column” is variable and there is a second column which contains corresponding values.

Excel Source

Excel Sheet

Excel Sheet

solution: SSIS Script Component

To implement a solution for the problem we use the most adequate item – the Script Component.

  1. We start with a data flow task in our package and add an OLE DB Excel Source component (that returns the source data above).
    We can use following SQL statement:

    SELECT
    F1,F2,F3
    FROM
    [sheet1$B1:D5]

    The imported columns can be named as KEY, PERSON, DEPARTMENT.
    Excel Source

    Excel Source

  2. Then we add a Derived Column Component to the data flow as well.
    Derived Column

    Derived Column

    In order to replace the CRs with commas we replace the imported columns with following formula:

    REPLACE(REPLACE(REPLACE(REPLACE(PERSON,"\n",",")," ",""),",,",","),",,",",")
    REPLACE(REPLACE(REPLACE(REPLACE(DEPARTMENT,"\n",",")," ",""),",,",","),",,",",")

    Derived Column Settings

    Derived Column Settings

  3. As result we receive following list of elements:

    • – -
    • row 1
    • PERSON: Mr. Wayne,Mr. Kent,Mr. Parker
    • DEPARTMENT: Heros/Batman,Heros/Superman,Heros/Spiderman
    • – -
    • row 2
    • PERSON: Mr. Kirk,Ms. Janeway
    • DEPARTMENT: StarTrek/Classic,StartTrek/Voyager
    • – -
    • row 3
    • PERSON: “Mr. Wrong”
    • DEPARTMENT: Wrong entry 1,Wrong entry 2
    • – -
  4. Now we add a Script Component to the data flow.
    When we are prompted to select the type of the component we choose Transformation.

    Script Component

    Script Component

  5. Then we right-click on the Script Component and choose Edit. In the Script Transformation Editor dialog box we select all three columns to make them available as input columns. Now these column are available in the Script code.
    Script Component Columns

    Script Component Columns

  6. Since the tranformation is going to produce more output rows than it receives input rows we have to mark the transformation as asynchronous. This is done by setting the SynchronousInputID property of the output to None.
    Script Component Settings

    Script Component Settings

  7. Next, we have to specify the output columns that the output buffer will contain. So we select the transformation’s output and click Add Column and set the properties (column name and data type).
    Script Component Input / Output rows

    Script Component Input / Output rows

  8. Now we can add the code by clicking on the Design Script button.
    Script Component - Design Script

    Script Component - Design Script

  9. Following Code is splitting our list of items into rows:

    ' Variables
    Dim i As Integer = 0
    Dim itemList_PERSON As String = Row.PERSON
    Dim itemList_DEPARTMENT As String = Row.DEPARTMENT
    Dim delimiter As String = ','
    If Not (String.IsNullOrEmpty(itemList_PERSON) Or String.IsNullOrEmpty(itemList_DEPARTMENT)) Then
    Dim inputListArray_Wer() As String = itemList_PERSON.Split(New String() {delimiter}, StringSplitOptions.RemoveEmptyEntries)
    Dim inputListArray_Abt() As String = itemList_DEPARTMENT.Split(New String() {delimiter}, StringSplitOptions.RemoveEmptyEntries)
    If (inputListArray_Wer.Length = inputListArray_Abt.Length) Then
    ' new rows
    For Each item As String In inputListArray_Wer
    With Output0Buffer
    .AddRow()
    .KEY = CInt(Row.KEY)
    .PERSON = item
    .DEPARTMENT = inputListArray_Abt(i).ToString()
    End With
    ' increase counter
    i = i + 1
    Next
    Else
    ' Filtered Rows
    With Output1Buffer
    .AddRow()
    .KEY = CInt(Row.KEY)
    .PERSON = Row.PERSON
    .DEPARTMENT = Row.DEPARTMENT
    End With
    End If
    End If

    The code shows that there are two outputs. Whenever a record set cannot be split or the columns PERSON and DEPARTMENT have a different number of itemss the rows is redirected to the second output (here Output1).
  10. In order to receive the output from the Script Component we add a Union All Component. Additionally we add two Data Viewer Components, since we want to see the resulting outputs from the Script Component.
    Union ALL

    Union ALL

  11. Running the SSIS package will bring the following result.
    Data Flow completed

    Data Flow completed

  12. As we can see, the first DataViewer Component show us the valid rows produced by our Script Component.
    Data Viewer valid rows

    Data Viewer valid rows

  13. The second DataViewer Component show us the invalid rows produced by our Script Component.
    Data Viewer invalid rows

    Data Viewer invalid rows

download
Of course I provide you the solution as download.

problem
If I have a column with several values (separated by CR) which I need separated into rows then following solution can help:

possible solution: T-SQL
One possible solution using T-SQL is following statement.
The function fn_Split returns all values in a list and the replace function converts the CR (carriage returns) to commas.


SELECT
[Key]
,[b.Value]
,[c.Value]
FROM
[dbo.TableExcel] AS a
CROSS APPLY
fn_Split(REPLACE(a.Persons, CHAR(10)+CHAR(13),','), ',') AS b
CROSS APPLY
fn_Split(REPLACE(a.Departments, CHAR(10)+CHAR(13),','), ',') AS c

The corresponding fn_Split function:

CREATE FUNCTION [dbo].[fn_Split](@text nvarchar(max), @delimiter char(1) = ' ')
RETURNS @Strings TABLE (
position int IDENTITY PRIMARY KEY,
value nvarchar(max)
)
AS
BEGIN
DECLARE @index int
SET @index = -1
WHILE (LEN(@text) > 0)
BEGIN
SET @index = CHARINDEX(@delimiter , @text)
IF (@index = 0) AND (LEN(@text) > 0)
BEGIN
INSERT INTO @Strings VALUES (@text)
BREAK
END
IF (@index > 1)
BEGIN
INSERT INTO @Strings
VALUES (LEFT(@text, @index - 1))
SET @text = RIGHT(@text, (LEN(@text) - @index))
END
ELSE
SET @text = RIGHT(@text, (LEN(@text) - @index))
END
RETURN
END

When Excel data is not coming into SSIS right, then do the following:

  1. use option “IMEX=1” in ConnectionString (Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\..\MyExcel.xls;Extended Properties="Excel 8.0;HDR=Yes;IMEX=1")
  2. modify registry setting “TypeGuessRows=8” to a much higher value
  3. use a sql statement to retrieve data (e.g. “SELECT * FROM [sheet1$A5:M2000]“)

NOTE: The setting IMEX=1 tells the Excel driver to use Import mode. This mode reads the registry setting “ImportMixedTypes=Text” which forces mixed data to be converted to text. To achieve a more reliable column type recognition, the registry setting “TypeGuessRows=8” should be increased. By default the ISAM driver analyzes  the first eight rows and determines the column datatypes from this sampling. If the first analysed rows contain only numeric values, then setting IMEX=1 will not convert this column datatype to Text – it will remain numeric.

ATTENTION: The IMEX setting has some other modes:

  • 0 is Export mode
  • 1 is Import mode
  • 2 is Linked mode (full update capabilities)

Registry Setting Location:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\X.X\Engines\Excel

supplement
If you experience following error the above described solution could fix it.
1. An OLE DB error has occurred. Error code: 0×80040E21.
2. Failed to retrieve long data for column “XYZ”.
3. There was an error with output column “XYZ” (55488) on output “Excel Source Output” (109). The column status returned was: “DBSTATUS_UNAVAILABLE”.
4. The “output column “XYZ” (55488)” failed because error code 0xC0209071 occurred, and the error row disposition on “output column “XYZ” (55488)” specifies failure on error. An error occurred on the specified object of the specified component.
5. [DTS.Pipeline] Error: The PrimeOutput method on component “Info Source” (101) returned error code 0xC0209029. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.
6. [DTS.Pipeline] Error: Thread “SourceThread0″ has exited with error code 0xC0047038.

SQL Server: Version, Service Pack, Edition

Just run following command within Management Studio:
SELECT
@@VERSION,
SERVERPROPERTY('productversion'),
SERVERPROPERTY ('productlevel'),
SERVERPROPERTY ('edition')

As result you receive something like:

Microsoft SQL Server 2005 - ... Windows NT 5.2 (Build 3790: SP 2)
  9.00.4035.00
  SP3
  Standard Edition (64-bit)

Der Fehler

MSB4019: The imported project “C:\Program Files\MSBuild\Microsoft\VisualStudio\v8.0\WebApplications\Microsoft.WebApplication.targets” was not found.

konnte bei mir wie folgt behoben werden:

  1. Prüfen, ob sich unter C:\Program Files\MSBuild\Microsoft\VisualStudio\v9.0\WebApplications\Microsoft.WebApplication.targets ein File befindet.
  2. Wenn ja, dann dieses an die erwartete Stelle kopieren. Wenn nicht, dann dieses herunterladen (Google).

A very interesting tool: Phune.

Phun is a free game like 2D physics sandbox where you can play with physics like never before. The playful synergy of science and art is novel, and makes Phun as educational as it is entertaining.

[Source: http://www.phunland.com]

[via: Micha Weber]

How to disable caching in SQL Server Reporting Services for Visual Studio in Preview Mode?

SQL Server 2005 Service Pack 1 brings a new feature: caching data in BIDS (Business Intelligence Development Studio) Report Designer. Following objects are cached: all the data query, parameter value and credentials for previewing a report. Sometimes this feature can be annoying, if changes in preview mode should be visible immediately.

In order to disable the caching feature, following key has to be added to the config file ‘RSReportDesigner.config‘.

Key:
<Add Key="CacheDataForPreview" value="False" />

Location:
C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\PrivateAssemblies\RSReportDesigner.config
To clear the cache you can also delete the *.data cache file which is located in the report RDL directory.

SSRS: Caching Files (Preview Mode)

SSRS: Caching Files (Preview Mode)

  • ReportPath="/CPA_DWH_ASP_REPORTS/MPCSurvey_Plausibility"
  • ReportServerUrl="http://53.71.217.43/ReportServer

Next Page »