IT


Often ETL design is realized using set-based data processing (SQL-Statements). Especially developers with a database professional background are induced to extract, load and transform (ETL) data using Stored Procedures. On the other side there are advantages of record-based data processing.

The following slides give an overview and recommendation about usage scenarios. There are some tips which should be considered when comparing set-based operations and record-based operations.

T-SQL-Statements: Advantages

  • SQL will outperform data-flows (if data is transferred table-to-table on the same server)
  • Using transactions is simple
  • SQL (Stored Procedures) can be debugged
  • Set-based operation vs. record-based operation (seems to be more performing)
  • SQL tools support statement generation
  • Stored Procedures are supported by internal Optimization Engine (execution plan)
  • DB experts can design data processing tasks
  • Optimized algorithm for Sorting / Grouping statements (e.g. GROUP BY, DISTINCT, ORDER BY)
Data-Flows: Advantages
  • Data from heterogeneous sources (e.g. non-SQL-sources)

  • Data transfer machine-to-machine
  • Data transforms are flexible, transparent
    (SSIS: buffers for heavy operations / T-SQL: temporary tables, cursors)
  • Broad transformation functionality (fuzzy logic, aggregations, derived columns, scripts)
  • Easier handling of „bad“ data (different data sinks, exception handling)
  • .Net Data providers as alternative to OLE DB Providers (e.g. mySAP .Net Data Provider)
  • Visual representation of work flow / data flow à better maintenance, understanding of data-flows
    (since they are a bit self-documenting)
  • Parallel execution / step-by-step execution (Checkpoints) of data-flows
  • Easier debugging using events („component <component name> (xxxx) wrote xxxx rows“)
  • Easier handling of different data types (conversion, …)
Performance Tips

Bulk operation vs. OLE DB Command Transformation
  • Use bulk updates instead of single statement updates
  • OLE DB Command with parametrized query usually slow (no bulk operation)
  • Load changed records to staging table and perform set-based update with a stored procedure
  • Prefer usage of set-based updates (stored procedures) over OLE DB Command

Load only changed rows: “Delta Detection”

  • If possible extract and load only a delta set

Eliminating Unnecessary Work

  • Data columns and rows need space within the buffer (performance)
  • Only extract and process needed data
  • Use optimized SQL Statements for data extraction (up-to-date statistics required)

Job dependencies

  • Dependencies among ETL jobs (e.g. job “B” cannot start while job “A” is not finished)
  • Master packages should be used with maximum degree of parallelism
  • Implement parallel execution (MaxEngineThreads, MaxConcurrentExecutables)

Asynchronous transformations (bottlenecks)

  • slowest parts within ETL process
  • Asynchronous operations (block data flow) vs. synchronous operations (row by row basis)
  • E.g. Sort Transformation and Aggregate Transformation (ok for small data sets)
  • Blocking transformations should be avoided wherever possible
  • Merge Join: delivered data should be pre-sorted (by stored procedures or select queries)
  • Prefer stored procedures to inline selects and ensure that join keys are indexed
    (then internal DB Optimizing Engine is used)

Slow database operations

  • Reason: concurrency, integrity checks, indexes, triggers
  • Data processing outside database (better performance)
  • Disable integrity checking (disable constraint) before load in target database
  • Disable DB triggers before load in target database
  • Drop indexes before load and recreate them after (e.g. SQL: drop index / create index)
  • Parallel bulk operations (if table is partitioned or there are no indexes)
    Note: parallel load attempts into same table/partition can cause locks (on data rows or indexes)

Cross-DB Joins

  • Slow down ETL extracts considerably
  • Use staging tables (schemas should be used to separate staging areas)

Transaction Protocol

  • Disable Transaction Protocol (not convenient for OLAP)

Using SQL Statement to retrieve data

  • Avoid using table or view access mode in OLE DB Source
    (otherwise a row set is opened for both to retrieve column meta data and the rows)
  • Use SQL Statement: 10times faster
    (SQL Statements use internal DB Optimizer, but SSIS cannot optimize statements)

Heavy data load operations

  • Use staging tables and SSIS Checkpoints (only available for Control Flow)
  • Avoid buffer swapping (slow) to disk (use performance counters to monitor)
  • Reduce columns covered by Lookup transformation to the minimal possible set (natural key and surrogate key)
  • Use Merge Join in preference to Lookup transformation when possible

Auditing and Logging

  • Use event log for exception and data quality handling
  • Use validations (e.g. no. of extracts equals no. of inserts)

Conclusion

  • Generally use record-based operations (ETL-Tools)

  • Intelligent usage of set-based operations (e.g. for bulk operations: table-to-table)
  • No data transformations within set-based-operations
  • Sorting / Aggregation statements should be done in set-based operations (at source extraction)
  • Staging tables should be used
  • Asynchronous operations (Sorting, Aggregating) should be avoided when possible
  • Consequently eliminate unnecessary work
  • Only extract needed data from sources
  • Try to pre-sort Merge Join input flows

Slides (Download)

There are two options executing SSIS packages:

- xp_cmdshell command (not recommended for security reasons)
- sp_start_job command

The main difference between both options is the execution method. The xp_cmdshell command is a synchronous call and the sp_start_job command is an asynchronous call.

xp_cmdshell command (synchronous)

  • enable xp_cmdshell mode in Surface Area Configuration Tool for SQL Server
  • use following procedure to execute SSIS package:

    DECLARE @returncode int
    EXEC @returncode = xp_cmdshell 'dtexec /f "PackageNameWithFullPath.dtsx"'

sp_start_job (asynchronous)

  • define job in SQL Server Agent with SSIS execution step
  • use following procedure to start SQL Server Agent job with SSIS execution step:

    [msdb].dbo.sp_start_job @job_name='JobName'
  • check execution status of job, since the call is asynchronous:

    SELECT
    [server],
    [start_execution_date],
    [stop_execution_date],
    [run_date],
    [run_duration],
    [run_status],
    [message]
    FROM
    MSDB.DBO.SYSJOBS Z
    INNER JOIN
    MSDB.DBO.SYSJOBACTIVITY A
    ON Z.JOB_ID = A.JOB_ID
    INNER JOIN
    (
    SELECT
    MAX(SESSION_ID) AS SESSION_ID
    FROM
    MSDB.DBO.SYSSESSIONS
    ) AS B
    ON A.SESSION_ID = B.SESSION_ID
    LEFT JOIN
    MSDB.DBO.SYSJOBHISTORY C
    ON A.JOB_HISTORY_ID = C.INSTANCE_ID
    WHERE
    Z.NAME = 'JobName'

There is following way to make the call of sp_start_job synchronous.

sp_start_job (synchronous)

  • define job in SQL Server Agent with SSIS execution step
  • use following procedure to start SQL Server Agent job with SSIS execution step:

    ALTER PROCEDURE [dbo].[AGENT_JOB_CHECK2]
    -- Add the parameters for the stored procedure here
    DECLARE @job_name nvarchar(100)='',
    DECLARE @maxwaitmins int = 5
    AS
    BEGIN
    set NOCOUNT ON;
    set XACT_ABORT ON;
    BEGIN TRY
    declare @running as int
    declare @seccount as int
    declare @maxseccount as int
    set @maxseccount = 60*@maxwaitmins
    set @seccount = 0
    set @running = 0
    declare @job_owner sysname
    declare @job_id UNIQUEIDENTIFIER
    set @job_owner = SUSER_SNAME()
    -- get job id
    select @job_id=job_id
    from msdb.dbo.sysjobs sj
    where sj.name=@job_name
    -- invalid job name then exit with an error
    if @job_id is null
    RAISERROR (N'Unknown job: %s.', 16, 1, @job_name)
    -- output from stored procedure xp_sqlagent_enum_jobs is captured in the following table
    declare @xp_results TABLE ( job_id UNIQUEIDENTIFIER NOT NULL,
    last_run_date INT NOT NULL,
    last_run_time INT NOT NULL,
    next_run_date INT NOT NULL,
    next_run_time INT NOT NULL,
    next_run_schedule_id INT NOT NULL,
    requested_to_run INT NOT NULL, -- BOOL
    request_source INT NOT NULL,
    request_source_id sysname COLLATE database_default NULL,
    running INT NOT NULL, -- BOOL
    current_step INT NOT NULL,
    current_retry_attempt INT NOT NULL,
    job_state INT NOT NULL)
    -- start the job
    declare @r as int
    exec @r = msdb..sp_start_job @job_name
    -- quit if unable to start
    if @r<>0
    RAISERROR (N'Could not start job: %s.', 16, 2, @job_name)
    -- start with an initial delay to allow the job to appear in the job list (maybe I am missing something ?)
    WAITFOR DELAY '0:0:10';
    set @seccount = 10
    -- check job run state
    insert into @xp_results
    execute master.dbo.xp_sqlagent_enum_jobs 1, @job_owner, @job_id
    set @running= (SELECT top 1 running from @xp_results)
    while @running<>0 and @seccount < @maxseccount
    begin
    WAITFOR DELAY '0:0:10';
    set @seccount = @seccount + 10
    delete from @xp_results
    insert into @xp_results
    execute master.dbo.xp_sqlagent_enum_jobs 1, @job_owner, @job_id
    set @running= (SELECT top 1 running from @xp_results)
    end
    -- result: query
    SELECT
    [server],
    [start_execution_date],
    [stop_execution_date],
    [run_date],
    [run_duration],
    [run_status], -- 0: failed, 1: success, null: running
    [message]
    FROM
    MSDB.DBO.SYSJOBS Z
    INNER JOIN
    MSDB.DBO.SYSJOBACTIVITY A
    ON Z.JOB_ID = A.JOB_ID
    INNER JOIN
    (
    SELECT
    MAX(SESSION_ID) AS SESSION_ID
    FROM
    MSDB.DBO.SYSSESSIONS
    ) AS B
    ON A.SESSION_ID = B.SESSION_ID
    LEFT JOIN
    MSDB.DBO.SYSJOBHISTORY C
    ON A.JOB_HISTORY_ID = C.INSTANCE_ID
    WHERE Z.NAME = @job_name
    -- result: not ok (=1) if still running
    --if @running <> 0
    --return 0
    --else
    --return 1
    END TRY
    BEGIN CATCH
    DECLARE
    @ErrorMessage NVARCHAR(4000),
    @ErrorNumber INT,
    @ErrorSeverity INT,
    @ErrorState INT,
    @ErrorLine INT,
    @ErrorProcedure NVARCHAR(200);
    SELECT
    @ErrorNumber = ERROR_NUMBER(),
    @ErrorSeverity = ERROR_SEVERITY(),
    @ErrorState = ERROR_STATE(),
    @ErrorLine = ERROR_LINE(),
    @ErrorProcedure = ISNULL(ERROR_PROCEDURE(), '-');
    SELECT @ErrorMessage =
    N'Error %d, Level %d, State %d, Procedure %s, Line %d, ' +
    'Message: '+ ERROR_MESSAGE();
    RAISERROR
    (
    @ErrorMessage,
    @ErrorSeverity,
    1,
    @ErrorNumber, -- original error number.
    @ErrorSeverity, -- original error severity.
    @ErrorState, -- original error state.
    @ErrorProcedure, -- original error procedure name.
    @ErrorLine -- original error line number.
    );
    END CATCH
    END

[Source: http://blog.boxedbits.com/archives/124]

A very interesting tool: Phune.

Phun is a free game like 2D physics sandbox where you can play with physics like never before. The playful synergy of science and art is novel, and makes Phun as educational as it is entertaining.

[Source: http://www.phunland.com]

[via: Micha Weber]

openproj.jpg Wer eine freie und kostengünstige Alternative zu Microsoft Project sucht, kann mit OpenProj fündig werden. Als Java-Anwendung unterstützt die Software Mac, Unix, Linux und Windows.

Außerdem steht OpenProj unter der Common Public Attribution License 1.0 (CPAL) und ist damit Open-Source.

Die Firma hinter OpenProj heißt Projity und stellt folgende Features heraus:

  • OpenProj basiert auf einer SaaS-Lösung (Software as a Service oder Project-On-Demand)
  • es handle es sich um einen vollwertigen Ersatz für Microsoft Project (Datei-Format ist überführbar!)
  • keine hohen Lizenzkosten
  • Gantt-Diagramme
  • Netzwerk-Diagramme (PERT Charts)
  • Projektstrukturpläne
  • Leistungswertanalysen

Also das klingt doch vielversprechend.

Wer kennt das nicht:

Da gibt es ein Programm oder eine ausführbare Datei, die gestartet werden sollen – doch man hat Bedenken. Schließlich möchte man sich ja nicht die Windows-Installation “versauen” oder gar Schadsoftware einfangen.

zwei Möglichkeiten gibt es da:

  • Entweder man hat eine virtuelle Maschine auf dem Rechner
  • oder man benutzt eine Art Sandbox (transient storage area).

(more…)

Falls im Sicherheitscenter weder Windows Defender noch Antivirus-Software erkannt wird, hilft folgendes:

  1. Start – Ausführen – “services.msc” starten und Dienst Windows-Verwaltungsinstrumentation stoppen
  2. unter C:\Windows\system32\wbem\repository den Ordner löschen oder umbenennen
  3. den WMI-Dienst wieder starten

Das sollte helfen.

Falls Putty verwendet wird und nur komische Zeichen im Midnight Commander dargestellt werden, dann einfach folgende Einstellung vornehmen:

  • Putty-Konfiguration
  • Window: Translation
  • Character Set Translation On Received Data
  • Passenden Zeichensatz auswählen (z.B. meistens UTF-8)

Vorher: Putty 01
Nachher: Putty 02

Eine kleine Auswahl an Empfehlungen:

Hier einfach mal eine Auswahl einiger Programme:

Wieder mal eine Empfehlung hier:

AVS DVD Player AVS DVD Player – ein relativ leistungsfähiger DVD Player, der zwar nicht an das kommerzielle PowerDVD herankommt, aber dennoch fast all meine DVDs abspielen kann.

(more…)


Link: sevenload.com


Link: sevenload.com

MeineLeude Logo

“Meine Leude” ist eine kostenlose Internetplattform zur Bildung von sozialen Netzwerken im deutschsprachigen Raum. Sie wurde im Oktober 2006 von Stefan Maischner entwickelt und liegt mittlerweile in der beta5 vor.

Auf jeden Fall ist MeineLeu.de ein absoluter Surftipp!

Eigentlich gefällt mir ja der OpenSource PDF Printer PDFCreator. Allerdings funktioniert dieser immer noch nicht unter Windows Vista.

PDF Creator

Als gute Alternative bietet sich der Bullzip PDF Printer an – leider kein OpenSource, aber Freeware.

Bullzip PDF

EditItemTemplate

Um ein erfolgreiches FindControl in einem EditItemTemplate umzusetzen, sollte folgender Code verwendet werden. In diesem Beispiel wird in einem GridView in einer editierbaren Zeile einer Textbox ein Wert (SelectedValue) aus einer DropDownList zugewiesen, sobald ein Button gedrückt wird.


protected void btnSet_Click(object sender, EventArgs e) {
TextBox txtPredecessorName = (TextBox)((Button)sender).Parent.FindControl("txtPredecessorName");
DropDownList ddKTPredecessors = (DropDownList)((Button)sender).Parent.FindControl("ddKTPredecessors");
txtPredecessorName.Text = ddKTPredecessors.SelectedValue;
}

Abhängig vom Medium kann eine unterschiedliche Farbwahl die Lesbarkeit verbessern. So gibt es auch für unsere Entwicklungswerkzeuge verschiedene Themes, die Farben und Schriftarten anpassen können.

kontrast

Visual Studio Themes:

Themes 1

Themes 2

Visual Studio Colors: Back vs. White

Textmate Theme

Farben von Visual Studio 2005 nach MS SQL Management Studio übertragen:

VSColorsToSQL

Meine Farben:
VS2005Settings.zip

[inspired by: Join the dark side of Visual Studio]

visionapp Remote Desktop ist ein Freeware-Tool, welches die Verwaltung von Remote Desktop Verbindungen angenehm erleichtert. Prädikat empfehlenswert!

visonApp

In order to start SSIS packages from ASP.NET web applications you can use a Web Service.

Following page shows how to implement this: Running packages programmatically on the server by using a Web service or remote component.
What you should know about running SSIS packages from ASP.NET is that there is a problem with the threads SSIS processes use.

Even Impersonation does not work successfully. The impersonation applies to the calling thread only, but SSIS creates additional threads in order to be able to perform multiple steps simultanously. Unfortunately, the impersonation context is not passed to these additional threads, so any data base access occurs under process user context, not under impersonated context.

So it is better to execute SSIS package outside of ASP.NET process, e.g. using DTEXEC or Agent Job (Agent proxies are convinient if you want to execute under specific credentials).

Important:
With its default settings for authentication and authorization, a Web service generally does not have sufficient permissions to access SQL Server or the file system to load and execute packages. You may have to assign appropriate permissions to the Web service by configuring its authentication and authorization settings in the web.config file and assigning database and file system permissions as appropriate.

Solution:

IIS 5.0: only solution seems to be assigning Administrator rights to the ASP.NET User

IIS 6.0: create new Application Pool with Administrator user context and set the Web Service
using this context in order to initiate SSIS packages

Here is an example Application Configuration for a SSIS & ASP.NET project:

SSIS ASP.NET

Following error messages can show that you’re running into the problem with the SSIS threads:
- package validation failed
- TaskHost validation failed
- cannot access file location
- component xyz validation failed

workaround: add in /boot/grub/menu.lst kernel parameter i8042.noloop

After installing IIS the ASP.NET setup has to be started by using following command:

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_regiis.exe -i

Next Page »