SSWUGtv
With Stephen Wynkoop
In this Special Edition, Leon Guzenda from Objectivity DB joins us to talk about Big Data, some real applications for it and the history behind this kind of massive data procession. Really interesting applications of this technology.
Watch the Show
Merge Feedback
We had a couple responses from the Merge conversation raised in the newsletter yesterday.
Tim Asks:
Quick question regarding your merge methodology. In your email, you said that it is most efficient to perform merges in "update", "delete", "insert" order. Wouldn’t it be faster to perform the delete first so you aren’t processing records you are going to delete anyway?
Editor:
Tim…that is a great question. In my experience, if you perform the update first, that is really all that matters. The insert and delete commands follow, but the order is less significant. The reason is that you wouldn’t update a record that was going to be deleted.
There are two ways to perform merges on sets. The first method is to have the set to be merged identify the action to be performed on each record (Insert, Update, Delete) by having a column with the necessary action. Then you simply perform the appropriate commands.
The second way assumes nothing regarding the two sets other than the fact that the new set is authoritative. In this second method, that is where the order of execution makes a difference. Here is an update command for rows existing in both sets.
UPDATE PermTable SET
Column-n = NewSet.Column-n
FROM NewSet
JOIN PermTable on NewSet.keyn… = PermTable.Keyn…
WHERE PermTable.Column-n != NewSet.Column-n
This update command will update all the rows matching in the new table based on the primary key having some column being updated that is different. The syntax here is representative, not pure SQL.
To delete rows no longer in the new set you can use an outer join. Any row already in the database no longer in the new set should therefore be deleted, since the new set is authoritative.
DELETE p
FROM PermTable p
LEFT OUTER JOIN NewSet N on p.Keyn… = n.Keyn…
WHERE N.Keyn… is null
Then you may insert the new rows using a similar technique…any row in the new set not already in the database is new and should therefore be inserted.
INSERT PermTable (col list)
SELECT N.Col List
FROM NewSet N
LEFT OUTER JOIN PermTable P on p.Keyn… = n.Keyn…
WHERE P.Keyn… is null
This technique works with any SQL Engine that supports outer joins.
Mike Writes:
I wanted to write a gotcha with the merge command. If you are building up a query to be executed and going to store in the nvarchar(max) variable, then each string you concatenate has to be nvarchar(max) or your string will be truncated to 4000 characters. In other words, you have to do something like set @nvarcharmaxvariable= cast(‘select * from ‘ as nvarchar(max)) + cast(‘mytable’ as nvarchar(max)). If the string your building is 4000 or less it’s a non issue. But what will happen, and at least in my case, you’ll test it on a couple of tables and everything will work fine. You then apply it to a wide table and wonder why it doesn’t work. Also, keep in mind if you print the string to the message panel only the first 4000 chars will be displayed even if you have successfully put more than 4000 chars in your nvarchar(max) variable.
Please drop me a note at btaylor@sswug.org if you have other things you’d like to share.
Cheers,
Ben
$$SWYNK$$
Featured Article(s)
There’s No Such Thing as Information Overload
The idea of information overload has been around for a long time, even pre-dating the Internet explosion of the 1990s. It seems quaint that anyone even imagined the term “information overload” pre-Internet, doesn’t it? As we progress through the second decade of the 21st Century though, information overload can seem more relevant than ever before. It can be difficult to the avalanche of information sources "out there" that we have to contend with. Couple that with the better technologies at our disposal for managing and sifting through all the data and information and it can seem overwhelming at times.
Featured White Paper(s)
All-At-Once Operations
Written by Itzik Ben-Gan with SolidQ
SQL supports a concept called all-at-onc… (read more)