Tuesday, February 15, 2011

Google Interview Question: "Explain a database in three sentences to your eight-year-old nephew."

[Reposted from my non-SQL blog.  Apologies to the handful of readers who follow both.]

From this list of Google interview questions, I found the question: "Explain a database in three sentences to your eight year old nephew."  Here's my attempts.

Attempt 1: Trying to Capture Third Normal Form
  1. On a sheet of notebook paper, make a list of all your video game systems, and number them 1 through whatever.
  2. On another sheet of paper, make a list of all your video games, also numbered 1 through whatever.
  3. For each video game, find the game system that it goes with on the first list, and write the number from the first list next to the game on the second list.
Attempt 2: Eight-year-olds Are Still Kinda Into Santa
  1. Santa has a list of every kid in the world, which includes the kid's name, address, age, and naughty/nice score.
  2. Santa has another list of every toy in the world-- how much it costs, what color it is, how much it weighs (for the sleigh).
  3. Santa has a special "many-to-many" list with just the kid-number and toy-number, to keep track of which kids want which toys, so it's really easy to figure out things like how many of each toy he needs to buy.
Attempt 3: They Learn This Stuff in the Womb, I Swear
  1. You know how you implemented MongoDB for the social networking site you wrote last year, after the metadata structure got sufficiently gnarly for a simple key/value store, and users were demanding the ability to search on any attribute?
  2. Well, when I was a kid, we had these things called Relational Database Management Systems, which were really good at enforcing relational integrity (which was important because the computers still depended on algorithms instead of magic like they do today) but you had to know a thing or two about relational data design, and the RDMS got kind of slow when everyone on the planet started uploading holographic 3D videos of what they had for lunch every day.
  3. Now go get your Uncle S  a beer.

Friday, February 11, 2011

SQL Saturday "Action Items"

Kendra Little's "Cerealizable" Isolation Level
SQL Saturday #60 was a huge success!  As a volunteer, I'm proud to be part of a team that put on a great event.  As an attendee, I'm kind of blown away at the quality of the event, especially knowing that it was all donated time and money.  (None of the speakers were paid, and many actually flew to Cleveland in February on their own dimes!  You guys rock!)

The best thing about going to events like SQL Saturday and the PASS Summit is that you get to meet and hear all these fantastic speakers.  (The worst thing is, there's never time to meet and listen to all of them!)  This week's  new "discoveries" for me: Jeremiah Peschka, Aaron Bertrand (although I'm a huge fan of Aaron's "Bad Habits" series already), Kendra Little, and Tom LaRock.  All of these folks are wicked smaht and fun to listen to, and Kendra even draws groovy little cartoon drawrings like the one I shared here.

As usual, I went home with visions of new SQL tools and techniques dancing in my head.  Here's some things on my "to investigate" list:

  • Erin Stellato introduced me to ClearTrace!!! Die, die, read80trace!!!  Okay, that's not fair-- read80trace has helped me through many performance audits, but it's kind of unruly to use.  ClearTrace looks like the answer to a SQL Profiler-analyzer's dreams.  Must learn it.
  • About everyone made it clear that Twitter is not just for posting your breakfast intentions anymore.  Brent Ozar wrote a little book to help old folks like me catch up.  I actually joined twitter last night, and I feel more connected already, at least to the SQL community.
  • Tom LaRock finally explained waitstats in a way that made sense.  I'm vowing to finally start looking at waitstats to diagnose performance issues.
  • Kendra Little got me excited about snapshot isolation level again.  I thought it was about the coolest thing ever when it came out, but then never had the time or load test facilities to try it out.  I'm still convinced this will provide huge performance boosts to clients of my former employer, where blocking and deadlocking reign.
  • Lastly, I'm going to get on the schedule of my local SQL user group to present on some SQL topic.  Giving SQL presentations was my favorite part of my last job, and seeing all these great presenters has me itching to get back on the other side of the lectern again.
That should be enough to keep me busy for a while.  

Wednesday, February 9, 2011

An Encryption Problem

Here's the scenario: we have a SQL Server database that until recently stored sensitive information in plaintext  (yikes!)  And we have an Access 2003 database in which we built a custom little front-end, with linked tables to the SQL database.  The vendor recently encrypted some of the fields in the database (thanks!) and now we need to update the Access database to comply.

The documentation from the vendor provided a basic example of accessing the encrypted information:

OPEN SYMMETRIC KEY APPKEY DECRYPTION BY CERTIFICATE APPCERT
SELECT CONVERT(VARCHAR,DECRYPTBYKEY(CUSTOMER.SSN)) AS SSN FROM CUSTOMER

That should be a pretty easy change in most code-- just open the key, wrap the encrypted field in a couple of functions, and you're good to go.  But linked tables in Access present an interesting problem.   I'll spare you the laundry list of things I tried, but suffice it to say that lots of approaches almost worked, but nothing quite let me get the key open before calling DecryptByKey.

What saved me was a little function called DecryptByKeyAutoCert, which extracts the private key from the cert and uses it to decrypt the value, all in one fell swoop.  Without the need to open the key separately, I could create a SQL view, and use INSTEAD OF triggers for the updates.  With baited breath, I linked my new view in Access... and it worked!

In short, here's the SQL that makes it all possible:
create view my_customer as
select customerid, fname, lname, convert(varchar(12),DecryptByKeyAutoCert(cert_id('APPCERT'),null,customer.ssn)) as ssn
from customer

create trigger trg_my_customer_upd on my_customer
instead of update as
begin
     open symmetric key APPKEY decryption by certificate APPCERT
     update customer set fname=i.fname, lname=i.lname, ssn=EncryptByKey(key_guid('APPKEY'),i.ssn)
     from inserted i where i.customerid=customer.customerid
     close symmetric key APPKEY
end

create trigger trg_my_customer_ins on my_customer
instead of insert as
begin
     open symmetric key APPKEY decryption by certificate APPCERT
     insert into customer (customerid, fname, lname, ssn)
     select customerid, fname, lname, EncryptByKey(key_guid('APPKEY'),ssn)
     from inserted
     close symmetric key APPKEY
end
With this view created, we can now issue simple SELECT, UPDATE, and INSERT queries against my_customer, treating the ssn column as a plaintext column.  The view decrypts the data as it comes into the resultset, and the triggers ecrypt the data as it goes back into the database.  In short, the my_customer view now acts just like the original customer table.

All we had to do then was to delete the original linked table in Access, link to the new view, and rename the linked table to the original name.  No need to rewrite any other logic in our mini-application!

I simplified a few things here, and in our case the cert was created without a password, but it's pretty easy to add that in if you need do.  

Tuesday, December 14, 2010

SQL Saturday: A Free Event for SQL Geeks

I'm not sure if anyone reads or subscribes to this blog, but if you do, here's an event to put on your calendar.

SQL Saturday is a free event in Brunswick, OH (just Southwest of Cleveland), and will feature 36 sessions on SQL and SQL-related topics.  Speakers range from local professionals to national Microsoft MVP's who have written for popular SQL magazines and presented at national conferences like PASS.

http://www.sqlsaturday.com/60/schedule.aspx

Yours truly is on the organizing committee and is organizing the food and drink, so you know at least the eatin' will be good.  I'm pretty excited about the roster of sessions that we're lining up, and it should be a good geeky day.

Register here if you plan on coming, or if you're a kind soul who is able to volunteer or if you or your company are able to sponsor.

Thursday, August 19, 2010

Sorting Text Fields Numerically

Just a little tidbit that I came across this morning, involving sorting fields that have numeric data but in a text field.

In this instance, I'm using a cursor to loop through my tables and run sp_spaceused on them, and then put the results into a temporary table:
insert into #TableSizes exec sp_spaceused @objname = @name;

The problem is, when I select from the temp table:

select [Table Name]
, [Number of Rows]
, [Reserved Space]
, [Data Space]
, [Index Size]
, [Unused Space]
from [#TableSizes]
order by [Reserved Space] desc

the Reserved Space column isn't sorted correctly. I end up with something like this:






Table NameNumber of RowsReserved Space
TableD6096 KB
TableB587952 KB
TableA1111518728 KB
TableC6652908717752 KB

Because Reserved Space is a varchar column, it sorts as text, based on the first character first. So 9's sort to the top, even small numbers like 96.

The correct solution is probably to trim off the " KB" from the end, cast to an integer, and sort by that:

order by cast(left([Reserved Space],len([Reserved Space])-3) as int) desc

which then also allows you to use SUM, etc. on the column.

I was in a hurry and didn't want to mess around with all the parenthesis, and came up with a quick and dirty solution:

order by len([Reserved Space]) desc,[Reserved Space] desc

This sorts the longest strings (hence the biggest numbers) to the top of the list, and then sorts each string alphabetically from 9 to 0.

Not something you'll need to do every day, but I thought it was a neat solution.

Thursday, August 12, 2010

It's the Color of the Sky, See?


"I want to know where my data is, physically!" This was the repeated demand of the gentleman sitting next to me at last night's SQL SIG talk on SQL Azure-- Microsoft's new cloud-based RDBMS. (Of equal concern was, once his data was inevitably lost or compromised, "Who do I sue???") The pitch was so perfect, he could have been a plant-- those of us who were skeptical about data in the cloud, having just heard our prejudice given an angry voice, were suddenly defending it.

Still, it's easy to understand his frustration. In the cloud-heavy world of NetFlix, Rhapsody, and Amazon Kindles, an increasing percent of the media and information in our private lives isn't on our bookshelves-- it's out there somewhere. I used to buy CDs and DVDs obsessively, but now spend $9 a month to access a vastly larger collection online. Like the old chestnut about beer, you don't buy media anymore, you rent it.

Microsoft wants to turn your SQL Server into such a commodity, and they have a pretty good start on it. SQL Azure is part of the Windows Azure platform, that lets you host web applications in the cloud. What's appealing is that the interfaces for both managing and programming SQL Azure are the ones you're used to: you can connect SQL Management Studio and BIDS (at least the latest versions) to your Azure server and they behave exactly the same as they would with your real SQL Server, at least in most cases. Connecting to an Azure database with ADO.NET is transparent --but for the connection string-- and you use the same Connection and Command objects.

In short, once you've set up your Azure server and database, you can almost forget that the data's in the cloud and not on your local server. Almost. Context switching ("USE Database2") and three-part naming ("select * from database2.dbo.mytable") don't work-- you have to stay in your own database. Connections have some new behaviors and errors to respond to connectivity issues and other throttles-- Azure kills transactions that run longer than 5 minutes, for example. But for smaller, non-complicated apps, it's pretty smooth.

So where is your data, physically? When you sign up for an Azure account, you pick a geographic location (currently pretty large regions) and Microsoft holds your database on a farm of servers across multiple locations, with replication handled automatically. Essentially your server becomes a black box to you-- your traffic will be routed through load balancers to the right place, and server resources can be added instantly as activity grows and shrinks.

How about performance? Here's where your application architecture may need to change. If you have a stateless web app that's already designed to live in a web farm nicely, you're probably set-- host your web application on Windows Azure servers, and set some affinity settings to make sure your app and database servers in the cloud have good connectivity. This is called Near Computing, because your business logic is close to the database.

If you have more of a two-tier app, you'll need to adjust to the increased latency of accessing data in the cloud, and some of the new behaviors mentioned above. Depending on how chatty your app is with the database server, this isn't trivial. If you're already optimized for poor network latency (say, if your database server is across a wan from the application), it may not be that much different.

There's also some hybrid methods to sync your local SQL database with an Azure database, asynchronously, so you can migrate parts of your app to the cloud and leave others local where it makes sense. This sounds like a bit of a nightmare but might be a good choice for broad applications that might benefit from moving certain functions into the cloud.

Certainly, if your application is entirely local, connected, and speed-sensitive (say, an inventory and routing app where the users are inside the warehouse and connected directly to the data at all times), moving to the cloud might not be a fit. But for bringing small apps online quickly, and particularly when your users are already geographically scattered, it's a tempting architecture.

Licensing plans are still fairly immature, and more complicated features like service broker haven't been implemented yet, but Microsoft is pretty excited about this, and they're certainly not the only ones with their head in the Cloud.

If I were building a new app from scratch today, I'd give Azure a good solid look-- and I might also have my lawyer figure out who to sue.

Wednesday, August 4, 2010

Summer Reading

Es Cue El is going on vacation to beautiful Toronto, Ontario for the weekend, so no blog posts unless Toronto is really boring.

Your weekend reading assignment: Aaron Bertrand's classic "Bad Habits To Kick" series. There's lots of good gems in here, and most of them are things (like always terminating statements with semicolons) that you can easily add to your finger memory with a couple days' effort.

See you Monday!