Tuesday, December 14, 2010

SQL Saturday: A Free Event for SQL Geeks

I'm not sure if anyone reads or subscribes to this blog, but if you do, here's an event to put on your calendar.

SQL Saturday is a free event in Brunswick, OH (just Southwest of Cleveland), and will feature 36 sessions on SQL and SQL-related topics.  Speakers range from local professionals to national Microsoft MVP's who have written for popular SQL magazines and presented at national conferences like PASS.

http://www.sqlsaturday.com/60/schedule.aspx

Yours truly is on the organizing committee and is organizing the food and drink, so you know at least the eatin' will be good.  I'm pretty excited about the roster of sessions that we're lining up, and it should be a good geeky day.

Register here if you plan on coming, or if you're a kind soul who is able to volunteer or if you or your company are able to sponsor.

Thursday, August 19, 2010

Sorting Text Fields Numerically

Just a little tidbit that I came across this morning, involving sorting fields that have numeric data but in a text field.

In this instance, I'm using a cursor to loop through my tables and run sp_spaceused on them, and then put the results into a temporary table:
insert into #TableSizes exec sp_spaceused @objname = @name;

The problem is, when I select from the temp table:

select [Table Name]
, [Number of Rows]
, [Reserved Space]
, [Data Space]
, [Index Size]
, [Unused Space]
from [#TableSizes]
order by [Reserved Space] desc

the Reserved Space column isn't sorted correctly. I end up with something like this:






Table NameNumber of RowsReserved Space
TableD6096 KB
TableB587952 KB
TableA1111518728 KB
TableC6652908717752 KB

Because Reserved Space is a varchar column, it sorts as text, based on the first character first. So 9's sort to the top, even small numbers like 96.

The correct solution is probably to trim off the " KB" from the end, cast to an integer, and sort by that:

order by cast(left([Reserved Space],len([Reserved Space])-3) as int) desc

which then also allows you to use SUM, etc. on the column.

I was in a hurry and didn't want to mess around with all the parenthesis, and came up with a quick and dirty solution:

order by len([Reserved Space]) desc,[Reserved Space] desc

This sorts the longest strings (hence the biggest numbers) to the top of the list, and then sorts each string alphabetically from 9 to 0.

Not something you'll need to do every day, but I thought it was a neat solution.

Thursday, August 12, 2010

It's the Color of the Sky, See?


"I want to know where my data is, physically!" This was the repeated demand of the gentleman sitting next to me at last night's SQL SIG talk on SQL Azure-- Microsoft's new cloud-based RDBMS. (Of equal concern was, once his data was inevitably lost or compromised, "Who do I sue???") The pitch was so perfect, he could have been a plant-- those of us who were skeptical about data in the cloud, having just heard our prejudice given an angry voice, were suddenly defending it.

Still, it's easy to understand his frustration. In the cloud-heavy world of NetFlix, Rhapsody, and Amazon Kindles, an increasing percent of the media and information in our private lives isn't on our bookshelves-- it's out there somewhere. I used to buy CDs and DVDs obsessively, but now spend $9 a month to access a vastly larger collection online. Like the old chestnut about beer, you don't buy media anymore, you rent it.

Microsoft wants to turn your SQL Server into such a commodity, and they have a pretty good start on it. SQL Azure is part of the Windows Azure platform, that lets you host web applications in the cloud. What's appealing is that the interfaces for both managing and programming SQL Azure are the ones you're used to: you can connect SQL Management Studio and BIDS (at least the latest versions) to your Azure server and they behave exactly the same as they would with your real SQL Server, at least in most cases. Connecting to an Azure database with ADO.NET is transparent --but for the connection string-- and you use the same Connection and Command objects.

In short, once you've set up your Azure server and database, you can almost forget that the data's in the cloud and not on your local server. Almost. Context switching ("USE Database2") and three-part naming ("select * from database2.dbo.mytable") don't work-- you have to stay in your own database. Connections have some new behaviors and errors to respond to connectivity issues and other throttles-- Azure kills transactions that run longer than 5 minutes, for example. But for smaller, non-complicated apps, it's pretty smooth.

So where is your data, physically? When you sign up for an Azure account, you pick a geographic location (currently pretty large regions) and Microsoft holds your database on a farm of servers across multiple locations, with replication handled automatically. Essentially your server becomes a black box to you-- your traffic will be routed through load balancers to the right place, and server resources can be added instantly as activity grows and shrinks.

How about performance? Here's where your application architecture may need to change. If you have a stateless web app that's already designed to live in a web farm nicely, you're probably set-- host your web application on Windows Azure servers, and set some affinity settings to make sure your app and database servers in the cloud have good connectivity. This is called Near Computing, because your business logic is close to the database.

If you have more of a two-tier app, you'll need to adjust to the increased latency of accessing data in the cloud, and some of the new behaviors mentioned above. Depending on how chatty your app is with the database server, this isn't trivial. If you're already optimized for poor network latency (say, if your database server is across a wan from the application), it may not be that much different.

There's also some hybrid methods to sync your local SQL database with an Azure database, asynchronously, so you can migrate parts of your app to the cloud and leave others local where it makes sense. This sounds like a bit of a nightmare but might be a good choice for broad applications that might benefit from moving certain functions into the cloud.

Certainly, if your application is entirely local, connected, and speed-sensitive (say, an inventory and routing app where the users are inside the warehouse and connected directly to the data at all times), moving to the cloud might not be a fit. But for bringing small apps online quickly, and particularly when your users are already geographically scattered, it's a tempting architecture.

Licensing plans are still fairly immature, and more complicated features like service broker haven't been implemented yet, but Microsoft is pretty excited about this, and they're certainly not the only ones with their head in the Cloud.

If I were building a new app from scratch today, I'd give Azure a good solid look-- and I might also have my lawyer figure out who to sue.

Wednesday, August 4, 2010

Summer Reading

Es Cue El is going on vacation to beautiful Toronto, Ontario for the weekend, so no blog posts unless Toronto is really boring.

Your weekend reading assignment: Aaron Bertrand's classic "Bad Habits To Kick" series. There's lots of good gems in here, and most of them are things (like always terminating statements with semicolons) that you can easily add to your finger memory with a couple days' effort.

See you Monday!

Oracle XE

If you're looking to get your feet wet in the world of Oracle, then Oracle XE is for you.

Somewhat comparable to SQL Server Express Edition, Oracle 10g XE is the free version of Oracle-- free to develop, free to deploy, free to distribute. It's limited to one database per machine, 4 GB per database, 1 GB of memory, and one processor (you can install it on multi-processor machines but it will only use one).

Combined with Oracle Application Express (APEX) -- a free toolkit for building applications, including web-based user interfaces, reporting, security, and dashboards -- you can actually build little web-based database applications pretty quickly without much learning.

I think the hope is that smaller organizations will consider using it for internal applications and then grow into the more expensive versions. I'm not sure how likely that is, but I am working with a friend to build a small inventory/billing app for her small business, and I was going to use MS Access just because it's fast and familiar. After installing Oracle XE, I'm considering using it, although admittedly I'd be doing so just as an excuse to learn a new skillset.

Oracle has a reputation of being somewhat of a monster to install, and if you've ever read anything about the Universal Installer, you may think you need a PhD just to get it installed. Oracle XE and APEX are quick installs that ask you a handful of questions (install path, etc) and for the most part you can sail right through with the defaults. I had to change the default port and still don't quite know why, but even that was easy-- the installer told me that 8080 was in use, so I tried 8081, then 8082, until I found one that was open. That was the biggest glitch I ran into.

The other great thing about the XE/APEX is the tutorials available on both web sites. Oracle has created a "2 Day Developer Guide" that walks you through the tools, basic PL/SQL, and everything you need to know to get a quick and dirty app up. There's also some great tutorials.

I don't expect this will make anyone an expert at Oracle, but for SQL Server types like myself looking to teach themselves PL/SQL and maybe create a little sample app, you can at least get yourself some exposure to the world of Oracle without a lot of hassle.

Tuesday, August 3, 2010

Can a LEFT JOIN restrict your resultset?

A colleague came to me the other day with this question: "Isn't it true that a left join shouldn't restrict the number of records returned?"

This was one of those crisis-of-understanding moments: we're all taught that if you query a table and get 100 rows back, adding on a left join may add rows to your resultset, but should never remove any.

His query was something like this:
select od.*, p.ISBN from OrderDetails od left join Products p on od.ProductUid=p.UID where od.OrderDate='3/11/10' and p.Type in ('B','T')
He told me that when he ran this query:
select od.* from OrderDetails od where od.OrderDate='3/11/10'
he got 100 records back. But when we added on the left join, suddenly he only got 87 records. Since when, he wanted to know, could a left join do that? Not all the OrderDetail records had a ProductUid (some were for taxes, miscellaneous charges, etc), which was why he used the left join to begin with. And sure enough, those records were the ones that were missing from the results. Shouldn't the left join have returned those records, with no p.ISBN value?

The problem here is actually our old friend, Mr. NULL. The left join did return the OrderDetail records without a ProductUid. But the where clause filtered them out-- specifically, the phrase "p.Type in ('B','T')". When I told my friend that this was the problem, he protested. "But Product.Type is a required field!" It was defined as NOT NULL in the table.

So, how could p.Type be the problem, and when would it ever be null?

Remember, when a match isn't found in a left join, the fields in the right table (in this case, "p") are all null. So, after evaluating the left join, there still were 100 records. However, now when we come to evaluating "p.Type in ('B','T')", we run into the situation where p.Type is null (because no record was found in p).

Since"null=anything" always returns false (technically "unknown"), and because "in" evaluates to a series of "or"s in the query engine:

p.Type = 'B' or p.Type = 'T'

,we end up filtering out the very records that the left join was supposed to leave in.

There are three quick solutions to this:
select od.*, p.ISBN from OrderDetails od left join Products p on od.ProductUid=p.UID and p.Type in ('B','T') where od.OrderDate='3/11/10'
or
select od.*, p.ISBN from OrderDetails od left join Products p on od.ProductUid=p.UID where od.OrderDate='3/11/10' and (p.Type in ('B','T')or p.Type is null)
or
select od.*, p.ISBN from OrderDetails od left join Products p on od.ProductUid=p.UID where od.OrderDate='3/11/10' and isnull(p.Type,'') in ('B','T')
The first solution is probably preferable-- move the p.Type condition into the join expression. What we're really trying to join to is "products with a type of B or T"-- if none are found, then p.ISBN should be null in our resultset.

It's equivalent to:
select od.*, p.ISBN from OrderDetails od left join (select * from Products where p.Type in ('B','T')) p on od.ProductUid=p.UID where od.OrderDate='3/11/10'
Testing for "p.Type is null" in the where clause should accomplish the same thing, but consider the condition where Product.Type is a nullable field, and there are records where p.Type is null and p.ISBN is not. By testing for "p.Type is null" in the where clause, we'd be including records with no match from the left join, but we'd also be including ISBNs for products with a null Type, which isn't what we want.

This is particularly troublesome when the data you're working with almost always has a match in the right table. In this case, 99% of the OrderDetails records had a ProductUid, so the developer never had to worry about p.Type. When he found a rare data condition, it seemed like the rules of SQL had changed on him.

The bottom line is this: be careful when using where clauses on columns from a right table-- even when those columns are defined as NOT NULL in the database. For that matter, get in the habit of always asking yourself "what if this column were null?" when testing for conditions in a SQL statement.

Monday, August 2, 2010

CTEs and readability

I've been working lately on keeping my queries more readable. Here's a pattern I use a lot when I'm not thinking about readability.

Say I want to know the average number of unique items contained in a set of orders. This is a classic aggregate-of-an-aggregate problem. First you need to find how many unique items are in each order:

select count(distinct prodcode) from orderdetail group by orderid

Then you can take that set and get an average:

select avg(prodcount) from (select count(distinct prodcode) prodcount from orderdetail group by orderid) as x

I'm always tempted to do this because it just involves wrapping my first query in parentheses, adding a column alias, and giving the whole thing a table alias. Here's two problems with this:
  1. I used "x" as a table alias. Syntax requires me to give the CTE a name, but I don't really care what it is. Until I add another layer to the query. I've seen queries that are multiple pages long and full of "as x", "as x1", "as x2" etc.
  2. It's already starting to get a little difficult to tell which from clause applies to which tables, and which columns came from where. Scope is also difficult to infer-- if I use this query as part of something else and want to add a date range, where do I add it?
Consider this syntax instead:

with items_per_order as
(select count(distinct prodcode) as prodcount from orderdetail group by orderid)

select avg(prodcount) from items_per_order ipo

The query benefits from a better alias for the CTE, but also we've simplified the intent: the last line is a very simple query getting the average count from a set. Items_per_order encapsulates the logic of getting the item count, and yet it's right there in the query if I want to delve into it. Scope is also much clearer: orderdetail and its columns can be referenced from within the items_per_order CTE, but not outside of it. It's clear where items_per_order starts and stops.

If I continue down the road of building more logic into the query, I can add more CTEs:
with orders_denormalized as
(select o.orderid, s.name as salespersonname, c.name as customername, sc.code from orderheader o left join salesperson s on o.sid=s.id
left join customer c on o.custid=c.id
inner join statuscodes sc on o.statusid=sc.id),

items_per_order as
(select orderid, count(distinct prodcode) as prodcount from orderdetail group by orderid),

orders_with_itemcount as
(select o.orderid,o.salesersonname,o.customername,o.code,i.prodcount
from orders_denormalized o inner join items_per_order i on o.orderid=i.orderid )

select avg(prodcount) from orders_with_itemcount owi where customername='steve smith'

I have some distinct encapsulations here that help keep track of where things came from. The definition of each CTE is obvious, and the end query is simple and easy to read.

Performance should definitely be considered here; I'd want to see the query plans to see how well this all performs. But as a way to keep ad hoc encapsulations in order (as often happens in SSRS as you continue to hang logic on your query to build complex segment logic), this has made maintenance a whole lot easier for me.

Sunday, August 1, 2010

A quick gripe


My Eee netbook has a tiny little keyboard, and keys like page-up, page-down, and home are reached by pressing two keys: function-left arrow is the combination for "home".

On my ThinkPad, the ctrl and function keys are and always have been oddly reversed from other keyboards.

Consequently, the keystrokes for alt-home (which I use all the time while browsing), on the Eee PC are function-alt-left arrow. On the ThinkPad, doing the same thing with my fingers ends up with them pressing ctrl-alt-left arrow. I'll let you figure out what that does on your own computer. Let's just say it's annoying and I wish my fingers had two sets of presets like some fancy car seats, so I could program the same muscles to do different things on different keyboards.

Sunday Reading: SQL Server MVP Deep Dives


Today's recommendation is "SQL Server MVP Deep Dives," a collection of excellent articles by SQL MVP on a variety of topics.

I just got this this weekend but so far have learned about ten things that I can use right away. Here's what makes the book such a gem: the target audience for each chapter seems to be experienced DBAs who haven't worked in that particular topic before, but who need a good quick start with examples they can actually use. They're each about 10 pages long but have enough details to give you a solid understanding of the concepts and usually some good examples that you can use immediately.

There are some more conceptual chapters like "What does it mean to be a DBA?" and "Louis and Paul's 10 key relational database design ideas" that are good summaries of some things you probably already know (or should know) but make great touchstones. Others like "BI for the relational guy" solve that classic "I wish someone had just told me X before I started" scenario.

As an added bonus, the authors are donating all their royalties to War Child, an organization that helps war-affected children worldwide.

The ebook version is only $35 and is worth every penny.

Saturday, July 31, 2010

No-SQL Saturday: Gears

Saturdays are time to take a break from databases and look at other things that amuse me.

This week: non-circular gears.

This guy supposedly hand-makes these things out of paper:


There's a great series of videos on old Navy analog computers, which are basically complex gear systems: part 1, 2, 3, 4, 5, 6, 7.

This is probably the best explanation I've ever seen of how differential gears work:



And of course, no discussion of differentials would be complete without mention of the South-pointing Chariot, rumored to have been invented by the Chinese over 4500 years ago. South-facing chariots use a series of gears to keep a figure pointing south no matter which way the wagon turns. These days geeks can't resist building their own out of legos or whatnot, and in looking for links I came across a kit you can buy for $59 and I think I might just have to.

Lastly, here's some random gear-related links:

Spiromania!
Animated Engines
The Kentucky Do-Nothing Machine
The German Do-Nothing Machine
The Eames (yes, that Eames) Do-Nothing Machine

A wooden orrery!

Friday, July 30, 2010

Certifiable

I recently studied for (and passed!) my exams for MCITP Database Developer certification.

In the past, I've always glanced askance at certification. I've run into too many people in my career with a lot of letters after their names and not much sense or experience in their heads. Conversely, some of the smartest people I know in IT have no professional schooling or certification.

An old joke comes to mind:
Q: What do you call the person who graduates last in their class from medical school?
A: "Doctor"

Passing your boards and completing your residency doesn't make you a good doctor. Years of experience, ambition, passion, and curiosity generally do. But, would you take your sick child to see a doctor who "never really got around to getting licensed"?

Getting IT certification shows the world:
  • I care enough about this skillset to actually study for and pass the exams
  • At one point or another, I read and understood something about all the topics in this area, not just the ones I use every day.
Cynically, I'd say that most anyone can cram enough facts into their head to pass a test, and then never actually apply that knowledge and really learn it. But assuming you're not a complete phony and you actually do know your specialty pretty well, being forced to study for exams benefits you in a lot of ways, even if you forget a lot of the facts shortly after.

Here's what I got out of it:
  • I demoted a lot of "unknown unknowns" to "known unknowns." It's kind of like going through your toolbox and learning what some of the more obscure tools do. You still might not ever use your double offset ring spanner, but at least now you know you've got one and what it's for.
  • I got a good solid dose of humility. In my little work world, I'm the SQL guru. But really I use a subset of SQL's features, and there's always plenty to learn.
  • On the opposite side, I got acknowledgement that I do really know my stuff on the "known known" topics. I'm pretty good at writing complex queries, and tuning them, and those questions were a snap.
  • I was able to put a few things to work right away. I've just recently started using recursive CTE's, and lately I'm finding all kinds of things to do with CTE's. I knew about them, but until I started playing with them, they never entered my go-to lexicon.
  • I reaffirmed how much I like working in SQL. The Microsoft exams give you a "real" business scenario that you've been hired to implement, and then ask you about various aspects of the project. It was actually kind of fun digging into an interesting problem and thinking about how I'd go about tackling it.

Next on my plate: MCITP Developer 2008.

Thursday, July 29, 2010

Welcome, and License Plate Woes

Welcome to Es Cue El.

This blog will be a place to post random thoughts about SQL projects I'm working on, things like the No SQL movement that interests/scares/annoys/intrigues me, and just generally a place to geek out. Hopefully I pick up some SQL friends along the way.

A little about me. I work for a software company in Cleveland, OH, and have held various positions over the past couple of decades, most recently "technical consultant." Although I've wandered a bit from programming to support to consulting to product management, I've always been heavily involved in database work, whether it's writing SQL, tuning performance issues, building data access layers, etc.

I've been looking for a creative outlet for a while, and decided that blogging about something I love would probably be a good place to start. Maybe that's a little like a professional race car driver taking up painting to relieve the stress of racing, and then ending up painting nothing but cars and ovals. But hey, it's a start. And I'll try to throw in a non-database post from time to time.

Oh, about the license plate. I decided to go all geek and get an obscure techie license plate. I surveyed my SQL friends to help decide between some good candidates: "SP WHO2", "NOLOCK", "VARCHAR", "ES CUE EL", "SQL FTW" (that last one from a leet-spoken young'n). In the end I went with a slightly different spelling:



I still like it, but somehow didn't notice the Spanish "que" or realize how many people would read it as "ess kay ell" (in Spanish it translates roughly to "it that the"). I took German in school and somehow missed it. Anyway, it's on my car for the next two years now, so I guess I'm stuck explaining it a lot-- but that's kind of what I wanted in the first place. If I had my life to live over, I might have gone with ES CUE EL. C'est la vie.