Wednesday, October 10, 2012

Home on the (Date) Range

Here's an interesting problem that came across my desk this week.  We have, to generalize the data a little, a table containing various product discounts, each of which has a date range during which it applies.

Here's what the table looks like:
CREATE TABLE productdiscounts (startdate DATE,enddate DATE,keyval CHAR(3),discount int) 
INSERT INTO productdiscounts VALUES
('1/1/12','1/31/12','abc',1)
,('1/17/12','2/15/12','abc',8)
,('2/7/12','3/7/12','abc',4)
,('1/15/12','5/13/12','def',2)
,('4/17/12','4/28/12','def',16)
,('2/12/12','4/20/12','def',32)
The challenge was, we need to derive a set that has a row for each date range for which a different set of discounts applies.  So, for product abc, we have a $1 discount from 1/1/12 through 1/16/12, and then on 1/17/12, an additional discount of $8 takes effect.  On 2/1/12, the $1 discount goes out of effect, and then on 2/7/12, a new $4 discount takes effect.  And so on.

Basically, the set we need to create should look like this:
keyval startdate         enddate         discount
abc         2012-01-01   2012-01-16 1
abc         2012-01-17 2012-01-31 9
abc         2012-02-01 2012-02-06 8
abc         2012-02-07   2012-02-15 12
abc         2012-02-16 2012-03-07 4
def         2012-01-15 2012-02-11 2
def         2012-02-12 2012-04-16 34
def         2012-04-17 2012-04-20 50
def         2012-04-21   2012-04-28 18
def         2012-04-29 2012-05-13 2
So how do you take a set of staggering, overlapping date ranges, and come up with one continuous set of ranges, aggregating the discounts as you go?

Our old friend, the numbers table

My first instinct was to use a numbers table and compute the aggregates against that set, like this:

SELECT pd.keyval,d.dayval,SUM(pd.discount) FROM
(SELECT DATEADD(d,number,'1/1/12') AS dayval FROM master..spt_values WHERE type='p') d
INNER JOIN dbo.productdiscounts pd ON pd.startdate<=d.dayval AND pd.enddate>=d.dayval
GROUP BY pd.keyval,d.dayval
The derived table d gives us a set of days from 1/1/12 through sometime in 2017 (2048 days), and by joining to the product discounts, we can add up the discount in effect on each day.  The resultant set looks like this (I omitted the records for every single day):

keyval dayval totaldiscount
abc 2012-01-01 1
...
abc 2012-01-16 1
abc 2012-01-17 9
...
abc 2012-01-31 9
abc 2012-02-01 8
...
abc 2012-02-06 8
abc 2012-02-07 12
 ...
abc 2012-02-15 12
Now the question becomes, how do we find the starting and ending dates for each unique range?  It's not enough to just look for MIN(dayval) and MAX(dayval) for a given aggregate, since the total discount might go from 1 to 9 and then back to 1 again.  SQL 2012 has some window functions that would help with this, but our shop is barely on 2008.

We were pretty close to (ugh) walking this table with a cursor.  But then we did the math.  For two years of discounts, we'd have over 700 rows for each product, multiplied by the number of products.  Using a cursor on millions or rows wasn't a pleasant prospect.

A fresh approach

Looking at the data, we realized that the number of dates involved was fairly small.  Discounts tended to start and end at month end, so in a two year range, we had maybe 40 dates on which a discount started or ended.  And so that got me thinking.

Our desired date ranges would always start on either the start date of a particular discount, or on the day after the end date of a discount.  If a discount wasn't starting or ending, there'd be no change in the total discount that day.

With this in mind, we can create a CTE of "interesting dates":

WITH interestingdates AS (
SELECT DISTINCT keyval,startdate FROM dbo.productdiscounts
UNION SELECT DISTINCT keyval,DATEADD(d,1,enddate) FROM dbo.productdiscounts)
Note that the DISTINCT will eliminate any duplicate dates in the list.  Note also that the keyval (i.e. the product id) is also included, since a date is interesting only for the product affected by that discount.

Armed with our list of interesting dates by product, we can calculate the end date of each range, which is just the day before the next start date:

,interestingranges as
(SELECT a.keyval,a.startdate,
      (SELECT DATEADD(d,-1,MIN(startdate)) FROM interestingdates b WHERE b.keyval=a.keyval AND b.startdate>a.startdate) AS enddate
FROM interestingdates a)
Which leaves us in a pretty good spot.  Here's what the interestingranges CTE looks like:
keyval startdate         enddate
abc        2012-01-01 2012-01-16
abc        2012-01-17 2012-01-31
abc        2012-02-01 2012-02-06
abc        2012-02-07 2012-02-15
abc        2012-02-16 2012-03-07
abc        2012-03-08 NULL
def        2012-01-15 2012-02-11
def        2012-02-12 2012-04-16
def        2012-04-17 2012-04-20
def        2012-04-21 2012-04-28
def        2012-04-29 2012-05-13
def        2012-05-14 NULL
Notice that the last date range for each product is open-ended, since our calculation of the end date is based on the next start date.  These ranges might be handy; if not, it's easy to filter them out.

Having identified the interesting ranges, we just need to aggregate the discount amounts:

SELECT ir.keyval, ir.startdate, ir.enddate, SUM(pd.discount) AS discount
FROM interestingranges ir
INNER JOIN dbo.productdiscounts pd ON pd.startdate<= ir.startdate AND pd.enddate>=ir.enddate AND pd.keyval=ir.keyval
GROUP BY ir.keyval, ir.startdate, ir.enddate
Here's our completed aggregate:
keyval startdate         enddate         discount
abc         2012-01-01   2012-01-16 1
abc         2012-01-17 2012-01-31 9
abc         2012-02-01 2012-02-06 8
abc         2012-02-07   2012-02-15 12
abc         2012-02-16 2012-03-07 4
def         2012-01-15 2012-02-11 2
def         2012-02-12 2012-04-16 34
def         2012-04-17 2012-04-20 50
def         2012-04-21   2012-04-28 18
def         2012-04-29 2012-05-13 2
I  purposely chose discount amounts that were powers of 2, so it's easy to check which discounts are in effect on any day against the aggregate.  A $50 discount on def from 4/17 through 7/20, for instance has to be $32 + $16 + $2.   In reality, it's possible that two (or more) contiguous date ranges to result in the same aggregate-- for example, a $5 discount could expire, and two new discounts for $3 and $2 could take effect on the same day.  In our case, the aggregation was a little more involved, in such a way that the unique set of individual discounts still made each range unique, even if the total discount ended up the same.

The total script ends up looking like this:

--an interesting day is one where a promotion starts or the day after a promotion ends
;WITH interestingdates AS (
SELECT DISTINCT keyval,startdate FROM dbo.productdiscounts
UNION SELECT DISTINCT keyval,DATEADD(d,1,enddate) FROM dbo.productdiscounts
--an interesting range is just each interesting date and the next interesting date, by key
,interestingranges as
(SELECT a.keyval,a.startdate,
      (SELECT DATEADD(d,-1,MIN(startdate)) FROM interestingdates b WHERE b.keyval=a.keyval AND b.startdate>a.startdate) AS enddate
FROM interestingdates a
SELECT ir.keyval, ir.startdate, ir.enddate, SUM(pd.discount) AS discount
FROM interestingranges ir
INNER JOIN dbo.productdiscounts pd ON pd.startdate<= ir.startdate AND pd.enddate>=ir.enddate AND pd.keyval=ir.keyval
GROUP BY ir.keyval, ir.startdate, ir.enddate

Tuesday, December 13, 2011

T-SQL Tuesday #025 : Tricks of the Trade

This week's SQL Tuesday theme is "Tips and Tricks", which led me to start making an inventory of the tricks and tools I use almost daily as a SQL developer.

As an old(er) dog, many of the tricks I use are things I've learned over the past couple of decades, and as I document them, I'm finding that in many cases, there's new ways of doing them (DMO's, for example).  It's no excuse for not learning the new, better methods, but in my defense I'll just say that I have to support clients back to SQL 2000 (and occasionally even 7!) so I find that the old ways still work really well, even if newer versions introduce newer features.

First, some tweaks to the environment:

My first trick is the only one with a price tag.  I've been working with Redgate's SQL Developer Bundle all summer, and I can't imagine ever not using it for SQL development.  It's pricey, but I'd estimate that I work 50-100% faster with it.  If there's money in the budget, ask your boss to get you the best tools available.  I'll be happy to talk to your boss if you want a testimonial of how much productivity you can buy with these tools.

Next, color your connections!  SSMS will do this if you tell it to, but I prefer the coloring provided as part of the free SSMS Tools Pack.  Nothing reminds you that you're on the production server like a giant red bar at the edge of the query editor.  SSMS Tools Pack also automatically saves all your tabs periodically with a searchable history, gives you SQL snippets (similar to SQL Prompt's), helps analyze execution plans, and about ten other handy things, all free.

Finally, if you're staring at the query editor all day, why not try a soothing color scheme?  I really prefer a black background with light text.  Tomas Restrepo wrote a handy little tool for porting color themes between Visual Studio and SSMS.  I've found it handy to backup the following registry key after I set the colors I like:
HKEY_CURRENT_USER\Software\Microsoft\Microsoft SQL Server\90\Tools\Shell\FontAndColors
which makes it handy to switch between themes.  I use a dark background for development, and a white background for presentations.  There's lots of free themes out there at Studio Styl.es for you to try.  (Note: if you import a registry key with a new color scheme, you can implement the changes without starting SSMS by going to Tools/Options, touching Fonts and Colors, and hitting OK.)

Everyone's a DBA Sometimes:

Even though I'm usually in development mode, when things get slow, any database person will do.

Activity monitor will help you find blocking processes, but often when there's blocking, there's panic.  I keep this code in my snippets (or create a stored procedure and add it to SSMS's keyboard shortcuts):
SELECT * FROM master..sysprocesses WHERE blocked<>0 and blocked<>spid
SELECT * FROM master..sysprocesses WHERE blocked=0 and spid in
(select blocked from master..sysprocesses where blocked<>0 and spid<>blocked)
It immediately tells me if anyone's being blocked, and the head of the blocking chain.  It also works in every version of SQL.   DBCC INPUTBUFFER(spid) is usually not far behind.

Yes, you can do this from Activity Monitor and most tools have a way to find it relatively quickly, but having a snippet defined lets me run it with two keystrokes, immediately and repeatedly.

The other thing I setup in any environment is a snippet or stored procedure to gather information from a variety of application-specific logs table.  Being able to see the last ten errors from a few apps, the ETL processes, maybe even some realtime stats like number of sessions, is invaluable when things are going wrong.

Development Tricks

----------------------------------------------

I've come to rely on Redgate's SQL Seach (it's free!) for finding anything in the database.  If I don't have access to it, I usually have this in a snippet for searching the text of stored procedures, triggers, etc:
SELECT schema_name(SO.schema_id)+'.'+SO.NAME,SO.TYPE_DESC,SM.DEFINITION,SO.CREATE_DATE,SO.MODIFY_DATE
FROM SYS.SQL_MODULES SM
INNER JOIN SYS.OBJECTS SO ONSO.OBJECT_ID=SM.OBJECT_ID
WHERE SM.DEFINITION LIKE '%blah%'

----------------------------------------------


Before I started using numbers tables (sometimes called "tally tables"), I thought my life was complete.  Now I use them for all kinds of things, particularly populating data.  Anytime I need a set of integers, my go-to set is:
SELECT number FROM master..spt_values WHERE type='p'
It contains integers 0-2047, and if you need bigger, you can join it to itself to get 4 million, 8 billion, etc.

You can check out the above links for ideas about how to use numbers tables, but for example if I need a table of days for the next five years or so, you can whip one up with:
SELECT DATEADD(d,Number,'1/1/12') as MyDate, Number%7+1 as DayOfWeek
FROM (SELECT Number FROM master..spt_values WHERE type='p') numbers 

----------------------------------------------


Hardly a day goes by when I don't have to convert to or from a comma-delimited list, usually for passing sets around when table-valued parameters aren't an option.  FOR XML PATH is a quick way to get a list of values into a single, comma-delimited string:
DECLARE @mylist VARCHAR(MAX)
SET @mylist=(SELECT TOP 10 ','+name FROM sysobjects WHERE type='p' FOR XML PATH(''))
SET @mylist=STUFF(@mylist,1,1,'')
SELECT @mylist
 You can use a numbers table to quickly split the delimited string back into its components.  (Also see Jeff Moden's definitive piece on tally-table splitters):
SET @mylist=','+@mylist+','
;WITH n AS (SELECT number FROM master..spt_values WHERE type='p' AND number < LEN(@mylist))
SELECT SUBSTRING(@mylist,number+1,CHARINDEX(',',@mylist,number+1)-number-1) FROM n WHERE SUBSTRING(@mylist,number,1)=','
----------------------------------------------

Finally, hardly a day goes by when I don't need to get something random.  Although RAND() will get you a random number, that doesn't help if you want to generate a set of random numbers.  You can always use a loop or a cursor, but there's a better way.
SELECT ABS(CHECKSUM(NEWID()))%9+1 FROM MASTER.sys.columns
will return a list of random numbers between 1 and 9.  This can be particularly handy for generating test data, particularly when combined with TOP() and CROSS APPLY (more on this in another post).

Of course, NEWID() also comes in handy for getting a random selection of existing rows.
SELECT TOP 15 * FROM MyTable ORDER BY NEWID()
will return a random set of 15 records every time it's run.

Social Tricks

If you're reading this blog, you probably know the most important trick: be connected!  There's a huge network of wicked-smaht, generous SQL experts out there, and more free blogs and videos out there than you'll ever have time for.  Here's some of my favorites:
  • Brent Ozar PLF's Weekly News letter is the best thing about Monday, and usually points me to a dozen cool things, SQL and not.
  • There's probably a SQL PASS chapter near you.  And if there's not, they have Virtual Chapters.  These are great way to learn about SQL topics, meet great SQL people, network, and much much more.  Check yours out!
  • SQL Saturdays are a great, free alternative to the incomparable PASS Summit.  Some pretty phenomenal speakers donate their time and wisdom to these, and again they're a great chance to meet fellow SQL geeks.  (Hint: many presenters also generously upload their presentation materials to the SQL Saturday sites.)
  •  SQL Bits, SQL Server Cental (particularly their Stairway series), and SQL Server Team Blogs are all great spots to go read and watch tons more great SQL content.
  • And of course, if you're not on Twitter, get on!  Almost all of the SQL MVP's, presenters, and even the SQL development team follow #sqlserver and #sqlhelp.  Brent Ozar even wrote a free book specifically for SQL Twitter virgins, so you have no excuse.  A typical Twitter experience: I tweeted thanks for some SQL feature I liked, and a few minutes later, the Microsoft engineer who wrote it tweeted back, "You're welcome!" How cool is that?




Tuesday, December 6, 2011

Can Your Network Do This?

A couple of months ago, my wife and I took a road trip from Cleveland to Pittsburgh.  Stuck in bumper to bumper traffic, the woman behind me suddenly lept from her car, ran up behind my car, and took a picture of my license plate.  She hollered something like "I love SQL!" and disappeared back into traffic.

It took three days for a friend to send this back to me in a tweet:


Aside from the cosmic coolness of having attractive, nerd-appreciating women accosting me in traffic, it occurred to me that this was an indication of a pretty good professional network.  Can your network do this?  How many degrees of Kevin Bacon are you from a random stranger on the freeway?

One of my personal and professional goals this year was to do more networking.  I don't suppose I'm alone in the tech world in putting "networking" right up there with "root canal" in the list of things I'd rather not be doing or having done to me. Images come to mind of schmoozing and making uncomfortable small talk, and telling people we should do lunch, or take lunch, or whatever verb it is people do to lunch while networking.

As it turns out, networking can largely be whatever you want it to be, so long as you're connecting to other people.  Yes, I've "done" a couple of really uncomfortable lunches with people trying to sell me something, or see if I could sell something for them.  But I've also been writing this blog, tweeting, drinking beer with other SQL geeks, and attending some interesting (for me) conferences.  Yes, it's still an effort-- I'm an introvert, and as much fun as I have hanging out with SQL people, I'd often much rather just be at home with my wife and cats.  But really, it hasn't been a chore-- just a conscious effort to get more involved in the SQL community.

And you can do it, even if you have major social anxiety.

Here's some things that I've found good luck with:

  • Find a PASS chapter.  There are groups in most major cities, and if there isn't one near you, there are virtual chapters. It's a chance to hang out with other SQL people, learn some things, and just get a feel for what's going on in the SQL world, both at large and in your town.
  • Get involved with SQL Saturday, SQL Rally, or SQL Summit events.  Some of these are run on shoestring budgets and eager for any help you can give.  Even if your job is just handing out flyers or stuffing bags, being a volunteer gets you backstage access to presenters, MVPs, and others in the community. If you can't volunteer, at least try to attend.  
  • Blog something.  I should spend more time on this, and web presence in general, but having a blog is an opportunity to share your knowledge and get your name out there a little bit.  
  • Read other people's blogs.  SQL BitsSQL Server Cental (particularly their Stairway series), and SQL Server Team Blogs are all great places to find some great SQL blogs to start.
  • Tweet something.  The SQL community is hugely involved in Twitter, and that includes the guys and gals who write SQL Server.  I once tweeted "Thanks, Microsoft!" about some SQL feature that I really liked, and a few minutes later, the developer who wrote that feature tweeted back, "Glad you like it!".  How cool is that?  Follow #sqlserver and #sqlhelp to start, and read Brent Ozar's great introduction to Twitter for SQL folks.
  • Of course, get yourself LinkedIn.  It's not like Facebook, honest-- it's just a good way to keep in contact with people you've worked with.
Notice that some of these don't even require you to leave the living room.  Of course, you'll make more meaningful connections if you actually get out and talk to people.  You might even-- gasp-- make a friend or two.  But at a minimum, the goal is to stay connected to the community.  

One big epiphany I had this year was gleaned from the classic What Color Is Your Parachute: a big goal of networking is to increase the number of people you can call (or email) to get introduced to someone who can help you.  That's it.  These people don't have to be able to vouch for your work personally (although that helps), they just have to be able to introduce you.  

So, for example, I volunteered to help organize SQL Saturday with my local PASS chapter, which introduced me to dozen or so SQL people here in Cleveland.   Now, aside from my ability to order great sandwiches, they don't know much about my SQL skills.  But collectively they probably know someone in every SQL shop in town, or they know someone who does.  So now I not only have some cool SQL friends, but I have a little network of people, and when I find myself asking, "I wonder who I can talk to at XYZ Corp," chances are one of these people can hook me up with a name and an introduction.

Further out at the edges of my network are the presenters I've met and watched at SQL Saturday, bloggers I've read, and the aforementioned SQL developers who tweet or lurk on the same hash tags as I do.  I don't expect Tim Ford or Jeremiah Peschka or Brent Ozar know who I am... but I know them.  (And I know people who know them, bwuahaha.)  More importantly, I know they're the experts in their areas, and that they have a wealth of knowledge and materials that they generously share in talks and blogs.  They're also active twitterers (tweeters?).  If I had some question about DMO's, window functions in Denali, or how SANs work, I know where to go.  

As you're making your New Year's resolutions next year, think about what you can do to increase the size of your SQL network.  Also, think about becoming a nicer driver-- the person behind you might be closer than you think!

Friday, September 23, 2011

America's Next Top (ER) Model: Plowed!

I've been doing a lot of modeling lately, and it's starting to affect my personal life.  In today's article, I'll explore the personal story of how the stress of modeling found me getting plowed every day, and how I managed to stop.

A Day in the Life of a Model
As I mentioned, I'm do a lot of data modelling these days.  My team uses Visio for all of our ER diagrams.  It works pretty well, and with the help of a free plug-in from Orthogonal, we can generate our initial DDL scripts right from the diagram.

Here's where the stress of modeling starts:

I have a nice ER diagram, all organized in a visually pleasing way:


.. and then I go to move one of the entities a little bit, and that's when it happens:


A whole bunch of other entities scatter out of the way, often landing outside the page borders.  Rounding up each of these stray sheep typically results in a few more sheep wandering off in random directions, and it's a long, frustrating day on the ranch.

I'm not even sure how to begin Googling, because there isn't a good word for the behavior.  I tried things like "visio object repel" or "visio objects rearrange" or "visio sucks sucks sucks sucks er diagrams".  

Getting Plowed
In the end, I found it the old fashioned way: randomly clicking around until I found something that looked dodgy.  It turns out, the behavior is called "plowing," and as usual, Microsoft is trying to help you.  With plowing enabled, objects behave much like the girls did towards me in junior high school: they jump way out of the way as soon as you get anywhere near them.

What's happening above is something I'd dubbed a "cascading plow" (mmm, cascading plow...).  The entity I moved got close to another entity, which jumped ten feet in some random direction, where it in turn ended up perilously close to another entity, which jumped out of the way, and so on, and so on...  

From what I can tell, there's two ways to turn this behavior off:

Per Object
Right-click any entity and choose Format, Behavior, and go to the Placement tab.  Here you can set one of two properties: 
  • "Move other shapes way on drop", which you want to set to "Plow no shapes"
  • "Do not allow other shapes to move this shape away on drop.
Which one you set is kind of a world-view thing:

Do you stop Moe from poking everyone else in the eye, or do you teach everyone else to do the Nose Salute every time Moe comes at them?  I suppose you could do both, but really I don't think Larry should have to always be on the lookout, so I set the first option to "Plow no shapes."  (Nyuk nyuk nyuk)

With this enabled, you can now move your entity anywhere you like, and it won't push other entities around anymore.  Of course, you'll run the "risk" of being able to have entities touch or even --gasp!-- overlap, but you're in control of that.  Connectors (i.e. relationships) will continue to route themselves automatically, but I find Visio does this pretty well.

Of course, you'll have to set this property on each and every entity you add to your diagram.

Forever and Ever, Amen
I don't really want to have to set the behavior on every entity from now until forever.  Having fixed one of my existing ER diagrams, the Plow No Shapes behavior works perfectly, and I don't want to ever go back to bumper car mode.

You can't officially modify the default stencils that ship with Visio (a stencil is the official name for a category of like controls-- when you select File --> Shapes, the little green window-pane items are stencils).  So if you like, you can right-click Entity, and choose Add to My Shapes, and add it to your Favorites stencil or a new stencil.  

Me, I'm fine with all the other controls in the Entity Relationship stencil, which opens by default with any new "Database Model Diagram".  I just want to change the behavior of the standard Entity, forever.  If you want to do that, you'll want to open up the standard Entity Relationship stencil, which on my system is located in c:\Program Files\Microsoft Office\Visio11\1033\ENTITY_U.VSS.  

With the stencil opened for editing (either your own or the standard ENTITY_U one), you can now right-click the Entity object, and choose Edit Master --> Edit Master Shape.  Right-click the shape, Format, Behavior, Placement, Plow No Shapes, and save.

Now when you create new ER diagrams, the Entity object won't act like an ass anymore.  You can move your entities around any way you like, and nothing with jump (or be plowed) out of the way.  

It's made my modelling life a whole lot easier, although I still have trouble with the four-inch heels.



Sunday, April 24, 2011

Do You Know .NET?

I got an email from a friend recently, saying:


"As I search for my next situation, I stop when I see .NET as a requirement.
So...  what does it take to learn .NET?"
This turns out to be the same friend who inspired this blog post.  I guess he just has a knack for asking blog-worthy questions.

Like many people I know from my last job, this guy's what I'd call a technology generalist.  He's spent most of his career supporting, troubleshooting, tweaking, and business-analyzing software.  A lot of shops have people like this around: they know enough about a variety of technologies to move data around from spreadsheet to database, maybe a little scripting skills, enough SQL knowledge to write a query or ten.  They're often the people who respond to the constant stream of ad-hoc queries coming from senior leadership or finance.  

As a database specialist and sometimes-programmer, I've dealt with this question myself.  What does it mean to "know" .NET?   I once tried to explain to a sales team what .NET was, and it just about killed us all.  "Dot NET" as a marketing term has come to stand for as broad-- and as vague-- a set of technologies as "green" has: it may very well have a very specific meaning to the person using it, or it might just be that person's shorthand for "cutting edge" or "Microsoft".  

However, if ".NET" is listed further down in the job description, I understand it to mean three things:
  1. "We work primarily on MS Windows-based software"
  2. "We use C# (probably), VB.NET (maybe), or some other .NET programming language"
  3. "We're not using "old" technology like VB6, C++, Fortran, etc.


How much experience they expect you to have, and how much exposure to .NET languages the job actually has, can vary a lot.

Here's how I approach it, and my advice to my friend: if the primary focus of the job is .NET development, look elsewhere.  These jobs are for people who are computer programmers by trade.  It's a little like the difference between a General Practitioner and a brain surgeon.  A GP could operate on your brain in a pinch, and do a much better job than a non-doctor.  But you're still probably better off with a mediocre brain surgeon than an exceptional GP.


The two answers I want to be able to give (honestly) in an interview are the following:
  1. I've worked with/supported/extended/etc programs that were written using .NET technologies-- I think it was mostly ASP.NET for the Web/UI layer, C# for business logic, and SQL Server on the back-end.  
  2. I've done a little C# programming recreationally-- not enough to consider myself a programmer, but I wrote a little web app to store recipes in a database, and I'm familiar enough with what a C# program looks like-- if I looked through someone's else's code, I could probably work out what it's doing.
I want to establish right up front that I'm not a programmer, but that I have enough experience and professional curiosity to have dabbled in it a bit.  

At this point, the employer should have enough to tick (or not tick) that requirement off the list.  If they really need someone who can take over primary coding responsibilities, I'm not the guy, and I don't want to be.  If they just need someone who can maybe maintain and tweak a couple little C# utilities that the last guy wrote, as 10% of the job duties, I can probably do that.  It might take me ten times longer than a "real" programmer, but let's talk about how my skills fit the other 90% of the job.

So, how does someone like my friend get from "I get the idea that .NET is a framework" to being able to cover these two points?  Here's my advice:

For the first answer, ask questions about the software you're working on-- even if you're just doing tech support, or gathering requirements, or whatever.  Talk to the programmers and ask them what languages and technologies they're using, and why.  Don't be afraid to ask stupid questions: programmers have a reputation for being brusque, but at some point they have to explain what they do to their spouses and moms.  Ask them, "What exactly is Ruby on Rails anyway?" and the good ones should be able to give you some analogies at least, or tell you why it differs from other development languages.  

Also, as long as you're willing to assume the "dumb guy" role, most people get a big ego boost expounding on their area of expertise to a newbie.  It's why guys love explaining the finer points of football to an interested female (and why girls do so well asking, "Tell me again about the guys in the stripey outfits?")  Programmers love being the smartest guy in the room, and you can get a lot of knowledge by starting questions with, "I understand you're the X guru here..."

Ask these questions, all the time, about whatever web site or app you're supporting.  If you're no longer at the job, contact some former colleagues and ask them.  The goal isn't to learn all these technologies-- just to be able to talk about them a little.  Good sales guys do this all the time-- they may not know a rack from a pinion, but they want to be able to tell the average Joe why he wants one.

For the second answer, you'll have to work a little harder.  You're going to want to teach yourself-- or have someone teach you-- C#.  This seems to be the most common language in use out there, so it's probably the place to start, although VB.NET is a bit friendlier and will work just as well for this objective.  If you have some familiarity with any kind of computer programming-- maybe you took a Fortran class way back when-- and you have some familiarity with variables and parameters, you've got a head start.  If not, you'll be learning new concepts as well as a new language, so it might take longer.

Get yourself a copy of Microsoft Visual Studio Express here (it's free!) and pick up a beginner's C# book-- lots of people seem to like the Head First series, like this one on C#.  A lot of these are broken down into one or two hour lessons, and if you dedicate a few nights a week to studying, you can learn the basics in a month or two.  Alternatively, check with your local community college or continuing ed program.  Most offer introductory classes. As you work through each exercise, make up little variants to test yourself:  "What if I want my window to be red instead of blue?"  "What if I want to let the user choose the window color?"  

Make sure the book you use shows you how to interact with a database, and how to write a web application as well as a Windows/desktop one.  This will force you to learn a little SQL, and understand how IIS and web sites work.

From there, it's  kind of up to you.  Maybe you'll get the bug and want to take some more classes on computer programming, or you'll decide you want to write the next Angry Birds and make your millions $3 at a time.

Most likely, you'll learn a little bit about programming-- much of which you'll forget-- and maybe pick up some skills you can use at work.  I've found that the most useful thing to do, once you know a little of a language, is to challenge yourself to some business task: write a little application that looks through all the documents in folder for and reports how many times the word "mortgage" appears.  You'll have to search the web to learn how to loop through files in a directory, how to open up various file types, how to search.  These are the things a technology generalist often needs to be able to do with .NET: just complicated enough that you don't want to do it by hand, but simple enough to accomplish with 100 lines of code or less.

Remember, the goal here isn't to fake your way through a test, or to just learn enough "buzz words" to impress the interviewer (actually, that is the goal of the first answer).  The goal here is to understand enough about C# to be able to use it to do some basic tasks for you.  You want to be the GP who can do a sloppy brain surgery if you absolutely have to.  When the job calls on someone to write a little routine that sends an email if the stock goes over $40 a share, you want to have just enough skills to Google "C# get stock price" and "C# send email" and be able to cobble the examples together to make something that gets the job done.  Then you can tell the next interviewer, "I'm not a programmer, but once I had to write a little app to monitor the stock price and send an email, and it worked."

I can't stress this last point enough.  People go to school for years to learn professional programming techniques, and then often spend years honing these skills on the job.  And many of these people still might not describe themselves as really "knowing" .NET.  You're not trying to become the next rockstar programmer in a month.

But, there's a ton of little tasks on the job that can be solved with a few lines of the right code.  The powerful thing about .NET is that there are thousands of libraries out there to open spreadsheets, send emails, move files around, monitor RSS feeds, get the weather, and any other of a million little tasks, and there are millions of web sites and forum posts explaining how to use them.  A tech generalist has to be a little like a librarian: you might not know everything, but you know how to find out.  A business analyst, or tech support rep, or QA engineer who can cobble together a little .NET code when they have to is often a very valuable person to have on your team.

Now, go out and "learn" .NET!


Tuesday, March 22, 2011

Entity Framework and the Inheritance Problem

[tl;dr Don't use EF's default TPT mapping if you care about... anything]

A friend and I were having lunch, and I mentioned a recent session I'd attended at the last 24 Hours of PASS called "Entity Framework: Not as Evil as You May Think".  I talked about how much I hate ORM's and the hideous SQL that they generated.

As it turns out, my friend had recently started using Entity Framework for a project he was working on.  As it turns out, his was the kind of project that EF is great for: the database won't ever hold more than a few thousand records in a handful of tables, and the whole thing will run in SQL CE in a single-user database.  In short, performance and scalability aren't huge concerns-- he just wants a basic data store.

But he had a problem.  During the initial population of the database, they might insert 1000 or more records into a certain table.  This process was taking minutes, and he'd tracked it down to a single line of code that checked to see if the record was already in the database.  A little snooping revealed that EF was generating a SQL statement that was 37,000 characters long.  He suspected that the parse and compile time alone accounted for the 1+ second each lookup took.

My interest piqued, I had him take me back to his place and fire up Visual Studio so we could take a look.  I've never actually used EF or SQL CE, and I wanted to watch it all in action.  After fumbling around trying to remember how to see the query EF was generating (SQL CE doesn't have a Profiler),  we found the ugly offender.  It was a mess.

What we found, as described in this article, is that Table-Per-Type inheritance is pretty much a nightmare in EF.  (Microsoft claims this is a "high priority work item" for the next version of EF, so we'll see how that goes.)

Let's take a look at how this goes.  Suppose I have a base class, "Animal", and four classes which inherit from Animal: Dog, Cat, Tutle, and Fish.  The Entity Model is shown here:
(click to enlarge)
When you generate a database from this model, you get a SQL table for each class (hence, Table Per Type, duh), with foreign keys joining Cat, Dog, etc to Animal.  So far so good.

Now let's run a simple line of code to check whether an Animal named "Bob" already exists:

Animal a= animalContext.Animals1.FirstOrDefault(Animal => Animal.Name == "Bob");

And, let's see what EF came up with (sorry, you'll need to page down a few times):


SELECT 
[Limit1].[C1] AS [C1], 
[Limit1].[Id] AS [Id], 
[Limit1].[Name] AS [Name], 
[Limit1].[Age] AS [Age], 
[Limit1].[C2] AS [C2], 
[Limit1].[C3] AS [C3], 
[Limit1].[C4] AS [C4], 
[Limit1].[C5] AS [C5]
FROM ( SELECT TOP (1) 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[Age] AS [Age], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN '0X' WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN '0X0X' WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN '0X1X' WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN '0X2X' ELSE '0X3X' END AS [C1], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN [UnionAll3].[C2] WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) END AS [C2], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN [UnionAll3].[C3] WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) END AS [C3], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN [UnionAll3].[C4] END AS [C4], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [UnionAll3].[C5] END AS [C5]
FROM  [dbo].[Animals1] AS [Extent1]
LEFT OUTER JOIN  (SELECT 
[UnionAll2].[C1] AS [C1], 
[UnionAll2].[C2] AS [C2], 
[UnionAll2].[C3] AS [C3], 
[UnionAll2].[C4] AS [C4], 
[UnionAll2].[C5] AS [C5], 
[UnionAll2].[C6] AS [C6], 
[UnionAll2].[C7] AS [C7], 
[UnionAll2].[C8] AS [C8], 
[UnionAll2].[C9] AS [C9]
FROM  (SELECT 
[UnionAll1].[Id] AS [C1], 
[UnionAll1].[IsClawed] AS [C2], 
[UnionAll1].[C1] AS [C3], 
[UnionAll1].[C2] AS [C4], 
[UnionAll1].[C3] AS [C5], 
[UnionAll1].[C4] AS [C6], 
[UnionAll1].[C5] AS [C7], 
[UnionAll1].[C6] AS [C8], 
[UnionAll1].[C7] AS [C9]
FROM  (SELECT 
[Extent2].[Id] AS [Id], 
[Extent2].[IsClawed] AS [IsClawed], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(1 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Cat] AS [Extent2]
UNION ALL
SELECT 
[Extent3].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
[Extent3].[FreshWater] AS [FreshWater], 
cast(0 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(1 as bit) AS [C7]
FROM [dbo].[Animals1_Fish] AS [Extent3]) AS [UnionAll1]
UNION ALL
SELECT 
[Extent4].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
[Extent4].[IsHouseTrained] AS [IsHouseTrained], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(0 as bit) AS [C4], 
cast(1 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Dog] AS [Extent4]) AS [UnionAll2]
UNION ALL
SELECT 
[Extent5].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
[Extent5].[ShellColor] AS [ShellColor], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(0 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(1 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Turtle] AS [Extent5]) AS [UnionAll3] ON [Extent1].[Id] = [UnionAll3].[C1]
WHERE N'Bob' = [Extent1].[Name]
)  AS [Limit1]

Notice the goofy kind-of-bit-mapped discriminator columns C4-C7.  Clever, I'm sure.  On my friend's database, there were five derived classes, and lots more columns, resulting in the famed 37kb SQL statement.  And, of course, EF is going to use a beast similar to this for any table access to any of the base or derived classes.

This, it turns out, is how Entity Framework handles a simple inheritance with TPT.  Neither my friend or I are experts on EF, but there doesn't appear to be a workaround here, without manually hacking the hell out of the mapping files-- which, as an added benefit, seems to make the graphical model designer in VS choke.

Here, I think, is the best way to work around this for now.  Luckily, there's no legacy databases to worry about, so redesigning the table structure is an option.

1. Download and install the Entity Designer Database Generation Power Pack (EDDGPP to its friends).
2. Restart Visual Studio
3. Now in the entity model properties, you'll find a whole lot of new xaml files installed under the Database Generation Workflow property, as shown below.
4. Choose "Generate T-SQL Via T4 (TPH).xaml (VS)
5. Re-generate the database from the model.

This creates a single table for Animal (TPH stans for Table-Per-Hierarchy), with all the additional attributes from Dog, Cat, etc added as additional columns in Animal.  EF also automatically generates a discriminator column in Animal, which is not mapped to a property in the object model.


There's a couple of caveats here:

  1. Each property in each of the classes needs to be mapped as nullable, or have a default value defined.  (Which makes sense: Fish won't have an IsClawed property, so we need a valid value for that column in SQL).
  2. Each property in all the classes-- both base and derived-- need to be unique (which should be self-evident from the table structure).
With that all done, the object model is exactly the same as it was using TPT mapping, but the SQL is a LOT cleaner.  Here's our code again:

Animal a= animalContext.Animals1.FirstOrDefault(Animal => Animal.Name == "Bob");

And here's the SQL generated:

SELECT TOP (1) 
[Extent1].[__Disc__] AS [__Disc__], 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[Age] AS [Age], 
[Extent1].[IsClawed] AS [IsClawed], 
[Extent1].[IsHouseTrained] AS [IsHouseTrained], 
[Extent1].[ShellColor] AS [ShellColor], 
[Extent1].[FreshWater] AS [FreshWater]
FROM [dbo].[Animals1] AS [Extent1]
WHERE ([Extent1].[__Disc__] IN ('Animal','Cat','Dog','Turtle','Fish')) AND (N'Bob' = [Extent1].[Name])

That makes a lot more sense all around, and I'm guessing it will perform a ton better.  The drawback (if it's really a drawback) is that now all animals are stored in one table, which may not be desirable in some situations, and of course we had to redesign the database, which could be a bit of an issue if there's existing data in the TPT tables.

So, is Entity Framework evil?  Not always: in most cases, the default EF mappings were fine for my friend, and as a developer-not-a-DBA, he's able to load and persist objects with ease.  But, God help you when the EF magic decides to do something truly ponderous like this to persist a pretty simple object model.

Thursday, February 24, 2011

A SQL Poem

One of the jobs I recently applied for asked for a SQL poem as part of the application.  I love stuff like this so I took a little time and came up with the following.  (This actually did happen to me once...)

I didn't originally title it, but I think I'll call it...

"Wait... STAT!" by S

The week was starting bad for Joe
His app was running way too slow.
He summoned Steve, from regions nether
To patch performance back together.

“Well, here’s your problem-- clear as Windex:
You’ve got a missing clustered index.
This calls for not just any geek; we'll
Need one who speaks fluent SQL."

Functions, unctions, query plans:
"We'll need more seeks, and fewer scans.
And then, to get some extra speed, yer
Gonna need a stored procedure."

Steve worked fast, with mumbled zeal,
Declared, "Success! Just watch 'er squeal!"
Said Joe, "While I applaud your fervor,
My problem's on this other server."