Tuesday, March 22, 2011

Entity Framework and the Inheritance Problem

[tl;dr Don't use EF's default TPT mapping if you care about... anything]

A friend and I were having lunch, and I mentioned a recent session I'd attended at the last 24 Hours of PASS called "Entity Framework: Not as Evil as You May Think".  I talked about how much I hate ORM's and the hideous SQL that they generated.

As it turns out, my friend had recently started using Entity Framework for a project he was working on.  As it turns out, his was the kind of project that EF is great for: the database won't ever hold more than a few thousand records in a handful of tables, and the whole thing will run in SQL CE in a single-user database.  In short, performance and scalability aren't huge concerns-- he just wants a basic data store.

But he had a problem.  During the initial population of the database, they might insert 1000 or more records into a certain table.  This process was taking minutes, and he'd tracked it down to a single line of code that checked to see if the record was already in the database.  A little snooping revealed that EF was generating a SQL statement that was 37,000 characters long.  He suspected that the parse and compile time alone accounted for the 1+ second each lookup took.

My interest piqued, I had him take me back to his place and fire up Visual Studio so we could take a look.  I've never actually used EF or SQL CE, and I wanted to watch it all in action.  After fumbling around trying to remember how to see the query EF was generating (SQL CE doesn't have a Profiler),  we found the ugly offender.  It was a mess.

What we found, as described in this article, is that Table-Per-Type inheritance is pretty much a nightmare in EF.  (Microsoft claims this is a "high priority work item" for the next version of EF, so we'll see how that goes.)

Let's take a look at how this goes.  Suppose I have a base class, "Animal", and four classes which inherit from Animal: Dog, Cat, Tutle, and Fish.  The Entity Model is shown here:
(click to enlarge)
When you generate a database from this model, you get a SQL table for each class (hence, Table Per Type, duh), with foreign keys joining Cat, Dog, etc to Animal.  So far so good.

Now let's run a simple line of code to check whether an Animal named "Bob" already exists:

Animal a= animalContext.Animals1.FirstOrDefault(Animal => Animal.Name == "Bob");

And, let's see what EF came up with (sorry, you'll need to page down a few times):


SELECT 
[Limit1].[C1] AS [C1], 
[Limit1].[Id] AS [Id], 
[Limit1].[Name] AS [Name], 
[Limit1].[Age] AS [Age], 
[Limit1].[C2] AS [C2], 
[Limit1].[C3] AS [C3], 
[Limit1].[C4] AS [C4], 
[Limit1].[C5] AS [C5]
FROM ( SELECT TOP (1) 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[Age] AS [Age], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN '0X' WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN '0X0X' WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN '0X1X' WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN '0X2X' ELSE '0X3X' END AS [C1], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN [UnionAll3].[C2] WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) END AS [C2], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN [UnionAll3].[C3] WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) END AS [C3], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN [UnionAll3].[C4] END AS [C4], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [UnionAll3].[C5] END AS [C5]
FROM  [dbo].[Animals1] AS [Extent1]
LEFT OUTER JOIN  (SELECT 
[UnionAll2].[C1] AS [C1], 
[UnionAll2].[C2] AS [C2], 
[UnionAll2].[C3] AS [C3], 
[UnionAll2].[C4] AS [C4], 
[UnionAll2].[C5] AS [C5], 
[UnionAll2].[C6] AS [C6], 
[UnionAll2].[C7] AS [C7], 
[UnionAll2].[C8] AS [C8], 
[UnionAll2].[C9] AS [C9]
FROM  (SELECT 
[UnionAll1].[Id] AS [C1], 
[UnionAll1].[IsClawed] AS [C2], 
[UnionAll1].[C1] AS [C3], 
[UnionAll1].[C2] AS [C4], 
[UnionAll1].[C3] AS [C5], 
[UnionAll1].[C4] AS [C6], 
[UnionAll1].[C5] AS [C7], 
[UnionAll1].[C6] AS [C8], 
[UnionAll1].[C7] AS [C9]
FROM  (SELECT 
[Extent2].[Id] AS [Id], 
[Extent2].[IsClawed] AS [IsClawed], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(1 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Cat] AS [Extent2]
UNION ALL
SELECT 
[Extent3].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
[Extent3].[FreshWater] AS [FreshWater], 
cast(0 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(1 as bit) AS [C7]
FROM [dbo].[Animals1_Fish] AS [Extent3]) AS [UnionAll1]
UNION ALL
SELECT 
[Extent4].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
[Extent4].[IsHouseTrained] AS [IsHouseTrained], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(0 as bit) AS [C4], 
cast(1 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Dog] AS [Extent4]) AS [UnionAll2]
UNION ALL
SELECT 
[Extent5].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
[Extent5].[ShellColor] AS [ShellColor], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(0 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(1 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Turtle] AS [Extent5]) AS [UnionAll3] ON [Extent1].[Id] = [UnionAll3].[C1]
WHERE N'Bob' = [Extent1].[Name]
)  AS [Limit1]

Notice the goofy kind-of-bit-mapped discriminator columns C4-C7.  Clever, I'm sure.  On my friend's database, there were five derived classes, and lots more columns, resulting in the famed 37kb SQL statement.  And, of course, EF is going to use a beast similar to this for any table access to any of the base or derived classes.

This, it turns out, is how Entity Framework handles a simple inheritance with TPT.  Neither my friend or I are experts on EF, but there doesn't appear to be a workaround here, without manually hacking the hell out of the mapping files-- which, as an added benefit, seems to make the graphical model designer in VS choke.

Here, I think, is the best way to work around this for now.  Luckily, there's no legacy databases to worry about, so redesigning the table structure is an option.

1. Download and install the Entity Designer Database Generation Power Pack (EDDGPP to its friends).
2. Restart Visual Studio
3. Now in the entity model properties, you'll find a whole lot of new xaml files installed under the Database Generation Workflow property, as shown below.
4. Choose "Generate T-SQL Via T4 (TPH).xaml (VS)
5. Re-generate the database from the model.

This creates a single table for Animal (TPH stans for Table-Per-Hierarchy), with all the additional attributes from Dog, Cat, etc added as additional columns in Animal.  EF also automatically generates a discriminator column in Animal, which is not mapped to a property in the object model.


There's a couple of caveats here:

  1. Each property in each of the classes needs to be mapped as nullable, or have a default value defined.  (Which makes sense: Fish won't have an IsClawed property, so we need a valid value for that column in SQL).
  2. Each property in all the classes-- both base and derived-- need to be unique (which should be self-evident from the table structure).
With that all done, the object model is exactly the same as it was using TPT mapping, but the SQL is a LOT cleaner.  Here's our code again:

Animal a= animalContext.Animals1.FirstOrDefault(Animal => Animal.Name == "Bob");

And here's the SQL generated:

SELECT TOP (1) 
[Extent1].[__Disc__] AS [__Disc__], 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[Age] AS [Age], 
[Extent1].[IsClawed] AS [IsClawed], 
[Extent1].[IsHouseTrained] AS [IsHouseTrained], 
[Extent1].[ShellColor] AS [ShellColor], 
[Extent1].[FreshWater] AS [FreshWater]
FROM [dbo].[Animals1] AS [Extent1]
WHERE ([Extent1].[__Disc__] IN ('Animal','Cat','Dog','Turtle','Fish')) AND (N'Bob' = [Extent1].[Name])

That makes a lot more sense all around, and I'm guessing it will perform a ton better.  The drawback (if it's really a drawback) is that now all animals are stored in one table, which may not be desirable in some situations, and of course we had to redesign the database, which could be a bit of an issue if there's existing data in the TPT tables.

So, is Entity Framework evil?  Not always: in most cases, the default EF mappings were fine for my friend, and as a developer-not-a-DBA, he's able to load and persist objects with ease.  But, God help you when the EF magic decides to do something truly ponderous like this to persist a pretty simple object model.