Tuesday, March 22, 2011

Entity Framework and the Inheritance Problem

[tl;dr Don't use EF's default TPT mapping if you care about... anything]

A friend and I were having lunch, and I mentioned a recent session I'd attended at the last 24 Hours of PASS called "Entity Framework: Not as Evil as You May Think".  I talked about how much I hate ORM's and the hideous SQL that they generated.

As it turns out, my friend had recently started using Entity Framework for a project he was working on.  As it turns out, his was the kind of project that EF is great for: the database won't ever hold more than a few thousand records in a handful of tables, and the whole thing will run in SQL CE in a single-user database.  In short, performance and scalability aren't huge concerns-- he just wants a basic data store.

But he had a problem.  During the initial population of the database, they might insert 1000 or more records into a certain table.  This process was taking minutes, and he'd tracked it down to a single line of code that checked to see if the record was already in the database.  A little snooping revealed that EF was generating a SQL statement that was 37,000 characters long.  He suspected that the parse and compile time alone accounted for the 1+ second each lookup took.

My interest piqued, I had him take me back to his place and fire up Visual Studio so we could take a look.  I've never actually used EF or SQL CE, and I wanted to watch it all in action.  After fumbling around trying to remember how to see the query EF was generating (SQL CE doesn't have a Profiler),  we found the ugly offender.  It was a mess.

What we found, as described in this article, is that Table-Per-Type inheritance is pretty much a nightmare in EF.  (Microsoft claims this is a "high priority work item" for the next version of EF, so we'll see how that goes.)

Let's take a look at how this goes.  Suppose I have a base class, "Animal", and four classes which inherit from Animal: Dog, Cat, Tutle, and Fish.  The Entity Model is shown here:
(click to enlarge)
When you generate a database from this model, you get a SQL table for each class (hence, Table Per Type, duh), with foreign keys joining Cat, Dog, etc to Animal.  So far so good.

Now let's run a simple line of code to check whether an Animal named "Bob" already exists:

Animal a= animalContext.Animals1.FirstOrDefault(Animal => Animal.Name == "Bob");

And, let's see what EF came up with (sorry, you'll need to page down a few times):


SELECT 
[Limit1].[C1] AS [C1], 
[Limit1].[Id] AS [Id], 
[Limit1].[Name] AS [Name], 
[Limit1].[Age] AS [Age], 
[Limit1].[C2] AS [C2], 
[Limit1].[C3] AS [C3], 
[Limit1].[C4] AS [C4], 
[Limit1].[C5] AS [C5]
FROM ( SELECT TOP (1) 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[Age] AS [Age], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN '0X' WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN '0X0X' WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN '0X1X' WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN '0X2X' ELSE '0X3X' END AS [C1], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN [UnionAll3].[C2] WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) END AS [C2], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN [UnionAll3].[C3] WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) END AS [C3], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN [UnionAll3].[C4] END AS [C4], 
CASE WHEN (( NOT (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL))) AND ( NOT (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL))) AND ( NOT (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL))) AND ( NOT (([UnionAll3].[C9] = 1) AND ([UnionAll3].[C9] IS NOT NULL)))) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C6] = 1) AND ([UnionAll3].[C6] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C7] = 1) AND ([UnionAll3].[C7] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) WHEN (([UnionAll3].[C8] = 1) AND ([UnionAll3].[C8] IS NOT NULL)) THEN CAST(NULL AS varchar(1)) ELSE [UnionAll3].[C5] END AS [C5]
FROM  [dbo].[Animals1] AS [Extent1]
LEFT OUTER JOIN  (SELECT 
[UnionAll2].[C1] AS [C1], 
[UnionAll2].[C2] AS [C2], 
[UnionAll2].[C3] AS [C3], 
[UnionAll2].[C4] AS [C4], 
[UnionAll2].[C5] AS [C5], 
[UnionAll2].[C6] AS [C6], 
[UnionAll2].[C7] AS [C7], 
[UnionAll2].[C8] AS [C8], 
[UnionAll2].[C9] AS [C9]
FROM  (SELECT 
[UnionAll1].[Id] AS [C1], 
[UnionAll1].[IsClawed] AS [C2], 
[UnionAll1].[C1] AS [C3], 
[UnionAll1].[C2] AS [C4], 
[UnionAll1].[C3] AS [C5], 
[UnionAll1].[C4] AS [C6], 
[UnionAll1].[C5] AS [C7], 
[UnionAll1].[C6] AS [C8], 
[UnionAll1].[C7] AS [C9]
FROM  (SELECT 
[Extent2].[Id] AS [Id], 
[Extent2].[IsClawed] AS [IsClawed], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(1 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Cat] AS [Extent2]
UNION ALL
SELECT 
[Extent3].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
[Extent3].[FreshWater] AS [FreshWater], 
cast(0 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(1 as bit) AS [C7]
FROM [dbo].[Animals1_Fish] AS [Extent3]) AS [UnionAll1]
UNION ALL
SELECT 
[Extent4].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
[Extent4].[IsHouseTrained] AS [IsHouseTrained], 
CAST(NULL AS varchar(1)) AS [C2], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(0 as bit) AS [C4], 
cast(1 as bit) AS [C5], 
cast(0 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Dog] AS [Extent4]) AS [UnionAll2]
UNION ALL
SELECT 
[Extent5].[Id] AS [Id], 
CAST(NULL AS varchar(1)) AS [C1], 
CAST(NULL AS varchar(1)) AS [C2], 
[Extent5].[ShellColor] AS [ShellColor], 
CAST(NULL AS varchar(1)) AS [C3], 
cast(0 as bit) AS [C4], 
cast(0 as bit) AS [C5], 
cast(1 as bit) AS [C6], 
cast(0 as bit) AS [C7]
FROM [dbo].[Animals1_Turtle] AS [Extent5]) AS [UnionAll3] ON [Extent1].[Id] = [UnionAll3].[C1]
WHERE N'Bob' = [Extent1].[Name]
)  AS [Limit1]

Notice the goofy kind-of-bit-mapped discriminator columns C4-C7.  Clever, I'm sure.  On my friend's database, there were five derived classes, and lots more columns, resulting in the famed 37kb SQL statement.  And, of course, EF is going to use a beast similar to this for any table access to any of the base or derived classes.

This, it turns out, is how Entity Framework handles a simple inheritance with TPT.  Neither my friend or I are experts on EF, but there doesn't appear to be a workaround here, without manually hacking the hell out of the mapping files-- which, as an added benefit, seems to make the graphical model designer in VS choke.

Here, I think, is the best way to work around this for now.  Luckily, there's no legacy databases to worry about, so redesigning the table structure is an option.

1. Download and install the Entity Designer Database Generation Power Pack (EDDGPP to its friends).
2. Restart Visual Studio
3. Now in the entity model properties, you'll find a whole lot of new xaml files installed under the Database Generation Workflow property, as shown below.
4. Choose "Generate T-SQL Via T4 (TPH).xaml (VS)
5. Re-generate the database from the model.

This creates a single table for Animal (TPH stans for Table-Per-Hierarchy), with all the additional attributes from Dog, Cat, etc added as additional columns in Animal.  EF also automatically generates a discriminator column in Animal, which is not mapped to a property in the object model.


There's a couple of caveats here:

  1. Each property in each of the classes needs to be mapped as nullable, or have a default value defined.  (Which makes sense: Fish won't have an IsClawed property, so we need a valid value for that column in SQL).
  2. Each property in all the classes-- both base and derived-- need to be unique (which should be self-evident from the table structure).
With that all done, the object model is exactly the same as it was using TPT mapping, but the SQL is a LOT cleaner.  Here's our code again:

Animal a= animalContext.Animals1.FirstOrDefault(Animal => Animal.Name == "Bob");

And here's the SQL generated:

SELECT TOP (1) 
[Extent1].[__Disc__] AS [__Disc__], 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[Age] AS [Age], 
[Extent1].[IsClawed] AS [IsClawed], 
[Extent1].[IsHouseTrained] AS [IsHouseTrained], 
[Extent1].[ShellColor] AS [ShellColor], 
[Extent1].[FreshWater] AS [FreshWater]
FROM [dbo].[Animals1] AS [Extent1]
WHERE ([Extent1].[__Disc__] IN ('Animal','Cat','Dog','Turtle','Fish')) AND (N'Bob' = [Extent1].[Name])

That makes a lot more sense all around, and I'm guessing it will perform a ton better.  The drawback (if it's really a drawback) is that now all animals are stored in one table, which may not be desirable in some situations, and of course we had to redesign the database, which could be a bit of an issue if there's existing data in the TPT tables.

So, is Entity Framework evil?  Not always: in most cases, the default EF mappings were fine for my friend, and as a developer-not-a-DBA, he's able to load and persist objects with ease.  But, God help you when the EF magic decides to do something truly ponderous like this to persist a pretty simple object model.

9 comments:

  1. Let's say you have a problem. You say, "I know: I'll use inheritance to solve this problem!" Now you have two problems.

    ReplyDelete
  2. Holy cats, S. That's awesome!

    Can we harness the power of your curiosity to irrigate the Sahara? If we turn that into an interesting SQL problem?

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Why the retraction, Dean? Did you suddenly realize the awesome awesomeness of ORMs? (Incidentally, I'm told nHibernate doesn't generate nightmare SQL for TPT mapping... so maybe it's just *Microsoft* ORMs that are evil.)

    ReplyDelete
  5. This is actually an EF Designer problem, not an EF proplem per say. In trying to start my own technical blog, I actually have 2 entries on this very issue and a way to get around it. You just have to be willing to modify the .edmx file that contains the mappings.

    http://thundereagle.wordpress.com/2010/11/21/entity-framework-4-with-base-entities/

    http://thundereagle.wordpress.com/2010/11/21/entity-framework-with-base-entities-revisited/

    ReplyDelete
  6. Oh, and Entity Framework 4.1 that has a lot of CodeFirst goodness could make it easier to get the mappings/database you want.

    ReplyDelete
  7. Wow, thanks Scott, that's great info, and a pretty serious way to start a new blog. (Also it's becoming clear that I have to move to Wordpress)

    From this experience and listening to Julie Lerman's talk, it sounds like you need to get pretty knowledgeable about EF's mapping behavior (and almost never trust its default behavior) to deal with all but the most basic object models.

    That's a little disappointing, particularly for people like Lagged2Death who just need a pretty simple persistence store. EF is supposed to abstract away all the implementation details of data storage, and instead it feels a little like trading the complexity of rolling your own for the complexity of having to know what you want EF to do and then figuring out how to make it do that.

    I can see the appeal though. Once the mappings and database are setup, EF makes it *so* easy to interact with the database. And as I DBA I guess I should be thankful that EF generates wonky SQL-- job security for guys/gals like me.

    ReplyDelete
  8. "...and instead it feels a little like trading the complexity of rolling your own for the complexity of having to know what you want EF to do and then figuring out how to make it do that."

    That's pretty much the point I was making in my deleted post. But by the time the expletives were deleted, there was nothing left.

    ReplyDelete
  9. "That's a little disappointing, particularly for people like Lagged2Death who just need a pretty simple persistence store." Execept he used inheritance, that by definition is not simple. If he had not, there wouldn't have been any issue.

    There are 3 modes to creating tables from objects. Table per Hierarchy(TPH), Table per type(TPT) and Table Per Concrete type (TPC). TPC is how I think most of us think it should work, and is basically what I'm accounting for in my blog posts.

    TPH just creates one big ass table, talk about your simple SQL.

    TPT is just weird.

    ReplyDelete