I’m having a hard time realizing what I am going to write today. It goes against what I have experienced for years. And it explains why some of our auto generated tools have continued to be adopted. I shared some tips on optimizing data transformations for an MVC application where the data returned to the client is shaped according to the client’s needs by transforming the data returned from the database into what the client expects.
There were some great comments for the editorial. John left a comment for which I wish to respond. He was asking why, since we have a master-child kind of data relationship, we don’t make a single call returning multiple data tables, and transform the results from them. This is a fair question, and I’d like to share my experience, because I was surprised by what I found.
I have been a proponent of John’s suggestion for a long time, because, historically, it provided the optimal performance. Recent experience has shown me that this is no longer the case in every situation. So, if I can use auto generated ORM code and get great performance, why would I want to use my development time writing boiler plate code when I could be producing new features my company desperately needs?
That is a bold statement, and I think it requires some supporting evidence. So, I’ll use my scenario from yesterday as an example.
I actually did as John suggested. I had a stored procedure with three queries. The first joined all of my lookup tables where there was a single instance to my master table. Lets say we were working with my favorite purchase order example. My first query returns the summary information of a purchase order, resulting in a single row. It joins the purchase order to the customer, the shipping address, the person placing the order, the sales representative, etc. You get the picture. The second query returns the line items, each joined to the product table, the pricing table, and any other support table for a purchase order line item. This query, instead of returning one row only, returns zero, one or many rows, depending on the data entry for the purchase order.
I call the query from my application, and measure the execution time on the database server. I also measure the overall execution time from my client, ie. how long it takes to get back my results.
Now, for a comparison, I do the same thing using entity framework. I first retrieve the purchase order summary table, using Include to get the lookup table data, just like my stored procedure. Then, I execute a second query to retrieve the line items. This also uses Include to get the related data for each line item row.
Like my stored procedure example, I call this from my application and measure the overall performance time, the time my client submits the call to the time the data is returned. Guess what? The difference in time is less than 50 milliseconds. Here’s the big thing, though. The data retrieved using the stored procedure is still in data tables. The data in Entity Framework has already been transformed into objects, which is what we use in Dot Net.
Well, let’s be a pure DBA for a moment. We need to protect our database resource. It is only so big. It doesn’t scale out. So, every little 50ms adds up. So, I compared the time and resources used by my EF query and SP query, isolating the comparison to only the database server. Guess What? The difference in performance was virtually none. One might be faster than the other in one instance. The next instance it could invert. I wrote a test harness to run thousands of calls using the two different methods. The average difference from one to the other was 2ms.
Ok, that’s not the best comparison. You have caching issues, other users, things going on, and what not. However, without doing a boatload of work to isolate the two different methods, this is a pretty solid baseline.
So, why the difference from the client’s perspective? The answer is that Entity Framework transforms the returned results into actual dot net objects and collections, and the stored procedure version simply returns a dataset.
My conclusion is this: if you can build reliable code that performs reasonably using generated ORM code, why not use it. In fact, take the time to optimize the ORM methods so that there are more reasonable solutions. When things get too complicated to do through an ORM, then take full advantage of what the SQL Engine does for you in a native form. You have both worlds at your fingertips. Use the time you aren’t writing custom ORM code to create other features you company really needs.
Cheers,
Ben