Home .NET Split queries in EF Core

Split queries in EF Core

by admin

The ORM Entity Framework Core gets richer and richer in features with each version.The development team spends a lot of time on performance and probably a simple update of the Nuget package will already lead to some boom that users will feel.But today I want to talk about a very specific feature: a new query mode – "split queries" or "split queries" in the original. Prehistory

My current project uses Postgres as storage, which is accessed through the Npgsql and EF Core.One of the central entities of the entire business process is the "methodology for performing a PCR study" which is essentially a recipe for how to perform a study and includes quite a bit of information :

  • list of used reagents with indication of "mixing recipe"
  • list of results to be determined
  • list of equipment on which the study can be performed
  • and so on, 8 subcollections in total

The method object itself also has about a dozen fields, mostly small ones – name, description, version, etc.
Most collections are many-to-many links to other reference records, which themselves don’t contain much information, well maybe up to half a dozen scalar fields.

In the beginning, we didn’t bother using lazy-loading, which worked well in "partial-use"scenarios: when a method with only a list of reagents, or a method with only a list of results, was retrieved.
But certain scenarios required a methodology "whole" which led to the classic N+1 problem, where it takes 1 query to retrieve an entity and related collections and one more for each collection item.
This treatment of the data severely sagged performance and the next turn was to include collections in the parent object using LINQ -constructions Include().ThenInclude()

Everything was working more or less fine until czar methodology. It is a comprehensive and complex test that includes many reagents, many results, and uses many detection channels. Many forms of reagent kits are produced for it.
Anyway, we got a branching structure in which each nested collection had not the usual 3-5entries, but 20-30. This is where our software said "quot"
While it used to take anywhere from a few tens of milliseconds to hundreds of seconds to extract the complete entity, the
czar methodology It took up to half a dozen seconds which sometimes caused the database query to crash. It was already unacceptable and required some solutions.

Retelling of documentation

As Microsoft wrote in their article about split queries usually each LINQ query is converted into a single SQL query using JOIN to retrieve related collections.
I’ll take the liberty of flipping the example from the documentation a bit and extracting posts and comments instead of blogs and posts. Let the data model look like this :

public class Post{public intID { get; set; }public string Content { get; set; }public List<Comment> Comments { get; set; }}public class Comment{public int ID { get; set; }public Post ParentPost { get; set; }public int ParentPostID { get; set; }public string CommentText { get; set; }}

And let’s try to retrieve a hype post with 1000comments from the database with the following query :

var veryPopularPost = Posts.Include(x => x.Comments).First(x => x.ID == 42);

What will translate into about this SQL query is averaged, because there are a lot of variations :

SELECT p.id, p.content, c.id, c.parentpostid, c.commenttextFROM posts AS pLEFT JOIN comments AS c ON (p.id = c.parentpostid)WHERE p.id = 42LIMIT 1;

What if the article is large? For example, the author was on a roll and gave away 100KB of text (that’s about 50printed pages).The commenters read, enjoyed and commented, so they left 1000 comments. How much roughly will the resulting set weigh that ORM needs to subtract?
Just off the top of my head is. p.content = 100 KB, repeat 1000 times and the output is ~ 100 MB of text duplicated 1000 times. And that’s not counting the size of int and comment text.

Lazy-load will be faster in this case, though bombing the base 1000 and one request – is also questionable fun. Is there anything you can do about it without having to manually optimize queries?

AsSplitQuery()

Yes, and here’s how. EF Core 5.0 has a new directive AsSplitQuery() which will make the query provider translate the loading of the linked collection into a separate query.

var veryPopularPost = Posts.Include(x => x.Comments).AsSplitQuery().First(x => x.ID == 42);

Translated into the following SQL queries :

SELECT p.id, p.contentFROM posts AS pWHERE p.id = 42LIMIT 1;SELECT c.id, c.parentpostid, c.commenttext, p.idFROM posts AS pINNER JOIN comments AS c ON (p.id = c.parentpostid)WHERE p.id = 42LIMIT 1;

Which will already result in the record text not being read 1000 times, but only 1.

For multiple collections, the behavior will be similar, thus avoiding combinatorial explosion.

Benchmarks

I hastily made up a benchmark. Let’s have a collection of records like MainEntity , which has nothing but nested collections. 5 "small" records and 1 large one.

public class MainEntity{public int ID { get; set; }public List<RefEntity1> Ref1 { get; set; }public List<RefEntity2> Ref2{ get; set; }public List<RefEntity3> Ref3{ get; set; }public List<RefEntity4> Ref4{ get; set; }public List<RefEntity5> Ref5 { get; set; }public List<BigRefEntity> BigRef { get; set; }}public abstract class RefEntity{public int ID { get; set; }public string Payload { get; set; }= string.Empty;public MainEntityMainEntity { get; set; }public int? MainEntityID { get; set; }}public class RefEntity1 : RefEntity { }public class RefEntity2: RefEntity { }public class RefEntity3 : RefEntity { }public class RefEntity4 : RefEntity { }public class RefEntity5 : RefEntity { }public class BigRefEntity : RefEntity { }

Entries are filled during database initialization with lines of random length of 10 characters for "small" entries, and 1000 for "large" entries.
In each nested collection, a ItemsInCollection records. The test method retrieves 2records each MainEntity , attaching from 0 to 6collections to it (parameter LoadRefs ) in two modes – as a single request and as a split request (parameter SplitQueries ).

[Benchmark]public List<MainEntity> QueryLoad(){IQueryable<MainEntity> query = LoadRefsswitch{0 => dbContext.MainEntities, 1 => dbContext.MainEntities.Include(x => x.Ref1), 2=> dbContext.MainEntities.Include(x => x.Ref1).Include(x => x.Ref2), 3 => dbContext.MainEntities.Include(x => x.Ref1).Include(x => x.Ref2).Include(x => x.Ref3), 4 => dbContext.MainEntities.Include(x => x.Ref1).Include(x => x.Ref2).Include(x => x.Ref3).Include(x => x.Ref4), 5 => dbContext.MainEntities.Include(x => x.Ref1).Include(x => x.Ref2).Include(x => x.Ref3).Include(x => x.Ref4).Include(x => x.Ref5), 6 => dbContext.MainEntities.Include(x => x.Ref1).Include(x => x.Ref2).Include(x => x.Ref3).Include(x => x.Ref4).Include(x => x.Ref5).Include(x => x.BigRef), _ => throw new ArgumentOutOfRangeException()};var splitQuery = SplitQueries? query.AsSplitQuery() : query;return splitQuery.Take(2).ToList();}

The full benchmark code is available at githabe

I ran the benchmark on my home machine, DBMS in default configuration, connected locally via localhost.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042AMD Ryzen 5 2400G with Radeon Vega Graphics, 1 CPU, 8 logical and 4 physical cores.NET Core SDK=5.0.201[Host] : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT [AttachedDebugger]Job-HMJXLI : .NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJITInvocationCount=1 UnrollFactor=1

ItemsInCollection SplitQueries LoadRefs Mean Error StdDev Median
2 False 0 694.6 μs 18.42μs 52.57 μs 686.9 μs
2 False 1 1, 004.3 μs 25.43 μs 69.60 μs 983.4 μs
2 False 2 1, 255.3 μs 32.02 μs 89.25 μs 1, 237.0 μs
2 False 3 1, 578.9 μs 45.46 μs 126.73 μs 1, 545.1 μs
2 False 4 2, 013.3 μs 56.55 μs 162.26 μs 1, 976.8 μs
2 False 5 2, 685.2 μs 69.00 μs 196.85 μs 2, 651.1 μs
2 False 6 4, 646.8 μs 134.52 μs 392.41 μs 4, 515.2 μs
2 True 0 726.5 μs 17.60 μs 48.76 ms 725.0 μs
2 True 1 1, 403.1μs 34.46μs 96.06μs 1, 394.3μs
2 True 2 1, 928.7 μs 57.68 μs 165.51μs 1, 923.3μs
2 True 3 2, 639.6μs 96.20μs 277.56μs 2, 584.5μs
2 True 4 3, 128.8 μs 117.46μs 340.77 μs 3, 180.4μs
2 True 5 3, 725.9 μs 121.37 μs 357.87 μs 3, 713.8 μs
2 True 6 4, 299.9 μs 166.28 μs 485.04μs 4, 233.4μs
5 False 0 706.6 μs 18.03μs 50.25μs 698.9 μs
5 False 1 1, 071.6 μs 20.91μs 51.69 μs 1, 068.6 μs
5 False 2 1, 512.7 μs 30.13μs 54.33 μs 1, 513.6 μs
5 False 3 2, 809.9 μs 148.44μs 435.35μs 2, 619.9 μs
5 False 4 7, 803.3 μs 435.35μs 1, 242.08 μs 7, 243.8 μs
5 False 5 37, 752.4 μs 439.33 μs 366.86 μs 37, 791.4 μs
5 False 6 321, 948.5μs 3, 336.86 μs 2, 605.20μs 321, 361.0μs
5 True 0 714.0 μs 12.87 μs 11.41 μs 715.7 μs
5 True 1 1, 436.5μs 33.54 μs 92.37 μs 1, 418.8 μs
5 True 2 2, 233.7 μs 79.47 μs 230.55μs 2, 232.8 μs
5 True 3 3, 056.3 μs 166.89 μs 476.15 ms 3, 051.3 μs
5 True 4 3, 339.3 μs 105.32μs 303.88 μs 3, 340.5 μs
5 True 5 3, 962.7 μs 179.15 μs 508.21 ms 3, 862.4 μs
5 True 6 4, 496.6 μs 133.87 μs 394.71 μs 4, 484.2μs
10 False 0 747.7 μs 30.51 μs 88.51 μs 719.0 μs
10 False 1 1, 211.5 μs 49.81 μs 142.92 μs 1, 162.0 μs
10 False 2 2, 161.1 μs 88.84 μs 259.14 μs 2, 123.4 μs
10 False 3 9, 423.3 μs 702.14 μs 2, 014.57 μs 9, 313.8 μs
10 False 4 90, 392.5 μs 821.13 μs 727.91 μs 90, 467.2 μs
10 False 5 1, 202, 652.5 μs 23, 336.09 μs 24, 969.36 μs 1, 205, 782.6 μs
10 False 6 34, 625, 732.4 μs 691, 082.68 μs 1, 055, 356.24 μs 34, 718, 363.9 μs
10 True 0 747.0 μs 24.88 μs 68.93 μs 738.7 μs
10 True 1 1, 712.9 μs 53.74 μs 154.20 μs 1, 697.2 μs
10 True 2 2, 519.9 μs 107.27 μs 316.28 μs 2, 491.5 μs
10 True 3 3, 349.0 μs 149.58 μs 436.33 μs 3, 295.7 μs
10 True 4 4, 268.4 μs 165.83 μs 483.72 μs 4, 274.0 μs
10 True 5 4, 882.6 μs 188.59 μs 547.13 μs 4, 832.2 μs
10 True 6 5, 560.8 μs 249.02 μs 726.40 μs 5, 478.1 μs

Note the highlighted values. Adding one additional collection with only 10 records (albeit with relatively large data) results in a degradation of almost 30 times.

And when using split queries, the difference is not so dramatic. And the numbers themselves are much smaller in absolute terms (5.5ms vs. 34625ms).

Thanks, everyone, and stay tuned for combinatorial blasts!

Links

You may also like