Home .NET Principles of IQueryable and LINQ data providers

Principles of IQueryable and LINQ data providers

by admin

LINQ tools allow .Net developers to work uniformly both with collections of objects in memory and with objects stored in a database or other remote source.For example, to query ten red apples from a list in memory and from the database using Entity Framework, we can use exactly the same code :

List<Apple> appleList;DbSet<Apple> appleDbSet;var applesFromList = appleList.Where(apple => apple.Color == “red”).Take(10);var applesFromDb = appleDbSet.Where(apple => apple.Color == “red”).Take(10);

However, these queries are executed differently. In the first case, when enumerating the result with foreach, the apples will be filtered using the given predicate and then the first 10 of them will be taken. In the second case, the syntax tree with the query expression will be passed to a special LINQ provider, which will translate it into a SQL query to the database and execute it, and then form C# objects for the 10 records found and return them. The IQueryable<T> interface designed for LINQ providers of external data sources can provide such behavior. Below we will try to understand the principles of organization and usage of this interface.

IEnumerable<T> and IQueryable<T> interfaces;

At first glance, it might seem that LINQ is based on a set of extension methods like Where(), Select(), First(), Count(), etc. to the IEnumerable<T> interface, which ultimately gives the developer the ability to uniformly write requests to both objects in memory (LINQ to Objects), databases (e.g., LINQ to SQL, LINQ to Entities) and remote services (e.g., LINQ to OData Services). But this is not the case. The fact is that inside the extension methods to IEnumerable<T> the corresponding sequence operations are already implemented. For example, the First<TSource> method (Func<TSource, bool> predicate) is implemented in .Net Framework 4.5.2, the sources of which are available to us here , as follows :

public static TSource First<TSource> (this IEnumerable<TSource> source, Func<TSource, bool> predicate) {if (source == null) throw Error.ArgumentNull("source");if (predicate == null) throw Error.ArgumentNull("predicate");foreach (TSource element in source) {if (predicate(element)) return element;}throw Error.NoMatch();}

It is clear that in the general case this method cannot be executed on data located in a database or service. To execute it, we can only preload the entire dataset directly into the application, which is understandably unacceptable.
The IQueryable<T> interface (descendant of IEnumerable<T> ) along with a set of extension methods almost identical to those written for IEnumerable<T> is used to implement LINQ providers to data external to the application. Precisely because List<T> implements IEnumerable<T> and DbSet<T> from Entity Framework implements IQueryable<T> , the queries with apples given at the beginning of the article are executed differently.
The peculiarity of extension methods to IQueryable<T> is that they do not contain the logic of data processing. Instead they just form a syntactic structure with the query description, "building up" it with each new method call in the chain. When calling aggregate methods (Count(), etc.) or foreach enumeration, the query description is passed to a provider encapsulated inside a particular IQueryable<T> implementation, which converts the query into the data source language it works with and executes it. In the case of Entity Framework this language is SQL, in the case of the .Net driver for MongoDb it is a search json object, etc.
Incidentally, some "interesting" characteristics of LINQ providers come from this feature :

  • A query that is successfully executed by one provider may not be supported by another; moreover, we learn about this not even at the stage of constructing the query, but only at the stage of its execution by the provider;
  • the provider can modify the query before it is executed; for example, a limit on the number of returned objects, additional filters, etc. can be added to all queries.

Making LINQ with your own hands : ISimpleQueryable<T>

Before describing the structure of the IQueryable<T> interface, let’s try to write its simple counterpart – the ISimpleQueryable<T> interface and a couple of LINQ-style methods extensions to it. This will allow you to demonstrate the basic principles of working with IQueryable<T> , without going into the nuances of its implementation.

public interface ISimpleQueryable<TSource> : IEnumerable<TSource> {string QueryDescription { get; }ISimpleQueryable<TSource> CreateNewQueryable(string queryDescription);TResult Execute<TResult> ();}

In the interface we see the QueryDescription property, which contains a description of the query, and the Execute<TResult> () method, which should execute this query if necessary. This is a generic method, since the result of the execution can be either an enumeration or the value of an aggregate function such as Count(). In addition there is a CreateNewQueryable() method in the interface, which allows to create a new instance of ISimpleQueryable<T> , but with a new query description when adding a new LINQ method. Note that here the query description is represented as a string, while LINQ uses Expression Trees for this, you can read about them in here or here
Now let’s move on to extension methods :

public static class SimpleQueryableExtentions{public static ISimpleQueryable<TSource> Where<TSource> (this ISimpleQueryable<TSource> queryable, Expression<Func<TSource, bool> > predicate) {string newQueryDescription = queryable.QueryDescription + ".Where(" + predicate.ToString() + ")";return queryable.CreateNewQueryable(newQueryDescription);}public static int Count<TSource> (this ISimpleQueryable<TSource> queryable) {string newQueryDescription = queryable.QueryDescription + ".Count()";ISimpleQueryable<TSource> newQueryable = queryable.CreateNewQueryable(newQueryDescription);return newQueryable.Execute<int> ();}}

As we can see, these methods simply append information about themselves to the query description and create a new instance of ISimpleQueryable<T> . Moreover, the Where() method, unlike its counterpart for IEnumerable<T> , accepts not the Func<TSource, bool> predicate itself, but the previously mentioned expression tree with its Expression<Func<TSource, bool> > description. In this example, it just gives us a string with the predicate code; in the case of a real LINQ, it allows us to save all the details of the query as an expression tree.
Finally, let’s create a simple implementation of our ISimpleQueryable<T> , which will contain everything we need to write LINQ queries, except the method of their execution. To make it realistic, let’s add a reference to the data source (_dataSource), which should be used when executing the query using the Execute() method.

public class FakeSimpleQueryable<TSource> : ISimpleQueryable<TSource>{private readonly object _dataSource;public string QueryDescription { get; private set; }public FakeSimpleQueryable(string queryDescription, object dataSource) {_dataSource = dataSource;QueryDescription = queryDescription;}public ISimpleQueryable<TSource> CreateNewQueryable(string queryDescription) {return new FakeSimpleQueryable<TSource> (queryDescription, _dataSource);}public TResult Execute<TResult> () {//There should be QueryDescription processing and request application to dataSourcethrow new NotImplementedException();}public IEnumerator<TSource> GetEnumerator() {return Execute<IEnumerator<TSource> > ();}IEnumerator IEnumerable.GetEnumerator() {return GetEnumerator();}}

Now let’s look at a simple query for FakeSimpleQueryable:

var provider = new FakeSimpleQueryable<string> ("", null);int result = provider.Where(s => s.Contains("substring")).Where(s => s != "some string").Count();

Let’s try to understand what will happen when the above code is executed (see also the figure below):

  • first the first call to the Where() method will take an empty query description from the FakeSimpleQueryable instance created with the constructor, add ".Where(s => s.Contains("substring"))" to it, and form a second FakeSimpleQueryable instance with a new description;
  • then the second Where() call will take the query description from the previously created FakeSimpleQueryable, add to it ".Where(s => s != "some string")" and then again form a new, third instance of FakeSimpleQueryable with the query description ".Where(s => s.Contains("substring")).Where(s => s != "some string")"
  • Finally, the Count() call will take the query description from the FakeSimpleQueryable instance created in the previous step, add " .Count()" to it, and form a fourth FakeSimpleQueryable instance, then call its Execute<int> method, since no further query construction is possible;
  • as a result, inside the Execute() method we will have a QueryDescription value equal to ".Where(s => s.Contains("substring")).Where(s => s !="some string").Count()" which should be processed next.

Principles of IQueryable and LINQ data providers

Real IQueryable<T> … and IQueryProvider<T>

Now let’s see what is the IQueryable<T> interface implemented in .Net:

public interface IQueryable : IEnumerable {Expression Expression { get; }Type ElementType { get; }IQueryProvider Provider { get; }}public interface IQueryable<out T> : IEnumerable<T> , IQueryable {}public interface IQueryProvider {IQueryable CreateQuery(Expression expression);IQueryable<TElement> CreateQuery<TElement> (Expression expression);object Execute(Expression expression);TResult Execute<TResult> (Expression expression);}

Note that :

  • Net has a generic- and a regular version of IQueryable;
  • the Expression property is used to store the tree with the LINQ query description (in our implementation we used the string QueryDescription);
  • ElementType property contains information about the type of elements returned by the query and is used in LINQ provider implementations to check for type matching;
  • a couple of methods for creating new instances of IQueryable (CreateQuery() and CreateQuery<TElement> ()), as well as a couple of methods for executing the query (Execute() and Execute<TResult> ()) are moved to a separate interface IQueryProvider<T> ; one can assume that this separation was needed to separate the query itself, which is recreated with every new call to the extension method, from the object that actually has access to the data source, does all the main work and can be quite "heavy" to be constantly recreated;
  • IQueryable.Provider property points to the associated IQueryProvider instance.

Now let’s look at how extension methods to IQueryable<T> work, using the Where() method as an example:

public static IQueryable<TSource> Where<TSource> (this IQueryable<TSource> source, Expression<Func<TSource, int, bool> > predicate) {if (source == null) throw Error.ArgumentNull("source");if (predicate == null) throw Error.ArgumentNull("predicate");return source.Provider.CreateQuery<TSource> (Expression.Call(null, ((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(typeof(TSource)), new Expression[] { source.Expression, Expression.Quote(predicate) }));}

We can see that the method constructs a new instance of IQueryable<TSource> , by passing to CreateQuery<TSource> () an expression, in which a call to the actual Where() method with the passed predicate as an argument is added to the source expression from source.Expression.
Thus, despite some differences in the IQueryable<T> and IQueryProvider<T> from the ISimpleQueryable<T> we created earlier, the principles of their use in LINQ are the same: each extension method added to the query augments the expression tree with information about itself and then creates a new instance of IQueryable<T> using the CreateQuery<T> () method, and aggregate methods, in addition, initiate query execution by calling the Execute<T> () method.

A few words about LINQ provider development

Since the mechanism of constructing a LINQ request has already been implemented in .Net for us, most LINQ provider development comes down to implementing the Execute() and Execute<TResult> () methods. This is where you need to parse an expression tree that comes in for execution, convert it into a data source language, execute the query, wrap the results in C# objects and return them. Unfortunately, this procedure involves handling a lot of different nuances. Moreover, the available information on LINQ provider development is rather scarce. Below are the most informative, in the author’s opinion, articles on the subject :

I hope that the material of this article will be useful to all who have wanted to understand the organization of LINQ-providers to remote data sources or approach to create such a provider, but so far have not dared.

You may also like