-
Notifications
You must be signed in to change notification settings - Fork 15
DevGuide_Linq
- Introduction
- Basic LINQ operations
- INNER vs OUTER join
- Returning entities from a query
- Group-by and aggregates
- LINQ Includes - loading related entities
- Insert/Update/Delete with LINQ
VITA comes with the LINQ engine implementation for all supported database servers. LINQ technology allows you to formulate database queries as strongly-typed c# expressions, which are translated into SQL statements, then executed, and the results are delivered as entities to your code tier. We assume you are familiar with general LINQ concepts and have some knowledge of SQL.
The source code examples in this document are taken (adapted) from a small demo app available on github:
https://github.com/rivantsov/VitaGuideSample
The sample application models a very simple online bookstore with books, publishers, authors.
Any data operation starts with opening an entity session:
var session = booksApp.OpenSession();
NOTE: We use a simplified way to open session, a shortcut. In majority of cases you will be using the OperationContext to open sessions - this object carries the identity of the current user (even if it is a system or anonymous user), so that all data access operations are executed in a specific user context.
Let's find first 5 books with an author with first name "John", ordered by price, and return tuples (title, publisher name):
var booksByJohn = session.EntitySet<IBook>()
.Where(b => b.Authors.Any(a => a.FirstName == "John"))
.OrderBy(b => b.Price)
.Skip(skip).Take(take) //paging
.Select(b => new { b.Title, Publisher = b.Publisher.Name })
.ToList();
Comments:
- Building a LINQ query always starts with getting an instance of IQueryable<TEntity> interface - the session.EntitySet<>() method creates this object for you.
- You can use pretty much any of the facilities available on Queryable extension class: Where filters, OrderBy, Skip, Take, projection using Select and "new-auto-object", etc.
- You can return entities, your custom objects or auto-objects; you can return many or a single object/entity using methods like First(), with optional filter; you can return simple values using methods like Count().
- You can make joins in your query (SQL joins). One cool thing about VITA is that in many cases you do not need to specify the join explicitly - you can use the entity/entity-list properties instead. In this case the engine adds a SQL join automatically with proper join conditions. In this example we used two such properties:
- book.Authors list to filter on author's first name
- book.Publisher property to include publisher's name in the output.
Here is the resulting SQL:
SELECT b0$."Title" AS "Title", p$."Name" AS "Publisher"
FROM "books"."Book" b0$
INNER JOIN "books"."Publisher" p$ ON p$."Id" = b0$."Publisher_Id"
WHERE EXISTS
(SELECT *
FROM "books"."BookAuthor" ba$
INNER JOIN "books"."Author" a0$ ON a0$."Id" = ba$."Author_Id"
WHERE ba$."Book_Id" = b0$."Id" AND a0$."FirstName" = 'John')
ORDER BY b0$."Price"
OFFSET @P1 ROWS FETCH NEXT @P0 ROWS ONLY;
When this statement is executed, the results are converted into a list of auto-objects.
We used 'fluent' syntax to formulate the query (chain of Queryable calls), but we could use the SQL-like syntax:
var booksByJohn2 = (
from b in session.EntitySet<IBook>()
where b.Authors.Any(a => a.FirstName == "John")
orderby b.Price
select new { b.Title, Publisher = b.Publisher.Name }
)
.Skip(0).Take(5)
.ToList();
Both forms are equivalent, and the result is the same.
In the previous example, all joins in SQL were INNER joins. When deriving joins from entity-type properties, the engine uses the following rule:
If the property linking two entities is nullable, then the resulting join is OUTER. Otherwise, for non-nullable property, the join is INNER.
For example, book.Publisher property is non-nullable (publisher is required for a book), so the join in the SQL is INNER. But if you put [Nullable] attibute on the Publisher property, you will see an OUTER join.
One interesting case that does not come up often, but is worth mentioning - explicit OUTER JOIN (explicit in the sense using join keyword in c# LINQ). The technique involving DefaultIfEmpty method is described here: Perform left outer joins, and it is fully supported by VITA LINQ engine.
The previous query was returning tuples of strings. You can return entities as well:
var cheapBooks = session.EntitySet<IBook>()
.Where(b => b.Price < 10)
.ToList();
There are a few points worth noting:
- The returned entities are attached to the session - they are 'tracked' by the session. If you modify any of the entities, session 'knows' about it, and the session.SaveChanges() call will submit the updates to the database.
- The session ensures 'single-instance' rule - there is only one entity, single instance of an entity mapped to a specific row in database table, within a given session. So if you run other queries, before or after, and they return books as well, and one or more of these is 'cheap book', then across all queries' results, this book is represented by the same object(.NET object). Same primary key - same instance.
Because of the 'single-instance' rule, you can always rely on comparing .NET entities as objects, instead of comparing primary key values - it is often easier.
Sometimes it is desirable to return more than one entity per query result row. In this case, we can just return a tuple (auto-object):
var cheapBooksWithPubs = session.EntitySet<IBook>()
.Where(b => b.Price < 10)
.Select(b => new { Book = b, Pub = b.Publisher})
.ToList();
All the IBook and IPublisher entities returned in tuples are 'normal' entities, tracked by the session instance. This method is useful when you are planning to update both books and their publishers, so you need to load both.
VITA LINQ engine supports grouping and aggregate functions. The following query groups books by category and returns triplets (Category, book count, max price):
var booksByCatQuery =
from b in session.EntitySet<IBook>()
group b by b.Category into g
orderby g.Key
select new {
Category = g.Key, BookCount = g.Count(),
MaxPrice = g.Max(b => b.Price)
};
var booksByCat = booksByCatQuery.ToList();
Note that not all Queryable.GroupBy overloads are supported - those that have IEqualityComparer parameter are not supported. For other cases, you might eventually get 'groupby type non-supported' error for some non-trivial queries - this just means LINQ engine could not figure out the proper SQL, try to reformulate the query.
We can also use aggregate functions without group-by, over entire table, possibly with WHERE clause. For a single aggregate:
var progrBooksCount = session.EntitySet<IBook>()
.Where(b => b.Category == BookCategory.Programming)
.Count();
If we need to return multiple aggregates, then we use fake group-by clause (it groups all in a single group):
var progrBooksStats = session.EntitySet<IBook>()
.Where(b => b.Category == BookCategory.Programming)
.GroupBy(b => 0)
.Select(g => new { BookCount = g.Count(), MaxPrice = g.Max(b => b.Price) })
.ToList();
When working with databases and accessing data from code, we try to accomplish the task using minimum roundtrips (calls) to the database. In regards to this challenge of minimizing number of SQLs executed, there is one well known 'bad' pattern called 'N+1' problem. Let's illustrate it with an example. We load a number of books and then printout
var progBooks = session.EntitySet<IBook>()
.Where(b => b.Category == BookCategory.Programming)
.ToList();
foreach(var bk in progBooks)
Console.WriteLine(bk.Title + ", by " + bk.Publisher.Name); //causes load of Publisher
The first query retrieves all programming books. Then the foreach loop prints out books' titles and publishers' names. Inside the loop, the expression bk.Publisher triggers automatic 'lazy' load of the related IPublisher entity. Therefore the execution of the code results in (N + 1) SQL queries executed, N for publishers, and '1' for initial book query.
This is obviously very inefficient. Much better way would be to query all referenced publishers before the loop and right after the books query. VITA LINQ engine implements the Include facility that in fact does this automatically, with just a hint required from the programmer.
Here is how you add an Include call that fixes the problem in the previous code:
var progBooks2 = session.EntitySet<IBook>()
.Where(b => b.Category == BookCategory.Programming)
.Include(b => b.Publisher)
.ToList();
foreach(var bk in progBooks)
Console.WriteLine(bk.Title + ", by " + bk.Publisher.Name); // no extra SQLs
The LINQ engine does the following:
- Executes the books query and retrieves a set of books
- Collects the Publisher_Id foreign key column values from all loaded books
- Executes the query against the Publisher table filtering by the list of IDs from the previous step
- Runs through all loaded books and assigns the book.Publisher property looking up the value in loaded publishers list
The total number of queries is two - one for books and one for publishers, much better than N+1.
What if we need to printout books' authors as well? You can Include the lists, and many-to-many lists as well. We can add another call like '.Include(b => b.Authors)'.
Or we can specify multiple targets in a single Include call using an auto-object:
session.LogMessage("Running final Include example");
var progBooks3 = session.EntitySet<IBook>()
.Where(b => b.Category == BookCategory.Programming)
.Include(b => new { b.Publisher, b.Authors })
.ToList();
You can see the effect of the include if you inspect the log file. Run the code in the accompanying app (in VitaGuideDemo repo) and open the _bookStore.log file in the bin folder. Notice the LogMessage call in the code above - the message will help you find the location in the log file. The Include calls are specially marked in log file by 'Start/end include' tags.
LINQ queries in VITA are not limited to SELECT statements only. You can use LINQ expressions to formulate SQL commands that insert, update or delete multiple records based on some criteria. The process is two-step: first you define the 'base' query; then you execute it using one of the special session's method.
The base query for insert-type statements must return an auto-type with properties matching the properties of entity being inserted. For example, let's create a command to copy books from one publisher to another:
var booksToCopy = session.EntitySet<IBook>()
.Where(b => b.Publisher == pub1)
.Select(b => new { b.Category, b.Title, b.Description, b.Price,
b.PageCount, Publisher_Id = pub2.Id});
session.ExecuteInsert<IBook>(booksToCopy);
Book table's primary key Id is identity column, values are generated on insert, so we do not list it in the base query output. We list all columns/properties that must be copied. For publisher reference, we have to use the exact name of the foreign key column, Publisher_id - that's how it works. The base query booksToCopy is then passed to the ExecuteInsert method; the base query serves as a data source for the INSERT statement.
For updates, the base query output should contain:
- Primary key value (Id)
- Properties that should be updated with new values.
Let's say we want reduce price by 10% on all programming books:
var updateQuery = session.EntitySet<IBook>()
.Where(b => b.Category == BookCategory.Programming)
.Select(b => new { b.Id, Price = b.Price * 0.9m });
session.ExecuteUpdate<IBook>(updateQuery);
Note that the source entitySet/table can be any set of joined tables, not necessarily the one being updated.
See TestLinqNonQuery_Update unit test in VITA repo for more complex examples.
For Delete statements the base query output should be:
- The entity instance to delete
- Or just the primary key of the records to delete
Let's see how to delete all expensive (>20$) books:
var delQuery = session.EntitySet<IBook>()
.Where(b => b.Price > 20);
session.ExecuteDelete<IBook>(delQuery);
With base query returning primary keys:
var delQuery2 = session.EntitySet<IBook>()
.Where(b => b.Price > 20)
.Select(b => b.Id);
session.ExecuteDelete<IBook>(delQuery2);
As before, the base query might be built from any entity or a join of entities.
The LINQ-based non-query commands we have just shown were executed immediately. However, in many cases it is desirable to run the non-query within the context of a larger update operation triggered by the session.SaveChanges.
We can do this if we schedule commands instead of executing them. So instead of session.ExecuteXXX methods, use their ScheduleXXX counterpats: ScheduleInsert, ScheduleUpdate, ScheduleDelete. There is an extra option with these methods: you can choose to run the non-query either at the beginning or at the end of the SaveChanges action sequence.