Full Text Search
PostgreSQL has built-in support for full-text search, which allows you to conveniently and efficiently query natural language documents.
Mapping
PostgreSQL full text search types are mapped onto .NET types built-in to Npgsql. The tsvector
type is mapped to NpgsqlTsVector
and tsquery
is mapped to NpgsqlTsQuery
. This means you can use properties of type NpgsqlTsVector
directly in your model to create tsvector
columns. The NpgsqlTsQuery
type on the other hand, is used in LINQ queries.
public class Product
{
public int Id { get; set; }
public string Title { get; set; }
public string Description { get; set; }
public NpgsqlTsVector SearchVector { get; set; }
}
Setting up and querying a full text search index on an entity
As the PostgreSQL documentation explains, full-text search requires an index to run efficiently. This section will show two ways to do this, each having its benefits and drawbacks. Please read the PostgreSQL docs for more information on the two different approaches.
Method 1: tsvector column
This method adds a tsvector
column to your table, that is automatically updated when the row is modified. First, add an NpgsqlTsVector
property to your entity:
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public NpgsqlTsVector SearchVector { get; set; }
}
Setting up the column to be auto-updated depends on your PostgreSQL version. On PostgreSQL 12 and above, the column can be a simple generated column, and version 5.0.0 contains sugar for setting that up. In previous versions, you must manually set up database triggers that update the column instead.
Note
The below only works on PostgreSQL 12 and version 5.0.0 of the EF Core provider.
The following will set up a generated tsvector
column, over which you can easily create an index:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Product>()
.HasGeneratedTsVectorColumn(
p => p.SearchVector,
"english", // Text search config
p => new { p.Name, p.Description }) // Included properties
.HasIndex(p => p.SearchVector)
.HasMethod("GIN"); // Index method on the search vector (GIN or GIST)
}
Once your auto-updated tsvector
column is set up, any inserts or updates on the Products
table will now update the SearchVector
column and maintain it automatically. You can query it as follows:
var context = new ProductDbContext();
var npgsql = context.Products
.Where(p => p.SearchVector.Matches("Npgsql"))
.ToList();
Method 2: Expression index
Version 5.0.0 of the provider includes sugar for defining the appropriate expression index; if you're using an older version, you'll have to define a raw SQL migration yourself.
modelBuilder.Entity<Blog>()
.HasIndex(b => new { b.Title, b.Description })
.HasMethod("GIN")
.IsTsVectorExpressionIndex("english");
Once the index is created on the Title
and Description
columns, you can query as follows:
var context = new ProductDbContext();
var npgsql = context.Products
.Where(p => EF.Functions.ToTsVector("english", p.Title + " " + p.Description)
.Matches("Npgsql"))
.ToList();
Computed column over JSON columns
Starting with 7.0, the provider can also create computed tsvector
columns over JSON columns. Simply use HasGeneratedTsVectorColumn()
as shown above, and when applied to JSON columns, the provider will automatically generate json_to_tsvector/jsonb_to_tsvector
as appropriate.
Note that this will pass the filter all
to these functions, meaning that all values in the JSON document will be included. To customize the filter - or to create the computed column on older versions of the provider - simply specify the function yourself via HasComputedColumnSql
.
Operation translation
Almost all PostgreSQL full text search functions can be called through LINQ queries. All supported EF Core LINQ methods are defined in extension classes in the Microsoft.EntityFrameworkCore
namespace, so simply referencing the Npgsql provider will light up these methods. The following table lists all supported operations; if an operation you need is missing, please open an issue to request for it.
.NET | SQL |
---|---|
EF.Functions.ToTsVector(string) | to_tsvector(string) |
EF.Functions.ToTsVector("english", string) | to_tsvector('english'::regconfig, string) |
EF.Functions.ToTsQuery(string) | to_tsquery(string) |
EF.Functions.ToTsQuery("english", string ) | to_tsquery('english'::regconfig, string) |
EF.Functions.PlainToTsQuery(string) | plainto_tsquery(string) |
EF.Functions.PlainToTsQuery("english", string) | plainto_tsquery('english'::regconfig, string) |
EF.Functions.PhraseToTsQuery(string) | phraseto_tsquery(string) |
EF.Functions.PhraseToTsQuery("english", string) | phraseto_tsquery('english'::regconfig, string) |
EF.Functions.WebSearchToTsQuery(string) | websearch_to_tsquery(string) |
EF.Functions.WebSearchToTsQuery("english", string) | websearch_to_tsquery('english'::regconfig, string) |
EF.functions.ArrayToTsVector(new[] { "a", "b" }) | array_to_tsvector(ARRAY['a', 'b']) |
NpgsqlTsVector.Parse(string) | CAST(string AS tsvector) |
NpgsqlTsQuery.Parse(string) | CAST(queryString AS tsquery) |
tsvector.Matches(string) | tsvector @@ plainto_tsquery(string) |
tsvector.Matches(tsquery) | tsvector @@ tsquery |
tsquery1.And(tsquery2) | tsquery1 && tsquery2 |
tsquery1.Or(tsquery2) | tsquery1 || tsquery2 |
tsquery.ToNegative() | !! tsquery |
tsquery1.Contains(tsquery2) | tsquery1 @> tsquery2 |
tsquery1.IscontainedIn(tsquery2) | tsquery1 <@ tsquery2 |
tsquery.GetNodeCount() | numnode(query) |
tsquery.GetQueryTree() | querytree(query) |
tsquery.GetResultHeadline("a b c") | ts_headline('a b c', query) |
tsquery.GetResultHeadline("a b c", "MinWords=1, MaxWords=2") | ts_headline('a b c', query, 'MinWords=1, MaxWords=2') |
tsquery.Rewrite(targetQuery, substituteQuery) | ts_rewrite(to_tsquery(tsquery), to_tsquery(targetQuery), to_tsquery(substituteQuery)) |
tsquery1.ToPhrase(tsquery2) | tsquery_phrase(tsquery1, tsquery2) |
tsquery1.ToPhrase(tsquery2, distance) | tsquery_phrase(tsquery1, tsquery2, distance) |
tsvector1.Concat(tsvector2) | tsvector1 || tsvector2 |
tsvector.Delete("x") | ts_delete(tsvector, 'x') |
tsvector.Delete(new[] { "x", "y" }) | ts_delete(tsvector, ARRAY['x', 'y']) |
tsvector.Filter(new[] { "x", "y" }) | ts_filter(tsvector, ARRAY['x', 'y']) |
tsvector.GetLength() | length(tsvector) |
tsvector.Rank(tsquery) | ts_rank(tsvector, tsquery) |
tsvector.RankCoverDensity(tsquery) | ts_rank_cd(tsvector, tsquery) |
tsvector.SetWeight(NpgsqlTsVector.Lexeme.Weight.A) | setweight(tsvector, 'A') |
tsvector.ToStripped() | strip(tsvector) |
EF.Functions.Unaccent(string) | unaccent(string) |
EF.Functions.Unaccent(regdictionary, string) | unaccent(regdictionary, string) |