Table of Contents

Full Text Search

PostgreSQL has built-in support for full-text search, which allows you to conveniently and efficiently query natural language documents.

Mapping

PostgreSQL full text search types are mapped onto .NET types built-in to Npgsql. The tsvector type is mapped to NpgsqlTsVector and tsquery is mapped to NpgsqlTsQuery. This means you can use properties of type NpgsqlTsVector directly in your model to create tsvector columns. The NpgsqlTsQuery type on the other hand, is used in LINQ queries.

public class Product
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string Description { get; set; }
    public NpgsqlTsVector SearchVector { get; set; }
}

Setting up and querying a full text search index on an entity

As the PostgreSQL documentation explains, full-text search requires an index to run efficiently. This section will show two ways to do this, each having its benefits and drawbacks. Please read the PostgreSQL docs for more information on the two different approaches.

Method 1: tsvector column

This method adds a tsvector column to your table, that is automatically updated when the row is modified. First, add an NpgsqlTsVector property to your entity:

public class Product
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }
    public NpgsqlTsVector SearchVector { get; set; }
}

Setting up the column to be auto-updated depends on your PostgreSQL version. On PostgreSQL 12 and above, the column can be a simple generated column, and version 5.0.0 contains sugar for setting that up. In previous versions, you must manually set up database triggers that update the column instead.

Note

The below only works on PostgreSQL 12 and version 5.0.0 of the EF Core provider.

The following will set up a generated tsvector column, over which you can easily create an index:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Product>()
        .HasGeneratedTsVectorColumn(
            p => p.SearchVector,
            "english",  // Text search config
            p => new { p.Name, p.Description })  // Included properties
        .HasIndex(p => p.SearchVector)
        .HasMethod("GIN"); // Index method on the search vector (GIN or GIST)
}

Once your auto-updated tsvector column is set up, any inserts or updates on the Products table will now update the SearchVector column and maintain it automatically. You can query it as follows:

var context = new ProductDbContext();
var npgsql = context.Products
    .Where(p => p.SearchVector.Matches("Npgsql"))
    .ToList();

Method 2: Expression index

Version 5.0.0 of the provider includes sugar for defining the appropriate expression index; if you're using an older version, you'll have to define a raw SQL migration yourself.

modelBuilder.Entity<Blog>()
    .HasIndex(b => new { b.Title, b.Description })
    .HasMethod("GIN")
    .IsTsVectorExpressionIndex("english");

Once the index is created on the Title and Description columns, you can query as follows:

var context = new ProductDbContext();
var npgsql = context.Products
    .Where(p => EF.Functions.ToTsVector("english", p.Title + " " + p.Description)
        .Matches("Npgsql"))
    .ToList();

Computed column over JSON columns

Starting with 7.0, the provider can also create computed tsvector columns over JSON columns. Simply use HasGeneratedTsVectorColumn() as shown above, and when applied to JSON columns, the provider will automatically generate json_to_tsvector/jsonb_to_tsvector as appropriate.

Note that this will pass the filter all to these functions, meaning that all values in the JSON document will be included. To customize the filter - or to create the computed column on older versions of the provider - simply specify the function yourself via HasComputedColumnSql.

Operation translation

Almost all PostgreSQL full text search functions can be called through LINQ queries. All supported EF Core LINQ methods are defined in extension classes in the Microsoft.EntityFrameworkCore namespace, so simply referencing the Npgsql provider will light up these methods. The following table lists all supported operations; if an operation you need is missing, please open an issue to request for it.

.NET SQL
EF.Functions.ToTsVector(string) to_tsvector(string)
EF.Functions.ToTsVector("english", string) to_tsvector('english'::regconfig, string)
EF.Functions.ToTsQuery(string) to_tsquery(string)
EF.Functions.ToTsQuery("english", string ) to_tsquery('english'::regconfig, string)
EF.Functions.PlainToTsQuery(string) plainto_tsquery(string)
EF.Functions.PlainToTsQuery("english", string) plainto_tsquery('english'::regconfig, string)
EF.Functions.PhraseToTsQuery(string) phraseto_tsquery(string)
EF.Functions.PhraseToTsQuery("english", string) phraseto_tsquery('english'::regconfig, string)
EF.Functions.WebSearchToTsQuery(string) websearch_to_tsquery(string)
EF.Functions.WebSearchToTsQuery("english", string) websearch_to_tsquery('english'::regconfig, string)
EF.functions.ArrayToTsVector(new[] { "a", "b" }) array_to_tsvector(ARRAY['a', 'b'])
NpgsqlTsVector.Parse(string) CAST(string AS tsvector)
NpgsqlTsQuery.Parse(string) CAST(queryString AS tsquery)
tsvector.Matches(string) tsvector @@ plainto_tsquery(string)
tsvector.Matches(tsquery) tsvector @@ tsquery
tsquery1.And(tsquery2) tsquery1 && tsquery2
tsquery1.Or(tsquery2) tsquery1 || tsquery2
tsquery.ToNegative() !! tsquery
tsquery1.Contains(tsquery2) tsquery1 @> tsquery2
tsquery1.IscontainedIn(tsquery2) tsquery1 <@ tsquery2
tsquery.GetNodeCount() numnode(query)
tsquery.GetQueryTree() querytree(query)
tsquery.GetResultHeadline("a b c") ts_headline('a b c', query)
tsquery.GetResultHeadline("a b c", "MinWords=1, MaxWords=2") ts_headline('a b c', query, 'MinWords=1, MaxWords=2')
tsquery.Rewrite(targetQuery, substituteQuery) ts_rewrite(to_tsquery(tsquery), to_tsquery(targetQuery), to_tsquery(substituteQuery))
tsquery1.ToPhrase(tsquery2) tsquery_phrase(tsquery1, tsquery2)
tsquery1.ToPhrase(tsquery2, distance) tsquery_phrase(tsquery1, tsquery2, distance)
tsvector1.Concat(tsvector2) tsvector1 || tsvector2
tsvector.Delete("x") ts_delete(tsvector, 'x')
tsvector.Delete(new[] { "x", "y" }) ts_delete(tsvector, ARRAY['x', 'y'])
tsvector.Filter(new[] { "x", "y" }) ts_filter(tsvector, ARRAY['x', 'y'])
tsvector.GetLength() length(tsvector)
tsvector.Rank(tsquery) ts_rank(tsvector, tsquery)
tsvector.RankCoverDensity(tsquery) ts_rank_cd(tsvector, tsquery)
tsvector.SetWeight(NpgsqlTsVector.Lexeme.Weight.A) setweight(tsvector, 'A')
tsvector.ToStripped() strip(tsvector)
EF.Functions.Unaccent(string) unaccent(string)
EF.Functions.Unaccent(regdictionary, string) unaccent(regdictionary, string)