JustNik

Indexing dates in a sortable format in Umbraco version 8

Umbracologoblue05

This is a quick guide on modifying the External Examine index in your Umbraco v8 site. The aim is to index dates in a sort-able fashion instead of the default text format. It uses a few new concepts due to vast changes between Umbraco v7 and Umbraco v8 as well as the improved Examine 1.0 release.

This guide will cover the use of Components and Composers to hook into the Umbraco start up process. It will take advantage of the built in Dependency Injection in order to access Umbraco services, and finally it will hook into the new TransformingIndexValues event in order to manipulate the data being saved to the index.

Getting set up

This guide has been created using Umbraco v8.1.4 and the default starter kit. On top of this, the "BlogPost" document type has been extended to add an Article Date property of type DatePicker. As the default mode for Models Builder is PureLive, it has been left this way, although this does mean that we don't get some of the niceties available in other modes.

Creating the Component

The first thing to create is the custom Component. This is possibly the most important element of this guide as it contains all of the indexing logic that will manipulate the index as well as the data being stored in it.

In Umbraco the old ApplicationEventHandler class from v7 was removed and replaced with Components and Composers in v8. These work together, along with the DI framework to build up the Umbraco application. Details on this aren't the focus of this post, but I highly recommend reading the documentation in order to understand more about it.

First off, we will create a new class called IndexerComponent and this will inherit from IComponent. Due to the interface it must contain 2 methods, Initialize and Terminate, although we won't actually be putting any code in the Terminate method. This empty class looks like this:

using Umbraco.Core.Composing;

namespace ExamineIndexing.ExamineHelper
{
    public class IndexerComponent : IComponent
    {
        public void Initialize()
        {
            
        }

        public void Terminate(){}
    }
}

Now we have an empty class we need a few things and the best way to get them is by telling the application we need them. How you ask? Well now that v8 has Dependency Injection (DI) in the core, we take advantage!

To do this we need to add a constructor to our class and tell it what we are expecting. In this case we need the Examine Manager, the Umbraco Context Factory, and a Logger for logging error messages. Once we are given these we also need to store them somewhere we create some read only internal fields to keep these objects.

From the code below, add the list of using statements to the top of your class file and the additional code just after the opening { for your IndexerComponent class.

/* Updated Using statements
using Examine;
using System;
using Umbraco.Core.Logging;
using Umbraco.Core.Composing;
using Umbraco.Web;
*/

private readonly IExamineManager examineManager;
private readonly IUmbracoContextFactory umbracoContextFactory;
private readonly ILogger logger;

public IndexerComponent(IExamineManager examineManager, IUmbracoContextFactory umbracoContextFactory, ILogger logger)
{
    this.examineManager = examineManager ?? throw new ArgumentNullException(nameof(examineManager));
    this.umbracoContextFactory = umbracoContextFactory ?? throw new ArgumentNullException(nameof(umbracoContextFactory));
    this.logger = logger ?? throw new ArgumentNullException(nameof(logger));
}

Now that we have our dependencies being passed in we can get to the crux of this guide and start messing around with the Examine Indexes.

There are three parts to this:

  1. Finding the index
  2. Defining a field
  3. Manipulating the data

Finding the index and Defining a field

Parts 1 and 2 are reasonably straight forward once you know how so let's quickly address those. In v7 of Umbraco the indexes were accessed off of the ExamineManager class. This is the same in v8, although the method has changed as it no longer uses the IndexProviderCollection property (as it doesn't exist). Instead we can either access the Indexes property or, as in this case, use the TryGetIndex method.

Once we have our index, we then manipulate the FieldDefinitionCollection on the index to define our field. In the case of Date fields, I believe it is best to store them as "long" values so that is what I'll be doing here. To do this we have to create a new FieldDefinition instance with a name and a type. If you want to override the value from your document type this name should match the alias of the property you are overriding, however if you want to store it in a new field then define it's name here. E.g. articleDate vs articleDateSortable.

Manipulating the data

Once we've completed these two steps we need to address step 3 and the first part of that is hooking up to the TransformingIndexValues event. To get to this event we need to case out IIndex (returned from TryGetIndex) to a concrete type, in this case we will use the BaseIndexProvider.

All of the steps for this so far need to be added to the Initialize method of our component and should result in a method that looks like this.

public void Initialize()
{
    if (examineManager.TryGetIndex("ExternalIndex", out IIndex externalIndex))
    {
        externalIndex.FieldDefinitionCollection.AddOrUpdate(
            new FieldDefinition("articleDate", FieldDefinitionTypes.Long));

        ((BaseIndexProvider)externalIndex).TransformingIndexValues += 
            IndexerComponent_TransformingIndexValues;
    }
}

Now that we are hooked into the TransformingIndexValues event we have the ability to alter the data that is going into the External Index. There are different approaches to this depending on your requirement and preference, however in this scenario we will be extracting content from the Umbraco Cache and overriding values in the Umbraco Index.

The first objective is to try and get the Id of the node being indexed, and the second is to make sure this is the document type we want to be working with. These two bits of information can be found on the ValueSet property of the IndexingItemEventArgs.

Once we know that we working with the correct document we can utilise the umbracoContextFactory to get the document from the cache. We can then retrieve the value we want to manipulate, in this case the Article Date, and inject it into the index. As we want the Article Date in a long format we use the Ticks property.

The resulting code for this event handler looks as follows:

private void IndexerComponent_TransformingIndexValues(object sender, IndexingItemEventArgs e)
{
    if(int.TryParse(e.ValueSet.Id, out var nodeId))
        switch(e.ValueSet.ItemType)
        {
            case "blogpost":
                using (var umbracoContext = umbracoContextFactory.EnsureUmbracoContext())
                {
                    var contentNode = umbracoContext.UmbracoContext.Content.GetById(nodeId);
                    if(contentNode != null)
                    {
                        var articleDate = contentNode.Value<DateTime>("articleDate");
                        e.ValueSet.Set("articleDate", articleDate.Date.Ticks);
                    }
                }
                break;
        }
}

The Composer

Now that we've completed our component we need to tell the application about it. In order to do this we need to utilise the Composer approach mentioned earlier. The following code is a composer that will register the component in the Umbraco application.

public class RegisterIndexerComponentComposer : IUserComposer
{
    public void Compose(Composition composition)
    {
        composition.Components().Append<IndexerComponent>();
    }
}

So here we have it, a complete example for indexing a date field in a sort-able format.

This code is an example, as you can see there is no error handling but it is recommended to include this if using in a production environment. The ILogger implementation is already injected so logging errors during your index manipulation will be possible.

Below is a complete code listing for this post.

using Examine;
using System;
using Umbraco.Core;
using Umbraco.Core.Logging;
using Umbraco.Core.Composing;
using Umbraco.Web;
using Examine.Providers;

namespace ExamineIndexing.ExamineHelper
{
    public class IndexerComponent : IComponent
    {
        private readonly IExamineManager examineManager;
        private readonly IUmbracoContextFactory umbracoContextFactory;
        private readonly ILogger logger;

        public IndexerComponent(IExamineManager examineManager, 
            IUmbracoContextFactory umbracoContextFactory, 
            ILogger logger)
        {
            this.examineManager = examineManager ?? throw new ArgumentNullException(nameof(examineManager));
            this.umbracoContextFactory = umbracoContextFactory ?? throw new ArgumentNullException(nameof(umbracoContextFactory));
            this.logger = logger ?? throw new ArgumentNullException(nameof(logger));
        }

        public void Initialize()
        {
            if (examineManager.TryGetIndex("ExternalIndex", out IIndex externalIndex))
            {
                externalIndex.FieldDefinitionCollection.AddOrUpdate(
                    new FieldDefinition("articleDate", FieldDefinitionTypes.Long));

                ((BaseIndexProvider)externalIndex).TransformingIndexValues += 
                    IndexerComponent_TransformingIndexValues;
            }
        }

        private void IndexerComponent_TransformingIndexValues(object sender, IndexingItemEventArgs e)
        {
            if(int.TryParse(e.ValueSet.Id, out var nodeId))
                switch(e.ValueSet.ItemType)
                {
                    case "blogpost":
                        using (var umbracoContext = umbracoContextFactory.EnsureUmbracoContext())
                        {
                            var contentNode = umbracoContext.UmbracoContext.Content.GetById(nodeId);
                            if(contentNode != null)
                            {
                                var articleDate = contentNode.Value<DateTime>("articleDate");
                                e.ValueSet.Set("articleDate", articleDate.Date.Ticks);
                            }
                        }
                        break;
                }
        }

        public void Terminate(){}
    }

    public class RegisterIndexerComponentComposer : IUserComposer
    {
        public void Compose(Composition composition)
        {
            composition.Components().Append<IndexerComponent>();
        }
    }
}

Update - 2019-09-11

Since posting this a few questions have been asked about this approach, so I want to clarify a few things but first and foremost, like with most things in Umbraco there are different ways to approach any given problem. I'm not claiming this is the right way, or the only way, but it is how I ended up solving a real world problem.

IUserComposer vs ComponentComposer

In this post I've opted to use the IUserComposer approach, there is an alternative approach which involves the use of the ComponentComposer. This approach does, in theory, make it a bit easier to get up and running but when testing it this way I found that my code was executing too early. This manifested itself when I tried to modify the External Index as it wasn't yet defined. ComponentComposer inherits from IComposer but not IUserComposer which changes where it is loaded in the Umbraco startup process. It runs the risk of being loaded before all of the Umbraco startup composers have executed and this is what was happening for me. IUserComposers should always execute after the core composers which was why I took this approach. If you want to use the ComponentComposer you might be able to work around this behaviour by using the ComposeAfter attribute to ensure it runs after a Core composer you are depending on.

Can this be done without using DI?

Yes, however because DI is part of the core of Umbraco I highly encourage people to utilise this functionality where they can. It's not a given as there are ways to access core services without it, however I'm not going to share this as I personally feel it should be avoided as much as possible so don't really want to encourage it.

Why do this?

Out of the box the Umbraco External Index doesn't really know about your properties on your doc types, as a result when they are being indexed their raw values are what are indexed, and for the most part they seem to be being indexed as strings. This, as you can imagine, isn't ideal. It also makes performing Sort/Range style searches impossible. In my experience I found that you couldn't sort on any fields effectively, and fields that were thought to be numerical in value (or compatible with Range queries) didn't work. I did attempt multiple times to get Sort and Range queries without this much extra code. Unfortunately I found this was the only approach that worked reliably for me. I may follow this post up with an example search service that uses sorting and ranges on dates in the not so distant future.

FieldDefinitionTypes.Long vs FieldDefinitionTypes.DateTime

I've been asked why I wanted my dates stored as Long instead of using the DateTime type. There is a 2 fold answer on this and the first is a little legacy. In Umbraco v7, the version of Examine there would store dates in a wierd format where it was constructing the a string in the format: YYYYMMDDHHmmSS (approximately) but what I found was sorting and range queries didn't like this very much. Results would be out of order or missing regularly and so I investigated indexing dates myself to get there Tick value. Moving onto v8, I found that although better, using the DateTime field definition still has issues. When attempting to do a ranged query on a field configured this way, I either got zero results on an exception was thrown. Granted, I've not delved into this further as I was on a deadline at the time, so I resorted back to this approach to store Tick values. Storing the Tick value also means I can quickly exclude the Time element from the stored value (as seen in the post).

Possible issues

It was highlighted to me that my approach of accessing content from the Content Cache could have possible drawbacks. The most obvious of these drawbacks would be it not matching the original value if modifying, for example, the Internal Index, which contains newer versions of content than is published potentially. This is something to consider when implementing this for yourself, do you need to parse the existing value? Do you need to consider using the content service? What about variants? These are all things to bear in mind when using this approach but are beyond the scope of this post.

Questions / Further discussion

If you want to discuss this post further, have opinions on it, improvement ideas, or anything else, feel free to reach out on Twitter or the Umbraco community Slack channel and I'll happily discuss them.

JustNik
Connect with me
Twitter GitHub LinkedIn
© JustNik 2024