How to build a complex index

When I originally designed the way the new Collette website would utilize Solr, the concept was to build an index to support the tour search page. To do this meant we would have to do things differently from what Sitecore was giving us with their out-of-the-box indexes. Those indexes, core, master and web I consider to be simple indexes. They crawl a certain directory, item by item, and index all the fields on that item. What makes our search index different is that we have conditional crawling, which means, not only do we target an item to be crawled by it’s template, but it needs to pass certain conditions. For example, we need to make sure that the item we are indexing is setup properly by the content authors, and that the fields are set appropriately. We also need to make sure the market and currency matches that of the current site. If any of these do not match, we do not index that item.

How did we accomplish this?

We built a custom crawler class that inherits from Sitecore’s AbstractProviderCrawler and overrode the AddRecursive method. What this allows us to do is perform a fast query against Sitecore to retrieve only the items with the intended template. Then we iterate over those items, and verify certain conditions. If we did not do this the index would be full of items we did not need to include in our index, as well as tours that are not applicable.

public class TourCrawler : AbstractProviderCrawler
{
	public override void AddRecursive(Item rootItem, IProviderUpdateContext context,
												ProviderIndexConfiguration indexConfiguration)
	{
		Assert.ArgumentNotNull(rootItem, "rootItem");
		Assert.ArgumentNotNull(context, "context");
		Assert.ArgumentNotNull(indexConfiguration, "indexConfiguration");

		if (indexConfiguration != null)
		{
			return;
		}

		TemplateBasedIndexConfiguration config = indexConfiguration as TemplateBasedIndexConfiguration;
		if (string.IsNullOrEmpty(config.BaseTemplateId))
		{
			return;
		}

		Event.RaiseEvent("indexing:adding", new object[] { context.Index.Name, rootItem.Uri });

		string query = "fast:/" + rootItem.Paths.FullPath + "//*[@@templateid = '" + config.BaseTemplateId + "']";
		List itemsToIndex = rootItem.Database.SelectItems(query).ToList();

		foreach (Item item in itemsToIndex)
		{
			if (item == null)
			{
				CrawlingLog.Log.Warn("TemplateBasedCrawler - Add Recursive - Item was null, continuing to next");
				continue;
			}

			TourContextScope scope = new TourContextScope();
			scope.LandingPage = item;
			scope.SiteSettings = item.GetSettingsItem();
			scope.Indexing = true;

			if (scope.SiteSettings != null)
			{
				Log.Debug("Site Settings Id: " + scope.SiteSettings.ID);
			}

			ITourContext tourContext = new TourContextFactory(scope).GetTourContext();
			if (tourContext == null)
			{
				Log.Debug("Template Based Crawler: Tour Context is null.  Id: " + item.ID);
				continue;
			}

			var sItem = new SitecoreIndexableItem(item);
			base.Operations.Add(sItem, context, indexConfiguration);
			Event.RaiseEvent("indexing:added", new object[] { context.Index.Name, rootItem.Uri });
		}
	}
}

Many to one

Another thing that makes the search index different from that of the simple indexes is that ours is “many to one”, meaning that our document is made up of many items instead of the traditional one to one relationship. A tour is not just one item, it is made up of packages, and packages are made up of dates. We also have contracts, upgrades, and extensions which do not live under the tour item, but rather in their own respective spot in the tree. Therefore, we could not use the traditional/simple approach that Sitecore provided.

We needed a way to include twenty six custom fields that would allow us to take the item that is being indexed and do some computation to fill that field with a value. Some examples of these fields are data indicators like: product line, a list of features, what dates this tour was being run, and in which country and continent were they being run. We were able to accomplish this by using computed fields.

Take a look at our tour index below. In order to use computed fields, the index needs to have its own configuration section and cannot use the provided default configuration. The main sections you want to be aware of are:

  1. The Fields section which holds the computed fields
  2. The Locations section which handles the crawler (code example of the crawler shown above)

Creating a Computed Field

Creating a custom field is fairly simple. The first thing you’ll want to do is create a class file that implements Sitecore’s interface: IComputedIndexField (Sitecore.ContentSearch.ComputedFields.IComputedIndexField). Once the interface is implemented you’ll need to put your logic in the ComputeFieldValue method. You’ll be returning an object. That object will need to match the returnType attribute of the tag for that field within the index. For example, take a look at the index configuration below, find the field for CountryNames, the return value is set to stringCollection. The CountryNames custom field class below (underneath the config) returns an object of string array.

<index id="tours_collette" type="Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
	<param desc="name">$(id)</param>
	<param desc="core">tours_collette</param>
	<param desc="rebuildcore">tours_collette_swap</param>
	<param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />
	<configuration type="Collette.Library.Search.TemplateBasedIndexConfiguration, Collette.Library">
		<!-- Will index only items based of this template Tour Landing Page - -->
		<BaseTemplateId>{6308F77F-B718-42EF-B7B7-79B4A1AC9144}</BaseTemplateId>
		<IndexAllFields>false</IndexAllFields>
		<DeepTemplateLookup>false</DeepTemplateLookup>
		<TemplateLookupDepth>1</TemplateLookupDepth>

		<fieldMap type="Sitecore.ContentSearch.SolrProvider.SolrFieldMap, Sitecore.ContentSearch.SolrProvider" >
			<typeMatches hint="raw:AddTypeMatch">
				<typeMatch typeName="stringCollection"   type="System.Collections.Generic.List`1[System.String]"   fieldNameFormat="{0}_sm"   multiValued="true"   settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
				<typeMatch typeName="string"             type="System.String"                                      fieldNameFormat="{0}_s"                         settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
				<typeMatch typeName="int"                type="System.Int32"                                       fieldNameFormat="{0}_i"                         settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
				<typeMatch typeName="guidCollection"     type="System.Collections.Generic.List`1[System.Guid]"     fieldNameFormat="{0}_sm"   multiValued="true"   settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
				<typeMatch typeName="datetimeArray"      type="System.DateTime[]"                                  fieldNameFormat="{0}_tdtm" multiValued="true"   settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
				<typeMatch typeName="datetimeCollection" type="System.Collections.Generic.List`1[System.DateTime]" fieldNameFormat="{0}_tdtm" multiValued="true"   settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
				<typeMatch typeName="datetime"           type="System.DateTime"                                    fieldNameFormat="{0}_tdt"                       settingType="Sitecore.ContentSearch.SolrProvider.SolrSearchFieldConfiguration, Sitecore.ContentSearch.SolrProvider" />
			</typeMatches>
		</fieldMap>

		<!-- COMPUTED FIELDS: This allows you to look up values to be placed into the index based off an item going into the index-->
		<fields hint="raw:AddComputedIndexField">
			<field fieldName="ActivityLevel"             returnType="string">Collette.Library.Search.ComputedFields.Tour.ActivityLevel,Collette.Library</field>
			<field fieldName="Content"                   returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.Content,Collette.Library</field>
			<field fieldName="TitleWords"                returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.TitleWords,Collette.Library</field>
			<field fieldName="ContinentNames"            returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.ContinentNames,Collette.Library</field>
			<field fieldName="CountryNames"              returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.CountryNames,Collette.Library</field>
			<field fieldName="CountryIds"                returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.CountryIds,Collette.Library</field>
			<field fieldName="DateRange"                 returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.DateRangeMonth,Collette.Library</field>
			<field fieldName="DayLength"                 returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.DayLength,Collette.Library</field>
			<field fieldName="Features"                  returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.Features,Collette.Library</field>
			<field fieldName="NumberOfDays_Max"          returnType="int">Collette.Library.Search.ComputedFields.Tour.NumberOfDaysMax,Collette.Library</field>
			<field fieldName="NumberOfDays_Min"          returnType="int">Collette.Library.Search.ComputedFields.Tour.NumberOfDaysMin,Collette.Library</field>
			<field fieldName="NumberOfMeals_Max"         returnType="int">Collette.Library.Search.ComputedFields.Tour.NumberOfMealsMax,Collette.Library</field>
			<field fieldName="NumberOfMeals_Min"         returnType="int">Collette.Library.Search.ComputedFields.Tour.NumberOfMealsMin,Collette.Library</field>
			<field fieldName="Price"                     returnType="int">Collette.Library.Search.ComputedFields.Tour.Price,Collette.Library</field>
			<field fieldName="PriceRange"                returnType="stringCollection">Collette.Library.Search.ComputedFields.Tour.DynamicPriceRange,Collette.Library</field>
			<field fieldName="SearchResultImage"         returnType="string">Collette.Library.Search.ComputedFields.Tour.SearchResultImage,Collette.Library</field>
			<field fieldName="SearchResultSmallImage"    returnType="string">Collette.Library.Search.ComputedFields.Tour.SearchResultSmallImage,Collette.Library</field>
			<field fieldName="Style"                     returnType="string">Collette.Library.Search.ComputedFields.Tour.Style,Collette.Library</field>
			<field fieldName="StyleCssClass"             returnType="string">Collette.Library.Search.ComputedFields.Tour.StyleCssClass,Collette.Library</field>
			<field fieldName="TourDetailUrl"             returnType="string">Collette.Library.Search.ComputedFields.Tour.TourDetailUrl,Collette.Library</field>
			<field fieldName="Summary"                   returnType="string">Collette.Library.Search.ComputedFields.Tour.Summary,Collette.Library</field>
			<field fieldName="Title"                     returnType="string">Collette.Library.Search.ComputedFields.Tour.Title,Collette.Library</field>
			<field fieldName="Start"                     returnType="datetime">Collette.Library.Search.ComputedFields.Tour.DepartureDate_Earliest,Collette.Library</field>
			<field fieldName="End"                       returnType="datetime">Collette.Library.Search.ComputedFields.Tour.DepartureDate_Latest,Collette.Library</field>
			<field fieldName="Dates"                     returnType="datetimeCollection">Collette.Library.Search.ComputedFields.Tour.DepartureDatesList,Collette.Library</field>
			<field fieldName="parsedlanguage"            returnType="string">Sitecore.ContentSearch.ComputedFields.ParsedLanguage,Sitecore.ContentSearch</field>
		</fields>

		<virtualFieldProcessors hint="raw:AddVirtualFieldProcessor">
			<virtualFieldProcessor fieldName="daterange" type="Sitecore.ContentSearch.VirtualFields.DateRangeFieldProcessor, Sitecore.ContentSearch" />
			<virtualFieldProcessor fieldName="_lastestversion" type="Sitecore.ContentSearch.VirtualFields.LatestVersionFieldProcessor, Sitecore.ContentSearch" />
			<virtualFieldProcessor fieldName="_url" type="Sitecore.ContentSearch.VirtualFields.UniqueIdFieldProcessor, Sitecore.ContentSearch" />
		</virtualFieldProcessors>

		<!-- INDEX FIELD STORAGE MAPPER: Maintains a collection of all the possible Convertors for the provider.-->
		<IndexFieldStorageValueFormatter type="Sitecore.ContentSearch.SolrProvider.Converters.SolrIndexFieldStorageValueFormatter, Sitecore.ContentSearch.SolrProvider">
			<converters hint="raw:AddConverter">
				<converter handlesType="System.Guid"                                                          typeConverter="Sitecore.ContentSearch.Converters.IndexFieldGuidValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="Sitecore.Data.ID, Sitecore.Kernel"                                    typeConverter="Sitecore.ContentSearch.Converters.IndexFieldIDValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="Sitecore.Data.ShortID, Sitecore.Kernel"                               typeConverter="Sitecore.ContentSearch.Converters.IndexFieldShortIDValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="System.DateTime"                                                      typeConverter="Sitecore.ContentSearch.Converters.IndexFieldDateTimeValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="System.DateTimeOffset"                                                typeConverter="Sitecore.ContentSearch.Converters.IndexFieldDateTimeOffsetValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="System.TimeSpan"                                                      typeConverter="Sitecore.ContentSearch.Converters.IndexFieldTimeSpanValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="Sitecore.ContentSearch.SitecoreItemId, Sitecore.ContentSearch"        typeConverter="Sitecore.ContentSearch.Converters.IndexFieldSitecoreItemIDValueConvertor, Sitecore.ContentSearch">
					<param type="Sitecore.ContentSearch.Converters.IndexFieldIDValueConverter, Sitecore.ContentSearch"/>
				</converter>
				<converter handlesType="Sitecore.ContentSearch.SitecoreItemUniqueId, Sitecore.ContentSearch"  typeConverter="Sitecore.ContentSearch.Converters.IndexFieldSitecoreItemUniqueIDValueConverter, Sitecore.ContentSearch">
					<param type="Sitecore.ContentSearch.Converters.IndexFieldItemUriValueConverter, Sitecore.ContentSearch"/>
				</converter>
				<converter handlesType="Sitecore.Data.ItemUri, Sitecore.Kernel"                               typeConverter="Sitecore.ContentSearch.Converters.IndexFieldItemUriValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="Sitecore.Globalization.Language, Sitecore.Kernel"                     typeConverter="Sitecore.ContentSearch.Converters.IndexFieldLanguageValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="System.Globalization.CultureInfo"                                     typeConverter="Sitecore.ContentSearch.Converters.IndexFieldCultureInfoValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="Sitecore.Data.Version, Sitecore.Kernel"                               typeConverter="Sitecore.ContentSearch.Converters.IndexFieldVersionValueConverter, Sitecore.ContentSearch" />
				<converter handlesType="Sitecore.Data.Database, Sitecore.Kernel"                              typeConverter="Sitecore.ContentSearch.Converters.IndexFieldDatabaseValueConverter, Sitecore.ContentSearch" />
			</converters>
		</IndexFieldStorageValueFormatter>
	</configuration>
	<strategies hint="list:AddStrategy">
		<strategy ref="contentSearch/indexUpdateStrategies/sixHourRebuildOfIndex" />
	</strategies>
	<locations hint="list:AddCrawler">
		<crawler type="Collette.Library.Search.Crawlers.TourCrawler,Collette.Library">
			<Database>web</Database>
			<Root>/sitecore/content/Home/Tours/</Root>
		</crawler>
	</locations>
</index>
public class CountryNames : IComputedIndexField
{
	public object ComputeFieldValue(IIndexable indexable)
	{
		ITourContext tourContext = TourFactory.GetTourContext(indexable);
		if (tourContext == null)
		{
			return null;
		}

		List result = tourContext.CountryNames;

		return result;
	}

	public string FieldName { get; set; }
	public string ReturnType { get; set; }
}

If you have any questions please feel free to reach out. Thank you for taking the time to check out my post.

Tim

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s