By the time you finish reading this blog post you should have a good understanding of the key points in the Solr configuration file. I always look at things like a big puzzle, and when you have put more pieces together, the more you understand what it is you are working with and everything around you. Hopefully this blog post will add a couple pieces to your puzzle…
How is an indexed structured?
Each index within Sitecore has a configuration behind it. This configuration specifies how the index should interact and behave in the environment. From here it knows when to crawl, where to crawl, which database to crawl, which core to point back to in Solr, and whether or not it should swap cores when it finishes rebuilding. Let’s pull apart the screenshot below:
- ID – You can see on the index node that we are supplying it with an ID as well as a type. I always keep my ID consistent with the name of the core on the Solr side. No reason to start throwing different names all over the place. Keep it simple!
- Core Switching – If you want the index to switch cores when it is finished rebuilding you will want to make sure the class name is set to SwitchOnRebuildSolrSearchIndex and not the default SolrSearchIndex. Core switching is very handy; when Sitecore rebuilds an index one of the first things it does is purge all data from the index. If it takes 15 minutes to rebuild, that is 15 minutes where your index is not available. With the core switching, it builds the swap index then switches the two cores so there is no downtime. Again, simple!
- Parameters – I’ll be honest I haven’t had a need to change the property store parameter so I am going to focus on the other three. You can see from the screenshot I am using the token $(id), this refers back to the ID attribute of the <index> tag. We set the name parameter and the core parameter to this value. The rebuildcore parameter (which is only needed if you are utilizing the core swapping) maps directly to your swap core over on the solr side.
- Configuration – This piece is very important. Within the configuration you can specify new fields types and computed fields. For the most part in the Collette solution we have a configuration for each index. This is because each index usually requires computed fields which are specific to that index. We will drill down into this section further on in the post.
- Strategies – Here you can specify when you would like your index to rebuild. Is it after publishing, based off an interval such as 2 hours, manual or maybe something completely custom? More on this later! 🙂
- Locations – More fun! Here we get to define our crawler. We tell our crawler which database to target and where to target. If you would like to learn more about custom crawlers I have a blog post for that too!
In my mind the index configuration is what the index uses to translate the Sitecore data into a Solr document. It defines how the fields should be represented in the document array, it comes up with a dynamic name, and allows you to add computed fields to bring in data or to compute data that is not a field value on that item being processed/crawled. An index configuration can be used by many indexes. Let’s dig in.
- Type Matches – You ever wonder how every string field ends up in Solr with a “_s” suffix as part of the field name? Type matches takes care of this and it identifies to Sitecore when querying what object to cast that field into. Theoretically you could store custom objects. 🙂
- Field Types – Sitecore uses the field type section to map Sitecore fields to object types in Solr. For example, a single-line text field we all know is a string, but how does Solr know that? It knows it because it is called out. See the screenshot below. Now when the content search layer comes across a single-line text field, it knows to store it as text in Solr.
- Target Certain Templates – For this index configuration we are targeting airlines, therefore when the crawler goes to get all the items from the root, we want to filter out any items that do not meet the templates I am specifying in the screenshot below. Note, you can include multiple templates.
- Index Field Storage Value Formatter – These converters actually do the conversion to and from a type of object. Let’s take the item’s version object. This is converted using the IndexFieldVersionValueConverter. It is able to take a field value from Solr and parse it into a Sitecore.Data.Version object. It is also to do the inverse by taking the Sitecore Version object and parsing it into a string for Solr. Very interesting stuff! When we talked about storing custom objects in Solr in the type matches section, this section here will play a big part of that.
- Computed Fields – This section is crucial. It allows you to specify field values in Solr that are not field values on the Sitecore item. This allows you to programmatically compute data and store that data in Solr. For example, if you want a sum of three fields on the index, you can easily do this using a computed field rather than making schema changes on the Solr side of things. If you wanted to get the URL for the item, you can call LinkManager and store the URL as we are doing in the screenshot below. In another blog post I talk in much more detail regarding computed fields.
Key Content Search Settings
There are several key settings you will want to be aware of when using the Content Search layer. Let’s take a look at these:
- ContentSearch.SearchMaxResults – This controls how many documents are returned to Sitecore when it queries Solr. The default is 500 documents which is plenty. Know that requests to Solr are made over HTTP, therefore the more requests you receive, the larger the response payload will be.
- ContentSearch.Solr.ServiceBaseAddress – This points to solr, you should be able to take the value of this setting, paste it in a browser and get directly to Solr.
- ContentSearch.Update.BatchModeEnabled – Enabling this will certainly help committing documents to Solr when you are rebuilding or updating an index
- ContentSearch.Update.BatchSize – This is used when the above setting is set to true.
- ContentSearch.EnableSearchDebug – Set this to true while you are debugging your search queries. You will also need to make sure log4net is also set to at least DEBUG.