4 minute read
I’m doing some work in SharePoint 2013 and we want to take advantage of as many out of the box capabilities as possible. We’re replacing an existing Intranet that has grown up in SharePoint from 2007 to 2010, and we’d like to rebuild with as little custom code as possible, since SharePoint 2013 now contains features that had to be custom built in the past.
The Intranet is build using a Publishing Portal and we want to use the Content Search Web Part (CSWP) to surface content in places like a home page rotator (the latest stories of certain Content Types within relative date ranges), in several “archive pages” (a list of the historical content, sorted by descending publishing date), and using search with the Search Results Web Part (SRWP) and the Refinement Web Part (RWP) in a page. The user stories and use cases here are not really all that complex: let’s show people the latest content of predetermined types regardless where it was created in the Publishing Portal.
The new Continuous Crawl capability in SharePoint 20103 sounds like it will fit the bill for us. We want the content that users see to be as fresh as possible. In fact, the TechNet article I link to below says that with Continuous Crawl “[t]he search results are very fresh, because the SharePoint content is crawled frequently to keep the search index up to date.” Sounds perfect, but we need to understand more about it.
We haven’t done much at all with the Search Service Application. We’ve got one Content Source, which is “Local SharePoint Sites”. In other words, it couldn’t be much simpler. Since search will underlie so much of the functionality, we need to understand exactly how the crawls are going to work and what sort of lag time we can expect users to have before they see content that is published. We can’t figure out exactly how Continuous Crawl works under the hood, so today I tried to do some experiments.
I set things to have no schedules to start out just to make things as fresh as possible, and just in case, I did a Full Crawl.
When the Full Crawl was done, the Content Source showed this status:
Next I clicked the Enable Continuous Crawls radio button. Note that when I did this, the Incremental Crawl schedule was automatically set to every 4 hours. This can be changed, but the incremental schedule cannot be set to “None” while the Enable Continuous Crawls radio button is selected.
The Content Source status changed to this:
In the log, it looks like an Incremental Crawl fired off when I saved that change at 11:34.
I waited for the Incremental Crawl to complete and published a new News Item at 11:37. The new content showed up in the CSWPs and the search results around 11:55. For some reason, a new Incremental Crawl started at 11:55 (21 minutes after the previous crawl).
I added some more new content at 11:58. That content showed up in the CSWP by 12:09. (I’m not sure exactly how many minutes it took to get there, but it was less than 12.) There’s nothing in the logs to indicate that a crawl occurred:
At 12:30, There was still nothing new in the logs:
All in all, this is still confusing to me. Continuous Crawl seems to be working, but at some underlying schedule which isn’t visible. There have been some suggestions that the Continuous Crawl schedule is set to every 15 minutes by default, and the evidence above seems to support that since the second piece of content showed up in 12 minutes, about 15 minutes after the last crawl that was visible in the logs.
There is some PowerShell you can use to get at properties of the Continuous Crawl, but it’s not totally clear what impact they have on the schedule.
$ssa = Get-SPEnterpriseSearchServiceApplication
Another thing that’s not clear is how many Continuous crawl threads might stack up if things get backed up. One person has suggested an unlitimited number and someone else told me there’s a maximum of 8 threads. Obviously, there’s not a clear understanding of this, either.
In researching things, there articles/posts seem useful:
This TechNet article is way too vague and only focuses on what buttons to push to turn Continuous Crawl on or off:
In my opinion, we need some much clearer documentation from Microsoft to explain how all of this holds together and I’m trying to track down the right people to see if I can help to make that happen. If you know who those people are an could give me an introduction, I’d appreciate it.