SharePoint 2013’s Search Continuous Crawl: An Enigma
I’m doing some work in SharePoint 2013 and we want to take advantage of as many out of the box capabilities as possible. We’re replacing an existing Intranet that has grown up in SharePoint from 2007 to 2010, and we’d like to rebuild with as little custom code as possible, since SharePoint 2013 now contains features that had to be custom built in the past.
The Intranet is build using a Publishing Portal and we want to use the Content Search Web Part (CSWP) to surface content in places like a home page rotator (the latest stories of certain Content Types within relative date ranges), in several “archive pages” (a list of the historical content, sorted by descending publishing date), and using search with the Search Results Web Part (SRWP) and the Refinement Web Part (RWP) in a page. The user stories and use cases here are not really all that complex: let’s show people the latest content of predetermined types regardless where it was created in the Publishing Portal.
The new Continuous Crawl capability in SharePoint 20103 sounds like it will fit the bill for us. We want the content that users see to be as fresh as possible. In fact, the TechNet article I link to below says that with Continuous Crawl “[t]he search results are very fresh, because the SharePoint content is crawled frequently to keep the search index up to date.” Sounds perfect, but we need to understand more about it.
We haven’t done much at all with the Search Service Application. We’ve got one Content Source, which is “Local SharePoint Sites”. In other words, it couldn’t be much simpler. Since search will underlie so much of the functionality, we need to understand exactly how the crawls are going to work and what sort of lag time we can expect users to have before they see content that is published. We can’t figure out exactly how Continuous Crawl works under the hood, so today I tried to do some experiments.
I set things to have no schedules to start out just to make things as fresh as possible, and just in case, I did a Full Crawl.
When the Full Crawl was done, the Content Source showed this status:
Next I clicked the Enable Continuous Crawls radio button. Note that when I did this, the Incremental Crawl schedule was automatically set to every 4 hours. This can be changed, but the incremental schedule cannot be set to “None” while the Enable Continuous Crawls radio button is selected.
The Content Source status changed to this:
In the log, it looks like an Incremental Crawl fired off when I saved that change at 11:34.
I waited for the Incremental Crawl to complete and published a new News Item at 11:37. The new content showed up in the CSWPs and the search results around 11:55. For some reason, a new Incremental Crawl started at 11:55 (21 minutes after the previous crawl).
I added some more new content at 11:58. That content showed up in the CSWP by 12:09. (I’m not sure exactly how many minutes it took to get there, but it was less than 12.) There’s nothing in the logs to indicate that a crawl occurred:
At 12:30, There was still nothing new in the logs:
All in all, this is still confusing to me. Continuous Crawl seems to be working, but at some underlying schedule which isn’t visible. There have been some suggestions that the Continuous Crawl schedule is set to every 15 minutes by default, and the evidence above seems to support that since the second piece of content showed up in 12 minutes, about 15 minutes after the last crawl that was visible in the logs.
There is some PowerShell you can use to get at properties of the Continuous Crawl, but it’s not totally clear what impact they have on the schedule.
$ssa = Get-SPEnterpriseSearchServiceApplication
$ssa.SetProperty(“ContinuousCrawlInterval”,1)
Another thing that’s not clear is how many Continuous crawl threads might stack up if things get backed up. One person has suggested an unlitimited number and someone else told me there’s a maximum of 8 threads. Obviously, there’s not a clear understanding of this, either.
In researching things, there articles/posts seem useful:
- https://www.nothingbutsharepoint.com/sites/itpro/Pages/What-is-Continuous-Crawl-in-SharePoint-2013.aspx
- http://blog.octavie.nl/index.php/2012/11/23/sharepoint-2013-search-continuous-crawling/
- http://social.msdn.microsoft.com/Forums/en-US/sharepointsearch/thread/c5bb9c5f-1957-4691-9ae4-fefff880349c
This TechNet article is way too vague and only focuses on what buttons to push to turn Continuous Crawl on or off:
In my opinion, we need some much clearer documentation from Microsoft to explain how all of this holds together and I’m trying to track down the right people to see if I can help to make that happen. If you know who those people are an could give me an introduction, I’d appreciate it.
Great Post Marc! I want to point out that the Content Search Web Part (CSWP) is an Enterprise only item. If your clients have the Standard CAL they can use the Content results with a couple of wrinkles in functionality. Agreed that MSFT needs to step up to the plate on documentation about this piece – reason for my tweet earlier…
Hi Martin,
In case of the Standard CAL, what would be these couple of wrinkles in functionality exactly? And how can the Content results be shown without the CSWP?
Check out the Search Results Web Part in the Search group on Standard. The “wrinkles” are that you don’t get to customize the managed property mappings for the display template through the UI, you have to define the properties you’re going to use directly in the template.
Because you can define the mappings in the UI in the Content Search web part you can have reusable templates and you don’t need to specify the specific properties to use in the template. It also plays well with metadata navigation allowing you to drop a content search web part on a category page and display different results based on the navigation. So you loose that functionality in the Search Results web part but it still works great!!
Hi Marc,
Vaidy Raghavan was a Program Manager for the 2013 search crawler. He presented a session about the crawler (and continuous crawl) at SPC (SPC044 if you have the decks/vids). He might be a good person to reach out to. But I would check out his session. I think it may provide some clarity, but if memory serves, not in the definitive detail that you’re seeking.
Thanks Paul. I’ll check out that session.
M.
Hi Mark, we have a similar situation, we are using Content Search web parts and we were having problems with the content not showing up fast enough. We had two content sources, one was the local and the other was the domain the we set. We set local to crawl incrementally and we set the domain on continuous. I have read that by having 2 content sources you get an overlapping affect. The continuous crawl is set for every 15 minutes by default, however I have read that you can change that setting in PowerShell. I see the incremental on local to be every 3 minutes. This really sped up my search content to appear in content search web parts. I was a little concerned that was possibly slowing the site down, however changing the incremental back up to 15 did not change it that much since there is a dedicated server running the crawls. Anyways, hope you got it figured out in the past few months.
Well, I know I’m kind of late to the party here – but I’m fighting the same with a SP SE version.
Users claim that News used to publish faster – but now it’s delayed for sometimes very long (hours).
I’m a “n00b” at Sharepoint admin – and only recently learned that “publish new News” in Sharepoint does not mean that any webpage is updated at that point. No – Crawl has to do stuff, when a timerjob triggers it. And THEN the updated page is shown.
So – I’ve switched from full/incremental crawl as a first. Didn’t help.
Then we decided to switch to Continuous crawl because that “ensures max freshness”. Didn’t help either. I learned that even the “Continuous” have a default schedule of 15 minute, that could be changed to smaller interval. Which would affect the entire ContentSource, and I don’t want to do that…
We have a default ContentSource for “Local Sharepoint Sites” and one for “Mysite and Profiles” – all seems pretty default.
I then though – HEY, why don’t I just create an extra ContentSource and make that crawl the specific location often, where they publish their “News”. But oh no, that can’t be done in Sharepoint either – it complains this way: “The start address https://sharepoint.domain.name/sites/news already exists in this or another content source” !?!
I hope somebody ends up reading this and wants to chime in and educate me a bit – Is there any way I can tell Sharepoint to smarten up and concentrate on this single location to spot news immediately?
A way that doesn’t make it read all the rets SP data every time it starts crawling..
Thanks in advance