SharePoint Online Search Isn’t Displaying What I Expect – Part 1 – Trimmed Duplicates

  • SharePoint Online Search Isn’t Displaying What I Expect – Part 1 – Trimmed Duplicates

8 minute read

For years, I’ve been skeptical about search indexing in SharePoint, especially in SharePoint Online in Office 365. The fact that we can’t know when a search crawl has run – thus updating the indices – is a huge part of the problem. In the early days, before Content Search Web Parts (CSWPs) were available in SharePoint Online, we routinely saw delays between content creation and that content showing up in search results of days or even weeks. Later the CSWP was enabled on SharePoint Online, and it is a fantastically powerful tool, far better than the Content Query Web Part (CQWP) which it nominally replaced.

But the value of any search-driven mechanisms in SharePoint is directly tied to the recency and frequency of updates to the search index. While the CQWP is quite inefficient – since it actually goes out to look for content at the source every time it runs (though there may be some caching) – the CSWP uses the search index and can thus return results using fewer server resources in some cases. (One downside is that you can only retrieve up to 50 results with the CSWP.) Since we don’t know when the search crawls run in SharePoint Online, and we often seem to not see the results we expect, we tend to blame to indexing for the problem.

There are many things that can contribute to the indexing issues. Load on the indexing servers can mean that your tenant isn’t crawled as frequently as you might want – taking hours or in the worst cases days to display items you know should be there. Unfortunately, there is no way to know if the index is the issue. Based on what I’ve heard, there are usually multiple indexing servers per tenant, and those indices can supposedly get out of sync. Search is also a very complex beast: probably way too complicated for most use cases, as it is based on the old FAST search platform. Most people simply want to be able to see content they add to a SharePoint list or library right away if they search for it. Period. So simple, yet often not what happens.

The other day, Julie and I were certain we had an ironclad instance where search indexing simply wasn’t working correctly. The scenario seems to be a very common one, and we stripped it down to as minimal an example as possible and sent it off the the Product Group.

In the example, we were making a REST call to the search API:

to retrieve all of the events in calendars across an Intranet. The use case is a very common one: we have a calendar per department, office, etc., and in some cases we’d like to promote those calendar events to display on the home page of the Intranet. Put aside the governance questions here, and just assume that everyone who can create events gets to decide whether to check the Show on the home page box. To make this all work, we have a few custom content types which inherit from Event with a few extra columns. We have some nice, fancy display mechanisms on the home page using AngularJS and on a Company Calendar page using fullcalendar. But most of that doesn’t matter: we were seeing the issue in the call to the search API.

Our query looked something like this:

This query will return all (we thought!) list items which inherit from the Event Content Type, because its ContentTypeId is 0x0102. Of course, our actual query was more complicated: we requested specific fields with selectproperties, asked for more items by setting the rowlimit to 50, etc. But again, at the core we were simply asking for a bunch of events.

But we weren’t seeing all the events we expected. We assumed that the search index wasn’t working correctly, just like most site admins would.

There was a series of meetings going on about some HR changes, and the company was giving employees a set of webinars from which they could choose one to attend. The events were at four different times during the day. In our call to the search service, we were only getting one of those events. It happened to be the first one added to the calendar; all events had been added over 24 hours before.

When we tried searching for the title of the events in the regular old search box, we still only saw one result. At least that was consistent, and it showed we weren’t doing something stupid in our REST call. I’ve had to blur a number of things out in this screenshot, but here’s what we saw on the search results page. In this case, the results were coming to us in /_layouts/15/osssearchresults.aspx for the particular subsite where the events were, but it didn’t matter if we tried using the search center.

I included Mikael Svenson (@mikaelsvenson) in my email to my Product Group friends because if there’s something about search Mikael doesn’t know, it isn’t worth knowing. I probably should have just asked him in the first place, but we truly believed we had found a bug.

Mikael spotted that all four of the events we expected to retrieve had the same title. This isn’t so unusual: a couple of meetings on a given day with the same title. Maybe we have five interview slots set up for a new candidate, or have several different times when the Red Cross is running a blood drive on the same day, or exactly the example we have here.

We probably should have realized something was wrong when we looked at the search results above. As you can see below – now that I’ve highlighted it – there were three plus one items shown in the histogram for the Modified date.

It turned out that because the four events were so similar, search was considering them duplicates. Of course they weren’t duplicates to us: each is a unique event with its own value to end users.

By adding trimduplicates=false to our REST call, we were able to retrieve all of the events we expected. It was a very simple fix, but given the black art of SharePoint search, not necessarily an obvious one. Perhaps we should have known better, but I don’t think this is an unusual problem. Add to that the fact that the standard SharePoint search results UI doesn’t give you any way to see the duplicates, and I think there is a significant issue.

I’ve made a suggestion on the SharePoint UserVoice to Allow easier management of the trimduplicates setting.

When we search for content, we often (some might say “usually”) need to see all results for our search query. It seems that by default, trimduplicates is set to true in SharePoint Online. This seems to be true in the search Web Parts and in the search API.

My suggestion is that we have far better and clearer ways to manage this setting, including:
* An easy toggle in the Content Search Web Part (CSWP)
* A clear way on the search results pages to choose to show duplicates where there are any. This was present in earlier versions of SharePoint, and I’m not sure when it was removed.

Deciding when duplicates are appropriate is a complex thing, and it varies greatly by use case. Giving people setting up SharePoint pages simpler control over the setting will both help build compelling user experiences on the platform and help confirm that search is indeed working properly. When someone searches for something they know is there and it doesn’t show up in search results, it undermines faith in the entire platform.

If you feel that my ideas have merit, please vote this suggestion up! Suggestions at UserVoice with enough votes truly do get the SharePoint Product Group’s attention.

If you find yourself in a situation like this, there is a tool that can be helpful to solve whatever might be going on. If you do much work with search, check out the SharePoint Search Query Tool on Codeplex. Mikael has contributed to this tool and it basically allows you to issue REST calls through a UI that “understands” SharePoint search very well.

In the screenshot below I’ve done a search against our Sympraxis tenant using the querytext='test'. That’s certainly nothing fascinating, but it points out a few of the useful aspects of the tool:

  • Simple querytext configuration
  • Easy on/off switches for the various query options; in this case unchecking Trim Duplicates was the winner.
  • An easy way to see the effect of your settings on the REST call on the URL (right at the top of the screen)
  • On the right side, a clear view of the results returned, using a number of useful formats. If you’ve written JavaScript to parse search results, you’ll know that this is really helpful.

As a little bonus, here’s the key JavaScript we use to parse the RelevantResults table from the REST call to the search API. Because we’re requesting items  which are all based on the Event Content Type, we can treat all search results the same way. In this example, we’re using AngularJS and jQuery, preparing the data for use with fullcalendar, as I mentioned above.

Hopefully this post gives you a little more insight into the inner workings of SharePoint search. To me, these little eddies and backwaters of search are what turns it into a black art. I’d love to see things get even simpler that the so-called “Quick Mode” in CSWPs.

Thanks again to Mikael and the Product Group folks who engaged on this with me. Notice that this is Part 1 – I expect to add more entries to this series.

References

Mikael pointed me to these two articles on duplicates and “shingling” in case you’d like to understand the underlying principles more fully.

Mikael’s post about duplicate trimming

SharePoint Search Query Tool

SharePoint UserVoice suggestion:

Advertisements

11 Comments

  1. Never thought this word would come from my mouth… Typescript….
    It is a good thing your are doing “roque” JavaScript…
    you declare searchResult as an Array and then treat it as an Object..
    searchResult.fRecurrence !== “0” ? true : false
    is wasting CPU cycles

    Reply
    • @Danny:

      Ah, the danger of letting people see your code. There’s always something “wrong ” with it. I’ll take your comments as “feedback” and adjust accordingly. I do admit that sometimes I write thing that help with readability, even if it “wast[es] CPU cycles”, which in this case will have virtually no effect.

      M.

      Reply
  2. Marc
    Great article – another issue solved in my ongoing struggle with the brilliance/frustration of search.
    Whenever working on an intranet-based system (i.e. where anonymous access is not an issue) and looking for the return of a modest set of items, such as in your example, what made you persevere with the search-based solution over simply hitting a DVWP or CQWP, or even using the “other” REST API endpoints (/_api/lists)? I know (for reasons you outlined in the article) that we should be using the search index, but often calendar event-based data is needed now, not when a crawl has finally caught up, and it sure is easier to construct a query!

    Reply
    • @Adrian:

      Using a DVWP wouldn’t make sense, as the items are coming from across a number of subsites, where the number will change. Similar answer for using /_api/lists. (In the old days, I would definitely have made either of these approaches work [SOAP, not REST], but it was ugly.) CQWPs are old tech and pretty damned inefficient; I wouldn’t be surprised to see them deprecated at some point.

      The issue is the definition of “now”. for events which are scheduled ahead of time, the delay in indexing shouldn’t pose too much of a problem – assuming the crawls are running as they should. I usually see things show up in indexes within two hours. If it takes longer, you might have an issue. It would be nice if Microsoft gave us a “search crawl SLA”. Because it’s been spotty in the past, we automatically suspect it not working.

      M.

      Reply
  3. I tried to find your entry on User Voice, but didn’t have any luck. Might it be security trimmed? I do not have a login. I searched for “trim” and “search”

    Reply
      • Must be my browser or company filters. I clicked the link previously and couldn’t get there. I clicked it from home just now and it opened. That’s strange, but the “Reply” link under your post also doesn’t work now. Anyway, great post. Thanks!

        Reply

Have a thought or opinion?