Let’s Capture Missing or Insufficient SharePoint REST Endpoints

Today I got an alert that the SharePoint UserVoice suggestion from Corey Roth (@coreyroth) entitled Add managed metadata term store operations to REST API got the coveted “Thinking About It” tag from the Product Group. I like to tweet out changes like this to let people know the Product Group is listening and acting on our feedback – beyond saying “That’s good feedback!” It’s not all wine and roses, though:

Thank you for your feedback! Just letting you know that we absolutely have this in our backlog, but unfortunately this currently is not included in our short term engineering tasks. We absolutely understand the request and seeing vote counts around this, will help to further prioritize this work for next sprints.

I got a couple of tweets back right away pointing out some other current holes in the REST APIs.

If you think there are other endpoints the REST APIs need or endpoints that don’t work well, please add them to the comments here. I’ll work them up into a list for the Product Group and let’s see what we can get moving! We’ll play by the rules and add the list to UserVoice, but I think all the individual suggestions get lost and it’s harder to see the bigger picture. For each item on the list, I’ve tried to capture related UserVoice suggestions.

The list so far:

SharePoint Online Search Isn’t Displaying What I Expect – Part 1 – Trimmed Duplicates

This entry is part 1 of 1 in the series SharePoint Online Search Isn't Displaying What I Expect

For years, I’ve been skeptical about search indexing in SharePoint, especially in SharePoint Online in Office 365. The fact that we can’t know when a search crawl has run – thus updating the indices – is a huge part of the problem. In the early days, before Content Search Web Parts (CSWPs) were available in SharePoint Online, we routinely saw delays between content creation and that content showing up in search results of days or even weeks. Later the CSWP was enabled on SharePoint Online, and it is a fantastically powerful tool, far better than the Content Query Web Part (CQWP) which it nominally replaced.

But the value of any search-driven mechanisms in SharePoint is directly tied to the recency and frequency of updates to the search index. While the CQWP is quite inefficient – since it actually goes out to look for content at the source every time it runs (though there may be some caching) – the CSWP uses the search index and can thus return results using fewer server resources in some cases. (One downside is that you can only retrieve up to 50 results with the CSWP.) Since we don’t know when the search crawls run in SharePoint Online, and we often seem to not see the results we expect, we tend to blame to indexing for the problem.

There are many things that can contribute to the indexing issues. Load on the indexing servers can mean that your tenant isn’t crawled as frequently as you might want – taking hours or in the worst cases days to display items you know should be there. Unfortunately, there is no way to know if the index is the issue. Based on what I’ve heard, there are usually multiple indexing servers per tenant, and those indices can supposedly get out of sync. Search is also a very complex beast: probably way too complicated for most use cases, as it is based on the old FAST search platform. Most people simply want to be able to see content they add to a SharePoint list or library right away if they search for it. Period. So simple, yet often not what happens.

The other day, Julie and I were certain we had an ironclad instance where search indexing simply wasn’t working correctly. The scenario seems to be a very common one, and we stripped it down to as minimal an example as possible and sent it off the the Product Group.

In the example, we were making a REST call to the search API:

to retrieve all of the events in calendars across an Intranet. The use case is a very common one: we have a calendar per department, office, etc., and in some cases we’d like to promote those calendar events to display on the home page of the Intranet. Put aside the governance questions here, and just assume that everyone who can create events gets to decide whether to check the Show on the home page box. To make this all work, we have a few custom content types which inherit from Event with a few extra columns. We have some nice, fancy display mechanisms on the home page using AngularJS and on a Company Calendar page using fullcalendar. But most of that doesn’t matter: we were seeing the issue in the call to the search API.

Our query looked something like this:

This query will return all (we thought!) list items which inherit from the Event Content Type, because its ContentTypeId is 0x0102. Of course, our actual query was more complicated: we requested specific fields with selectproperties, asked for more items by setting the rowlimit to 50, etc. But again, at the core we were simply asking for a bunch of events.

But we weren’t seeing all the events we expected. We assumed that the search index wasn’t working correctly, just like most site admins would.

There was a series of meetings going on about some HR changes, and the company was giving employees a set of webinars from which they could choose one to attend. The events were at four different times during the day. In our call to the search service, we were only getting one of those events. It happened to be the first one added to the calendar; all events had been added over 24 hours before.

When we tried searching for the title of the events in the regular old search box, we still only saw one result. At least that was consistent, and it showed we weren’t doing something stupid in our REST call. I’ve had to blur a number of things out in this screenshot, but here’s what we saw on the search results page. In this case, the results were coming to us in /_layouts/15/osssearchresults.aspx for the particular subsite where the events were, but it didn’t matter if we tried using the search center.

I included Mikael Svenson (@mikaelsvenson) in my email to my Product Group friends because if there’s something about search Mikael doesn’t know, it isn’t worth knowing. I probably should have just asked him in the first place, but we truly believed we had found a bug.

Mikael spotted that all four of the events we expected to retrieve had the same title. This isn’t so unusual: a couple of meetings on a given day with the same title. Maybe we have five interview slots set up for a new candidate, or have several different times when the Red Cross is running a blood drive on the same day, or exactly the example we have here.

We probably should have realized something was wrong when we looked at the search results above. As you can see below – now that I’ve highlighted it – there were three plus one items shown in the histogram for the Modified date.

It turned out that because the four events were so similar, search was considering them duplicates. Of course they weren’t duplicates to us: each is a unique event with its own value to end users.

By adding trimduplicates=false to our REST call, we were able to retrieve all of the events we expected. It was a very simple fix, but given the black art of SharePoint search, not necessarily an obvious one. Perhaps we should have known better, but I don’t think this is an unusual problem. Add to that the fact that the standard SharePoint search results UI doesn’t give you any way to see the duplicates, and I think there is a significant issue.

I’ve made a suggestion on the SharePoint UserVoice to Allow easier management of the trimduplicates setting.

When we search for content, we often (some might say “usually”) need to see all results for our search query. It seems that by default, trimduplicates is set to true in SharePoint Online. This seems to be true in the search Web Parts and in the search API.

My suggestion is that we have far better and clearer ways to manage this setting, including:
* An easy toggle in the Content Search Web Part (CSWP)
* A clear way on the search results pages to choose to show duplicates where there are any. This was present in earlier versions of SharePoint, and I’m not sure when it was removed.

Deciding when duplicates are appropriate is a complex thing, and it varies greatly by use case. Giving people setting up SharePoint pages simpler control over the setting will both help build compelling user experiences on the platform and help confirm that search is indeed working properly. When someone searches for something they know is there and it doesn’t show up in search results, it undermines faith in the entire platform.

If you feel that my ideas have merit, please vote this suggestion up! Suggestions at UserVoice with enough votes truly do get the SharePoint Product Group’s attention.

If you find yourself in a situation like this, there is a tool that can be helpful to solve whatever might be going on. If you do much work with search, check out the SharePoint Search Query Tool on Codeplex. Mikael has contributed to this tool and it basically allows you to issue REST calls through a UI that “understands” SharePoint search very well.

In the screenshot below I’ve done a search against our Sympraxis tenant using the querytext='test'. That’s certainly nothing fascinating, but it points out a few of the useful aspects of the tool:

  • Simple querytext configuration
  • Easy on/off switches for the various query options; in this case unchecking Trim Duplicates was the winner.
  • An easy way to see the effect of your settings on the REST call on the URL (right at the top of the screen)
  • On the right side, a clear view of the results returned, using a number of useful formats. If you’ve written JavaScript to parse search results, you’ll know that this is really helpful.

As a little bonus, here’s the key JavaScript we use to parse the RelevantResults table from the REST call to the search API. Because we’re requesting items  which are all based on the Event Content Type, we can treat all search results the same way. In this example, we’re using AngularJS and jQuery, preparing the data for use with fullcalendar, as I mentioned above.

Hopefully this post gives you a little more insight into the inner workings of SharePoint search. To me, these little eddies and backwaters of search are what turns it into a black art. I’d love to see things get even simpler that the so-called “Quick Mode” in CSWPs.

Thanks again to Mikael and the Product Group folks who engaged on this with me. Notice that this is Part 1 – I expect to add more entries to this series.

References

Mikael pointed me to these two articles on duplicates and “shingling” in case you’d like to understand the underlying principles more fully.

Mikael’s post about duplicate trimming

SharePoint Search Query Tool

SharePoint UserVoice suggestion:

Dear Microsoft: Please Fix Retrieving SharePoint Lookup Columns with REST When the Lookup List is in Another Web

I love SharePoint. I really do. I especially love writing client side code to build awesome applications for my clients.

Today’s annoyance, though, comes while I am in the process of rewriting an application I built on SharePoint 2007, porting it to SharePoint Online in Office 365. This ought to feel like a huge leap forward technologically, and in some ways it does. I’m changing all my SOAP calls with SPServices to REST calls. I’m switching from KnockoutJS to AngularJS, which will simply perform better given the profile of the applications. (KnockoutJS was the right choice years ago when I first built the applications, but the data and feature requirements have outgrown it.)

Unfortunately, I’m running into a simple constraint that makes my life a lot harder. When I first started building these applications five years ago, I created what I’ve got to say is a very solid information architecture. It’s withstood shifting needs and requirements in the interim, and I stand by it. One of the aspects of this good information architecture is storing commonly used reference lists in the root site of the Site Collection. By creating a Site Column which is a lookup into each reference list, I can reuse those common reference values throughout my subsites.

This works great in SharePoint 2007 with SOAP calls. When I retrieve items with one of these lookup Site Columns from a list in a subsite, I simply get the ID and Title values, separated by a “;#”. However, when I try to do the same thing with “modern” REST calls, I get an error like this:

I’ve been a good team player, and I’ve suggested they fix this on the SharePoint User Voice in my suggestion Enable support for lookup columns in other webs in the REST API. The votes are up, and it’s been a while.

There’s a workaround, but it’s not very pleasant. (The easiest workaround is to simply stick with SOAP calls and SPServices – I’ve done that in several cases in other projects. But SOAP is officially “deprecated”, so…)

Here’s a specific example. The client I’m working with is in financial services, and they issue recommendations on securities. Those recommendations are very standard, and predictable: Hold, Buy, Sell, etc. In other words, perfect to store in a list in the root site called Recommendations. Why not a Managed Metadata column, you might ask? Well, I also wanted to store several other columns in the Recommendations list, like Description (e.g., “The analyst expects the security to outperform their coverage universe.”), a SortOrder value so I could rearrange the values in dropdowns using SPArrangeChoices, and several other fields which drive configuration of some reports. In other words: great information architecture. The values are all consistent across the various subsites, I store them once, etc. Nice setup.

I created a Site Column back in the beginning called Recommendation, which is a lookup into the title column of the Recommendations list (Hold, Buy, Sell, etc.). I used that Site Column in many Content Types defined on the subsite level. Those Content Types are mainly used in a list I’ll call Notes.

In SOAP with SPServices, I can make this [simplified] call:

This retrieves the items and returns nice JSON for me. Because Recommendation is a lookup column, it comes back as something like “1;#Buy” and that’s easy to turn into a JSON object like:

Easy, peasy.

However, when I try the analogous call in REST:

I get the error:

In other words, there’s no way to $expand the Recommendation column because it comes from an other Web, even though that is ideal information architecture!

The workaround, which André Lage (@aaclage) pointed out in my UserVoice suggestion (but I clearly didn’t get at the time), is to simply ask for the Recommendation column’s ID instead. This isn’t obvious at all:

This doesn’t follow the syntax we’d expect: we need to append “Id” to the end of the lookup column’s InternalName. Of course, this just gets us the ID of the item in the Recommendations list; it doesn’t fetch us the Title value, which is what we really want. Because of this, I need to do a *separate* REST call to get the items from the Recommendations list and merge the values in my client side code.

Now, one could argue that this is more efficient. I don’t ask the server to $expand the values across thousands of notes (yes, there are way more than 5000; I’ve written enough about that lately – I may have mentioned it here and here and here and here), so it gets a break. Retrieving the 5-10 values in the reference list (in this case) is no big deal.

But I have a half dozen or so of these lookup columns to deal with in this application, which means a half dozen extra REST calls, plus the code to merge the values. More work for me, but more importantly a longer wait for the application user when they load the page. I believe that poor UX is what has doomed many a SharePoint roll out, and I loathe creating a poor UX myself. In this case, I’ll make it work, but I’d really like to see this change.

Yes, Virginia, You Can Get More than 5000 SharePoint Items with REST

If you haven’t been paying any attention, you might not know that I loathe the 5000 item limit in SharePoint. I may have mentioned it here and here and here and a bunch of other places, but I’m not sure. But given it’s there, we often need to work around it.

No 5000 item limit

I’ve written this little function before, but alas today I needed it again but couldn’t find any previous incarnations. That’s what this blog is for, though: to remind me of what I’ve already done!

In this case, I’m working with AngularJS 1.x. We have several lists that are nearing the 5000 item level, and I don’t want my code to start failing when we get there. We can get as many items as we want if we make repeated calls to the same REST endpoint, watching for the __next link in the results. That __next link tells us that we haven’t gotten all of the items yet, and provides us with the link to get the next batch, based on how we’ve set top in the request.

Here’s an example. Suppose I want to get all the items from a list which contains all of the ZIP codes in the USA. I just checked, and that’s 27782 items. That’s definitely enough to make SharePoint angry at us, what with that 5000 item limit and all. Let’s not get into an argument about whether I need them all or not. Let’s just agree that I do. It’s an example, after all.

Well, if we set up our requests right, and use my handy-dandy recursive function below, we can get all of the items. First, let’s look at the setup. It should look pretty similar to anything you’ve done in an AngularJS service. I set up the request, and the success and error handlers just like I always do. Note I’m asking for the top 5000 items, using "&$top=5000" in my REST call.

If there are fewer than 5000 items, then we don’t have a problem; the base request would return them all. Line 32 is what would do that “normal” call. Instead, I call my recursive function below, passing in the request only, even though the function can take two more arguments: results and deferred.

The recursive function simply keeps calling itself whenever it sees that the __next attribute of the response is present, signifying there is more data to fetch. It concatenates the data into a single array as it goes. In my example, there would be 6 calls to the REST endpoint because there are 27782 / 5000 = 5.5564 “chunks” of items.

Image from https://s-media-cache-ak0.pinimg.com/564x/16/bb/03/16bb034cb3b6b0bdc66d81f47a95a59f.jpg

Image from https://www.pinterest.com/pin/323062973239383839/

NOW, before a bunch of people get all angry at me about this, the Stan Lee rule applies here. If you have tens of thousands of items and you decide to load them all, don’t blame me if it takes a while. All those request can take a lot of time. This also isn’t just a get of of jail free card. I’m posilutely certain that if we misuse this all over the place, the data police at Microsoft will shut us down.

In actual fact, the multiple calls will be separated by short periods of time to us, which are almost eternities to today’s high-powered servers. In some cases, you might even find that batches of fewer than 5000 items may be *faster* for you.

In any case, don’t just do this willy-nilly. Also understand that my approach here isn’t as great at handling errors as the base case. Feel free to improve on it and post your improvements in the comments!