Predictive Indexing Comes to Office 365 Lists and Libraries

The 5000-item view threshold in SharePoint lists and libraries has been a bane for many people for a long time. I’ve been very vocal about it for the community (as well as for my own selfish reasons – I have client work to do!). I have a plethora of posts tagged with 5000 item limit, most of which are aimed to help us work around it, though I admit there is a fair amount of just plain complaining, too.

SharePoint Online is finally coming into the modern age, with Microsoft firmly attacking the 5000 item limit in several new ways. In the Microsoft Ignite session Harness collective knowledge with SharePoint and OneDrive content services (ECM), Chris McNulty (@cmcnulty2000) and Ian Story (@storyid) talked through how predictive indexing will improve our overall SharePoint experience.

We’ve all seen the rhetoric here: for years lists and libraries have been able to contain up to 30 million items, yet in too many cases we see errors when we have 5000 or more items in a view. In the “modern” experience, that may mean a #sadicecreamcone.

Or a #sadflattire.

By using predictive indexing – basically adding indices as the platform sees you will need them in addition to any you add manually – the real limits of list views become much larger.

Predictive indexing rolled out in advance of Ignite, so you may already have noticed some improvements when you work with lists with over 5000 items – as long as you are in the “modern” experience. As you add grouping to views or slice and dice list content using the column headers, SharePoint may be adding indices behind the scenes for you. You can see this happening if you look at the list settings and go to Indexed columns. There, you may see that some indices have been added by the platform, not by you.

One of the side benefits of predictive indexing – which to many will feel like a main benefit – is the the Filters Pane is now “smarter”, too. As those indices are added, the platform realizes that those columns may be things you’d like to see in the Filters Pane.

So are we all set now? Well, predictive indexing isn’t a total panacea, but should reduce the number of throttling events in many real-world situations. We still have to consider the 5000 item limit in several places, unfortunately. And taking proactive steps with our own indices is never a bad idea.

First, predictive indexing doesn’t change the threshold for REST calls. If we want to get more than 5000 items from a list or library via REST, we still need to page through them or come up with come other fancy caching scheme. I’d love to be able to tell a better story than “the 5000 item limit is still in effect for REST calls”. It’s still the case that – as has been the case for a long time now – if we use indexed columns as our first filters, we can get past the first 5000 items. The first filter still needs to bring the count under 5000 items for us to avoid an error, and we can’t retrieve more than 5000 items.

If you create a list or library and immediately dump more than 20,000 items into it, you’ll run into some of the same issues we used to have at 5000 items. We can create indices on lists with up to 20,000 items, but after that we run into the same issue we used to have with 5000 items: we can’t add a new index without getting below the 20,000 item number.

This also means Microsoft Flows or PowerApps may run into the 5000 item limit, depending on what you are doing (making REST calls yourself, for sure), as they use REST endpoints in the Microsoft Graph to retrieve data. The average power user building a Powerapp will see batches of 100 (but may never realize it). Someone taking things further (like reading from another list via a connector) might see batches of 500. In other words, if you are working with large lists, you need to understand the concept of paging on some level.

So what’s the upshot here? Well, primarily that Microsoft is working hard to make the constraints on large lists go away, or at least set the limitations to much higher levels. From an end user standpoint, this is Very Good News. From a developer standpoint, I think we should view it as a work in progress. Overall, though, it shows that Microsoft knows this is a weakness in the product and they are working hard to fix it.

I’m VERY happy to see this progress, and based on conversations with people like Chris and others in the Product Group I expect to see continuous improvement. Don’t worry – I’ll keep pushing them on it mercilessly, though!

Advertisements

Deleting a Very Large SharePoint List

I’ve complained about the 5000 item list threshold limit so many times, I’ve lost count. If you don’t believe me, you could read my posts here and  here and here and here.

Today’s difficulty was caused by needing to delete a Very Large List. This isn’t just a Large List – it’s Very Large. The number of items is way over 5000 at almost 300000. This list is part of a well-used application I built for a client about 3 years ago in SharePoint 2007. It’s been ticking away just fine, as there was no list threshold in 2007.

I’m migrating the application to SharePoint Online, so I’m totally rewriting it for several reasons. I copied across the list “shell” with Sharegate and brought along a few hundred items in the big list just for testing and building purposes.

But now I need to test “at volume”, so I brought across all 300k items. Unfortunately, I forgot to add an index to one of the columns where I need it to make some queries work. I figured “no big deal”. I’d just delete the list and start again.

Not so fast! When I tried to delete the list, I got this error:

I’m trying to DELETE the list, so what gives? Well, I found a post from Mike Smith (@TechTrainNotes) entitled SharePoint 2016: List View Threshold Limit to Delete a List is 99,993 Items??? which pretty much told me why this wasn’t working. It’s crazy, but you can’t delete a list or library which has more than 100k objects in it. (Read Mike’s post to see where he ran into a few wrinkles on this.) One would think that deleting a list would have nothing to do with any thresholds, but one would be wrong.

I tried a number of things – deleting the list in SharePoint Designer, writing some code to delete items one by one, etc. – but either my attempts didn’t work or they were taking forever.

Luckily, I have no pride and I complained on Twitter. Kelvin Hoyle (@kelvin_hoyle) came to the rescue!

By connecting to the list with Microsoft Access, I was able to select all the items and hit delete. Yes, this is taking a while as well, but it’s clearly faster.

The steps to do this are:

  • Open Access – I’m using Access 2016 – and create a blank database.
  • In the ribbon, go to External Data and in the Import & Link section, click on the More dropdown.
  • Choose SharePoint List
  • Provide the URL to the SharePoint site which houses the list
  • Be sure to leave the default radio button selected, creating a link between Access and the SharePoint list.

  • Click Next, and choose the list you want to work with.
  • Click OK.

Access will set up a new table which is linked to your list. You can open it like any other table in Access. Open the table and – voila – you can select any items you’d like to edit or delete.

Yet another workaround for something which I think shouldn’t be a limitation in the modern age. Sigh.

Addendum

It took overnight, but my list finally was whittled down below the 100k object level and I was able to delete it. Innovation: 1. Productivity: 0.

After I posted this, I heard from another corner of the Twittersphere. In SharePoint no one can hear you scream there are usually at least several ways to do anything.

Ryan Yeats’ Client-side SharePoint PowerShell on Codeplex looks like another good option. He tweeted some more details and let me know there’s a function called Clear-ListContent which does the heavy work. I’ve always avoided CSOM and I’ve spent enough time of this today but from a quick look, Ryan’s library looks useful here.

Getting Around SharePoint’s Threshold Limits for Large Reference Lists

In SharePoint 2007, we could build lists that contained as many items as we wanted and access them in any way we wanted. We may have done stupid things that were bad for server performance, but if we knew what we were doing there were few limits. We had the mythical “2000 item” rule of thumb (which had little basis in reality in many, many cases) but otherwise it was up to us. (See: Plan for software boundaries (Office SharePoint Server))

In SharePoint 2010, Microsoft introduced threshold limits to protect us from ourselves. This seemed counter intuitive to me, since with the upgrade to SharePoint 2010, one also had to go to 64 bit architecture and put a heck of a lot more iron behind the SharePoint farm. While we could potentially store 30,000,000 items in a list (30 million!), we had a list view threshold of 5000 items. (See: SharePoint Server 2010 capacity management: Software boundaries and limits)

SharePoint 2013 maintains similar limits to 2010, with 5000 items the limit for a list view. (See: Software boundaries and limits for SharePoint 2013)

List View ThrottlingSomehow, as technology has moved forward – significantly – we’ve got more limits.

5000 items is way too many to ever show in a list view, but it may not be if you want to do some client side processing. It doesn’t matter if you’re trying to build a truly bloated, Bad Idea list view or request data using REST, you’re stuck with that 5000 item limit.

But what if you know what you are doing and the 5000 item limit doesn’t work for you? I’ve written in the past about why we shouldn’t change some of the threshold limits (e.g.,  The SharePoint 2010 “List View Lookup Threshold” and Why We Don’t Change It).

If we’re working on Office365, we simply can’t change the limits – Microsoft gets to decide what they are. Sometimes those limits change without notice, too, so it’s truly out of our hands.

Well, what does all this add up to? Let’s take a specific example.

I needed to provide auto complete functionality in an InfoPath form. It was the classic scenario: we had a list of [in this case] 19,000+ cities and towns and we needed a good UX for entering selections into the form. We run into this sort of thing with data like cities and towns all the time. There are too many choices for a dropdown to make any sense, and SharePoint doesn’t offer up any better alternatives. jQueryUI’s autocomplete function is a perfect solution for this sort of thing.

The data was in a spreadsheet, and it’s easy to upload data from a spreadsheet to a SharePoint list. Bingo, bango, done, right? Well, no. It’s impossible to upload data from a spreadsheet *and* index the column. We need to index the column so that we can make type of requests requests that we need for autocomplete – basically, give me all of the items that begin with or contain this string – or else we run into the 5000 item threshold error on our requests.

No problem! We’ll just upload the data and *then* index the column. Not so fast, mister. When you try to index the column you run into – you guessed it – the 5000 item limit. No indexee, no laundry.

So seemingly we’re stuck. But wait – we can just create the list in the UI, add the index, and paste all 19,000 items into the Brave New World of SharePoint 2013’s “data sheet view” aka “Quick Edit”. Sadly, when I tried to paste in more than a few hundred items, the browser hung on me. This was on Office365, and I tried several different browsers. SharePoint Online seems to prefer nibbles to big bites of data. Unless I spent days just doing the pasting, this was never going to happen.

In this case, I decided to simply store the data in a text file containing JSON rather than trying to force it into a list. Thanks to @bpmccullough for the suggestion on Twitter.

The data I originally had received for the sities and towns was in a CSV file. I found a nice little site that would convert the CSV format to JSON (bookmark this one; you’ll need it in this JSON-oriented world).

Once I had the JSON file uploaded to a Document Library, the code below is what I used to set up the autocomplete. We wanted the autocomplete to allow multiple choices, so there’s a little monkeying around to enable that. It wasn’t tremendously peppy (19,000 cities, after all), but it worked. Remember, kids: this is an example. Don’t expect to drop it into your page and have it work.

var citiesAndStates = [];

// Get the City and State data from the file with JSON data
$.getJSON("/Shared%20Documents/CityState.txt", function(data) {
  var results = $(data);
  // Push the data into an array which is appropriate for jQueryUI autocomplete
  citiesAndStates = ($.map(results, function(item) {
    return {
      label: item.State + " - " + item.City, // Show "State - City" [e.g., MA - Boston] for selection
      value: item.City + ", " + item.State // Show "City, State" [e.g., Boston, MA] upon selection
    }
  }));
});

// When the field is available and gets focus, set up the autocomplete behavior
//ctl00_ctl42_g_6069846d_1b7f_4890_b767_2bdc15d2b133_FormControl0_V1_I1_S15_I4_T5
$(document).on("focus", "input[id$='FormControl0_V1_I1_S15_I4_T5']", function() {
  if ($(this).data("autocompleteSet") === undefined) {
    $(this).data("autocompleteSet", true);

    $(this)
    // Don't navigate away from the field on tab when selecting an item
    .bind("keydown", function(event) {
      if (event.keyCode === $.ui.keyCode.TAB && $(this).data("ui-autocomplete").menu.active) {
        event.preventDefault();
      }
    }).autocomplete({
      source: function(request, response) {
        // Delegate back to autocomplete, but extract the last term
        response($.ui.autocomplete.filter(citiesAndStates, extractLast(request.term)));
      },
      minLength: 3,
      focus: function() {
        // Prevent value inserted on focus
        return false;
      },
      select: function(event, ui) {
        var terms = split(this.value);
        // Remove the current input
        terms.pop();
        // Add the selected item
        terms.push(ui.item.value);
        // Add placeholder to get the semicolon-and-space at the end
        terms.push("");
        this.value = terms.join("; ");
        return false;
      }
    });
  }
});

function split(val) {
  return val.split(/;\s*/);
}

function extractLast(term) {
  return split(term).pop();
}

One note here: This approach basically violates my “put all data into a list so that end users can maintain it” rule. However, the approach I came up with was the best one for the situation. It’s still possible for a reasonably savvy (meaning “non-developer”) user to manage the data by downloading the file and using a JSON-aware editor.

The main thing was that we were able to set up something that worked, regardless what the Office365 limits may change to.