Predictive Indexing Comes to Office 365 Lists and Libraries

The 5000-item view threshold in SharePoint lists and libraries has been a bane for many people for a long time. I’ve been very vocal about it for the community (as well as for my own selfish reasons – I have client work to do!). I have a plethora of posts tagged with 5000 item limit, most of which are aimed to help us work around it, though I admit there is a fair amount of just plain complaining, too.

SharePoint Online is finally coming into the modern age, with Microsoft firmly attacking the 5000 item limit in several new ways. In the Microsoft Ignite session Harness collective knowledge with SharePoint and OneDrive content services (ECM), Chris McNulty (@cmcnulty2000) and Ian Story (@storyid) talked through how predictive indexing will improve our overall SharePoint experience.

We’ve all seen the rhetoric here: for years lists and libraries have been able to contain up to 30 million items, yet in too many cases we see errors when we have 5000 or more items in a view. In the “modern” experience, that may mean a #sadicecreamcone.

Or a #sadflattire.

By using predictive indexing – basically adding indices as the platform sees you will need them in addition to any you add manually – the real limits of list views become much larger.

Predictive indexing rolled out in advance of Ignite, so you may already have noticed some improvements when you work with lists with over 5000 items – as long as you are in the “modern” experience. As you add grouping to views or slice and dice list content using the column headers, SharePoint may be adding indices behind the scenes for you. You can see this happening if you look at the list settings and go to Indexed columns. There, you may see that some indices have been added by the platform, not by you.

One of the side benefits of predictive indexing – which to many will feel like a main benefit – is the the Filters Pane is now “smarter”, too. As those indices are added, the platform realizes that those columns may be things you’d like to see in the Filters Pane.

So are we all set now? Well, predictive indexing isn’t a total panacea, but should reduce the number of throttling events in many real-world situations. We still have to consider the 5000 item limit in several places, unfortunately. And taking proactive steps with our own indices is never a bad idea.

First, predictive indexing doesn’t change the threshold for REST calls. If we want to get more than 5000 items from a list or library via REST, we still need to page through them or come up with come other fancy caching scheme. I’d love to be able to tell a better story than “the 5000 item limit is still in effect for REST calls”. It’s still the case that – as has been the case for a long time now – if we use indexed columns as our first filters, we can get past the first 5000 items. The first filter still needs to bring the count under 5000 items for us to avoid an error, and we can’t retrieve more than 5000 items.

If you create a list or library and immediately dump more than 20,000 items into it, you’ll run into some of the same issues we used to have at 5000 items. We can create indices on lists with up to 20,000 items, but after that we run into the same issue we used to have with 5000 items: we can’t add a new index without getting below the 20,000 item number.

This also means Microsoft Flows or PowerApps may run into the 5000 item limit, depending on what you are doing (making REST calls yourself, for sure), as they use REST endpoints in the Microsoft Graph to retrieve data. The average power user building a Powerapp will see batches of 100 (but may never realize it). Someone taking things further (like reading from another list via a connector) might see batches of 500. In other words, if you are working with large lists, you need to understand the concept of paging on some level.

So what’s the upshot here? Well, primarily that Microsoft is working hard to make the constraints on large lists go away, or at least set the limitations to much higher levels. From an end user standpoint, this is Very Good News. From a developer standpoint, I think we should view it as a work in progress. Overall, though, it shows that Microsoft knows this is a weakness in the product and they are working hard to fix it.

I’m VERY happy to see this progress, and based on conversations with people like Chris and others in the Product Group I expect to see continuous improvement. Don’t worry – I’ll keep pushing them on it mercilessly, though!

Similar Posts

19 Comments

  1. Thanks for the update Mark!
    You answered my big question –indices are in addition to any you add manually. Do you know if they are increasing the maximum # of indices that can be on a list?
    Russell

      1. i’m up to 19 on one library. was set up that way by a vendor. doubt i need them all. i thin i’ll remove the ones i don’t think are needed and see if sharepoint replaces them with others.

        i’m over 5000 but under 20000

      2. I’ve had a few cases where I hit the 20-indexes-limit, for customers with many different views (for different user groups) on several thousand items (you can argue if all of those were really necessary of course).

        A related issue I ran into: we rolled out our lists and indexes using PowerShell. However SharePoint started proactively adding its own clever indexes halfway through that process. So what happened is that I suddenly get an error when I arrived at scripted index no. 18 because on the list I unexpectedly already reached 20.

        Also, I found that you cannot manipulate those auto-created indexes afterwards, they seem locked, which is understandable, but annoying nonetheless.
        So ideally you need to leave enough room for any future predictive indexes on your list, but a ‘manual prediction’ on how many indexes you need to leave open is difficult. That also kind of defeats the purpose of the predictive indexing, because it’s not something I want to have to consider upfront.

        But as you rightly said, it’s clearly work in progress. Any scenario where we can now escape the 5000-limit is another one off my list.

        -Pascal

  2. This is in addition to my comment via twitter….

    5000 view limit has been a big pain. Predictive indexing eases it but, as you mentioned, there is still lots to be done. It is work-in-progress.
    I have encountered the following issues so far:

    Only works on modern view so if you have customization that require the classic view, you will still run into the 5000 view limit error.
    The Quick Edit doesn’t work Throws the 5000 limit error.
    Export to Excel doesn’t work. Only 100 items exported.

    My scenario:
    List with around 80 columns and 8705 records.

    Thanks for pushing hard on this issue. A client wanted to move away from SP just because of this issue. For now I am using datatables (with paging) to facilitate exporting the whole list to Excel.

  3. Hey Mark, Awesome presentation on Collab365! Thanks. I have one question. You mentioned that you copy large lists into a document library as JSON objects to get around the 5000 limit. How on earth do you do that?

    1. @russell:

      I have some code that basically takes the list content and saves it into a text file as JSON. When I load the page the next time, I check for that cached file, hydrate the JSON if it’s there, figure out the deltas, update the file, and save it again. All that is much faster than reading from the list directly.

      M.

  4. What about Lookup fields in lists / libraries based on lists that have exceeded the 5000 limit? Does anyone have any strategies for this? The only thing I know to do is replace lookup fields with single line of text fields and use jQuery’s AutoComplete or Marc’s SPServices Autocomplete function to simulate the dropdown. This allows me to use caml queries to filter the results. For lists, I have been using the datatables JavaScript library.

    1. I’m with Collin here… I have a pretty effective solution for the classic interface, but that will all stop once we are forced to the modern view. I have been working around the 5000 item limit in the UI pretty effectively, but the “missing lookup column” is a huge pain and I have no solution for the modern interface. Telling my clients to not go over 5000 items is such a huge joke! Really!?!?! 5000?

  5. The most absurd fact is that you cannot index library by document name (FileLeafRef), neither system index on that field exists. From the REST (WS) API perspective, the most useful CAML search/view option is by name, and there you get hit by the 5k threshold. Why? Why?

    1. @Boris:

      I can’t recall a time I’ve ever filtered on filename in a REST call. It’s much more common for me to retrieve a set of documents based on some metadata value and filter on the client.

      M.

  6. Hi Marc, any update on this in 2021, i tried loading a library with 100k+ files and only the views break. I am able to rename folders and move files and add folder level security.

    1. @Ben:

      If you add all 100k files at once without indexing columns first, you’ll probably go past the threshold before indexing can occur automatically. Before you add the docs, index the columns you want to use in view sorting or filtering.

      Keep in mind that views with more than 5000 items make little sense functionally. Create views which help people take action or make better decisions.

      M.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.