Predictive Indexing Comes to Office 365 Lists and Libraries

The 5000-item view threshold in SharePoint lists and libraries has been a bane for many people for a long time. I’ve been very vocal about it for the community (as well as for my own selfish reasons – I have client work to do!). I have a plethora of posts tagged with 5000 item limit, most of which are aimed to help us work around it, though I admit there is a fair amount of just plain complaining, too.

SharePoint Online is finally coming into the modern age, with Microsoft firmly attacking the 5000 item limit in several new ways. In the Microsoft Ignite session Harness collective knowledge with SharePoint and OneDrive content services (ECM), Chris McNulty (@cmcnulty2000) and Ian Story (@storyid) talked through how predictive indexing will improve our overall SharePoint experience.

We’ve all seen the rhetoric here: for years lists and libraries have been able to contain up to 30 million items, yet in too many cases we see errors when we have 5000 or more items in a view. In the “modern” experience, that may mean a #sadicecreamcone.

Or a #sadflattire.

By using predictive indexing – basically adding indices as the platform sees you will need them in addition to any you add manually – the real limits of list views become much larger.

Predictive indexing rolled out in advance of Ignite, so you may already have noticed some improvements when you work with lists with over 5000 items – as long as you are in the “modern” experience. As you add grouping to views or slice and dice list content using the column headers, SharePoint may be adding indices behind the scenes for you. You can see this happening if you look at the list settings and go to Indexed columns. There, you may see that some indices have been added by the platform, not by you.

One of the side benefits of predictive indexing – which to many will feel like a main benefit – is the the Filters Pane is now “smarter”, too. As those indices are added, the platform realizes that those columns may be things you’d like to see in the Filters Pane.

So are we all set now? Well, predictive indexing isn’t a total panacea, but should reduce the number of throttling events in many real-world situations. We still have to consider the 5000 item limit in several places, unfortunately. And taking proactive steps with our own indices is never a bad idea.

First, predictive indexing doesn’t change the threshold for REST calls. If we want to get more than 5000 items from a list or library via REST, we still need to page through them or come up with come other fancy caching scheme. I’d love to be able to tell a better story than “the 5000 item limit is still in effect for REST calls”. It’s still the case that – as has been the case for a long time now – if we use indexed columns as our first filters, we can get past the first 5000 items. The first filter still needs to bring the count under 5000 items for us to avoid an error, and we can’t retrieve more than 5000 items.

If you create a list or library and immediately dump more than 20,000 items into it, you’ll run into some of the same issues we used to have at 5000 items. We can create indices on lists with up to 20,000 items, but after that we run into the same issue we used to have with 5000 items: we can’t add a new index without getting below the 20,000 item number.

This also means Microsoft Flows or PowerApps may run into the 5000 item limit, depending on what you are doing (making REST calls yourself, for sure), as they use REST endpoints in the Microsoft Graph to retrieve data. The average power user building a Powerapp will see batches of 100 (but may never realize it). Someone taking things further (like reading from another list via a connector) might see batches of 500. In other words, if you are working with large lists, you need to understand the concept of paging on some level.

So what’s the upshot here? Well, primarily that Microsoft is working hard to make the constraints on large lists go away, or at least set the limitations to much higher levels. From an end user standpoint, this is Very Good News. From a developer standpoint, I think we should view it as a work in progress. Overall, though, it shows that Microsoft knows this is a weakness in the product and they are working hard to fix it.

I’m VERY happy to see this progress, and based on conversations with people like Chris and others in the Product Group I expect to see continuous improvement. Don’t worry – I’ll keep pushing them on it mercilessly, though!

Advertisements

Microsoft Flow: Updating File Metadata When You Have More Than 5000 Items

It’s been way too long since I did a blog post. Let’s remedy that, shall we?

Paul Culmsee (@paulculmsee) has been doing some great work with Microsoft Flow and PowerApps. His daughter Ashlee even won a contest put on by the PowerApps team to make a fidget spinner app! When Paul digs into a technology, stand back, because he beats it into submission like few others. If there is a Clever Workaround – and we need those way too often – he’ll find it.

<UPDATE>

Paul’s latest post on this topic shows that we now can get the ID of the newly created item from the Create File step, which we can use with the newer SharePoint – Update File Properties Flow action. Thank goodness my tricks here are no longer necessary!  The (new current) best way to update SharePoint document metadata using Flow

</UPDATE>

Paul did a post back in August I’ve referred to multiple times as I’ve tried to use Microsoft Flow. My approach with Flow here is that I try to use it for anything new I need to automate in Office 365. If I hit a roadblock, then I revert to SharePoint Designer workflows. The good news is that I’m finding I revert less and less, though it still happens maybe a third of the time at this point. (I’m guessing, but that’s about what it feels like. I would love it to be 0%!) Unfortunately, that means rework down the road if we want to be fully “modern”.

Paul’s article The (currently) best way to update SharePoint document metadata using Flow shows us the unfortunately byzantine way we need to do things in order to update metadata on a file we’ve just created. The fact that we don’t get a useful “handle” back to the file we just created is sorta crazy, and the Microsoft Flow team knows this. In the meantime, Paul’s approach works. Well, it works unless you have more than 5000 items in a library.

I had exactly that situation. At one of my clients, we have an automated process which dumps files into a library I’ve inventively called __Temp. When a new file shows up, I run a Flow to copy the file into a place where we want to store it and apply metadata to the copied file. My problem arose when I tried Paul’s second step. Because my library has way over 5000 items, that step failed.

So, until we are able to get the ID or some other useful handle for the newly created file, I needed to come up with a cleaver workaround on top of Paul’s clever workaround.

The trick was to do the following:

  • Add an index to the Modified column in the destination list.
  • Set up a TimeStamp variable in my Flow to capture the startOfDay() when the Flow is running
  • In the GetItems step, use a filter based on the Modified column, and *then* the FileLeafRef to find the file I just created

Here it is in pictures…

Make sure you have an index on the Modified column in the target list. In my case, I have indices on several other columns as well.

In the flow, set up a variable to hold startOfDay(utcnow()) . You set this up in the Expression Builder. I initially tried to set the variable just before I created the new file, setting it to utcnow() . In theory, this should have worked, but I found I had about a 50% failure rate. This doesn’t make a lot of sense, but to be safe, I switched to startOfDay() , which has proved reliable.

Just like Paul’s example, I then create the new file.

In the GetItems step, I use the more complex filter:  Modified gt datetime'TimeStamp​​' and FileLeafRef eq 'Name' Because Modified is an indexed column, and it is the first filter in my expression, the result set is reduced to fewer than 5000 items and the query works.

Now everything works reliably. I can’t wait to reduce the complexity of this Flow when we get a handle to the newly created file in the Create File step!

Aren’t you proud of me? I wrote this whole post without whining about the 5000 item limit.

Wherein I Profess My Love for Document Sets, My Hatred of the 5000 Item Limit, and Some Tips

I love Document Sets. There, I’ve said it. They help us solve so many important business needs, it’s crazy. Unfortunately, because telemetry tells Microsoft that not very many people use Document Sets, they haven’t gotten any love for a long time. On the other hand, I hate the 5000 item limit in lists and libraries because they prevent us from getting good work done for our end users.

With Document Sets, we essentially get folders on steroids. We have a canvas available to us in the form of the Welcome page, which is a Web Part Page we can customize to our heart’s content. That means we an add images, other Web Parts, script (using a CEWP or SEWP), whatever we need in order to make the Document Set sing for our users. We can even push specific metadata from the Document Set level down into the documents within it.

While on the one hand it’s great that Microsoft hasn’t given them any love for a long time (they haven’t broken anything), it will be great when they eventually get the “modern” sheen. (See my comment about not breaking anything – that’s key if the “modern” version is to get any use.)

Today’s episode comes courtesy of one of my clients where we’re using Document Sets to the max in SharePoint Online. It’s a life sciences R&D operation, and we’re tracking most of their significant research data in Document Sets, among other mechanisms. It’s a really cool project, and I often wish I could show more of what we’re doing.

When we first built one of the main libraries using Document Sets as the basis (with 14 different Content Type variants inheriting from Document Set and each other), we talked about how many items would ever be in the library. At the time, 5000 seemed like a huge and distant number. Even so, I added some indices to hedge against it, but clearly not enough indices. It’s been over two years using this system, and we’ve done a bunch of development on top that we couldn’t have predicted originally.

Recently, a couple of things stopped working the way they should. Even though we never expected to, we recently went over the 5000 item limit in the Document Library – 5099 is the current count. Here are summaries of the issues and how we fixed them. The ever wonderful and talented Julie Turner (@jfj1997) came to my rescue on some of it, as you’ll see.

Adjusting the Indices While Over 5000 Items

This has historically been a HUGE problem. Once you cross the 5000 item limit and actually NEED indices on some of your columns, you haven’t been able to create them. When you tried to do so, you’d get an error telling you that you had more than 5000 items, so you couldn’t add an index. Helpful. Off to Sharegate to move some content out, fix the indices,then Sharegate the content back in.

In our Document Set instances, we were getting some errors where we were making REST calls to retrieve items related to the current one. (The Document Sets connect together in pathways of experiments, and we store the ParentID for each item’s parent Document Set.) The REST call would only retrieve one item from the library, since there was a filter:

Unfortunately, ParentID wasn’t indexed, so we were getting 500 errors back from our REST calls. Sigh. I assumed I’d need to shuffle some content out to add the index.

Just on the off chance, I went to add the index anyway, even though we were over the 5000 item. Never hurts to try, right?

Miracle of miracles, I was able to add the index without SharePoint batting an eye. I haven’t had a chance to test this elsewhere, but in this tenant I was able to do what previously was impossible.

If this is indeed the new normal, our lives have indeed gotten a lot easier.

We can add indices to lists and libraries with over 5000 items!

In any case, it solved my immediate problem. Maybe I shouldn’t talk about it so loudly near the tenant in case it changes its mind.

Fixed the broken default view

No one – and I mean no one – likes to see this message on a SharePoint page:

This view cannot be displayed because it exceeds the list view threshold (5000 items) enforced by the administrator.

 

To view items, try selecting another view or creating a new view. If you do not have sufficient permissions to create views for this list, ask your administrator to modify the view so that it conforms to the list view threshold.

We were seeing this horrible messaged in the List View Web Part at the bottom of the Welcome Pages. Since I have code running in the page, I wasn’t 100% sure that it wasn’t my fault.

Since the List View Web Part is only showing documents for this Document Set, it should only want to show a handful for each Document Set; nowhere near 5000. I was starting to think the Document Set was fundamentally broken by design.

Luckily, Julie was online and I asked her to take a look. She had the answer, and probably saved me hours trying to figure out why this was happening.

Her suggestion was to make sure the view doesn’t have “Show all items without folders” set to true. Sure enough, when I checked the view we were using for the Document Sets List View Web Parts, that was the setting. Julie pointed me to the article Manage large lists and libraries in SharePoint, specifically:

If you choose the Show all items without folders option in the Folders section when you create or modify a view in this list or library, you must then use a filter that is based on a simple index to ensure you don’t reach the List View Threshold.

Aha! For whatever reason over the years, we had set that very setting, and that was the problem. By turning that off, everything was right as rain again.

Document Sets Can Have Unique Views

This leads me to a little known thing about Document Sets: we can have a different view at the root of the library than inside the Document Sets. In fact, since you can inherit from Document Sets, you can have a different view per Document Set-based Content Type!

In fact, I was just reminded this yesterday from reading a post from Cameron Dwyer. Sure, it’s a setting on the Document Set settings page, but frankly, I’ve rarely noticed it.

The setting isn’t visible in the Document Set settings page when you create the Content Type, because you aren’t yet in the context of a list. Once you have enabled the Content Type on a list, you’ll see the additional settings.

Here’s the bottom of the Document Set settings in the Content Type definition:


and here’s the bottom of the page in a list context:

Note that the options are slightly different. In the list context we can choose a unique view for the Document Set-based Content Type. That means in my library, I could have 14 different views of the Document Set contents, one per Content Type, should I choose to do so.

Summary

Document sets are awesome. The 5000 item limit is not.

References

Deleting a Very Large SharePoint List

I’ve complained about the 5000 item list threshold limit so many times, I’ve lost count. If you don’t believe me, you could read my posts here and  here and here and here.

Today’s difficulty was caused by needing to delete a Very Large List. This isn’t just a Large List – it’s Very Large. The number of items is way over 5000 at almost 300000. This list is part of a well-used application I built for a client about 3 years ago in SharePoint 2007. It’s been ticking away just fine, as there was no list threshold in 2007.

I’m migrating the application to SharePoint Online, so I’m totally rewriting it for several reasons. I copied across the list “shell” with Sharegate and brought along a few hundred items in the big list just for testing and building purposes.

But now I need to test “at volume”, so I brought across all 300k items. Unfortunately, I forgot to add an index to one of the columns where I need it to make some queries work. I figured “no big deal”. I’d just delete the list and start again.

Not so fast! When I tried to delete the list, I got this error:

I’m trying to DELETE the list, so what gives? Well, I found a post from Mike Smith (@TechTrainNotes) entitled SharePoint 2016: List View Threshold Limit to Delete a List is 99,993 Items??? which pretty much told me why this wasn’t working. It’s crazy, but you can’t delete a list or library which has more than 100k objects in it. (Read Mike’s post to see where he ran into a few wrinkles on this.) One would think that deleting a list would have nothing to do with any thresholds, but one would be wrong.

I tried a number of things – deleting the list in SharePoint Designer, writing some code to delete items one by one, etc. – but either my attempts didn’t work or they were taking forever.

Luckily, I have no pride and I complained on Twitter. Kelvin Hoyle (@kelvin_hoyle) came to the rescue!

By connecting to the list with Microsoft Access, I was able to select all the items and hit delete. Yes, this is taking a while as well, but it’s clearly faster.

The steps to do this are:

  • Open Access – I’m using Access 2016 – and create a blank database.
  • In the ribbon, go to External Data and in the Import & Link section, click on the More dropdown.
  • Choose SharePoint List
  • Provide the URL to the SharePoint site which houses the list
  • Be sure to leave the default radio button selected, creating a link between Access and the SharePoint list.

  • Click Next, and choose the list you want to work with.
  • Click OK.

Access will set up a new table which is linked to your list. You can open it like any other table in Access. Open the table and – voila – you can select any items you’d like to edit or delete.

Yet another workaround for something which I think shouldn’t be a limitation in the modern age. Sigh.

Addendum

It took overnight, but my list finally was whittled down below the 100k object level and I was able to delete it. Innovation: 1. Productivity: 0.

After I posted this, I heard from another corner of the Twittersphere. In SharePoint no one can hear you scream there are usually at least several ways to do anything.

Ryan Yeats’ Client-side SharePoint PowerShell on Codeplex looks like another good option. He tweeted some more details and let me know there’s a function called Clear-ListContent which does the heavy work. I’ve always avoided CSOM and I’ve spent enough time of this today but from a quick look, Ryan’s library looks useful here.

Yes, Virginia, You Can Get More than 5000 SharePoint Items with REST

If you haven’t been paying any attention, you might not know that I loathe the 5000 item limit in SharePoint. I may have mentioned it here and here and here and a bunch of other places, but I’m not sure. But given it’s there, we often need to work around it.

No 5000 item limit

I’ve written this little function before, but alas today I needed it again but couldn’t find any previous incarnations. That’s what this blog is for, though: to remind me of what I’ve already done!

In this case, I’m working with AngularJS 1.x. We have several lists that are nearing the 5000 item level, and I don’t want my code to start failing when we get there. We can get as many items as we want if we make repeated calls to the same REST endpoint, watching for the __next link in the results. That __next link tells us that we haven’t gotten all of the items yet, and provides us with the link to get the next batch, based on how we’ve set top in the request.

Here’s an example. Suppose I want to get all the items from a list which contains all of the ZIP codes in the USA. I just checked, and that’s 27782 items. That’s definitely enough to make SharePoint angry at us, what with that 5000 item limit and all. Let’s not get into an argument about whether I need them all or not. Let’s just agree that I do. It’s an example, after all.

Well, if we set up our requests right, and use my handy-dandy recursive function below, we can get all of the items. First, let’s look at the setup. It should look pretty similar to anything you’ve done in an AngularJS service. I set up the request, and the success and error handlers just like I always do. Note I’m asking for the top 5000 items, using "&$top=5000" in my REST call.

If there are fewer than 5000 items, then we don’t have a problem; the base request would return them all. Line 32 is what would do that “normal” call. Instead, I call my recursive function below, passing in the request only, even though the function can take two more arguments: results and deferred.

The recursive function simply keeps calling itself whenever it sees that the __next attribute of the response is present, signifying there is more data to fetch. It concatenates the data into a single array as it goes. In my example, there would be 6 calls to the REST endpoint because there are 27782 / 5000 = 5.5564 “chunks” of items.

Image from https://s-media-cache-ak0.pinimg.com/564x/16/bb/03/16bb034cb3b6b0bdc66d81f47a95a59f.jpg

Image from https://www.pinterest.com/pin/323062973239383839/

NOW, before a bunch of people get all angry at me about this, the Stan Lee rule applies here. If you have tens of thousands of items and you decide to load them all, don’t blame me if it takes a while. All those request can take a lot of time. This also isn’t just a get of of jail free card. I’m posilutely certain that if we misuse this all over the place, the data police at Microsoft will shut us down.

In actual fact, the multiple calls will be separated by short periods of time to us, which are almost eternities to today’s high-powered servers. In some cases, you might even find that batches of fewer than 5000 items may be *faster* for you.

In any case, don’t just do this willy-nilly. Also understand that my approach here isn’t as great at handling errors as the base case. Feel free to improve on it and post your improvements in the comments!