Base Your SharePoint Database Architecture on Business Requirements First, Database Concerns Second

Sometimes when I’m speaking at SharePoint events, I’ll mention something about the fact that “the geeks” make decisions about Site Collections and database boundaries that are a detriment to the users. I got a question about this the other day:

…you mentioned something that I wanted to follow up on: basically it was a warning to avoid setting up the information architecture (site collections and thus ability to shuffle between databases) based on “the geeks”.  I would love to find out what the issues you’ve seen or experienced that were caused by breaking up data across site collections.  I image it might be the cross site query web parts and such.

The main thing I am referring to is the promise of being able to promote content from private or local contexts into wider contexts, therefore maintaining the single version of the truththat Microsoft sometimes talks about. This single version of the truth ought to apply not just to content when considered from a database perspective, but also from a Content Management perspective. A particular version of a document ought to exist in exactly one place. Of course, that’s far more easily said than done, but it’s a Content Management goal.

Note that when I talk about “the geeks”, I mean those of us (yes, I sometimes belong in this group, too) who view the technology for technology’s sake over what its function for the users should be. If you’re an admin who never talks to end users about what they need, then you probably are in this group, though you might not be because you are able to extrapolate their needs well.

Take the example where an Intranet has a root Site Collection and then a Site Collection per Department. (A simplified example, sure, but not that uncommon.) If there is a document in the HR site that we want to promote to the root site, there are no good mechanisms to do it. Site Collection boundaries are primarily security boundaries from the end user standpoint, so we can run into permission issues. We also don’t have a good way to keep a pointer on the root site to the actual document in synch. That means there can’t easily be one version of the truth.

CQWPs and DVWPs in CrossList mode are also out of the question for doing things like rolling up Announcements, for example, which is an extremely common use case. Again, Site Collections are permission boundaries, so those Web Parts don’t work across them. We have to resort to all sorts of tomfoolery using the Web Services or third party tools.

Of course, to the geeks, it makes total sense to make the Departments individual Site Collections. They want to be able to manage the content that way so that they can go to one Department if that Department’s content is getting out of hand to make them clean it up. But it often doesn’t work from a user perspective.

A more complex example would be a smaller organization (as a larger one is unlikely to find the idea palatable) which wants to use SharePoint for its Intranet, Extranet, and Internet sites. Ideally one could create work output in a Team Site in the Intranet somewhere and upon completion, promote it to expose it more widely on the Intranet. That itself if often a problem. But what if we then want to expose that piece of content in multiple Extranet sites? Or we want to expose it as part of the content corpus on the Intranet? Now, one can easily respond with all sorts of enterprise-class concerns about security, but that’s up to the customer. These scenarios come up with my clients all the time, and if they want to make them happen, it’s their choice.  The geeks will usually set things up to preclude any of this because of the content database concerns they have.

The business sometimes simply can’t do what it wants to do because of decisions based on database concerns, and SharePoint is blamed for a shortcoming which it doesn’t actually have. That’s not a good thing for any of us.

Just Because They Are SharePoint Lists It Doesn’t Mean Good Database Rules Don’t Apply

Matt Bramer and I were helping someone out in the Stump the Panel forums at EndUserSharePoint.com the other day and in the conversation, the person posted this list structure:

I created a list called “Solution Team Profiles” with the following columns
affinity to market
birth date
Bus Phone
Cell Phone
Clearence Levels
Deal 1 Allocation
Deal 1 Response Date
Deal 1 SiebelID
Deal 1 Status
Deal 1 Title
Deal 2 Allocation
Deal 2 Response Date
Deal 2 SiebelID
Deal 2 Status
Deal 2 Title
Deal 3 Allocation
Deal 3 Response Date
Deal 3 SiebelID
Deal 3 Status
Deal 3 Title
Deal 4 Allocation
Deal 4 Response Date
Deal 4 SiebelID
Deal 4 Status
Deal 4 Title
Deal 5 Allocation
Deal 5 Response Date
Deal 5 SiebelID
Deal 5 Status
Deal 5 Title
Deal 6 Allocation
Deal 6 Response Date
Deal 6 SiebelID
Deal 6 Status
Deal 6 Title
deal size
deal turn
email
employee
employeeid
Anniversary
location
netid
Other Allocation
other skills
Other Status
Other Title
primary expertise
role
secondary expertise
svc yrs
Team
Today
utilization
Vacation PLans
Created By
Modified By

I pointed out that since he had repeating groups, he really ought to be storing the deal information in a separate list. Matt pinged me with a question on why I made that suggestion.

I could go on and on about database theory, but the bottom line on this one is that is you find yourself naming columns in a SharePoint list something1, something2, something3, etc., you are violating relational rules. This would be what is called a “repeating group”, and that stuff should go into its own list. The best example of this is that of an order you place with Amazon.com. You might order three or four things on that order, right? So [very simplistically], you’d have one list which contained information about the order itself: order number, date created, number of items, ship to address, total value, etc. Then in a separate list, you’d have an row per item ordered: order number, line number, quantity, description, cost, extended price, etc. (Note that with today’s highly efficient databases, you might not store the totals information in the orders table because it’s something you can calculate on the fly if you need to from the details. When I learned my database stuff, you’d never do that because it was too expensive.)

Storing the repeating groups in a separate table means that you aren’t reserving space in the orders table which may or may not be populated, therefore, wasting space in the database. It also makes the orders table more efficient because in many cases, you’ll only need to query that primary table. For instance, “How many orders has Matt placed?” can be answered by looking only at the orders list. When you need to look at the order details, you can join to the detail list to get at that information, using the order number as the key.

If this all sounds like gobbledy-gook to you, pick up a beginning database theory book and skim it. It’s all pretty simple stuff once you get it, but even seasoned vets make mistakes about this stuff all the time. There is a set of database normalization rules (several of which I glossed over in the example above) which, if followed, will ensure good database structures which are efficient and can also meet all of your needs.

Just because SharePoint lists aren’t really database tables doesn’t mean that you shouldn’t try to follow good database design rules, as I pointed out in my blog post entitled “SharePoint Is Not a Database”.