Displaying the First N Words of a Long Rich Text Column with XSL

When you want to display blog posts and announcements with DVWPs in your SharePoint Site Collection, you usually don’t want to display the full posts, but just enough to indicate what the item is about and to let the user know if they should click to see more.  An example might be showing the last 3 blog posts on your Home Page.  There isn’t any easy out of the box way to do this.

For the following examples, let’s say that the @Body column contains the text: “The <em>quick</em> <span style=”color: #a52a2a;”>brown</span> fox jumped over the lazy dog.”, which actually looks like this: “The quick brown fox jumped over the lazy dog.”

One option is to use the ddwrt:Limit function.  This allows you to specify a number of characters to show, along with some text to postpend if the original text is longer than the limit you set.  So, for instance, ddwrt:Limit(string(@Body), 25, ‘…’) would show the first 25 characters, followed by the ‘…’ string if there are more than 25 characters in the @Body column.  However, since the @Body column usually contains some HTML markup, you usually don’t get what you really want (the tags are all counted as part of the number of characters).  With our example @Body text above, you’ll get “The <em>quick</em> <span …”, which isn’t even valid HTML since the <span> tag isn’t closed.  Depending on the browser you are using, you’ll probably see something like “The quick“.

So, the first thing you might want to do is to strip out all of the HTML.  The StripHTML XSL template below will do this for you.

<xsl:template name="StripHTML">
  <xsl:param name="HTMLText"/>
  <xsl:choose>
   <xsl:when test="contains($HTMLText, '&gt;')">
    <xsl:call-template name="StripHTML">
      <xsl:with-param name="HTMLText" select="concat(substring-before($HTMLText, '&lt;'), substring-after($HTMLText, '&gt;'))"/>
    </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
    <xsl:value-of select="$HTMLText"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

Once you have the HTML stripped out, the ddwrt:Limit function will do what you want, but the text will probably be cut off mid-word.  Looking at our example @Body text again, the StripXSL template will return “The quick brown fox jumped over the lazy dog.”, which with the ddwrt:Limit function above will look like “The quick brown fox jumpe…”

So, an even better solution is to first strip out the HTML and then return a specific word count.  The FirstNWords XSL template below takes care of this for you.

<xsl:template name="FirstNWords">
  <xsl:param name="TextData"/>
  <xsl:param name="WordCount"/>
  <xsl:param name="MoreText"/>
  <xsl:choose>
    <xsl:when test="$WordCount &gt; 1 and
        (string-length(substring-before($TextData, ' ')) &gt; 0 or
        string-length(substring-before($TextData, '  ')) &gt; 0)">
      <xsl:value-of select="concat(substring-before($TextData, ' '), ' ')" disable-output-escaping="yes"/>
      <xsl:call-template name="FirstNWords">
        <xsl:with-param name="TextData" select="substring-after($TextData, ' ')"/>
        <xsl:with-param name="WordCount" select="$WordCount - 1"/>
        <xsl:with-param name="MoreText" select="$MoreText"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="(string-length(substring-before($TextData, ' ')) &gt; 0 or
        string-length(substring-before($TextData, '  ')) &gt; 0)">
      <xsl:value-of select="concat(substring-before($TextData, ' '), $MoreText)" disable-output-escaping="yes"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$TextData" disable-output-escaping="yes"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

With our example, StripHTML returns “The quick brown fox jumped over the lazy dog.” and then a call to FirstNWords with a WordCount of 5 will give you “The quick brown fox jumped…”  Much nicer!

Note that this won’t do a perfect job if there is a lot of odd spacing or punctuation, but most of the time, it’s a much cleaner solution.

NOTE (2009-02-05): I was working with some data today that had lots of double spaces and some escaped characters, so I tweaked my FirstNWords template to work a little better by adding the test for double spaces (though it isn’t foolproof with different types of white space).

UPDATE (2009-02-27): Here’s an example of how I’ve used these templates in the past to display blog posts.  First, I create a variable called BodyText that contains the contents of the @Body column with the HTML stripped out by using the StripHTML template.  Then I output a row with a link to the post and a second row with the first 25 words of the post, followed by ‘…’, using the FirstNWords template.

<xsl:template name="USG_Blog.rowview">
  <xsl:variable name="BodyText">
    <xsl:call-template name="StripHTML">
      <xsl:with-param name="HTMLText" select="@Body"/>
    </xsl:call-template>
  </xsl:variable>
  <tr>
    <td>
      <a href="{$WebURL}Lists/Posts/Post.aspx?ID={@ID}&amp;Source={$URL}" >
        <xsl:value-of select="@Title"/>
      </a>
    </td>
  </tr>
  <tr>
    <td>
      <xsl:call-template name="FirstNWords">
        <xsl:with-param name="TextData" select="$BodyText"/>
        <xsl:with-param name="WordCount" select="25"/>
        <xsl:with-param name="MoreText" select="'...'"/>
      </xsl:call-template>
    </td>
  </tr>
</xsl:template>

As a side note, I always store these “utility” functions in a separate file for reuse and use the xsl:import tag to pull them into the DVWP I’m working on.  The import should go before the xsl:output tag, as below.

<xsl:import href="/Style Library/XSL Style Sheets/Utilities.xsl"/>
<xsl:output method="html" indent="no"/>

96 Comments

  1. That is a great article that provides a useful xsl templates, the only thing I am not sure how to implement it. Could you provide an example of how you use it with the syntax necessary to get this to work? I have used the ddwrt:Limit(string(@Body), 25, ‘…’) before but I have been looking for a way to get the word count. Here it is but I can’t get it to work. Also, I am a big fan of your blog posting, you are one of the only posts that consistently has relevant, real-world situations on providing solutions with DVWP.

    Reply
  2. Johnnie:

    Thanks for the kind words. I’m glad you find my musings helpful! I just updated the post with a real life example of how I’ve used this template to display blog posts. Let me know if this makes it clearer for you. If not, keep asking the questions.

    M.

    Reply
  3. Hi Marc,

    Thanks for the article. Its very helpful.

    Im having a problem with the < signs from the "FirstNWords" template. It does not allow those in the dataview. I have replaced them with > but it still has a problem with the line

    <xsl:with-param name="HTMLText" select="concat(substring-before($HTMLText, '’))”/>

    in the striphtml. Do you think you could upload your Utilities.xsl file? aslo having a problem referencing that, telling me that it doesnt exist even though i put it in the XSL Style sheets folder and referenced it correctly.

    Reply
    • Courtenay:

      Sorry for the issues. The ‘greater than’ and ‘less than’ signs were showing rather than the encoding. I added a note above with how it should work. Let me know if it doesn’t make sense.

      Make sure that you check in the Utilities.xsl file.

      M.

      Reply
  4. Hey there, this seems to do mostly what I’m looking for, but without using SharePoint at all. I’m trying to parse XML to XHTML and truncate at ## chars — I don’t seem to be able to implement this though. Please let me know if I’m grossly missing something, here’s what I’ve got

    – 3 XSLT files (FirstNWords.xsl, StripHTML.xsl, and Base.xsl)
    – XML source data *(currently google news, just for ‘proof of concept’)* — this does contain a LOT of HTML text and is totally uncontrollable, so I thought it’d be a great example to work through.

    I’d really appreciate a little more instruction as to how to implement this. As I said, if I’ve grossly missed something, feel free to chastize accordingly

    ~JR~

    Reply
    • Sounds like you’re heading in the right direction, but I’m not sure what issue you are having. My XSL here works in the context of a Data View Web Part in SharePoint, but I would think that it would work elsewhere as well.

      M.

      Reply
  5. Thanks for your work on this. This is exactly what I’m looking for.

    However, I’m afraid that I’m getting a bit lost. I’m not a developer and am a bit beyond my expertise (I’m the tech guy for a non-profit organization – miracles on a shoestring!)

    I’m actually trying to do exactly what your example talks about – pulling the beginnings of blog entries from a subsite (imaginatively named /blog). Is there any chance you could fill in the few gaps left that I could replicate it on my server?

    Thanks again!

    Mitch

    Reply
    • I have a similiar question… conceptually I get what you are doing, but have no idea how to implement it. I’m a novice at customizing Sharepoint, but have been tasked with doing exactly what you discuss above — limiting the initial blog text on the homepage to 3 lines, and adding a “read more” link for the rest of it.

      I assume the function you wrote (above) needs to be added somewhere, and then implemented on the blog homepage… just not sure how to do it. Have you posted a step-by-step anywhere that I could reference?

      Thank you!

      Doug

      Reply
      • Doug:

        The XSL I show can be used in a Data View Web Part (DVWP). You work with DVWPs in SharePoint Designer. If you haven’t done any of this work before, there’s a bit of a learning curve.

        M.

        Reply
  6. This is a great example of code. One question I have is that when I post a blog post I usually put an image at the beginning of the post before any of the text. With this code it removes the image from the “teaser” and displays the 25 words. Is there a way to incorporate this and allow for an image to remain in the first 25 words, or to count an image as one of the 25 words so that it will display in the first 25 words?

    Thanks and again this code has been great, and wanted to say thanks.

    Garred

    Reply
    • Garred:

      Glad you find this one helpful. I always like it when I can recurse. Maybe that’s just me.

      You could certainly adapt the FirstNWords template to leave img tags in the TextData and still include 25 “text words”. If the TextData is *always* going to include an image at the front (probably not a safe assumption), you could just skip past the img on the first call to the template. It would probably be a better idea to just adapt it such that it left any img tags, regardless where they were, in case you have a little leading text.

      M.

      Reply
      • That makes sense, I’m still in the basic learning stages of sharepoints at this point, and so I’m not sure on what to put in to allow for the img tags to not be wiped out, and exactly wher I would put it. Would I put it in to the FirstNWords template and create a variable for it, or would I put it into the StripHTML template? I think the first option is what I would like to do, as you mentioned there might be times when there is not an image, or there is, but there is some text that comes before it. Thanks again Marc.

        Reply
        • You’re right: the StripHTML template would need to change, as well as the FirstNWords template, to do what you are looking for.

          I’ll see if I can get some time to work a little on it for you, but it may be a while. It’s an interesting thought, and if the templates could take a list of HTML tags which they should “leave alone”, it could be helpful in other ways as well.

          M.

          Reply
    • Nancy:

      The /Style Library/XSL Style Sheets “folder” is a Document Library which is created in the root site of most MOSS Site Collections. (I say most only because I haven’t created every type of Site Collection and checked; it may be all MOSS Site Collections.) The reason I put my commonly-used XSL there is that it is the location where MOSS stores similar XSL files. Better to put our garbage there than to bring theirs up, to paraphrase Arlo Guthrie.

      This approach was something I built up for Announcements initially, so it should work great for you, too.

      M.

      Reply

Have a thought or opinion?