Displaying the First N Words of a Long Rich Text Column with XSL

When you want to display blog posts and announcements with DVWPs in your SharePoint Site Collection, you usually don’t want to display the full posts, but just enough to indicate what the item is about and to let the user know if they should click to see more.  An example might be showing the last 3 blog posts on your Home Page.  There isn’t any easy out of the box way to do this.

For the following examples, let’s say that the @Body column contains the text: “The <em>quick</em> <span style=”color: #a52a2a;”>brown</span> fox jumped over the lazy dog.”, which actually looks like this: “The quick brown fox jumped over the lazy dog.”

One option is to use the ddwrt:Limit function.  This allows you to specify a number of characters to show, along with some text to postpend if the original text is longer than the limit you set.  So, for instance, ddwrt:Limit(string(@Body), 25, ‘…’) would show the first 25 characters, followed by the ‘…’ string if there are more than 25 characters in the @Body column.  However, since the @Body column usually contains some HTML markup, you usually don’t get what you really want (the tags are all counted as part of the number of characters).  With our example @Body text above, you’ll get “The <em>quick</em> <span …”, which isn’t even valid HTML since the <span> tag isn’t closed.  Depending on the browser you are using, you’ll probably see something like “The quick“.

So, the first thing you might want to do is to strip out all of the HTML.  The StripHTML XSL template below will do this for you.

<xsl:template name="StripHTML">
  <xsl:param name="HTMLText"/>
  <xsl:choose>
   <xsl:when test="contains($HTMLText, '&gt;')">
    <xsl:call-template name="StripHTML">
      <xsl:with-param name="HTMLText" select="concat(substring-before($HTMLText, '&lt;'), substring-after($HTMLText, '&gt;'))"/>
    </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
    <xsl:value-of select="$HTMLText"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

Once you have the HTML stripped out, the ddwrt:Limit function will do what you want, but the text will probably be cut off mid-word.  Looking at our example @Body text again, the StripXSL template will return “The quick brown fox jumped over the lazy dog.”, which with the ddwrt:Limit function above will look like “The quick brown fox jumpe…”

So, an even better solution is to first strip out the HTML and then return a specific word count.  The FirstNWords XSL template below takes care of this for you.

<xsl:template name="FirstNWords">
  <xsl:param name="TextData"/>
  <xsl:param name="WordCount"/>
  <xsl:param name="MoreText"/>
  <xsl:choose>
    <xsl:when test="$WordCount &gt; 1 and
        (string-length(substring-before($TextData, ' ')) &gt; 0 or
        string-length(substring-before($TextData, '  ')) &gt; 0)">
      <xsl:value-of select="concat(substring-before($TextData, ' '), ' ')" disable-output-escaping="yes"/>
      <xsl:call-template name="FirstNWords">
        <xsl:with-param name="TextData" select="substring-after($TextData, ' ')"/>
        <xsl:with-param name="WordCount" select="$WordCount - 1"/>
        <xsl:with-param name="MoreText" select="$MoreText"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="(string-length(substring-before($TextData, ' ')) &gt; 0 or
        string-length(substring-before($TextData, '  ')) &gt; 0)">
      <xsl:value-of select="concat(substring-before($TextData, ' '), $MoreText)" disable-output-escaping="yes"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$TextData" disable-output-escaping="yes"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

With our example, StripHTML returns “The quick brown fox jumped over the lazy dog.” and then a call to FirstNWords with a WordCount of 5 will give you “The quick brown fox jumped…”  Much nicer!

Note that this won’t do a perfect job if there is a lot of odd spacing or punctuation, but most of the time, it’s a much cleaner solution.

NOTE (2009-02-05): I was working with some data today that had lots of double spaces and some escaped characters, so I tweaked my FirstNWords template to work a little better by adding the test for double spaces (though it isn’t foolproof with different types of white space).

UPDATE (2009-02-27): Here’s an example of how I’ve used these templates in the past to display blog posts.  First, I create a variable called BodyText that contains the contents of the @Body column with the HTML stripped out by using the StripHTML template.  Then I output a row with a link to the post and a second row with the first 25 words of the post, followed by ‘…’, using the FirstNWords template.

<xsl:template name="USG_Blog.rowview">
  <xsl:variable name="BodyText">
    <xsl:call-template name="StripHTML">
      <xsl:with-param name="HTMLText" select="@Body"/>
    </xsl:call-template>
  </xsl:variable>
  <tr>
    <td>
      <a href="{$WebURL}Lists/Posts/Post.aspx?ID={@ID}&amp;Source={$URL}" >
        <xsl:value-of select="@Title"/>
      </a>
    </td>
  </tr>
  <tr>
    <td>
      <xsl:call-template name="FirstNWords">
        <xsl:with-param name="TextData" select="$BodyText"/>
        <xsl:with-param name="WordCount" select="25"/>
        <xsl:with-param name="MoreText" select="'...'"/>
      </xsl:call-template>
    </td>
  </tr>
</xsl:template>

As a side note, I always store these “utility” functions in a separate file for reuse and use the xsl:import tag to pull them into the DVWP I’m working on.  The import should go before the xsl:output tag, as below.

<xsl:import href="/Style Library/XSL Style Sheets/Utilities.xsl"/>
<xsl:output method="html" indent="no"/>

Similar Posts

97 Comments

  1. Forgot to mark for comment notification…

    I am trying to truncate the body field of my announcements list. thank you!

  2. I want to use this to allow any announcements list to have its body field truncated to the first 25 words. I do not anticipate the need to strip html as it’s pretty unlikely I will have html in my announcement body fields on this particular site. Can I just remove the lines (3-5)that refer to this?

    And do I need to save the FirstNWords template file to the Style Library as well?

    1. Nancy:

      Obviously, you can adapt these templates however you’d like for your business needs. Even if you don’t expect HTML in the body, I’d leave the StripHTML call there just in case.

      As for the link you asked about, that’s just an example from a place where I used this technique with blog posts. You’ll need to define the $WebURL and $URL variables if you want to use them. In my example $WebURL is there because I was rolling up blog posts from across the Site Collection and needed to use the site path in the link. The $URL used in the Source Query String parameter ensures that the user will be brought back to the page when they are done looking at the item. You can see how to set up the $URL variable here:
      http://sympmarc.com/2007/10/19/data-view-web-part-parameters-based-on-server-variables/

      M.

    1. Andy:

      Code doesn’t come through on comments. If you want to, send me your code through the Contact tab on my blog and I’ll update your comment.

      M.

  3. Hello Marc,

    great work!

    I rebuilt it, everything went fine. But it only works when I use both, the HTMLStrip and FirstNWords, could not get the step between like “The quick brown fox jumpe…”

    Error Message I got was “Can not use the < in an attribute"

    Maybe you can tell me what went wrong?

    Another thing: I want to display a string which is combined of the URL to the article then a line break and finally the 25 first words. Is this possible?

    Actually I have 2 columns, first column has the URL, the second one has the 25 first words…

    Thanks!

    Kind regards
    Stephan

    1. Stephan:

      I’m not sure about your first question. Can you rephrase it? On the second question, you can do something like:

      <a href="http://whatever">Here's the link</a><br/><xsl:value-of select="$TwentyFive"/>

      M.

  4. I have been looking for something like this for months. I think your solution is great. The only problem is that none of the rich text formatting is preserved. Has anyone found a way to do this and still retain the rich text?

    1. Rob:

      You’ll note that I’ve got this implemented as two separate templates; the first strips the HTML, and the second truncates the words. The trick in the latter case is that it’s hard to count words in HTML. You’d need to be able to parse past all of the HTML tags, understanding that some can contain text (TR, SPAN, A, etc.) and some won’t. I’ve had a few other questions about this, and if you think about it, keeping the markup doesn’t really make a lot of sense anyway. You often need the surrounding word context for it to look right.

      M.

      1. I have been thinking on this for a while. What I was thinking was to go to the end of the current tag if it was open. i.e. a cutoff in the middle of a div tag <div class=" would become , and then close all open tags.
        I know that would mean keeping track of all open tags etc.

        My current solution is just to have another field include the short description the author wants to include.

        1. FWIW, I think you’re going to save yourself a lot of headaches and get better results by stripping the HTML. I’ve been through this in a real project and that’s why we got to this approach.

          M.

  5. Marc,

    This post and the XSL templates are just, well, freaking fantastic! I have been looking for a solution to cleanly limit the output of a SharePoint list field. Most of the time I find solutions that I peice together to make my desired outcome. However, with FirstNWords and the StripHTML templates that you shared, there was nothing left to be desired.

    For those that are wondering: at the beginning of my row template I added the xsl:variable tag, effectively stripping the HTML from the @SRDetails field and placing into $BodyText. Then where I wanted to display the output, I added the xsl:call-template tag for FirstNWords with the appropriate xsl:with-param values. The actual template code was placed directly below the row template, with the only changes being:

    1. StripHTML: replace the < sign in the xsl:when to < .
    2. StripHTML: replace the other two signs () as Marc stated.
    3. FirstNWords: replace the > signs in the first and second xsl:when to > .

    Awesome solution…
    thanks Marc.

    1. Aaron:

      Glad you found these templates helpful! WordPress has improved its sourcecode tags since I originally posted this, and I’ve fixed the code. (Your points 1-3 aren’t necessary anymore.)

      M.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.