SharePoint Workflows: Failed on Start (retrying)

At one of my clients, I’m using a pretty simple SharePoint Designer workflow to manage their Purchase Order approval process.  About half the time, the workflow runs fine.  The other half, the workflow instance gets stuck with the status ‘Failed on Start (retrying)’.  If I terminate the “stuck” instance and fire it off again manually, it runs fine, so I don’t think that it’s a bug in my code or bad data.

The environment is WSS and seems to be up to rev on all of the patches and hotfixes.  There aren’t any other workflows running, though folks at the client have done some experimenting, so workflows have been running in the past.

Looking around the Web, this doesn’t seem to be a unusual occurrence, but I haven’t found anything authoritative enough to explain it.  One suggested fix is to reset the WWF counters by running this command:

Lodctr /R c:WindowsMicrosoft.NetFrameworkv3.0Windows Workflow Foundationperfcounters.ini

We ran it, and everything seems to be working fine now.  I’ll update this post if we learn anything more…

2008-10-13

Alas, the attempt to fix things above did not seem to work. The documents are all being submitted by one person, and there still doesn’t seem to be any pattern to the successes vs. failures.  Whenever the workflows are getting stuck, I’m seeing a permission error in the logs like this:

10/13/2008 08:52:33.66     w3wp.exe (0x178C)                          0x047C     Windows SharePoint Services     Workflow Infrastructure         98d8     Unexpected System.UnauthorizedAccessException: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))     at Microsoft.SharePoint.SPGlobal.HandleUnauthorizedAccessException(UnauthorizedAccessException ex)     at Microsoft.SharePoint.Library.SPRequest.AddWorkItem(String bstrUrl, Guid& pWorkItemId, DateTime& pDeliveryDate, Guid workItemType, Guid workItemSubType, Guid parentId, Int32& pItemId, Boolean bRememberWebId, Guid& pItemGuid, Guid& pBatchId, Int32 userId, Object varBinaryPayload, String pwzTextPayload, Guid processingId, Boolean bAutoDelete)     at Microsoft.SharePoint.SPWorkItemCollection.Add(DateTime deliveryDate, Guid workItemId, Guid workItemSubType, Int32 itemId, Guid batchId, Guid itemGuid, Boolean rememberWebId, Int32 userId, Byte[] binaryPayload, String textPayload, Boo… 

10/13/2008 08:52:33.66*    w3wp.exe (0x178C)                          0x047C     Windows SharePoint Services     Workflow Infrastructure         98d8     Unexpected …lean inProgress, Boolean autoDeleteOldUnprocessedEvents)     at Microsoft.SharePoint.Workflow.SPWorkflowPendingEventCollection.Enqueue(SPWorkflowEvent workflowEvent, DateTime deliveryTime, SPRunWorkflowOptions runOptions)     at Microsoft.SharePoint.Workflow.SPWorkflowManager.RunWorkflowElev(SPWorkflow originalWorkflow, SPWorkflow workflow, Collection`1 events, SPRunWorkflowOptions runOptions)

I think that the error being a permissions one may be misleading, but I’ve tried bumping up the permissions for the person who is submitting the documents to the library to Full Control anyway.