Home » Tutorials » Rescue that Workflow Manager from certain doom, or at least get that OutboundCertificate fixed!

Rescue that Workflow Manager from certain doom, or at least get that OutboundCertificate fixed!

SharePoint 2013 and Workflow Manager have always proven to be a winning combination for late nights of

Doom I tell you. Doom!

Doom I tell you. Doom!

troubleshooting involving copious amounts of coffee and a complete loss of sleep. Or whatever may be your preferred means of caffeine intake. Plus your spouse may not appreciate yet another late night without you at home. Workflow Manager is a frustrating, burdensome beast, and it is not on the fun sunny side of life.

In this particular instance we are needing to resuscitate Workflow Manager from its current undead state. It pretends to be running and responsing to your commands. The users believe otherwise, and are wanting to lynch you because their workflows are showing angry messages about no longer being able to talk with the server. Looking at the server you find that the management databases are shot and cannot be worked with in their current state. In my recent case in particular, it was due to a fabulous mixture of expired certificates, revoked certificates and certificate templates that “update” your current certificates to certificates that are incompatible with Workflow Manager. This restore is also a method that can be used to replace the OutboundCertificate in the Workflow Manager farm if the Set-WFNextOutboundCertificateReference and Set-WFNextOutboundCertificateAsCurrent are not working for you.

Microsoft has a pretty decent article on disaster recovery for Workflow Manager 1.0. The problem I found with it as that it was incomplete, so thusly why I am putting together this post. Now, the topology we are working with in this scenario is a single SharePoint 2013 server, with a separate single SQL server, and a separate single Workflow Manager server. This scenario also requires you to have either working backups of your Workflow Manager databases, or that only the WFManagementDB and/or SbManagementDB are the only shot databases. You do have valid backups of everything, right? Go check again, right now, just to be safe. If you are doing a restore on a farm of multiple Workflow Manager servers then you may need a few extra steps to update those servers to the new databases. Also, check and make sure your certificates are up to date and that you know which service accounts are in use on your Workflow Manager farm and what their passwords are.

If you’re skipping ahead to the details on how to do this, here is where you need to start paying attention!

First off we need to uninstall Workflow Manager. Hopefully an easy enough step. If you’re installing 1.0 Refresh and you’re running Service Bus 1.0 then this would be a good time to move to Service Bus 1.1. It worked flawlessly for me when I did this. If that is the direction you are going to go then uninstall Service Bus 1.0

Next step! Let’s install Service Bus 1.1 followed by Workflow Manager Refresh 1.0. Hopefully that went smoothly for you.

Now we need to get the Service Bus farm up and running. Check your SQL server and make sure you remove your SbManagementDB and your WFManagementDB, just in case those still exist. Alternatively when rebuilding things you could name the databases something else, but I don’t see much of a point to that as it will just cause confusion further down the line. Identify your service account you are using for Service Bus and then we’ll get the database recreated. Pop open PowerShell and run

Import-Module ServiceBus

Restore-SBFarm -RunAsAccount DOMAIN\servicebussvc -GatewayDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbGatewayDatabase;Integrated Security=SSPI;Asynchronous Processing=True” -SBFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -FarmCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D -EncryptionCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D

That will get your replacement SbManagementDB created. The output of a successful run of the command will look something like the following, which don’t you love how on very critical commands like this it defaults to Yes?

This operation will restore the entire service bus farm
Are you sure you want to restore the service bus farm?
[Y] Yes [N] No [S] Suspend [?] Help (default is “Y”):
FarmType : SB
SBFarmDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated
Security=True;Asynchronous Processing=True
ClusterConnectionEndpointPort : 9000
ClientConnectionEndpointPort : 9001
LeaseDriverEndpointPort : 9002
ServiceConnectionEndpointPort : 9003
RunAsAccount : DOMAIN\servicebussvc
AdminGroup : BUILTIN\Administrators
GatewayDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SbGatewayDatabase;Integrated
Security=True;Asynchronous Processing=True
HttpsPort : 9355
TcpPort : 9354
MessageBrokerPort : 9356
AmqpsPort : 5671
AmqpPort : 5672
FarmCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
EncryptionCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
Hosts : {}
RPHttpsPort : 9359
RPHttpsUrl :
FarmDNS :
AdminApiUserName :
TenantApiUserName :
BrokerExternalUrls :

The Service Bus farm has been successfully restored.

Note that it will complain if SbManagementDB already exists, so you will have to delete it or name this one something new. Now we’ll connect in the SbGatewayDatabase.

Restore-SBGateway -GatewayDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbGatewayDatabase;Integrated Security=SSPI;Asynchronous Processing=True” -SBFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated Security=SSPI;Asynchronous Processing=True”

This operation will restore the Service Bus gateway database. This may require upgrading of gateway database and
message container databases.
Are you sure you want to restore the Service Bus gateway database?
[Y] Yes [N] No [S] Suspend [?] Help (default is “Y”):
Re-encrypting the global signing keys.
The following containers database has been restored:
WARNING: Failed to open a connection to the following dB: ”
WARNING: The database associated with container ‘1’ is not accessible. Please run Restore-SBMessageContainer -Id 1
-DatabaseServer <correct server> -DatabaseName <correct name> to restore container functionality.
Id : 1
Status : Active
Host :
DatabaseServer :
DatabaseName :
ConnectionString :
EntitiesCount : 13
DatabaseSizeInMB : 0

Restore-SBGateway : The operation has timed out.
At line:1 char:1
+ Restore-SBGateway -GatewayDBConnectionString “Data Source=sql.jefferyland.com; …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Restore-SBGateway], SqlCommandTimeoutException
+ FullyQualifiedErrorId : Microsoft.Cloud.ServiceBus.Common.Sql.SqlCommandTimeoutException,Microsoft.ServiceBus.Co
mmands.RestoreSBGatewayCommand

Do not be alarmed by the scary messages in there. I was alarmed at first but apparently everything went well. Next check your SQL server for SBMessageContainer* databases and you’ll need to run this command for each one. At least, according to Microsoft’s documentation. According to the command I ran it wasn’t necessary.

Restore-SBMessageContainer -Id 1 -SBFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -ContainerDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SBMessageContainer01;Integrated Security=SSPI;Asynchronous Processing=True”

Id : 1
Status : Active
Host :
DatabaseServer : sql.jefferyland.com
DatabaseName : SBMessageContainer01
ConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SBMessageContainer01;Integrated
Security=True;Asynchronous Processing=True
EntitiesCount : 13
DatabaseSizeInMB : 48.6875

All entities are up to date. No changes were made to entities.
Please run Start-SBHost.

Now we need to add our host to the Service Bus farm.

Add-SBHost -SBFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -RunAsPassword (ConvertTo-SecureString -Force -AsPlainText password!) -EnableFirewallRules:$true

FarmType : SB
SBFarmDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated
Security=True;Asynchronous Processing=True
ClusterConnectionEndpointPort : 9000
ClientConnectionEndpointPort : 9001
LeaseDriverEndpointPort : 9002
ServiceConnectionEndpointPort : 9003
RunAsAccount : DOMAIN\servicebussvc
AdminGroup : BUILTIN\Administrators
GatewayDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SbGatewayDatabase;Integrated
Security=True;Asynchronous Processing=True
HttpsPort : 9355
TcpPort : 9354
MessageBrokerPort : 9356
AmqpsPort : 5671
AmqpPort : 5672
FarmCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
EncryptionCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
Hosts : {Name: workflow.jefferyland.com, Configuration State: HostConfigurationCompleted}
RPHttpsPort : 9359
RPHttpsUrl : https://workflow.jefferyland.com:9359/
FarmDNS :
AdminApiUserName :
TenantApiUserName :
BrokerExternalUrls :

We’ve finished up the Service Bus farm, hopefully successfully, so now we’re ready for the Workflow Manager farm. Fighting!

This can get a little bit messy if you’re running Service Bus 1.1 as there is a buggy cmdlet. If you’re not using Service Bus 1.1, or you do not receive an error like

Could not load file or assembly
'Microsoft.ServiceBus, Version=1.8.0.0, Culture=neutral,
PublicKeyToken=31bf3856ad364e35' or one of its dependencies.
The system cannot find the file specified.

then you can skip the following. If we are using Service Bus 1.1, then we need to work around a call to an old ServiceBus assembly in one of the cmdlets. Thanks to these posts, http://www.wictorwilen.se/issue-when-installing-workflow-manager-1.0-refresh-using-powershell and https://carolinepoint.wordpress.com/2012/07/10/sharepoint-2010-powershell-and-bindingredirects/ we have a valid work around.

Create or edit a file named C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe.config and paste the following into it:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<configuration>
<runtime>
<assemblyBinding xmlns=”urn:schemas-microsoft-com:asm.v1″>
<dependentAssembly>
<assemblyIdentity name=”Microsoft.ServiceBus”
publicKeyToken=”31bf3856ad364e35″
culture=”en-us” />
<bindingRedirect oldVersion=”1.8.0.0″ newVersion=”2.1.0.0″ />
</dependentAssembly>
</assemblyBinding>
</runtime>
</configuration>

Then restart your PowerShell session to make this active. You may want to undo this part after you’re done restoring the farm just to be safe.

Continuing on with the farm build run the following.

Import-Module WorkflowManager

Restore-WFFarm -InstanceDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=WFInstanceManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -ResourceDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=WFResourceManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -WFFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=WFManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -OutboundCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D -EncryptionCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D -SslCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D -InstanceStateSyncTime (Get-Date)  -ConsistencyVerifierLogPath “C:\temp\wfverifierlog.txt” -RunAsAccount DOMAIN\workflowsvc -Verbose

A successful run through should get you output similar to this:

VERBOSE: [5/14/2015 11:56:58 PM]: Created and configured farm management database.
VERBOSE: [5/14/2015 11:56:58 PM]: Created and configured Workflow Manager resource management database.
VERBOSE: [5/14/2015 11:56:58 PM]: Created and configured Workflow Manager instance management database.
VERBOSE: [5/14/2015 11:56:58 PM]: Configuration added to farm management database.
VERBOSE: [5/14/2015 11:56:58 PM]: Workflow Manager configuration added to the Workflow Manager farm management
database.
VERBOSE: [5/14/2015 11:56:58 PM]: New-WFFarm successfully completed.
FarmType : Workflow
WFFarmDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=WFManagementDB;Integrated
Security=True;Asynchronous Processing=True
RunAsAccount : DOMAIN\workflowsvc
AdminGroup : BUILTIN\Administrators
Hosts : {}
InstanceDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=WFInstanceManagementDB;Integrated
Security=True;Asynchronous Processing=True
ResourceDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=WFResourceManagementDB;Integrated
Security=True;Asynchronous Processing=True
HttpPort : 12291
HttpsPort : 12290
OutboundCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
Endpoints : {}
SslCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
EncryptionCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False

This will get our WFManagementDB recreated as well. Time to add the host back in!

Add-WFHost -WFFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=WFManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -RunAsPassword (ConvertTo-SecureString -Force -AsPlainText password!) -EnableFirewallRules:$true

This should have your farm up and running. Let’s check the status.

Get-WFFarmStatus

HostName ServiceName ServiceStatus
——– ———– ————-
workflow.jefferyland.com WorkflowServiceBackend Running
workflow.jefferyland.com WorkflowServiceFrontEnd Running

Restoration is done! This is where Microsoft’s documentation leaves you hanging. You need to reconnect the farm with SharePoint.

Register-SPWorkflowService -SPSite “https://sharepoint.jefferyland.com/&#8221; -WorkflowHostUri “https://workflow.jefferyland.com:12290&#8221; -AllowOAuthHttp -Force

Your workflows should now be showing up once again but we’re not done yet, we need to perform some maintenance on the SharePoint server. First clean-up the old certificates using the thumbprint of the old certificate for your filtering criteria:

Get-SPTrustedRootAuthority | ?{$_.Certificate -match “BF5CA00B6A639FE5B7FF5688C9A38FEBFBF03552”} | Remove-SPTrustedRootAuthority -Confirm:$false

Next we need to run some jobs to update the security token, otherwise you’ll get a HTTP 401 Invalid JWT token error. Alternatively you can wait until after midnight for the timer jobs to run themselves, but I’m pretty sure that would not be the healthiest decision here.

In Central Administration go to Monitoring->Timer Jobs:Job Definitions
Run these jobs:
Refresh Trusted Security Token Services Metadata feed.
Workflow Auto Cleanup
Notification Timer Job c02c63c2-12d8-4ec0-b678-f05c7e00570e
Hold Processing and Reporting
Bulk workflow task processing

Now check in your workflows. They should be running nice and healthy! That wraps up this post on rescuing your Workflow Manager farm and saves you from losing a night or two of sleep.

Advertisements

7 Comments

  1. Thanks, best of articles about how to overcome stuff when restoring SB/WF Farm, however still incomplete.

    1. I had to run Restore-SBMessageContainer before Restore-SBGateway
    2. When Restoring-WFFarm or (in case adding host to an existing farm with 0 hosts with Add-WFHost) – I was presented with a message: “Token provider returned message: ‘400The namespace ‘WorkflowDefaultNamespace’ does not have a valid issuer that can be used to issue tokens. Add a valid issuer with a valid signature to the namespace”

    To resolve, had to run:
    ​​Remove-SBNamespace -Name ‘WorkflowDefaultNamespace’
    New-SBNamespace -Name ‘WorkflowDefaultNamespace’ -AddressingScheme ‘Path’ -ManageUsers ‘domain\service account’,’domain\some admin account’ -Verbose;

    Rejoice!

    And then some more debugging why existing SP 2013 workflows cannot be initated and presents such a message (some info removed):

    SocialRESTExceptionProcessingHandler.DoServerExceptionProcessing – SharePoint Server Exception [Microsoft.Workflow.Client.ScopeInactiveException: Scope ‘/SharePoint/default//’ is not in an active state. Its current state is ‘Unregistering’. HTTP headers received from the server – ActivityId: . NodeId: . Scope: /SharePoint/default//. Client ActivityId : —> System.Net.WebException: The remote server returned an error: (403) Forbidden. ​

    To fix that, do with PowerShell:

    $cred = Get-Credentials
    Remove-WFScope -ScopeUri “https://:12290/SharePoint/default//” -Credential $cred​
    New-WFScope -ScopeUri “https://:12290/SharePoint/default//” -Credential $cred -DisplayName “” -UserComment “”

    And afterwards republish SP2013 Workflows.

    Rejoice again!

  2. jefferyland says:

    My suspicion as to the disparity in steps for restoring a WF farm is that it has changed slightly from version to version. I had to run through a restore of another farm a few months ago and referencing the steps that I had written down resulted in some extra steps required to successfully restore the farm. Definitely a very frustrating experience.

  3. Dom says:

    Does the restoration also restore specific workflow instances that were already running or do they get terminated and have to be started again?

  4. jefferyland says:

    I haven’t looked into it specifically but I’m pretty sure that all instances would have to be started again.

  5. Richard says:

    Hi,

    I was following your guide in restoring the workflow manager. I already completed restoring the SBFarm and I am now at the Restore-WFFarm. In this line of code I am encountering this error. I also now imported all the certificates from the other machine. Is there some configuration that I might have missed?

    Restore-WFFarm : The token provider was unable to provide a security token while accessing
    ‘https://testsite:9355/WorkflowDefaultNamespace/$STS/Windows/’. Token provider returned message:
    400The namespace ‘WorkflowDefaultNamespace’ does not have a valid issuer that can be used
    to issue tokens. Add a valid issuer with a valid signature to the
    namespace..TrackingId:0e98e5e0-c47f-4505-9e95-11ad524c56dd_Gtestsite,TimeStamp:4/21/2017 6:41:21
    AM’.
    At line:1 char:1
    + Restore-WFFarm -InstanceDBConnectionString “Data Source=TestDB\SharePoint;Init …
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : OperationStopped: (:) [Restore-WFFarm], UnauthorizedAccessException
    + FullyQualifiedErrorId : WFRuntimeSettingFailed,Microsoft.Workflow.Deployment.Commands.RestoreWFFarm

  6. jefferyland says:

    Did you specify the FarmCertificateThumbprint and EncryptionCertificateThumbprint when you were running the Restore-SBFarm command? If so, then the next step would be to make sure that you had run them with the correct certificates. Also make sure your certificates are imported. For some additional details check this post from Microsoft’s blogs: https://blogs.msdn.microsoft.com/biztalknotes/2014/05/14/workflow-manager-disaster-recovery/

  7. Richard says:

    Yes, I already updated the SBFarm using the 2 certificate. I also imported the certificate used for the WFFarm including the OutBoundCertificate.

    I noticed the comment above that we have the same error. It’s a bit weird, I followed his suggestions.
    I tried to execute the Remove-SBNamespace -Name ‘WorkflowDefaultNamespace’ but it says it does not exists.
    When I tried to execute the New-SBNamespace -Name ‘WorkflowDefaultNamespace’ -AddressingScheme ‘Path’ -ManageUsers ‘domain\service account’,’domain\some admin account’ -Verbose; the response is that the ‘WorkflowDefaultNamespace’ does exists.
    Also when I run the Get-SBNamespace, I do not get a response/result.

    Could this be related as well?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

wordpress visitor counter

RSS Subscriptions

Contact Me

%d bloggers like this: