Home » Posts tagged 'disastery recovery'

Tag Archives: disastery recovery

Advertisements

Rescue that Workflow Manager from certain doom, or at least get that OutboundCertificate fixed!

SharePoint 2013 and Workflow Manager have always proven to be a winning combination for late nights of

Doom I tell you. Doom!

Doom I tell you. Doom!

troubleshooting involving copious amounts of coffee and a complete loss of sleep. Or whatever may be your preferred means of caffeine intake. Plus your spouse may not appreciate yet another late night without you at home. Workflow Manager is a frustrating, burdensome beast, and it is not on the fun sunny side of life.

In this particular instance we are needing to resuscitate Workflow Manager from its current undead state. It pretends to be running and responsing to your commands. The users believe otherwise, and are wanting to lynch you because their workflows are showing angry messages about no longer being able to talk with the server. Looking at the server you find that the management databases are shot and cannot be worked with in their current state. In my recent case in particular, it was due to a fabulous mixture of expired certificates, revoked certificates and certificate templates that “update” your current certificates to certificates that are incompatible with Workflow Manager. This restore is also a method that can be used to replace the OutboundCertificate in the Workflow Manager farm if the Set-WFNextOutboundCertificateReference and Set-WFNextOutboundCertificateAsCurrent are not working for you.

Microsoft has a pretty decent article on disaster recovery for Workflow Manager 1.0. The problem I found with it as that it was incomplete, so thusly why I am putting together this post. Now, the topology we are working with in this scenario is a single SharePoint 2013 server, with a separate single SQL server, and a separate single Workflow Manager server. This scenario also requires you to have either working backups of your Workflow Manager databases, or that only the WFManagementDB and/or SbManagementDB are the only shot databases. You do have valid backups of everything, right? Go check again, right now, just to be safe. If you are doing a restore on a farm of multiple Workflow Manager servers then you may need a few extra steps to update those servers to the new databases. Also, check and make sure your certificates are up to date and that you know which service accounts are in use on your Workflow Manager farm and what their passwords are.

If you’re skipping ahead to the details on how to do this, here is where you need to start paying attention!

First off we need to uninstall Workflow Manager. Hopefully an easy enough step. If you’re installing 1.0 Refresh and you’re running Service Bus 1.0 then this would be a good time to move to Service Bus 1.1. It worked flawlessly for me when I did this. If that is the direction you are going to go then uninstall Service Bus 1.0

Next step! Let’s install Service Bus 1.1 followed by Workflow Manager Refresh 1.0. Hopefully that went smoothly for you.

Now we need to get the Service Bus farm up and running. Check your SQL server and make sure you remove your SbManagementDB and your WFManagementDB, just in case those still exist. Alternatively when rebuilding things you could name the databases something else, but I don’t see much of a point to that as it will just cause confusion further down the line. Identify your service account you are using for Service Bus and then we’ll get the database recreated. Pop open PowerShell and run

Import-Module ServiceBus

Restore-SBFarm -RunAsAccount DOMAIN\servicebussvc -GatewayDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbGatewayDatabase;Integrated Security=SSPI;Asynchronous Processing=True” -SBFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -FarmCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D -EncryptionCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D

That will get your replacement SbManagementDB created. The output of a successful run of the command will look something like the following, which don’t you love how on very critical commands like this it defaults to Yes?

This operation will restore the entire service bus farm
Are you sure you want to restore the service bus farm?
[Y] Yes [N] No [S] Suspend [?] Help (default is “Y”):
FarmType : SB
SBFarmDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated
Security=True;Asynchronous Processing=True
ClusterConnectionEndpointPort : 9000
ClientConnectionEndpointPort : 9001
LeaseDriverEndpointPort : 9002
ServiceConnectionEndpointPort : 9003
RunAsAccount : DOMAIN\servicebussvc
AdminGroup : BUILTIN\Administrators
GatewayDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SbGatewayDatabase;Integrated
Security=True;Asynchronous Processing=True
HttpsPort : 9355
TcpPort : 9354
MessageBrokerPort : 9356
AmqpsPort : 5671
AmqpPort : 5672
FarmCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
EncryptionCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
Hosts : {}
RPHttpsPort : 9359
RPHttpsUrl :
FarmDNS :
AdminApiUserName :
TenantApiUserName :
BrokerExternalUrls :

The Service Bus farm has been successfully restored.

Note that it will complain if SbManagementDB already exists, so you will have to delete it or name this one something new. Now we’ll connect in the SbGatewayDatabase.

Restore-SBGateway -GatewayDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbGatewayDatabase;Integrated Security=SSPI;Asynchronous Processing=True” -SBFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated Security=SSPI;Asynchronous Processing=True”

This operation will restore the Service Bus gateway database. This may require upgrading of gateway database and
message container databases.
Are you sure you want to restore the Service Bus gateway database?
[Y] Yes [N] No [S] Suspend [?] Help (default is “Y”):
Re-encrypting the global signing keys.
The following containers database has been restored:
WARNING: Failed to open a connection to the following dB: ”
WARNING: The database associated with container ‘1’ is not accessible. Please run Restore-SBMessageContainer -Id 1
-DatabaseServer <correct server> -DatabaseName <correct name> to restore container functionality.
Id : 1
Status : Active
Host :
DatabaseServer :
DatabaseName :
ConnectionString :
EntitiesCount : 13
DatabaseSizeInMB : 0

Restore-SBGateway : The operation has timed out.
At line:1 char:1
+ Restore-SBGateway -GatewayDBConnectionString “Data Source=sql.jefferyland.com; …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Restore-SBGateway], SqlCommandTimeoutException
+ FullyQualifiedErrorId : Microsoft.Cloud.ServiceBus.Common.Sql.SqlCommandTimeoutException,Microsoft.ServiceBus.Co
mmands.RestoreSBGatewayCommand

Do not be alarmed by the scary messages in there. I was alarmed at first but apparently everything went well. Next check your SQL server for SBMessageContainer* databases and you’ll need to run this command for each one. At least, according to Microsoft’s documentation. According to the command I ran it wasn’t necessary.

Restore-SBMessageContainer -Id 1 -SBFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -ContainerDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SBMessageContainer01;Integrated Security=SSPI;Asynchronous Processing=True”

Id : 1
Status : Active
Host :
DatabaseServer : sql.jefferyland.com
DatabaseName : SBMessageContainer01
ConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SBMessageContainer01;Integrated
Security=True;Asynchronous Processing=True
EntitiesCount : 13
DatabaseSizeInMB : 48.6875

All entities are up to date. No changes were made to entities.
Please run Start-SBHost.

Now we need to add our host to the Service Bus farm.

Add-SBHost -SBFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -RunAsPassword (ConvertTo-SecureString -Force -AsPlainText password!) -EnableFirewallRules:$true

FarmType : SB
SBFarmDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SbManagementDB;Integrated
Security=True;Asynchronous Processing=True
ClusterConnectionEndpointPort : 9000
ClientConnectionEndpointPort : 9001
LeaseDriverEndpointPort : 9002
ServiceConnectionEndpointPort : 9003
RunAsAccount : DOMAIN\servicebussvc
AdminGroup : BUILTIN\Administrators
GatewayDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=SbGatewayDatabase;Integrated
Security=True;Asynchronous Processing=True
HttpsPort : 9355
TcpPort : 9354
MessageBrokerPort : 9356
AmqpsPort : 5671
AmqpPort : 5672
FarmCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
EncryptionCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
Hosts : {Name: workflow.jefferyland.com, Configuration State: HostConfigurationCompleted}
RPHttpsPort : 9359
RPHttpsUrl : https://workflow.jefferyland.com:9359/
FarmDNS :
AdminApiUserName :
TenantApiUserName :
BrokerExternalUrls :

We’ve finished up the Service Bus farm, hopefully successfully, so now we’re ready for the Workflow Manager farm. Fighting!

This can get a little bit messy if you’re running Service Bus 1.1 as there is a buggy cmdlet. If you’re not using Service Bus 1.1, or you do not receive an error like

Could not load file or assembly
'Microsoft.ServiceBus, Version=1.8.0.0, Culture=neutral,
PublicKeyToken=31bf3856ad364e35' or one of its dependencies.
The system cannot find the file specified.

then you can skip the following. If we are using Service Bus 1.1, then we need to work around a call to an old ServiceBus assembly in one of the cmdlets. Thanks to these posts, http://www.wictorwilen.se/issue-when-installing-workflow-manager-1.0-refresh-using-powershell and https://carolinepoint.wordpress.com/2012/07/10/sharepoint-2010-powershell-and-bindingredirects/ we have a valid work around.

Create or edit a file named C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe.config and paste the following into it:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<configuration>
<runtime>
<assemblyBinding xmlns=”urn:schemas-microsoft-com:asm.v1″>
<dependentAssembly>
<assemblyIdentity name=”Microsoft.ServiceBus”
publicKeyToken=”31bf3856ad364e35″
culture=”en-us” />
<bindingRedirect oldVersion=”1.8.0.0″ newVersion=”2.1.0.0″ />
</dependentAssembly>
</assemblyBinding>
</runtime>
</configuration>

Then restart your PowerShell session to make this active. You may want to undo this part after you’re done restoring the farm just to be safe.

Continuing on with the farm build run the following.

Import-Module WorkflowManager

Restore-WFFarm -InstanceDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=WFInstanceManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -ResourceDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=WFResourceManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -WFFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=WFManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -OutboundCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D -EncryptionCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D -SslCertificateThumbprint 814AA8261BE6F0DD9031F802A4D26EBAD020770D -InstanceStateSyncTime (Get-Date)  -ConsistencyVerifierLogPath “C:\temp\wfverifierlog.txt” -RunAsAccount DOMAIN\workflowsvc -Verbose

A successful run through should get you output similar to this:

VERBOSE: [5/14/2015 11:56:58 PM]: Created and configured farm management database.
VERBOSE: [5/14/2015 11:56:58 PM]: Created and configured Workflow Manager resource management database.
VERBOSE: [5/14/2015 11:56:58 PM]: Created and configured Workflow Manager instance management database.
VERBOSE: [5/14/2015 11:56:58 PM]: Configuration added to farm management database.
VERBOSE: [5/14/2015 11:56:58 PM]: Workflow Manager configuration added to the Workflow Manager farm management
database.
VERBOSE: [5/14/2015 11:56:58 PM]: New-WFFarm successfully completed.
FarmType : Workflow
WFFarmDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=WFManagementDB;Integrated
Security=True;Asynchronous Processing=True
RunAsAccount : DOMAIN\workflowsvc
AdminGroup : BUILTIN\Administrators
Hosts : {}
InstanceDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=WFInstanceManagementDB;Integrated
Security=True;Asynchronous Processing=True
ResourceDBConnectionString : Data Source=sql.jefferyland.com;Initial Catalog=WFResourceManagementDB;Integrated
Security=True;Asynchronous Processing=True
HttpPort : 12291
HttpsPort : 12290
OutboundCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
Endpoints : {}
SslCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False
EncryptionCertificate : Thumbprint: 814AA8261BE6F0DD9031F802A4D26EBAD020770D, IsGenerated: False

This will get our WFManagementDB recreated as well. Time to add the host back in!

Add-WFHost -WFFarmDBConnectionString “Data Source=sql.jefferyland.com;Initial Catalog=WFManagementDB;Integrated Security=SSPI;Asynchronous Processing=True” -RunAsPassword (ConvertTo-SecureString -Force -AsPlainText password!) -EnableFirewallRules:$true

This should have your farm up and running. Let’s check the status.

Get-WFFarmStatus

HostName ServiceName ServiceStatus
——– ———– ————-
workflow.jefferyland.com WorkflowServiceBackend Running
workflow.jefferyland.com WorkflowServiceFrontEnd Running

Restoration is done! This is where Microsoft’s documentation leaves you hanging. You need to reconnect the farm with SharePoint.

Register-SPWorkflowService -SPSite “https://sharepoint.jefferyland.com/&#8221; -WorkflowHostUri “https://workflow.jefferyland.com:12290&#8221; -AllowOAuthHttp -Force

Your workflows should now be showing up once again but we’re not done yet, we need to perform some maintenance on the SharePoint server. First clean-up the old certificates using the thumbprint of the old certificate for your filtering criteria:

Get-SPTrustedRootAuthority | ?{$_.Certificate -match “BF5CA00B6A639FE5B7FF5688C9A38FEBFBF03552”} | Remove-SPTrustedRootAuthority -Confirm:$false

Next we need to run some jobs to update the security token, otherwise you’ll get a HTTP 401 Invalid JWT token error. Alternatively you can wait until after midnight for the timer jobs to run themselves, but I’m pretty sure that would not be the healthiest decision here.

In Central Administration go to Monitoring->Timer Jobs:Job Definitions
Run these jobs:
Refresh Trusted Security Token Services Metadata feed.
Workflow Auto Cleanup
Notification Timer Job c02c63c2-12d8-4ec0-b678-f05c7e00570e
Hold Processing and Reporting
Bulk workflow task processing

Now check in your workflows. They should be running nice and healthy! That wraps up this post on rescuing your Workflow Manager farm and saves you from losing a night or two of sleep.

Advertisements
%d bloggers like this: