BizTalk is used by organizations to automate and digitize manual work, integrate applications, map data, implement new processes and so on. BizTalk is either entrenched as a legacy system with a varying degree of complexity, and often lack of institutional knowledge, or documentation, or as greenfield set-ups. In any case BizTalk is a critical part of the IT infrastructure supporting business processes owned by Line-Of-Business (LOB) owners.
Transactions processed by BizTalk is increasingly real time and often spans across an organization's value chain to partners, suppliers and customers. BizTalk becomes critical for revenue generation, productivity, and performance issues will likely also impact corporate reputation and employee morale. Hence, performing BizTalk Health Checks at regular intervals should be a recurring task for all IT departments responsible for Microsoft’s BizTalk server. However, for some organizations the cost of developing Health Checks using external partners may be prohibitive.
"Performing BizTalk Health Checks at regular intervals should be a recurring task for all IT departments responsible for Microsoft’s BizTalk server."
At AIMS we help customers automate the process of developing BizTalk Health Checks. And we go beyond the traditional scope of looking at BizTalk from a developer / operations perspective. The crucial part is how BizTalk support business processes that support revenue generation or productivity in the organization. Hence, the focus of the Health Check should always be on business processes supported and involve Line-Of-Business owners of these processes.
A Health Check of your BizTalk should take a holistic, or 360 degree view which means that you need to source “all” relevant data to capture ”unknowns”. “Unknows” are situations you are not aware of or issues that you would not normally think could happen. Those are typically the situations that a Health Check should uncover. A Health Check covering known situations has limited value.
Gaining access to current and historical data for a full 360 degree view is the first hurdle. And the data needs to be available for business processes / messaging patterns in BizTalk. That is not the topic of this blog, but AIMS is the only automated tool to provide that insight in the market.
To develop a holistic / 360 degree Health Check of your BizTalk we follow 9 Steps. Depending on findings during these steps we may adjust or add tasks to further identify poor performing orchestrations, other performance problems, impact and dependencies. All the Health Checks we develop can be used by AIMS customers free on demand and they are all fully automated.
Identify the overall message / transaction cyclicality and latency vs hardware performance.
The purpose of looking at the overall message / transaction cyclicality is to identify any critical load (and message growth) on hardware that is causing performance problems. The correlation of message / transaction count and latency is key but cannot be relied on independently without looking at potential poor code causing performance issues irrespective of hardware. Health checks should look into several time periods: daily cycles, weekly cycles and quarterly (90 days) at least.
Identify top business processes by message count and latency
By identifying the top business processes by message count and latency we are able to identify trends in count, volume/size and latency by message pattern or business process which is key to LOB owners. To review latency in a BizTalk environment without understanding which pattern / business process the latency occurs is of little use. Different processes will have different acceptance levels for latency (SLA) depending on the criticality of the business process supported. Revenue generating processes, for example payment processing from a web shop, will have a very low latency requirement while a batch process will accept material latencies.
Identify any hardware bottlenecks correlated with top business processes
Drilling down from understanding the overall message cyclicality and the cyclicality on business processes we look at identifying if particular processes / patterns cause any performance issues correlated with high load on hardware. The purpose here is to look for the business processes causing any degradation due to high load on hardware. One single non-critical process that impacts hardware can indirectly impact critical processes utilizing the same hardware.
Investigate any throttling on hosts
Any latency and performance issue in processes may cause throttling on hosts. First we look into the number of hosts throttling before we dive into identifying the likely cause by looking at individual hosts and message publishing throttling state and message delivery throttling state. (more on hosts in step 8 below)
Drill down into any latency by business process to identify root cause in ports & orchestrationsAfter identifying business processes / patterns with performance issues we dive into the ports & orchestrations in that pattern to identify the latency source. Usually this is an orchestration.
Identify any spikes in errors or anomaliesA standard part is to look at the development in number of errors and performance anomalies. Here we are looking for correlation between errors, performance anomalies and performance issues identified.
Look at started & stopped components as potential source of issues
Started and stopped components could easily be the cause of any performance issues or errors. Started and stopped components are in general a sign of either a badly engineered environment or frequent code deploys / maintenance.
Drill down into performance of host instances to identify host performance problemsIn step 4 we looked at identifying hosts with throttling, but we do not stop at throttling at the host level. We also cover several other performance parameters for hosts where we find issues. Typical parameters we review are active instance count, database size, message publishing delay, message delivery delay, message delivery incoming rate, message delivery outgoing rate, process memory usage, suspended messages, resident orchestrations, queue length, inbound latency and outbound latency.
Look into correlation with dependent or connected systems
Systems are typically integrated through BizTalk. This could be anything from custom systems to the usual off-the-shelf solutions such as SAP, Dynamics and Salesforce. It is a common pain for BizTalk teams that other IT or business teams point fingers at BizTalk if there are any issues. So, to help the BizTalk teams we identify the specific ports that interface these systems and provide insight to understand message count peaks, changes in message size and message latency. By creating this insight we often identify that the cause of an issue may actually reside with a different team internally or with an external party (customer, supplier). Well, we do not event stop there. Depending on which systems the customer is using AIMS for we would potentially be looking for correlations and cause-effect between BizTalk and SQL or any other system connected to AIMS – both internal systems or potentially beyond to customer or supplier end-points.
We are not limited to the steps above, but they represents the typical steps we take and based on findings we add specific steps that are applicable for the specific customer environment.
Last step is to wrap up the charts in a nice pdf or Powerpoint layout and add commentary, key findings and recommendations. We typically conclude this set-up in about 1-2 hours for a customer where we are not familiar with their environment. From that point on the customer can access the Health Check report as a standard Dashboard in AIMS and can also chose to e-mail subscribe to the report and get it at regular intervals.
We have developed a large number of BizTalk Health Checks and we have never failed to identify potentially critical performance issues that were unknowns to the responsible BizTalk dev or ops team.