API Analytics - Smart Diff

Smart Diff is a first of its kind feature that uses artificial intelligence and data mining to find potential root causes for API issues so you don’t have to manually look at millions of events.

How it works

Smart Diff mines your API data to learn unknown or hidden patterns that may exist in your data which cause a particular outcome. You can use Smart Diff to greatly reduce the debugging time, or use it to find key findings to build a better API such as answering questions like What are the main factors that drive higher API adoption among my highest paying users?

Smart Diff works in the opposite direction of other ML techniques like anomaly detection and classification. For example, once Moesif detects a set of API calls as anomalous, you can use Smart Diff to find out why and how they are anomalous.

There are two components to Smart Diff:

  • The filtered dataset to run the job on which is a filtered view of your API data. For example, we can filter our API data to only look at API calls where verb == PUT and route == /settings/preferences

  • The target filters is the subgroup of events which contains the outcome for what you are trying to find the cause for. For example, if we want to find out what drives 400 errors for our PUT /settings/preferences API calls, we would add a target filter response.status == 400

Smart Diff will then find the rules or attributes that are highly correlated with causing your target group filter with high confidence.

While we use the term correlation, the displayed percentage is not the actual correlation as we use a blend of factors to get better confidence.

Configuring your Smart Diff Job

First, go to the Smart Diff view from the Event Analytics menu dropdown. You have all the same filters as other views like Event Stream. Configure the filters for the initial dataset. You should use predefined filters that can narrow the scope of Smart Diff to a dataset that looks similar (in normal operation). For example, you may set the HTTP verb and route since you know the response should be roughly similar. This allows Smart Diff to really hone in on root causes rather than look at noise.

At the same time, it helps to have a decent amount of data. If there are less than a few hundred events, Smart Diff may not find meaningful patterns

After you set up your filters, click the Create Smart Diff job button, and you’ll be prompted to configure the criteria for the target group. In example below, we want to know what causes the Elapsed Time to be slower than 600 ms.

Events that match your target group criteria will partition the dataset into the target group, while any event that don’t match the target criteria will be bucketed into a non-target group.

Create Smart diff Job

Once your target filters are set, click run. The job can take 1 to 3 minutes or more depending on the original dataset size. Please be patient.

If the dataset chosen is too small or the target group filters have contradictions causing no events to match the target group, the job can fail. In which case, reconfigure your search filters and target group and try again.

How to interpret the results

When results are produced, it will be a list of property rules, like below. Each row is ranked by potential impact with the first one the highest ranked. Below is an example.

Smart diff API Analytics Report

  • Attributes is the list of discovered rules that drives the target/outcome.
  • Positive Correlation is how strongly the attributes are correlated with being in the target group. Higher means stronger correlation.
  • Negative Correction is how strongly the attributes are correlated with not being in the target group.
  • Target Coverage is what percentage of events in the target group that have these attributes. E.g. if the target coverage is 100%, it means all events of the target group have these attributes. If many of the top ranked discovered rules have very low coverage, you may need to filter the initial dataset and try again.
  • Non-Target Coverage is what percentage of the events not in the target group that have these attributes. E.g. if the non-target coverage is 100%, it means all events that aren’t in then target group have these attributes.

Updated: