Spring & Splunk logging tips

The most critical piece of the debug process

Yat Man, Wong
3 min readOct 2, 2022
Filter out what is useful from the logs

Use Case

The backend application I work on use Splunk for the logging service. Splunk provide an easy way to view our logs once our application is deployed on Azure.

Splunk has a powerful search and filter feature. It is also the root of our bug discovery and onCall process. We are generating alerts to email and OpsGenie if we detect error logs within some time interval.

Make sure the logs are from where you expected

We made the mistakes of not including the application version, environment, and instanceId on the older release.

This become a huge problem when we run into error: “The session limit for the database is 1200 and has been reached.” It took us a long time to realize, for some reason there were multiple instances of our application started and running in different environment on one deploy. We would have realized sooner if we included env and instanceId on every log.

How to log environment variables

SPLUNK_PATTERN is what we will see in the splunk website. Below is the syntax to log environment variables.

(blame Medium for not letting paste as text so I have to do image instead)

We are deploying on Azure, so we can check what variables are available in our resource group -> advance tool -> kudu.

How to log application version

The goal is to make splunk log automatically update with pom.xml project version

I happens to have an answer here

If you do them both, this is the example you should see

{ [-]
logger: com.xyz.xyzscenario.common.RequestInterceptor
message: 19 Nov 2022 20:43:00 | com.xyz.xyzscenario.common.RequestInterceptor:getAuthRoles() | instanceId=852c22ea883c78a54ba3d4215fecb95f8f697f2e180227342478a9a8f902b2fd | env=xyz-am-aas-xyzscenarioapi-dev.azurewebsites.net | version:3.2.0-SNAPSHOT | msg=Received a request without token, request=/, granted no role |
severity: WARN
thread: http-nio-19982-exec-3
time: 1668919380.384
}

Generate a dashboard to track API response time

One useful usecase of Splunk is to gather operational metrics like API response time.

All api go through APIM and has a log that includes response time.

We want a dashboard of all API response time in a nice table

From sample events:

{ 
durationMs: 83
properties: {
url: https://mywebsite/v1/organization/41547/buildings
}
correlationId: e581d476-fa5f-4023-a53e-53d6e06734ae
}

The difficult part is we want to group url with ids as one url. Otherwise we will have many rows of organization/id/buildings/id when they should be the same row.

This is where the splunk command to replace by regex come in. This replace command match any string 15 characters or more between 2 “/” (match Mongo Id), also match any numbers after a /

replace(endpoint, "(\/[0-9a-zA-Z]{15,}|\/\d+)", "/{id}") 

These turn all these organization/id/buildings/id url into 1 field call “endpoint”. And we can then group by this endpoint field

The corresponding query is

{base search}
| rex field=properties.url ".+v1/(?<endpoint>.+)"
| eval endpoint=replace(endpoint, "(\/[0-9a-zA-Z]{15,}|\/\d+)", "/{id}")
| stats
count(durationMs) AS #calls
exactperc99(durationMs) as P99 exactperc95(durationMs) as P95 exactperc90(durationMs) as P90 exactperc50(durationMs) as P50 by endpoint, properties.method
| sort endpoint

--

--

Yat Man, Wong

Android developer, problem solver, real man in training