Troubleshooting the O365 Message Reporting Add on for Splunk

Periodically this thing dies on me. It happened again, so here’s my notes.

Messages stopped coming in, I got an alert, and found this log:

2019-11-11 13:53:56,750 DEBUG pid=20951 tid=MainThread file=connectionpool.py:_new_conn:809 | Starting new HTTPS connection (1): reports.office365.com
2019-11-11 13:53:57,019 DEBUG pid=20951 tid=MainThread file=connectionpool.py:_make_request:400 | https://reports.office365.com:443 "GET /ecp/reportingwebservice/reporting.svc/MessageTrace?$filter=StartDate%20eq%20datetime'2019-10-22T02:53:08.114678Z'%20and%20EndDate%20eq%20datetime'2019-10-22T03:08:08.114678Z' HTTP/1.1" 200 216
2019-11-11 13:53:57,022 DEBUG pid=20951 tid=MainThread file=base_modinput.py:log_debug:286 | No messages returned.  Setting max date to 2019-10-22 02:54:08.114678

The “No messages returned.” bit was the kicker. Lies!

I tried the same query in Postman and it worked fine. It’s a pretty simple basic auth endpoint where you do a GET request to the URL with the filter you want, and it returns a nightmare of XML.

Last time this broke, it was something in the “state” it was keeping, so on to fixing that.

Get the current “state” objects - run this from the server you’re running the TA on. If you don’t have jq on all your servers, get it, it’s great!

# curl -sk -u admin https://localhost:8089/servicesNS/nobody/TA-MS_O365_Reporting/storage/collections/data/TA_MS_O365_Reporting_checkpointer | jq
Enter host password for user 'admin':
[
  {
    "state": "{\"max_date\": \"2019-10-22 02:27:08.114678\"}",
    "_user": "nobody",
    "_key": "myinput_obj_checkpoint"
  }
]

Well, that’s definitely old, but it was only updating by a minute when it wasn’t getting emails, so it’s a bit behind.

Time to update it to something recent and see what happens… I made a lookup “ta_ms_o365_reporting_checkpointer” for this troubleshooting so I can do the queries.

Setting Value
Type KV Store
Collection Name TA_MS_O365_Reporting_checkpointer
Supported Fields _key, state

Now to pull the data from the interface… do the following search:

| inputlookup ta_ms_o365_reporting_checkpointer where _key=* | rename _key as Key | table Key state

Results:

Key state
myinput_obj_checkpoint {“max_date”: “2019-10-22 02:27:08.114678”}”

We know it’s accessible now. Let’s update it.

| inputlookup ta_ms_o365_reporting_checkpointer where _key=myinput_obj_checkpoint 
| eval state="{\"max_date\":\"2019-11-09 22:45:57.781817\"}" 
| outputlookup ta_ms_o365_reporting_checkpointer append=false

The search example came from the outputlookup documentation page. Basically:

  1. Pull the data
  2. Update a field
  3. Push it back in

Be aware: if you have multiple configured inputs, this’ll blat state for them all; I only had one so that’s OK. I’m sure you’ll work it out if you need to.

Suffice it to say, it started working on the next poll and kept going. Woo!

2019-11-11 14:09:57,728 DEBUG pid=4079 tid=MainThread file=base_modinput.py:log_debug:286 | Number of messages returned: 338
2019-11-11 14:09:57,862 DEBUG pid=4079 tid=MainThread file=base_modinput.py:log_debug:286 | Max date after getting messages: 2019-11-06 14:24:50.973122


#splunk #o365 #troubleshooting #office #API