Troubleshooting the O365 Message Reporting Add on for Splunk

Periodically this thing dies on me. It happened again, so here’s my notes.

Messages stopped coming in, I got an alert, and found this log:

2019-11-11 13:53:56,750 DEBUG pid=20951 tid=MainThread file=connectionpool.py:_new_conn:809 | Starting new HTTPS connection (1): reports.office365.com
2019-11-11 13:53:57,019 DEBUG pid=20951 tid=MainThread file=connectionpool.py:_make_request:400 | https://reports.office365.com:443 "GET /ecp/reportingwebservice/reporting.svc/MessageTrace?$filter=StartDate%20eq%20datetime'2019-10-22T02:53:08.114678Z'%20and%20EndDate%20eq%20datetime'2019-10-22T03:08:08.114678Z' HTTP/1.1" 200 216
2019-11-11 13:53:57,022 DEBUG pid=20951 tid=MainThread file=base_modinput.py:log_debug:286 | No messages returned.  Setting max date to 2019-10-22 02:54:08.114678

The “No messages returned.” bit was the kicker. Lies!

I tried the same query in Postman and it worked fine. It’s a pretty simple basic auth endpoint where you do a GET request to the URL with the filter you want, and it returns a nightmare of XML.

Last time this broke, it was something in the “state” it was keeping, so on to fixing that.

Get the current “state” objects - run this from the server you’re running the TA on. If you don’t have jq on all your servers, get it, it’s great!

# curl -sk -u admin https://localhost:8089/servicesNS/nobody/TA-MS_O365_Reporting/storage/collections/data/TA_MS_O365_Reporting_checkpointer | jq
Enter host password for user 'admin':
[
  {
	"state": "{\"max_date\": \"2019-10-22 02:27:08.114678\"}",
	"_user": "nobody",
	"_key": "myinput_obj_checkpoint"
  }
]

Well, that’s definitely old, but it was only updating by a minute when it wasn’t getting emails, so it’s a bit behind.

Time to update it to something recent and see what happens… I made a lookup “ta_ms_o365_reporting_checkpointer” for this troubleshooting so I can do the queries.

SettingValue
TypeKV Store
Collection NameTA_MS_O365_Reporting_checkpointer
Supported Fields_key, state

Now to pull the data from the interface… do the following search:

| inputlookup ta_ms_o365_reporting_checkpointer where _key=* | rename _key as Key | table Key state

Results:

Keystate
myinput_obj_checkpoint{“max_date”: “2019-10-22 02:27:08.114678”}"

We know it’s accessible now. Let’s update it.

| inputlookup ta_ms_o365_reporting_checkpointer where _key=myinput_obj_checkpoint 
| eval state="{\"max_date\":\"2019-11-09 22:45:57.781817\"}" 
| outputlookup ta_ms_o365_reporting_checkpointer append=false

The search example came from the outputlookup documentation page. Basically:

  1. Pull the data
  2. Update a field
  3. Push it back in

Be aware: if you have multiple configured inputs, this’ll blat state for them all; I only had one so that’s OK. I’m sure you’ll work it out if you need to. \

Suffice it to say, it started working on the next poll and kept going. Woo!

2019-11-11 14:09:57,728 DEBUG pid=4079 tid=MainThread file=base_modinput.py:log_debug:286 | Number of messages returned: 338
2019-11-11 14:09:57,862 DEBUG pid=4079 tid=MainThread file=base_modinput.py:log_debug:286 | Max date after getting messages: 2019-11-06 14:24:50.973122


#splunk #o365 #troubleshooting #office #API