Security incidents happen. And when they do, they need to be dealt with— quickly. That’s where detection comes into play. The faster incidents are detected, the faster they can be handed off to the security team and resolved. To make detection as fast as possible, teams are usually aided by monitoring infrastructure that fires off an alert any time something even slightly questionable occurs. These alerts can lead to a deluge of information, making it difficult for engineers to sift through. Even worse, a large number of these alerts are false positives, caused by engineers arbitrarily running sudo
-i
or nmap
.
Ignoring some of these alerts is tempting. After all, for every alert that involves a person, a member of the security team needs to manually reach out to them. More alerts means more work: we all know that Chris runs nmap
about six times a day, and the SREs need to run sudo
fairly often. So we can just ignore those alerts, right? Wrong. This sets a dangerous precedent that never ends well. There’s a clear need for a system that can reduce the burden of alerts for the security team.
A year ago, Slack set out to tackle this very issue. Instead of manually reaching out to employees to verify their actions, they built an automated system designed to reach out and send aggregate results back to the security team. We were inspired: what if our team at Dropbox created an automated, distributed alerting bot of our own. Could we reduce the burden of alerts for our security team, and help them sort through alerts faster than ever before? To answer that question, we developed and deployed Securitybot, and found out that yes, we could.
But we didn’t want to stop there. As a founding member of the TODO Group (short for Talk Openly, Develop Openly), we are committed to sharing our knowledge with the greater tech community through support of open source projects. So, today we are also open sourcing our implementation in the hopes that other companies can benefit from what we’ve built.
Efficient incident detection
One of the hardest, most time-consuming parts of security monitoring is manually reaching out to employees to confirm their actions. Despite already spending a significant amount of time on reach-outs, there were still alerts that we didn’t have time to follow up on. We wanted to implement a system that would reach more users while allowing us to spend more time on other things, like building better detection tools and proactively hunting for bad actors.
Securitybot now finds its place in our alert detection chain. Soon after an alert is fired, an employee receives a message asking them to confirm whether they performed a potentially malicious action. Their response is then stored and later sent to the security team. Alert rollups are later augmented with employees’ responses to the bot. In the event where an employee reports that they did not perform an action, the security team is alerted immediately. This is meant to keep most alerting in the background but to surface the alerts that truly require prompt attention and follow-up. Rather than spending their time repeatedly reaching out, our security engineers now have more time to work on foundational projects that improve our overall security posture.
Design
When designing Securitybot, we wanted to hit on all the key points from Slack’s post. And the core ideas are retained: Securitybot is tied into our detection and alerting system and our company-wide Slack instance. Upon getting an alert, the bot contacts whoever triggered the alert and logs the response for the security team. However, we also wanted to extend the design to make it more useful to Dropbox and ideally the community at large. The goal was to make our implementation modular and reusable. For instance, if we shift chat platforms or monitoring systems, we wanted to be able to do so without rewriting the core code. Securitybot was designed around a set of core functions that reach out to monitoring and communication systems via a set of simple, composable plugins.
Securitybot moves between grabbing new alerts from our monitoring tools and communicating with employees. Whenever a new alert is encountered, it’s logged and a message is queued for whomever triggered it. Regular polling ensures that we get alerts promptly and can deal with them as soon as possible. Later, when responses are collected, they’re brought back into our monitoring system to be available alongside the rest of our alerts.
Securitybot ensures that user interaction is prompt and streamlined. For each alert, we simply ask an employee whether they triggered it and for a brief explanation. These are then aggregated back into our monitoring infra so that when we review hourly or daily aggregations all of the responses are right there for review. Responses are secured via 2FA, so even if an attacker managed to compromise Slack as well, they couldn’t fool the bot.
Finally, we’ve added a bit of user friendliness. Rather than bombard employees with messages, we let most alerts “snooze.” If we ping you for using sudo
, there’s a good chance you may be using it again in the future. So, we don’t bother you for some period of time, because we can be pretty sure three sudo
s in a row, in the same context, are all you.
Effectiveness
First and foremost, Securitybot helps the security team sort through alerts faster than ever before. False positives are resolved without needing to reach out to employees, and possible incidents are immediately escalated.
Securitybot not only helps the security team, but all Dropbox employees. Responding to a polite chat bot is much easier than responding, in full sentences at that, to a member of the security team. It not only saves our security engineers time but also all of our employees. (After all, it’s not just production engineers — with the bot we can alert on anomalous events within employees’ e-mail and Dropbox accounts as well unusual activities on their laptops.)
We understand the annoyance of having to respond to a nagging security team, and having an unfeeling bot that doesn’t understand “I’m busy, I’ll get to it later” doesn’t make things much better. So, we devoted some time to workshop the interaction between bot and user to ensure that it would be sufficiently pleasant to deal with. We wanted to make our bot polite and cordial rather than blunt and robotic. It turned out that giving a bit of personality to our interactions moved the bot from “annoying” to “adorable.”