Monday, February 22, 2016

A Risk Based Datasource

I am often consulted for how, from an architecture perspective, we might raise the bar on handling sensitive data. In this case, let's say the data is social security numbers(SSN). In other words, how can we handle SSN consistently within an application and then promote the patterns and any reusable implementations to the enterprise?

The heart of the matter is most often improving the handling of sensitive data between the application and the datasource (let's assume it's a back-end database).

I think there are a number of ways that this can be approached. I propose a very simple solution: use the tools you already have.

First, I need to rant about how various vendors sell database security features and the failures of many to understand attack patterns.

No, full disk encryption isn't a silver bullet. If you're seeking a solution that addresses smash-and-grab theft or any mishandling of the hardware, then you have a solution.  Definitely do this when there's sensitive data on a drive.  However this is probably the least of your worries since your data centers are in distinct locations with elevated physical security and processes for handling decommissioning hardware (true, right?).

So does it solve data at-rest encryption requirements? Well, maybe enough to pass an audit. But what have you really done to limit access to data? The DBAs can see it. The OS administrators can see it. Anyone with an account can see it. Are all of these people properly provisioned? Is it required that they are able to see it to fulfill their business function?

Proprietary databases with baked-in encryption aren't all that much better. It's possible that you could keep your OS administrators out with some solutions that tie decryption to external trust authorities (e.g., Vormetric). But you still have the DBAs. But say we trust both of them enough and decide that's enough to address the risk. You still have all the service accounts (applications connected to it) and maybe the occasional business reports user. Does the database happily decrypt the data so long as the caller is authenticated? It probably does, so this doesn't do so much after all.

I saw an Oracle presentation a while back and spotted a lie in their diagrams. They talked up all of these features to address security concerns, but every diagram showed people interacting with databases. So that's the occasional business reports accounts. Great. That still leaves 99.9% of the traffic coming from those service accounts. The honest picture, in many cases, would show people interacting with databases by way of a big, sloppy, crappy application with a service account between it and the database on one side and on the other side, the unwashed masses.

Why does this matter? Of course it's because the real users of this data don't know the service account credentials or rather haven't proven they deserve to have access from the database perspective. They have proven that they can get into the application that uses the service account. In other words, the database defers authorization to data, including sensitive data, to the that clumsy beast that your company coded in-house with developers who have little appreciation of security (whether you can change this is another blog post). So, no, you can't say Bob logged into that app and this query coming from service account user MyBigAppUser is legitimate for Bob to execute. Sorry, Oracle but this is the real problem that needs to be addressed.

So what can we do without paying more to the companies with the big booths at the RSA Conference? Or more to-the-point, what can we do without writing database drivers or going completely off the rails trying to come up with something we imagine is perfect?

The solution is to dislodge the developers from their stoop. Force them to put on gloves when they handle sensitive data. Make them use a toothbrush to keep things clean. Make them use a special datasource when the risk of accessing data is higher.

How? Make it so that the service account, MyBigAppUser, can't see ANYTHING that is sensitive.   Use view if you have that option. Create a view so that if they ask for it, they get back nonsense or nothing at all or an error (in-app honeypot?).

So how do they get at sensitive data? They use a second datasource with a separate service account. Let's call it MyBigAppSecureUser. This user can see sensitive data. The data is encrypted at-rest and decrypted when it's queried. But this is audited in a well monitored, secure logging store (centralized, hard to corrupt). It could also have limits, like never returning more than one record because there is no use case that would demand this.... or maybe 10, or 20 only. Whatever the case limit it. You could also insist on more arguments that permit us to be reasonably sure that this data belongs to Bob. Do it transparently, derived from a verified authentication token and call it SQL augmentation. Review and review again the smaller code set that touches this datasource.  Push dynamic testing add it turned up to brute force and fuzzing.  Have security analysts pen test this over and over each time a change is released.

Now the developer has to groan and think about handling sensitive data. They have to go to the special datasource, the special data layer and ask for this special data. When they do this, you also tell them to create an audit log from the application or you provide a way to do it transparently.

When the attacker comes along with a SQL injection that works, maybe they wind up hitting the non-sensitive datasource. Hopefully this alone raises alarms.  If they hit the sensitive datasource, alarms definitely ring both from activity in the app and in the database. Spotting the anomaly is now easier since we don't mingle all those queries that involve non-sensitive data with those that involve sensitive data.

If queries to the sensitive datasource come from a compromised application server, this can be detected in the logging and app monitoring.  The queries will lack correlation to front-end activity.  Be sure to watch for this and explain this threat to the log management team who will implement the correlation, monitoring, and alerting.

Another bonus: encrypt the secure data at-rest and leave the other unencrypted. Now you're encrypting only data that is known to be sensitive. If sensitive data leaks out the other side, your Data Loss Prevention tooling in a discovery model can catch it.

Is it perfect? No. Is it elegant? Elegant enough, until you finally decide to dismantle your monolithic app in favor of a service design. To push it further without a full-on rewrite, you could make sensitive data functionality a whole separate app, a nanoservice... but that's another post.

(Note that this post was drafted a couple of years ago.  However, I still find it very relevant and hope that others do as well.)