As we’ve seen from the Edward Snowden incident and subsequent revelations by the US National Security Agency, as well as seemingly ever-larger and more complex data breaches in the private sector, one of the biggest challenges in the world of big data is finding an appropriate balance between security and privacy. Ever more data is being stored about all of us, and the bigger, more expansive, and more dynamic the data store, the greater the challenge in affording adequate protection to both individuals and data-holders.
CERN is challenging the limits of big data storage systems as they generate a petabyte of information every second in the quest to discover the origins of the universe. The organisation, in conjunction with the European Commission, established a Grid computing system that taps into computing power from data centres in 11 different countries in order to be able to store even a fraction of the data being collected.
While CERN is something of an extreme example, the EU is looking at a number of legislative measures to address the data storage / security challenge, including requiring companies to delete information that might enable an individual to be identified. However, a Harvard University study last year found that individuals in a genetics database could be re-identified by cross-referencing that database with only three items of information – zip code, data of birth, and gender – with a 42% accuracy rate. Adding a first name or nickname – something that’s easily extractable from many email addresses – increased that re-identification rate to 97%.
Data security and privacy: can we have both?
Some approaches to data security recommend distributed storage of information across multiple locations to minimise the risks to that data, but such a strategy could equally lead to increased vulnerability simply because it is harder to monitor multiple locations than a single location.
Given the challenge of storing a vast pool of information in a single physical location, organisations are turning to the cloud and commodity systems. While this approach makes for simpler management, a single software vulnerability could enable hackers to illegally access the entire system.
In the early days of big data, storage security was less of an issue, as the only organisations collecting and storing it were large corporations and government agencies with proprietary infrastructures isolated from the operational networks. As cloud storage became available, these organisations were able to transition to private clouds without needing to make too many adjustments to their data protection practices.
But in today’s hyper-digital world, even small companies are collecting huge amounts of data about customer behaviour, marketing campaign responses, market trends, and more. Unfortunately, those companies are storing their data in public cloud environments, where security is a whole different matter. The traditional security systems smaller companies use to secure static data on semi-isolated networks are not up to the task of securing big data in a public cloud.
In our next post, we’ll take a closer look at the options available today to secure big data – and where the future of big data security may lie.