The Office of the Information Commissioner (“OAIC”) has released a consultation draft Guide to big data and the Australian Privacy Principles (the “Guide”). The draft Guide has been released at a time when many Australian businesses are exploring the potential of Big Data analysis for their business, and are grappling for the first time with some of the associated data management risks.
There is no single legislative regime regulating Big Data activities. However, Australia’s web of privacy and other data management laws, including the Privacy Act 1988 (Cth) (“Privacy Act”), will impact on many Big Data activities. With this in mind, the OAIC has developed the Guide to provide suggestions on how to conduct Big Data analysis in a way that satisfies and complies with existing laws protecting personal information and individuals’ right to privacy.
The Guide itself will not be binding, and will simply serve as a point of reference about the approach the OAIC will take when exercising its regulatory authority over Big Data activities. The Guide outlines the key privacy requirements that are relevant to the conduct of Big Data activities by such entities and provides some useful guidance for entities in the implementation of “privacy by design” in their culture and privacy compliance management processes. In this post, we provide an overview of what the OAIC means by “Big Data” and summarise some of the key recommendations from the Guide for entities involved in Big Data activities.
Big Data 101
Big Data is a broad term that refers to the collection, aggregation and analysis of large data sets (including personal information) on a mass scale. While there is no authoritative definition, the Guide refers to Big Data as being characterised by the “three Vs” identified by Gartner:
“high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight, decision making, and process optimization”
A key difference in the way that Big Data makes use of such “information assets” is by applying algorithms to analyse and find previously unidentified correlations between data sets. This process also enables users to re-cut and mine existing data sets in innumerable ways, and can also involve the creation of data sets by aggregating existing data. This is different to how data was analysed in the past, when particular hypotheses were tested against discrete data sets. Big Data analysis techniques have enabled businesses to gain unprecedented insight into consumers and markets, identify trends and opportunities for innovation, and answer previously unanswerable questions (for instance, what is the correlation between unscented body lotion, extra-big bags of cotton balls, hand sanitizers and washcloths? Click here to find the answers!)
The usefulness of Big Data is by no means limited to consumer marketing. Thanks to Big Data, decoding the human genome originally took 10 years to process, and can now be achieved in less than a day. Elite sports players are using data to dictate calorie intake, training levels and even fan interaction to optimise their performance on the field (a la Brad Pitt playing baseball manager Billy Bean in Moneyball). Astonishingly, despite recent advancements, it is estimated that at the moment only 0.5% of all data is every analysed and used so there is still huge potential for Big Data to unlock new insights. (It may be for these reasons that The Harvard Business Review has dubbed “Data Scientist” as the sexiest job of the 21st Century).
But, as we’ve learned from Spiderman, with great power comes great responsibility…
Key recommendations from the Guide
There are three overarching recommendations contained in the Guide:
- The adoption of “privacy by design”, a holistic approach where privacy is “integrated and embedded in an entity’s culture, practices and processes, systems and initiatives from the design stage onwards”. The Guide encourages entities to embed privacy by design into their businesses, including by implementing the four steps outlined in the Privacy Management Framework, which focusses on the importance of effective leadership and governance, the development of policies and risk management protocols, and procedures for continually evaluating the entity’s systems and responses to privacy issues in a proactive and forward thinking way. This reflects the near impossibility of “retrofitting” a privacy compliance framework to an existing project – appropriate privacy compliance can really only be assured if it is factored into the project design from the very beginning.A focus on implementing data protection measures by integrating compliance safeguards into organisations’ business practices is also reflected in the EU’s General Data Protection Regulation 2016/679, which will apply from 25 May 2018. Article 25 of the Regulation requires entities to adopt data protection measures “by design and by default”, including by implementing measures to ensure that personal data is only processed to the extent, in the amount and for the period necessary for the specific purpose of the relevant processing.
- The importance of conducting privacy impact assessments (“PIAs”) as part of regular risk privacy compliance management and planning programs, but also as a way to identify key risks and actions to be taken in relation to Big Data activities.
As above, the Guide itself will not be binding and entities cannot therefore be compelled to comply with its recommendations in respect of PIAs. However, it is worth noting that the Privacy Act currently gives the Information Commissioner the power to require a government agency to undertake a PIA. While this does not apply to private sector organisations at present, it may do so in the future.
Australian Privacy Principle 5 requires an entity that collects personal information about an individual to either notify or ensure that individual is aware of certain matters, including the purposes of collection. And Australian Privacy Principle 6 imposes limitations on how entities may use the personal information they collect, so that any use beyond the primary purpose of collection or a related secondary use that the relevant individual would reasonably expect will be difficult. To establish an appropriate “reasonable expectation” about how information may be used, it is important for entities to clearly set out their intentions in their privacy collection notices.
Given Big Data analysis will rarely be a primary purpose of collection, it will be important for entities to craft their collection notices in a way that allows maximum flexibility for any future analysis that they may wish to undertake. The Guide points out that research suggests that many people don’t read privacy notices, so the Guide recommends that entities consider how to ensure that all relevant information is included in their notices in an easy to read, dynamic and user centric way. This may require a shift away from the largely passive way in which many entities currently provide information about their collection and use of personal information (e.g. through links on their website) towards more active engagement with their customers to explain what will happen with the data they provide.
The Guide also suggests that entities should consider allowing individuals to choose what types of uses and disclosures they are willing to agree to. While this may appear attractive in theory, this could present tremendous difficulties for organisations trying to manage large data sets – very sophisticated data management tools will be needed to sift and sort data that are subject to different use rights. Unless an entity is able to precisely track what uses and disclosures have been approved for each user record, then the entire data set may effectively become unusable. For these practical reasons, we expect that organisations will want to keep the scope of permitted uses as consistent as possible across all of the data that they collect.
On a related theme, the Guide also notes that Big Data activities often involve the aggregation of data sets from multiple sources, and recommends that when conducting their PIAs, entities will therefore also need to consider the circumstances in which data was collected by relevant third parties. While not expressly stated in the Guide, this will also be an important factor to consider when negotiating data sharing contracts – counterparties to these arrangements will want to seek appropriate assurances from one another that the information being shared has been collected in a way that complies with all relevant laws and allows the type of use that the parties are contemplating. As flagged above, it is critical to get things right at the initial collection stage, as correcting compliance issues by seeking consents at a later stage can be difficult and prohibitively costly.
- The importance of de-identification of personal information, together with the risks of re-identification.
Entities considering undertaking Big Data activities should first consider whether de-identified information could be used (such that the information is no longer about an identifiable individual and therefore not personal information under the Privacy Act). If data has been appropriately de-identified, then it can be used, shared and published without jeopardising individuals’ privacy rights. It also means that there is a lower risk of personal information being compromised in the event of a data breach. For the OAIC’s guidance on the de-identification process, see Privacy business resource 4: De-identification of data and information and Information policy agency resource 1: De-identification of data and information.
On the flip side is the need to manage the risk of re-identification. As part of their PIAs, entities should consider what de-identification techniques they will use, how the relevant data will be handled, and whether it will be disclosed to another entity for Big Data purposes. This is an important risk mitigation strategy because where de-identification is not done properly, big data activities may lead to re-identification of personal information, in which case the entity’s obligations under the Privacy Act will apply to its use of that data. Perhaps ironically, the more data that becomes available (particularly via the internet) the higher the likelihood that data can be aggregated and cross-referenced, and personal information re-identified. As such, the more sophisticated entities become in applying Big Data analysis techniques, the more difficult it may be to produce a data set that has in fact been effectively de-identified. Entities will need to find the right balance between retaining data in a form that is still useful (as there will be a point at which, for purposes of de-identification, data becomes so abstracted that it is no longer the source of any insight) while also managing privacy compliance risks by removing as much personally identifiable information as possible.
This is by no means a comprehensive list of the OAIC’s recommendations, and interested parties are encouraged to read the full Guide here. The closing date for public comments is Monday 25 July 2016.