Risks of ‘Black Box’ Machine Learning in Compliance And Privacy Programs


By Daniel Fabbri, CEO of Maize Analytics,
Assistant Professor of Biomedical Informatics
and Computer Science at Vanderbilt University

Recent machine learning advances have the potential to revolutionize patient care through better clinical risk prediction and precision medicine. Rightfully so, the compliance and privacy communities are adapting these machine learning methods to help protect patient data. While these technologies will likely help detect and prevent future breaches [1], careful consideration must be taken to understand the risks of these machine learning methods when applied to compliance and privacy programs.

Healthcare providers access electronic medical records systems millions of times per day, which are recorded in audit logs. Manual processes to review these audit logs for inappropriate behavior do not scale. Machine learning algorithms have the potential to automate the detection of snooping, identify theft and other threats by learning characteristics of good, bad and anomalous access patterns. However, many types of modern machine learning models are uninterpretable to humans. As a result, the ‘black box’ machine learning models make it so that compliance and privacy officers do not know which ‘privacy policies’ the system is applying, nor if they are correct.

The interpretability of machine learning models is an active area of research. While some types of machine learning problems can be sufficiently addressed with predictions without explanations describing why the prediction is made, this paradigm is risky for compliance and privacy problems. An informal adage from the HHS Office for Civil Rights (OCR) is: “What is your policy and can you demonstrate you are following your policy to regulators?” If you cannot state what the machine learning algorithm is doing, how can you define what your policy is or even defend it to regulators?

The lack of interpretability also raises concerns about incorrectly learned privacy policies. Consider a training data set in which most accesses to hypertension patients are appropriate. Would the machine learning algorithm learn a policy that states that “all accesses to hypertension patients are appropriate?” Obviously, a diligent compliance officer would not want to deploy such a broad and arbitrary policy. Unfortunately, the compliance officer may have no means to identify or remedy these issues.

Machine learning algorithms for compliance and privacy may be better applied if they keep the compliance and privacy officer “in the loop.” Compliance officer in-the-loop machine algorithms leverage large-scale data analytics to identify trends and patterns in access data, but then recommend the policy (or reason for appropriate or inappropriate access) to the compliance officer [2, 3]. The compliance officer then has the opportunity to accept the policy or reject it. As such, the compliance officer is setting the policy. The auditing system can then apply the learned policy going forward. This supervision allows compliance officers to not only defend their policies if audited by the OCR, but also take advantage of a broad class of available machine learning algorithms today.

Machine learning and artificial intelligence are extremely useful tools to help compliance officers audit at scale. However, when left unchecked, policies can be incorrectly learned, leaving the hospital at risk. Be sure you can explain and defend exactly what and why your tool makes decisions to the OCR.


[1] Fabbri D, Frisse M, Malin B. The Need for Better Data Breach Statistics. JAMA Internal Medicine. 2017.

[2] Fabbri D, LeFevre K. Explaining accesses to electronic medical records using diagnosis information. Journal of the American Medical Informatics Association. 2013.

[3] Fabbri D and LeFevre K. Explanation-based auditing. Proc. VLDB. 2012


Comments are closed.