A New Model for Sharing Insights While Protecting Privacy
Last week at JP Morgan, Mayo Clinic announced a new collaboration with nference that I would describe as "Cloud-hosted, de-identified, federated learning in which the tools are brought to the data instead of sending data to the tools"
This Healthleaders article describes it well.
Here's a broad overview. Let's start with 3 containers.
The first container is controlled by Mayo Clinic, holds identified data, and has one purpose - the development and optimization of de-identification algorithms. Selected data scientists, who are accountable to Mayo Clinic, are asked to help with algorithms via time limited, audited access to the container. They are either Mayo staff or collaborators from outside who are trained in Mayo policies and held accountable to the same requirements as Mayo employees. No data ever leaves this container.
The second container is a controlled by Mayo Clinic, holds identified data, and has one purpose - running the perfected de-identification algorithms and producing a de-identified data set. That de-identified data set is moved to the third container.
The third container is for running innovative applications brought to Mayo by partners offering unique analytics on the de-identified data. No data leaves this container, the applications are brought into it. A joint tenancy model enables the container to be run by Mayo Clinic but others to be given limited, audited use of the container to run their applications. The only thing that ever exits the container are data insights or knowledge. For example, if nference is asked a question about drug discovery, its machine learning/natural language processing software in the container can pose the question. The answer is shared but not the data used to generate the answer.
To me, this is the perfect balance of agility, innovation, and privacy protection. I've worked in many organizations and not experienced a design that has so many safeguards against data leakage.
We'll populate the third container with our first wave of de-identified data later this year. I'll continue to report on our progress.