Yesterday I talked a bit about the idea of sharing information for the “good of the many” – there is a lot to making good on the promise of this type of aggregate information. For example, just making the information between systems able to be used in any rational way is going to be a huge hurdle.
If we start with the premise that we even want to do this in the first place – the sharing of information on the whole, to glean new levels of insights, and we solve the issues of privacy… blah, blah, blah..
Then it comes down to the mechanics a bit. If you think about how to get information from all these sources put together in a way that it can be analyzed – it’s a huge undertaking.
I think that was one of the original pulls of XML – that it would be self-describing and hopefully, with enough schema work completed, it would present some normalized views of denormalized information. I’ve worked through several projects where we tried to use existing schemas for data structures, it *sounds* so obvious and clear what is needed. I mean, how many ways can we describe name and address information?
It turns out, many. In my opinion, that approach isn’t going to get it done.
That means we’re left with smarter routines that can sniff out data structures, links and realational bits so we can get to those all-important “accidental” relationships. Those are the things we don’t normal see as related when we look at the information on an individual or small scale basis.
“Huh. Did you know that something like 20% of the people that have XYZ happen to have been exposed to polyester AND cell phones AND wool socks in the last year?” Of course other things like accidental treatment regimes start to appear too. This person on medication to treat that other thing also saw some improvements in other areas… and so did these other unrelated patients from a different country altogether.
It has the potential to be magic.
But how do we, as data platform people, make that happen? I think it’s going to come down to AI, and patterns that can be recognized, if not instantly related. If we can find out how to make that happen, then this sharing of information can start to make sense. You don’t need anything personally identiable in medical RESEARCH records. You need symptoms, treatments, outcomes, etc. Things that don’t need a “head” on them (an identity) only the facts.
Taking that proverbial hill though, the one where we have to figure out how we teach computers to learn and infer and extrapolate (wow, twice in 2 days), is a big deal. I’ve seen hints of it. I’ve seen some really intriguing work by projects like Bing (recognizing that search traffic and other things it sees can be glued together to make predictions of public interactions) and Watson (Jeopardy, other massively cool learning systems).
I suspect that part of our job going forward will more and more to be to create platforms that participate and contribute to these types of massive data stores that are interpreted and analyzed and learned from constantly.
I think that has the potential of being pretty cool.