Leveraging Information Together With People To Accelerate Information Scientific Discipline (Dr. Laura Haas)

Last calendar week Dr. Laura Haas gave a distinguished speaker serial utter at our department. Laura is currently the Dean of the College of Information as well as Computer Sciences at University of Massachusetts at Amherst. Before that, she was at IBM for many years, as well as most latterly served every bit the Director of IBM Research’s Accelerated Discovery Lab. Her utter was on her experiences at the Accelerated Discovery Lab, as well as was titled "Leveraging information as well as people to accelerate information science"

Accelerated Discovery Lab

The mission of the lab was to "help people acquire insight from information -- quickly".
The lab aimed to deal the technology scientific discipline as well as information complexity, as well as then that clients tin focus on solving their challenges. This chore involved providing the clients with:

  1. expertise: math, ml, computing
  2. environment: hosted large information platforms
  3. data: curated & governed information sets to provide context
  4. analytics: rich collection of analytics as well as tools.

By providing these services, the lab "accelerated" some xxx projects. The utter highlighted 3 of them:

  • Working alongside CRM as well as social media information, which involved managing complex data.
  • Working alongside a major seed company, which involved analyzing satellite imaging, as well as deciding what/how to plant, as well as choosing the workflows that provide auditability, quality, accuracy.
  • Working alongside a drug company, which involved "big" collaboration across various teams as well as organizations.

The Food Safety Project

The volume of the utter focused on the food-safety case-study application that the lab worked on.

Food security is an of import problem. Every year, inwards the USA solitary nutrient poisoning affects 1/6th of people, causes $50 ane one one thousand thousand ilnesses, 128K hospitalizations, 3K deaths, as well as costs 8 billion dollars.

Food poisoning is caused yesteryear contaminants/pathogens introduced inwards the supply-chain. Laura mentioned that the solid soil of the fine art inwards nutrient testing was based on developing a suspicion as well as every bit a effect testing a culture. This agency yous demand to know what yous are looking for as well as when yous are looking for it.

Recently deoxyribonucleic acid sequencing became affordable & fast as well as enabled the plain of metagenomics. Metagenomics is the study of genetic cloth (across many organisms rather than a unmarried organism) recovered straight from environmental samples. This enabled us to  construct a database of what is normal for each type of nutrient bacteria pattern. Bacteria are the tiny witnesses to what is going on inwards the food! They are canary inwards the coal mine. Change inwards the bacteria population may hollo for to several pathologies, including atomic number 82 poisoning. (Weimer et al. IBM Journal of R&D 2016.)

The large information challenge inwards nutrient safety

If yous receive got a prophylactic sample for metagenomics, yous tin aspect to consider 2400 microbial species. As a result, ane metagenomics file is 250 GB! And fifty metagenomics samples effect inwards 149 workflows invoked 42K times producing 271K files as well as 116K graphs.
(Murat's inner voice: Umm, goodness luck blockchainizing that many large files. Ok, it may live on possible to shop solely the hashes for integrity.)

Another large challenge hither is that contextual metadata is needed for the samples: when was the sample collected, how, nether what conditions, etc.

Data lake helps alongside the administration tasks. A information lake is a information ecosystem to acquire catalogue, govern, regain utilisation information inwards contexts. It includes components such as

  • schema: fields, columns, types, keys
  • semantics: lineage, ontology
  • governance: owner, risk, guidelines, maintenance
  • collaboration: users, applications, notebooks

Since the information collected involves sensitive information, the employees that receive got access to the information were made to sign away rights for ensuring privacy of the data. Violating these price constituted a firable offence. (This doesn't seem to live on a stone company procedure though. This relies on goodness intentions of people as well as the coverage of the monitoring to preserve the day.)

The large analytics challenge inwards nutrient safety

Mapping a 60Gb raw test-sample information file against a 100Gb references database may accept hours to days! To brand things operate information as well as references files proceed getting updated.

The lab developed a workbench for metagenomic computational analytics. The multipurpose extensible analysis platform tracks 10K datasets as well as their derivation, performs automatic parallelization across compute clusters, and  provides interactive UI as well as repeatibility/transparency/accountability. (Edund et.al ibm magazine 2016)

The large collaboration challenge inwards nutrient safety

four organizations worked together on the nutrient security project: IBM, mars petfood companionship (pet-food chains were the most susceptible to contamination), UC Davis, as well as Bio-Rad labs. Later Cornell also joined. The participants spanned across US, UK, as well as China. The participants didn't utter the same language: the disciplines spanned across business, biology, bioinformatics, physics, chemistry, cs, ml, stat, math, as well as operations research.

For collaboration, e-mail is non sufficient. There are typically some 40K datasets, which ones would yous live on emailing? Moreover, emailing also doesn't provide plenty context virtually the datasets as well as metadata.

To back upward collaboration the lab built a labbook integration hub. (Inner voice: I am relieved it is non Lotus Notes.) The labbook is a giant cognition graph that is annotatable, searchable, as well as is interfaced alongside many tools, including Python, Notebooks, etc. Sort of similar Jupiter Notebooks on steroids?

Things learned

Laura finished alongside some lessons learned. Her accept on this was: Data scientific discipline needs to live on holistic, incorporating all these three: people, data, analytics.

  • People: interdisciplinary hard, social practices/tools tin help
  • Data: information governance is hard, it needs policies, tools
  • Analytics: many heterogenous ready of tools demand to live on integrated for treatment uncertainty

As for the future, Laura mentioned that existence real wide does non work/scale good for a information scientific discipline organization, since it is difficult to delight every one. She said that the accelerated regain lab volition focus on metagenomic as well as materials scientific discipline to rest to a greater extent than relevant.

MAD questions

(This subdivision is hither due to my New Year's resolution.)

1. At the query answer fourth dimension for the talk, I asked Laura virtually IBM's recent force for blockchains inwards nutrient security as well as supply-chain applications. Given that the utter outlined many difficult challenges for nutrient security render chains, I wanted to larn virtually her views on what roles blockchains tin play here. Laura said that inwards nutrient supply-chains at that topographic point were some problems alongside faked provenance of sources as well as blockchains may assist address that issue. She also said that she won't comment if blockchain is the correct fashion to address it, since it is non her plain of expertise.

2. My query was prompted yesteryear IBM's recent tweet of this whitepaper which I works life to live on overly exuberant virtually the role blockchains tin play inwards the supply-chain problems.(https://twitter.com/Prof_DavidBader/status/964223296005967878) Below is the Twitter telephone commutation ensued afterward I took number alongside the report. Notice how mature @IBMServices line of piece of work organisation human relationship is virtually this? They pressed on to larn to a greater extent than virtually the criticism, which is a goodness indicator of intellectual honesty. (I wishing I could receive got been a chip to a greater extent than restrained.)


I similar to write a post virtually the utilisation of blockchains inwards supply-chains. Granted I don't know much virtually the domain, merely when did that always terminate me?

3. This is a niggling unrelated, merely was at that topographic point whatsoever application of formal methods inwards the supply-chain occupation before?

4. The utter also mentioned a chip virtually applying datascience to mensurate contributions of datascience inwards these projects. This is done via collaboration line analysis: who is working alongside who, how much, as well as inwards what contexts? A noteworthy finding was the "emergence of novel vocabulary correct earlier a regain events". This rings real familiar inwards our enquiry grouping coming together discussions. When nosotros notice an interesting novel phenomenon/perspective, nosotros are forced to give it a made-up name, which sounds clumsy at first. When nosotros proceed referring to this given name, nosotros know at that topographic point is something interesting coming upward from that front.

0 Response to "Leveraging Information Together With People To Accelerate Information Scientific Discipline (Dr. Laura Haas)"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel