ETL Pipelines with OODT, Solr and Stuff – Tom Barber
Discover a number of Apache projects you may not have heard of and how they can help you process both Clinical and non Clinical data. Apache OODT developed by NASA allows users to ingest and store files and metadata along with process workflows. OODT along with CTakes allows us to extract clinical information from files and then process them and allow end users access to the extracted data.
We can then take these sources and manipulate them further creating a highly flexible ETL pipeline offering reliability and scalability. Backed by Apache SOLR users can then interrogate the data via web interfaces and instigate further post processing and investigation.
Of course you may not have a clinical use case, but the platforms can be repurposed and will allow you to go away and build your own, scalable data pipeline for processing and integstion.