This blog is a part of a series written about the open-source data ingestion engine Alfred. For an overview of Alfred read this blog. You can also learn what Alfred means for data scientists, and data stewards.
CapTech has recently announced our new open-source product, Alfred. Alfred was created to streamline the data scientist's workflow and help provide relief to strained IT departments. In the pre-Alfred world, if a data scientist needed data to be moved into a data lake, they would have a complex procedure that could take weeks to get up and running. Alfred, on the other hand, allows data scientists to explore new data and see if it has value. This saves time for the data scientist, the data and IT groups, and the data stewards.
With this independent, sandbox approach, IT and data stewards only need to be brought in when data scientists decide the data may contain insights and want to load it into a production zone. After this the data steward can rest assured that the ingestion engine for Alfred is designed to validate against the rules added to Alfred. During data exploration, the sandbox keeps the data contained to an area restricted for the data scientist to prevent pollution within the data store.
Alfred doesn't supplant your current data tools. Alfred isn't a data catalog like Atlas or Collibra that indexes your data after it's in a data store - it's a gatekeeper for incoming data. Alfred isn't a transfer tool like Sqoop or Flume - it's designed to ingest data into a target location to allow your scientists and analysts to work with it. It works with the Apache Hadoop open-source stack but can be applied to any type of data sources and targets. Alfred doesn't replace your data scientists, your data stewards, nor your data engineers. What it does is provide a tool that allows your employees to be more effective and reduce unneeded overhead.
Alfred is available on GitHub. Click here.