In large-scale projects, we typically implement several data preparation steps before releasing the data to end-users for querying and exploration. These can be summarized as follows: first, we meet with the stakeholders to understand the scope of the project. We would then gain access to the data, which could be in multiple systems and formats, and set up a staging environment for data sets aggregation. At this point we would evaluate the existing data models and systems architecture, and remodel the data relationships if necessary. Afterward, we initiate the the data cleaning and preparation process. Please note that these steps can involve as much or as little of your internal IT resources as you prefer.
What differentiates us from other cleaning and standardization solutions are two things: 1.) our award-winning data manipulation tools that cater to people with or without an IT background, and 2.) the mechanism to encourage and incentivize active involvement of scientific end-users from the use of these tools throughout an adaptive data preparation process.
In a traditional data cleaning and standardization process, scientist end-users, who are the domain knowledge experts and final data consumers, usually have to wait for months, if not years, to gain access to prepared data derived from multiple active and legacy systems. During this period of data curation process, while IT or data managers might have limited or full access of the data, scientists usually have little or no access to any data at all.
With our unique Labmatrix tools, we have created a new workflow to engage and incentivize the scientists at much earlier stages of the data preparation process, so that both the IT team and scientists may mutually benefit from an expedited process. We enable this workflow by allowing the users, with or without previous programmatic or coding experience, to easily create diagrammatic data queries using a graphical interface. In essence, raw and/or messy data sources are aggregated, sometimes through cross-system federation, and accessed centrally by the users to gain a perspective of the data landscape and understand just how "messy" the data might be. This has an inherent advantage of allowing the scientific domain experts to give further guidance on the adaptive data preparation process, such as directing more focus in certain areas, as well as eliminate data sets that are of low interest or low quality. As a result, IT efforts are better optimized for curation of relevant data sets, while scientific users are able to get their hands on data earlier to preview, explore, visualize, and generate new ideas from the contents of less-than-perfect data.
Under the hood, Labmatrix has a set of powerful exploratory and data curation technologies that allow for dynamic, yet easy-to-learn data querying and data manipulation processes; in other words, while scientists or other non-IT personnel with no programmatic background are able to pick up the basics of creating graphical queries and data-content exploration just after a few minutes of training, Labmatrix also equips IT and power users with a comprehensive set of functions that allows for iterative data cleaning to achieve internal data consistency, as well as data standardization to achieve organizational integration and systems interoperability for projects involving internal and external resources. After initial data sets are consolidated into the Labmatrix platform via several import options (such as file, ETL, and API mechanisms), power users can further curate data sets by utilizing functions such as regular expression string manipulation formulas, complex filtering, pattern matching, case statements, pivot transformations, and compound join conditions. Views and snapshots of data sets at various stages of preparation can be saved natively in Labmatrix as checkpoints for presentation to end-users; therefore, existing staged data sets can be subjected to additional iterative manipulations (as needed) in a single, flexible access-controlled environment. Furthermore, draft and prepared data sets at all levels of readiness stored in Labmatrix can be easily exported to and imported from analytics tools with ease. We would be happy to go into greater technical details, should you wish to schedule an appointment with us.
Labmatrix is extremely impressive in terms of features and is the only commercial system we've seen that focuses specifically on whole-lifecycle translational research. - Academic Translational Research Center