Data set transformations allow admins to create new (derived) data sets by applying various operations such as filtering, field manipulation, and joining. To create, edit, delete, schedule, or execute a data set transformation, go to Admin→Data Set Transformations.

To create a new transformation, select a desired type in the drop-down located on the right side. Once configured, execute a transformation by clicking the run button at the associated row in the list table. Each transformation produces a result data set with a configurable data set id, name, and storage type.

Transformations can be scheduled for periodic execution (daily at a specified time) and support stream-based processing for large data sets.

Note
Data set transformations can be created and executed only by admins.

Copy

Creates a complete copy of a source data set. This is useful as a starting point for further modifications or as a backup before applying destructive changes.


Filter

Creates a new data set containing only the rows that match specified filter conditions. This allows you to create subsets of data based on criteria such as field values, ranges, or combinations using AND logic.


Drop Fields

Creates a new data set with specified fields removed. Use this to strip unnecessary or sensitive columns from a data set.


Rename Fields

Creates a new data set with specified fields renamed using old-to-new name mappings. This is useful for standardizing field names across data sets or making them more descriptive.


Change Field Types

Creates a new data set where specified fields have their data types changed (e.g., from String to Integer, or from Integer to Enum). This is helpful when automatic type inference did not produce the desired result.


Change Field Enums

Updates enumeration values of specified fields in-place (without creating a new data set). Use this to relabel categories, e.g., renaming "0" and "1" to "Control" and "Case".


Infer

Re-infers field types, enumerations, and statistics from the source data. This is useful after manual data modifications or when the original type inference needs to be refreshed.






Merge Multi Data Sets

Merges multiple data sets by appending rows (union-like operation) with explicit field name mappings. Use this when combining data sets that have different field names but represent the same data.


Merge Fully Multi Data Sets

Merges multiple data sets with automatic field matching. All fields from all source data sets are included in the result, with automatic alignment of matching field names.


Match Groups With Confounders

Creates a new data set by matching groups while controlling for confounding variables. This is commonly used in clinical studies to create balanced cohorts (e.g., matching cases and controls by age and gender).