# List of GUI modules for DAV³E

Most of DAV³E's functionality is contained in separate modules which represent the steps during data evaluation. Each module modifies specific parts of the whole project so that the user is guided through the process efficiently. The modules are collected in a frame, the [mainGUI](maingui), which provides some basic functionality relevant during the whole process.

* [Import](#import)
* [Preprocessing](#preprocessing)
* [Time correction](#time-correction)
* [Cycle ranges](#cycle-ranges)
* [Grouping](#grouping)
* [Features](#features)
* [Model](#model)
* [Apply and combine models](#apply-and-combine-models)

## Import
The Import module is the main way of getting raw data into DAV³E. When starting on a new project, the first action is usually to import sensor data. A click on "import sensor" opens up a file choosing dialog where every kind of raw data file compatible with DAV³E can be chosen - selection of multiple files is possible. During import, DAV³E might ask for the exact type of the file if it couldn't identify it clearly. The list includes many proprietary file formats, but the easiest way to get data into DAV³E is a CSV file with one cycle per row, or a MAT file with an arbitrary number of MATLAB matrices with one cycle per row. After successful import, all sensors show up in the main table in this module. Sensors are grouped in clusters, which are initially chosen depending on the file type. All sensors within one cluster must have unique captions. Also, their number of data points, cycles, sample rate, and time offset from the beginning of the measurement are all the same. If a sensor does not fit into its cluster, it can be moved to another or a new cluster by right-clicking the sensor in the table and choosing the new cluster in the "move to" option. This context menu also enables duplication of the sensor (e.g. if different preprocessing steps are to be applied) as well as complete removal.

Certain types of clusters (again, determined by the imported file type) define virtual sensors. These are signals that were not actually measured, but can be computed from another or a combination of other sensors, like conductance from resistance, or a signal from its raw signal and exponent. Such a virtual sensor can be added (or removed) to fitting sensors clusters via a click on "add virtual sensor". In the opening dialog, all possible choices for virtual sensors are listed and can be (un)checked. After clicking "OK", the chosen virtual sensors appear in the table, marked as "virtual". Otherwise, they can be used exactly as normal sensors from then on. Some virtual sensors have parameters which might need tuning. This is done by selecting the virtual sensor in the list and changing the parameters in the property grid on the right. This property grid also enables changing the caption of the sensor, its cluster and its measurement.

Furthermore, each sensor has an abscissa sensor which can be changed in the property grid as well. By default, every sensor has "virtual time" as abscissa. This, together with "virtual datapoints", are two virtual sensors included in each cluster by default. They compute a time or datapoint number, respectively, for each data point which becomes important later for time adjustments. However, plotting over equidistant time (or data points) does not produce meaningful plots and data for certain sensors (consider e.g. impedance spectroscopy) or additional information is revealed by choosing another sensors as x-axis (consider e.g. hystereses). Thus, the abscissa can be freely chosen and is then used in all subsequent steps for this sensor. The user can also decide whether to use the abscissa sensor in its raw or preprocessed form (consider e.g. logarithmic preprocessing).

Upon import of a second file, the user can choose between importing it as a new sensor or as a new measurement. This is an important difference as it will affect the data fusion in subsequent steps. A measurement defines the time and the prevalent conditions during which the data of all sensors included in this measurement were acquired. This means that adding a new sensor to an existing measurement assumes that this sensor was exposed to the same conditions at the same time as all other contained sensors, i.e. that the new sensor ran in parallel to all other sensors. Eventually, this will result in additional features for each observation while the number of observations does not change. On the other hand, adding the second file as a new measurement adds observations to the dataset while not affecting the number of features (i.e. parallel running senosrs).

Before advancing to the next step, the dataset should be reduced to include only sensors that are interesting for further evaluation. This can be done by (un)checking sensors in the main table. Checked sensors will be shown in the sensor set table at the bottom. This process is facilitated by the filtering options above the main table. The user can choose one or more restrictions for sensors, cluster, measurements, etc. to filter the table, and then use right click - "(de)select all" to (un)check all displayed sensors at once.

## Preprocessing
The Preprocessing module displays the sensor data in two different ways: as one cycle (bottom plot) and as quasistatic signal (top plot). Which cycle to display in the bottom plot can be chosen by dragging the selector (vertical line with number "01") in the top plot. Alternatively, the number of the cycle can be typed in as position of the selector in the upper table on the left (below the property grid). It can also define a color for the selector itself and the cycle that is shown in the bottom plot. This makes sense especially if there are more selectors, which can be added with a right click - "add point" in the top plot. The quasistatic signal in the top plot gives an overview over the whole measurement by showing the same point in time of each plot. Which point that is can be chosen in the bottom plot in the very same way as the cycle is chosen in the top plot. Adding new points and defining colors works identically. Points can be deleted by clicking the red X in the table, or via right click - "remove" on the selector in the plot. The quasistatic signal can give a hint which parts of the cycle are especially sensitive to certain conditions, and the cyclic plot can reveal differences in cycle shape during different parts of the measurement.

In the property grid on the top-left, a preprocessing set can be compiled. This is done by choosing a preprocessing method from the right click context menu. Several methods can be combined to make up preprocessing chains. As the order is important, a method can be moved up or down or deleted via right click.

Often, there are more than one preprocessing chains that shall be tested, or different preprocessings should be applied to different sensors. For this occasion, another preprocessing set can be added by clicking the "+" button close to the dropdown menu. It can also be renamed directly in the dropdown menu or deleted by clicking on the "-" button. Right clicks on "+" or "-" open context menus to copy the current preprocessing set, or copy it so specific sensors, or to select from a list which sets shall be deleted, respectively.

A preprocessing set is always applied automatically to the currently selected sensor (marked bold in the sensor set table at the bottom). For other sensors, the preprocessing set (also the quasistatic points set and feature set) can be changed in this table by choosing the desired set from the table dropdown list. To quickly apply the current preprocessing set to a bunch of sensors, use the "copy to sensor" feature (right click on "+").

The raw and preprocessed data is shown side by side in both plots, raw on the left y-axis and with pale color, and preprocessed on the right y-axis and with full color. The sensor data to display can be chosen in the sensor set table at the bottom.

## Time correction
This module allows for adjustments in sample rate and time offset of different sensor clusters within a measurement. It displays a reference sensor, which is the current sensor chosen in the sensor set table, along with another sensor chosen in the property grid on the right. Based on this display, time offset and sample rate of the affected cluster can be adjusted so that the signals match.

## Cycle ranges
The cycle ranges module ("Select relevant ranges") shows the previously defined quasistatic sensor signals and enables selection of parts of this signal. This is done with ranges which can be added again by right click in the graph. They can be dragged as a whole by dragging their id, or changed in size by dragging either side of the range. Alternatively, the positions can also be typed into the table listing all the ranges. Each range should cover cycles which happened during the same conditions (e.g. during the same gas offer). Even if there are no such cycles, e.g. because there are no groups but continuous values as a target, selecting cycle ranges still helps to isolate the important parts of the measurement.

Ranges can be loaded from several proprietary file formats ("Load"-"cycle ranges"). They can be colored and named for easier identification. Coloring can be done automatically in the "script" dialog, where every component can be assigned its unique color. Components can be added with a click on the "add component" button and removed with a click on the red X in the right table. They can also be renamed and have a unit. A component is, for example, a certain gas, where each range can have a specific concentration of that gas. In the "script" dialog, other useful actions are possible, like moving the ranges as a whole, as well as shrinking or stretching the ranges.

Remember that cycle ranges must be defined for each measurement separately. Only the ranges of the current measurements are shown, and the current measurement can be changed by choosing a sensor from another measurement in the sensor set table. For convenience, ranges can be copied/pasted or deleted in the context menu that appears upon right click in the plot.

## Grouping
This module builds upon the previously defined cycle ranges. It shows a table where each row corresponds to one range. The columns assign classes to the ranges, and each column is one "grouping". Groupings can be added and deleted just like preprocessing sets, with the dropdown menu and the "+"/"-" buttons. Each class in a grouping can have a unique color (to be set in the property grid on the right), which is represented in the top plot and from here on used in every context this grouping appears (e.g. in scatter plots). This visual feedback aims to reduce mistakes like typing errors while setting up the groupings. As another visual guide, the currently selected range in the table will be displayed non-transparent in the plot. The color can be changed in the property grid. Additionally, changing the name of a class in the property grid will rename the class in the whole grouping.

By defining more than one grouping, it becomes easy later-on to change the target of the evaluation, e.g. discriminating either gas types or the concentrations of one gas type. A range can be defined to be ignored by typing "<ignore>" as class. 

Keep in mind that, like for cycle ranges, the groupings must be set for each measurement. Nevertheless, groupings are "global", i.e. classes that appear only in one measurement will show up in all other measurements in the property grid.

## Features
This module displays an average cycle of each of the last selected grouping (colored in the color of the specific groups) in the bottom plot. With a right click in the property grid in the top-left, a feature can be added, e.g. "mean" to compute the mean value of a certain range, or "linear fit" to compute the slope. Once at least one feature has been added, ranges can be defined for this feature. The ranges behave like cycle ranges, i.e. they can be dragged and copied/pasted, while the feature sets behave like preprocessing sets. More than one feature can be added, so that mean values and slopes can be computed on (different parts of) a cycle. Of course different feature sets can also be assigned to different sensors in the sensor set table.

The top plot displays a preview of features, updates everytime a range is released from dragging. The preview features are computed from the average cycles shown in the bottom plot for speed. Hence, they can obscure important things, but they can often act as a first indicator for the discrimination quality of a feature.

Before proceeding to the next step, features must be computed with a click on the "compute" button in the bottom-left corner. Once features have been defined the computation can be invoked anywhere, not only in this module. If the sensor set table contains sensors that shall not be used as feature, they can be unchecked to ignore them in the computation.

## Model
This model uses the computed features to train and validate a statistical model. In a first step, the train data is selected, by choosing the sensor set, features and grouping. Features can be chosen one by one in the respective field of the property grid, or in groups by type, sensor or measurement by expanding the feature field (by clicking on the small "+" left of the row in the property grid). In the same manner, it can be decided how to use the classes in the grouping - for training, evaluation by predicting that class, or that it should be ignored entirely. In addition, a part of the data can be held out and later applied to test the model's validity. This value is given in percent and uses stratified selection, i.e. it maintains the group ratios. The model always shows the number of features and observations that are used for training.

A right click in the property grid produces a context menu with several options for adding steps in the model. Feature preprocessings work like the raw data preprocessings, only applied to the feature matrix, and usually to each feature column separately. Response preprocessings do the same for the target/response/grouping vector, however, it must be all numeric (with the exception of "<ignore>") to be eligible for this step. Checking "revert response preprocessing" will take care of reverting all preprocessing steps so that the output of the model will be unpreprocessed values, e.g. the concentration and not ln(concentration).

Dimensionality reduction reduces the number of features with algorithms like PCA and LDA which can also be combined. Predictors are classifiers or regressors, which take the (preprocessed) feature data and response as input and train a model so that the classification or regression error is minimized. As a last step, validation can be applied to test the model's validity.

Each of the aforementioned steps is optional, so that models can be flexibly built for the user's needs. Each method has its own set of plots, which can be viewed by clicking on the name of the plot. Some plots have additional parameters which are applied after another click on the plot name when changed.

## Apply and combine models
This module can evaluate models with new data or combine models in a hierarchical fashion. After choosing a model and a sensor set, the property grid will display each sensor that is used in the model, and try to match the sensors with sensors from the chosen sensor set. Usually, there should be identical sensors in training and test set, however, if that is not the case, sensors from the evaluation set can be assigned manually to sensors in the training set.

Each possible output of the model can be assigned another model. Only the data that was predicted into this class will be passed to the next model to simulate real, unknown data. However, if the correct classification is known, each model will produce a confusion matrix displayed in the MATLAB command window.