# Bar and Line Charts, Box Plots, and Heatmaps¶

These statistics plots use a setup interface similar to pivot tables. However, unlike pivot tables, any number of FCS files can be displayed per pivot position—ideal for summarizing replicate data. Annotations are used to select which FCS files to display.

Bar charts, line charts, and heatmaps show the arithmetic mean of all matching files, along with either the standard deviation or standard error of the mean (see replicate data). Box plots show the median line flanked by the first and third quartile, with whiskers spanning ±1.5 times the inter-quartile range and individual points for each sample in the plot.

Tip

If you want to plot only a subset of your replicate data, for example a specific timepoint, treatment, or demographic group, use filter annotations.

Zero and indeterminate values

An FCS file will be omitted from a plot in the following scenarios:

• When showing channel statistics (e.g., means or medians) and the file has no events in the selected gate.
• When showing the geometric mean and the file has events with negative values or zeros, in which case the geometric mean cannot be calculated.

Additionally, box plots omit FCS files when the plot is log-scaled and the statistic is negative or zero. Bar and line charts and heatmaps include zeros and negatives when calculating the mean of multiple files, but the mean will not be displayed if it is negative or zero.

## Scaling and Normalized Plotting¶

When looking at changes in data, such as cell signaling before and after treatment or changes in population frequencies over time, you can display normalized data in heatmaps, bar charts and line charts.

Howto

1. Set up your plot as described in the pivoting model. For example, to look at signaling markers under various stimulation conditions in a heatmap, set the row annotations to your cell signaling readouts and the column annotations to your stimulation conditions.
2. Select a normalization method from the scaling & normalization selector. For fluorescence-based signaling experiments, Log2 Ratio is common. See the table of scaling and normalization methods below for more information.
3. Select values to which to normalize the visualization. For example, if your unstimulated condition is in the left-most column, select Left Column. You may need to manually adjust the sorting order of your annotations so that your normalize-to values are in an appropriate position.

The possible scaling and normalization methods are as follows:

Method Equation Description and Use Cases
Raw $$x$$ The unmodified value. Commonly used for population frequencies (event counts or percentages).
Raw Fold $$\frac{x}{c}$$ Fold change without scaling.
Raw Difference $$x - c$$ Use instead of raw fold when the control value is near zero, in which case dividing by a small number amplifies the experimental value.
Scaled $$\operatorname{Scale}(x)$$ For channel statistics only. Uses the channel’s scale. This shows the value on the same scale used in flow plots (e.g. gating) and may thus be more approachable.
Scaled Difference $$\operatorname{Scale}(x) - \operatorname{Scale}(c)$$ For channel statistics only. Uses the channel’s scale. Commonly used instead of log2 ratio for CyTOF signaling experiments because unstimulated signaling markers tend to be near zero.
Scaled Ratio $$\operatorname{Scale}(x) / \operatorname{Scale}(c)$$ For channel statistics only. Uses the channel’s scale.
Log2 $$\log_2{x}$$
Log2 Ratio $$\log_2{\left(\frac{x}{c}\right)}$$ Commonly used for signaling experiments because it makes the control value zero, increased values positive and decreased values negative.
Log10 $$\log_{10}{x}$$ Commonly used when visualizing a large range of data, in which case a linear scale would make changes at the low end of the scale difficult to see.
Log10 Ratio $$\log_{10}{\left(\frac{x}{c}\right)}$$

where $$x$$ is the experimental value and $$c$$ is the control value.

## Replicate Data, Variability, Error and Error Bars¶

When replicate values are present, the mean of the values will be displayed along with the standard deviation (SD) or standard error of the mean (SEM). Bar charts, line charts and heatmaps show the variability or error in the hover text. Variability or error can also be displayed as error bars in bar and line charts.

• The standard deviation (SD) is an estimate of the variability of the entire population based on the representative set of samples in your data set. This value does not necessarily get smaller with larger sample sizes. This value should be used when you wish to describe the variability of a population.

• The standard error of the mean (SE or SEM) is an estimate of how precisely you have determined the mean with your experiment. This value gets smaller with larger sample sizes, as it is defined as the standard deviation divided by the square root of the number of samples. This value should be used when you wish to compare between different groups of samples.

Regardless of your choice, you should always report which metric you are showing.

How the SD or SEM is calculated further depends on the selected scaling and normalization, as described in the table below. These formulas propagate measurement uncertainty through the scaling and normalization equations.

scaling \ normalization (none) fold difference
raw absolute error ($$\sigma$$) $$\lvert \frac{x}{c} \rvert \sqrt{\frac{\sigma_x}{x}^2 + \frac{\sigma_c}{c}^2}$$ $$\sqrt{\sigma_x^2 + \sigma_c^2}$$
log2 $$\frac{\sigma}{x \times \ln 2}$$ $$\sqrt{\left( \frac{\sigma_x}{x \ln 2} \right)^2 + \left( \frac{\sigma_c}{c \ln 2} \right)^2}$$ not applicable
log10 $$\frac{\sigma}{x \times \ln 10}$$ $$\sqrt{\left( \frac{\sigma_x}{x \ln 10} \right)^2 + \left( \frac{\sigma_c}{c \ln 10} \right)^2}$$ not applicable
scale set not supported not supported not supported

where $$\sigma$$ is the SD or SEM, $$x$$ is the experimental value (mean, median, count, etc.), $$c$$ is the control value, $$\sigma_x$$ is the SD or SEM of the experimental value and and $$\sigma_c$$ is the SD or SEM of the control value.

Limitations

All of these formulas are estimates and make assumptions, including that the experimental and control conditions are uncorrelated (i.e. that there is no systematic bias) and that the error is relatively small.

## Line and Bar Graphs¶

Howto

1. Insert a line graph using the button or a bar graph with the button from the toolbar.
2. Under Axis Labels, click on Name and choose the annotation to display on the x-axis.
3. Choose the Values to display.
4. Optional: For line graphs, Connect Gaps continues the line through any intermediate axis values that have no data to display (either because there's no matching file or because the value cannot be calculated).
5. Optional: If the axis values are numeric, they will be positioned based on the value. To space numeric values evenly, use Categorical axis labels.
6. Under Legend Entries, choose the Name and Values for the lines that will be made on the graph.
7. In General, select and population and/or channel if applicable. (In cases where this selection would be redundant or nonsensical, the option will be hidden. Common examples include a graph using populations as axis labels or displaying “percent of parent” statistic instead of a channel-based statistic.)
8. Optional: Bar graphs can be reoriented under Plot Settings by choosing horizontal or vertical.

### Example: Line Graph¶

For this line graph, the median value of a channel is tracked over time. Axis labels labels and legend entries used “day” and “symptoms” annotations, respectively. Axis labels were set to categorical to space the data points evenly on the x-axis (otherwise the data points taken at 60 days would force the first week’s worth of data to the far left of the graph).

The data was normalized to uninfected samples (Day -1) by setting scaling to raw difference and normalization to the left axis group.

### Example: Bar Graph¶

This example, taken from the same dataset as the line graph above, shows a cell population at a particular timepoint, with axis labels and legend entries separated by annotations (“virus shedding” and “fever”).

A filter annotation was applied to select data collected at Day 8.

## Box Plots¶

Box plots are useful for displaying the distribution of samples within a group. Box plots show the median line flanked by the first and third quartile, with whiskers spanning ±1.5 times the inter-quartile range. In addition, box plots display a dot for each sample.

Tip

Mouse-over a dot on a box plot to see more information, including the FCS file and its statistic.

### Creating a box plot¶

This example shows the frequencies of several cell types for five different species, based on several dozen biological replicates per species. A single dot is shown for each donor. The box shows the lower quartile, median and upper quartile. The whisker spans ±1.5 times the interquartile range.

Howto

1. Insert a box plot using the button in the toolbar.
2. Set up the pivot table dimensions as described in the pivoting model. In the example above, the settings are as follows:
• Axis Labels Populations: pDCs, CD14+ Monocytes, CD16+ Monocytes, cDCs, NK Cells, CD4+ T cells, CD8+ T cells
• Legend Entries (Data Series) Species: African green monkey, Cyno, Human, Mouse, RHesus
3. Select the statistic and scaling in the the sidebar. In the example above, “Percent of” and “Singlets” are selected, and scaled by Log10 to improve visibility of the large range of values.

## Heatmaps¶

File annotations can be used as column or row entries, but they can also be added as categorical data to existing heatmaps. This is often particularly useful when heatmaps with individual samples or donors are used. The clustered heat map example shows column annotations for symptoms and viral shed of individual donors.

Howto

1. In the sidebar, under Heatmap Annotations, click on Column Annotations or Row Annotations.
2. Select one or more annotations from the list.
3. Optional: Click on the arrows beside the annotation name to change the order of the displayed annotations.

### Heatmap styling¶

#### Color scale¶

Like biaxial plots, heatmaps offer a range of gradients to choose from under Plot Settings.

### Normalized heatmap¶

This example shows the degree of change in phospho-P38 in response to four stimuli in six cell types.

Howto

1. Insert a heatmap using the button in the toolbar.
2. Set up the pivot table dimensions as described in the pivoting model. In the example above, the settings are as follows:
• Columns Populations: CD4+ T cells, CD8+ T cells, CD19+, CD33+, NKs, pDCs
• Rows Condition: IL2/GMCSF, IL10, PMA, LPS, unstim
3. Select the statistic and scaling in the the sidebar. In this example:

• Statistic Median
• Channel pP38
• Scaling Scaled difference. This is a common normalization method with CyTOF data akin to fold-change in fluorescence data.
• Normalize to Top Row. This makes the top row have the value 0; all other rows are normalized relative to the top row.

### Clustered heatmap¶

Heatmap columns and rows can be clustered with a variety of hierarchical clustering methods.

• Single linkage measures the cluster distance by using the minimum distance between components.

• Complete linkage measures distance using the maximum distance between components.

• Average linkage measures cluster distance using the mean distance between components.

• Ward’s method does not directly measure distance, instead minimizing the variance between clusters.

The example above uses average linkage to cluster samples from donors with influenza. Heatmap annotations for viral shedding and symptoms were added to the columns.

To make a clustered heatmap:

Howto

1. Insert a heatmap using the button in the toolbar.
2. Set up the pivot table dimensions as described in the pivoting model. In the example above, the settings are as follows:
• Columns Donor (annotation): all values
• Rows Day (annotation): -1, 1, 2, 3, 4, 5, 6, 7, 8
3. Select the statistic and scaling in the the sidebar. In this example:
• Population: MCs, as a percentage of an ancestor gate
• Scaling: Raw difference x-control
• Normalize to bottom row (timepoint before infection)
4. Under Heatmap Clustering, check one of the cluster options, the % of the graph space to allow the clustering diagram to occupy. In this case:
• Cluster columns
• Size is 20
5. Select Linkage and Line thickness. This example uses:
• Line thickness of 1.5
6. Optional: Add heatmap annotations by clicking on the drop-down box under column or row annotations and clicking on the checkboxes.
• The example uses virus shedding and symptoms.

### Heatmap correlation¶

Heatmaps can be used to show the correlation of experimental variables. This can be useful to show the coexpression of signaling molecules, the coordination of different cell types, or other instances where variables in the same group could affect or coincide with each other.

Heatmap correlation uses Pairwise Rows/Columns to decide what to display. Correlated Vectors generate correlation vectors for the row values.

The example above shows correlation of signaling molecules, using the channels of molecules of interest as Pairwise Rows/Columns, and cell phenotypes created by gating as Correlated Vectors.

#### Correlation coefficient calculations¶

CellEngine offers three methods for calculating correlation. The appropriate method depends on your data.

Pearson requires the data to be normally distributed, not have significant outliers, and have a linear relationship.

Spearman uses ranked data and is suitable for any distribution (not necessarily normal) with any monotonic relationship (not necessarily linear). It is sensitive to error and thus not well suited to data with outliers.

Kendall also uses ranked data and is suitable for any distribution with any monotonic relationship. It is more accurate than Spearman when the sample size is small and is less sensitive to error, making it useful when there are outliers. Overall, it is more robust than Spearman.