Bar and Line Charts, Box Plots, and Heatmaps¶

These statistics plots use a setup interface similar to pivot tables. However, unlike pivot tables, any number of FCS files can be displayed per pivot position—ideal for summarizing replicate data.

Bar charts, line charts, and heatmaps show the arithmetic mean of all matching files, along with either the standard deviation or standard error of the mean (see replicate data).

Box plots show the median line flanked by the first and third quartile, with whiskers spanning ±1.5 times the inter-quartile range.

Zero and indeterminate values

An FCS file will be omitted from a plot in the following scenarios:

• When showing channel statistics (e.g., means or medians) and the file has no events in the selected gate.
• When showing the geometric mean and the file has events with negative values or zeros, in which case the geometric mean cannot be calculated.

Additionally, box plots omit FCS files when the plot is log-scaled and the statistic is negative or zero. Bar and line charts and heatmaps include zeros and negatives when calculating the mean of multiple files, but the mean will not be displayed if it is negative or zero.

Examples¶

Box plot¶

This example shows the frequencies of several cell types for five different species, based on several dozen biological replicates per species. A single dot is shown for each donor. The box shows the lower quartile, median and upper quartile. The whisker spans ±1.5 times the interquartile range.

Howto

1. Insert a box plot using the button in the toolbar.
2. Set up the pivot table dimensions as described in the pivoting model. In the example above, the settings are as follows:
• Axis Labels Populations: pDCs, CD14+ Monocytes, CD16+ Monocytes, cDCs, NK Cells, CD4+ T cells, CD8+ T cells
• Legend Entries (Data Series) Species: African green monkey, Cyno, Human, Mouse, RHesus
3. Select the statistic and scaling in the the sidebar. In the example above, "Percent of" and "Singlets" are selected, and scaled by Log10 to improve visibility of the large range of values.

Tip

Box plot orientation can be changed from the sidebar. Under Plot Settings, choose horizontal or vertical.

Normalized heatmap¶

This example shows the degree of change in phospho-P38 in response to four stimuli in six cell types.

Howto

1. Insert a heatmap using the button in the toolbar.
2. Set up the pivot table dimensions as described in the pivoting model. In the example above, the settings are as follows:
• Columns Populations: CD4+ T cells, CD8+ T cells, CD19+, CD33+, NKs, pDCs
• Rows Condition: IL2/GMCSF, IL10, PMA, LPS, unsim
• Statistic Median
• Channel pP38
• Scaling Scaled difference. This is a common normalization method with CyTOF data akin to fold-change in fluorescence data.
• Normalize to Top Row. This makes the top row have the value 0; all other rows are normalized relative to the top row.

Select the statistic and scaling in the the sidebar. In this example:

Heatmap Correlation¶

Heatmaps can be used to show the correlation of experimental vaiables. This can be useful to show the coexpression of signaling molecules, the coordination of different cell types, or other instances where variables in the same group could affect or coincide with each other.

Heatmap correlation uses Pairwise Rows/Columns to decide what to display. Correlated Vectors generate correlation vectors for the row values.

The example above shows correlation of signaling molecules, using the channels of molecules of interest as Pairwise Rows/Columns, and cell phenotypes created by gating as Correlated Vectors.

Correlation coefficient calculations¶

CellEngine offers three methods for calculating correlation. The appropropriate method depends on your data.

Pearson requires the data to be normally distributed, not have significnat outliers, and have a linear relationship.

Spearman uses ranked data and is suitable for any distribution (not necessarily normal) with any monotonic relationship (not necessarily linear). It is sensitive to error and thus not well suited to data with outliers.

Kendall also uses ranked data and is suitable for any distribution with any monotonic relationship. It is more accurate than Spearman when the sample size is small and is less sensitive to error, making it useful when there are outliers. Overall, it is more robust than Spearman.

Creating a correlation heatmap¶

Howto

1. Insert a heatmap using the button in the toolbar.
2. In Heatmap Correlation, click on Correlation Coefficient, and choose a correlation coefficient method.
3. In the sidebar, under Pairwise Rows/Columns, click on Name and choose the variable you want to display.
4. Click on Values and select the values for the heatmap axes.
5. Under Correlated Vectors, choose the Name and Values that will build the vectors used to calculate the correlation coefficients.

Scaling and Normalized Plotting¶

When looking at changes in data, such as cell signaling before and after treatment or changes in population frequencies over time, you can display normalized data in heatmaps, bar charts and line charts.

Howto

1. Setup your plot as described in the pivoting model. For example, to look at signaling markers under various stimulation conditions in a heatmap, set the row annotations to your cell signaling readouts and the column annotations to your stimulation conditions.
2. Select a normalization method from the scaling & normalization selector. For fluorescence-based signaling experiments, Log2 Ratio is common. See the table of scaling and normalization methods below for more information.
3. Select values to which to normalize the visualization. For example, if your unstimulated condition is in the left-most column, select Left Column. You may need to manually adjust the sorting order of your annotations so that your normalize-to values are in an appropriate position.

The possible scaling and normalization methods are as follows:

Method Equation Description and Use Cases
Raw $$x$$ The unmodified value. Commonly used for population frequencies (event counts or percentages).
Raw Fold $$\frac{x}{c}$$ Fold change without scaling.
Raw Difference $$x - c$$ Use instead of raw fold when the control value is near zero, in which case dividing by a small number amplifies the experimental value.
Scaled $$\operatorname{Scale}(x)$$ For channel statistics only. Uses the channel's scale. This shows the value on the same scale used in flow plots (e.g. gating) and may thus be more approachable.
Scaled Difference $$\operatorname{Scale}(x) - \operatorname{Scale}(c)$$ For channel statistics only. Uses the channel's scale. Commonly used instead of log2 ratio for CyTOF signaling experiments because unstimulated signaling markers tend to be near zero.
Scaled Ratio $$\operatorname{Scale}(x) / \operatorname{Scale}(c)$$ For channel statistics only. Uses the channel's scale.
Log2 $$\log_2{x}$$
Log2 Ratio $$\log_2{\left(\frac{x}{c}\right)}$$ Commonly used for signaling experiments because it makes the control value zero, increased values positive and decreased values negative.
Log10 $$\log_{10}{x}$$ Commonly used when visualizing a large range of data, in which case a linear scale would make changes at the low end of the scale difficult to see.
Log10 Ratio $$\log_{10}{\left(\frac{x}{c}\right)}$$

where $$x$$ is the experimental value and $$c$$ is the control value.

Replicate Data, Variability, Error and Error Bars¶

When replicate values are present, the mean of the values will be displayed along with the standard deviation (SD) or standard error of the mean (SEM). Bar charts, line charts and heatmaps show the variability or error in the hover text. Variability or error can also be displayed as error bars in bar and line charts.

• The standard deviation (SD) is an estimate of the variability of the entire population based on the representative set of samples in your data set. This value does not necessarily get smaller with larger sample sizes. This value should be used when you wish to describe the variability of a population.

• The standard error of the mean (SE or SEM) is an estimate of how precisely you have determined the mean with your experiment. This value gets smaller with larger sample sizes, as it is defined as the standard deviation divided by the square root of the number of samples. This value should be used when you wish to compare between different groups of samples.

Regardless of your choice, you should always report which metric you are showing.

How the SD or SEM is calculated further depends on the selected scaling and normalization, as described in the table below. These formulas propagate measurement uncertainty through the scaling and normalization equations.

scaling \ normalization (none) fold difference
raw absolute error ($$\sigma$$) $$\lvert \frac{x}{c} \rvert \sqrt{\frac{\sigma_x}{x}^2 + \frac{\sigma_c}{c}^2}$$ $$\sqrt{\sigma_x^2 + \sigma_c^2}$$
log2 $$\frac{\sigma}{x \times \ln 2}$$ $$\sqrt{\left( \frac{\sigma_x}{x \ln 2} \right)^2 + \left( \frac{\sigma_c}{c \ln 2} \right)^2}$$ not applicable
log10 $$\frac{\sigma}{x \times \ln 10}$$ $$\sqrt{\left( \frac{\sigma_x}{x \ln 10} \right)^2 + \left( \frac{\sigma_c}{c \ln 10} \right)^2}$$ not applicable
scale set not supported not supported not supported

where $$\sigma$$ is the SD or SEM, $$x$$ is the experimental value (mean, median, count, etc.), $$c$$ is the control value, $$\sigma_x$$ is the SD or SEM of the experimental value and and $$\sigma_c$$ is the SD or SEM of the control value.

Limitations

All of these formulas are estimates and make assumptions, including that the experimental and control conditions are uncorrelated (i.e. that there is no systematic bias) and that the error is relatively small.