# Exporting Statistics¶

After you’ve gated populations, you can export a variety of statistics.

Howto

1. Click export statistics in the left sidebar.
2. Select the statistics to calculate.
3. Select the populations to calculate in the Populations selector.
4. For mean, median, geometric mean, CV, StdDev, MAD or quantile statistics, select the channels to calculate in the Channels selector.
5. Select the FCS files to calculate from in the FCS Files selector.
6. Select the compensation to use for gating.
7. Select the output file format (TSV or CSV with or without header, or JSON).
8. Select the output file layout (see descriptions below).

For TSV and CSV exports, three layouts are available:

Layout Description
Tall-Skinny One row per combination of FCS file, population, statistic and channel. All statistics are in a single column titled value. This format is ideal for use with applications such as TIBCO Spotfire® that filter rows to isolate the data of interest.
Medium One row per combination of FCS file, population and channel. Each statistic is in a separate column.
Short-Wide One row per FCS file. Each combination of population, statistic and channel is in a separate column. This format is provided for users accustomed to FlowJo® output. This format is not readily machine-parsable and cannot include population IDs (only names).

Exports include file annotations for convenience.

Exports optionally include the IDs of FCS files and populations in addition to their names. If you are consuming exported data in analysis scripts, IDs provide an immutable reference, unlike names, which can be changed by users.

The uniquePopulationName property has the names of parent populations prepended until the name is unique. If all of your population names are unique, then this value will be the same as the population name property.

## Statistic Types¶

Tip

NaN (not-a-number) or N/A values will occur in the following scenarios:

• When calculating channel statistics (mean, median, etc.) for a gate that contains no events
• When calculating the geometric mean for a gate that contains 0 or negative values

### Median¶

Definition at MathWorld

The median is a special case of a quantile and represents the center point of a set of observations. In the case of an even number of observations, linear interpolation is used (i.e. the mean of the two tied values is used).

Compared to the arithmetic mean, this value is less sensitive to outliers and is thus ideal for avoiding confounding effects of experimental noise.

### (Arithmetic) Mean¶

Definition at MathWorld

### Quantile (Percentile)¶

Definitions at MathWorld, Wikipedia

The threshold value below which the specified amount of data points fall. For example, if the 90th quantile is 23,104, that means 90% of data points are below 23,104.

There are at least nine definitions of “quantile” in common use. CellEngine uses the median-based estimate definition (definition 8 in R and Hyndman and Fan 1996). This definition is continuous (meaning that it interpolates between values), independent of the underlying distribution of the data, and median-unbiased. Because of these and several other qualities, it is the definition recommended by Hyndman and Fan.

### Geometric Mean¶

Definition at MathWorld

Note that the geometric mean will be undefined for populations that have any values less than zero because the formula takes the square root of all values, and the square root of a negative is a complex number. The geometric mean will also be undefined for populations that have any values equal to zero. Geometric means of zero can be misleading because a single zero (which may be an outlier) in a dataset causes the entire geometric mean to be zero.

Tip

The use of geometric means in flow cytometry is largely a holdover from old, analog cytometers that stored data in logarithmic form. The arithmetic mean of log-transformed values is equal to the log of the geometric mean, so it was more convenient to calculate that than convert from log back to linear. Because modern instruments store high-resolution list-mode data, and because the geometric mean cannot be calculated when the dataset contains negative values (as is common with compensated and background-subtracted data), the median is generally a more suitable statistic. In fact, the geometric mean and the median are equal for log-normal distributions, and most biological data is presumed to be log-normal, so in that regard they can be considered interchangeable. See page 235 of Shapiro’s Practical Flow Cytometry for more information.

### Event Count¶

The event count is the number of events in a population.

(The word event is used instead of cell because flow cytometry may be used to analyze a wide variety of particles, such as virions, bacteria, fungi and beads.)

### Percent of ___¶

The percent is the number of events in a population divided by the number of events in the specified ancestor population.

### Standard Deviation (StdDev)¶

Definitions at MathWorld, Wikipedia

CellEngine reports the population standard deviation (as opposed to the sample standard deviation).

### Coefficient of Variation (CV)¶

Definition at MathWorld

The coefficient of variation is the standard deviation divided by the mean, resulting in a relative variation metric (standard deviation is an absolute variation metric).