Data Specifications

This information is an example from the data transfer specification from app.teiko.bio demo dataset UMN002.

Definitions

Gating: The process of classifying cell events into biologically descriptive cell populations and cell states based on marker expression. This involves drawing boundaries or “gates” over specific cell populations to classify those cell populations’ respective lineages.

Clustering: The process of using unsupervised machine learning to identify cell types defined by their expression of markers without prior knowledge of cell populations.

FCS: Flow Cytometry Standard, a .fcs file format for cytometry data containing mass cytometry acquisition events and their associated channel values. The files provided will be version 3.1. Specifications for this format can be found here.


GatingML2.0: a .gatingml file format used to store a representation of cytometry gates. The file format version to be provided will be version 2.0. Specifications for this format can be found here.

Data Description and Usage

Teiko Bio’s analysis produces one FCS file per sample and one GatingML2.0 file per sample. We will also generate five results files per analysis summarizing different outputs of our analysis pipeline:

  1. Gated population frequencies by sample
    1. UMN002_gated_population_frequencies_by_sample_YYYY_MM_DD_HH_MM_SS.csv
  2. Gated population cell state frequencies by sample
    1. UMN002_gated_cell_state_by_sample_YYYY_MM_DD_HH_MM_SS.csv
  3. Gated population marker expression by sample
    1. UMN002_gated_marker_expression_by_sample_YYYY_MM_DD_HH_MM_SS.csv
  4. Clustered population frequencies by sample
    1. UMN002_clustered_population_frequencies_YYYY_MM_DD_HH_MM_SS.csv
  5. Clustered population marker expression by sample
    1. UMN002_clustered_marker_expression_YYYY_MM_DD_HH_MM_SS.csv
  6. FCS file per sample. Will provide a directory containing all files per sample.
    1. UMN002_fcs_YYYY_MM_DD_HH_MM_SS
      1. <sample_name>.fcs
  7. Gating ml per sample. Will provide a directory containing all files per sample.
    1. UMN002_gatingml_YYYY_MM_DD_HH_MM_SS
      1. <sample_name>.gatingml

Each file contains metadata to identify results associated with a particular sample and subject.

For example, if a client sends Teiko Bio four cohorts, with nine samples in each cohort (36 samples total), the client will receive 36 FCS files, 36 GatingML files, and the three results files described above.

File Structure

Metadata

ColumnDefinitionData TypeExample value(s)
sample_nameTeiko code associated with the patient sample from a single time pointAlphanumericUMN002-037
accession_idAccession number from external CRO, if applicableAlphanumeric160816P001
subject_nameCode associated with the patientAlphanumeric14020-UMN-6876
metadata_id_1Any metadata to associate with the sample. For example dosage.tbdtbd
Any metadata to associate with the sample. tbdtbd
metadata_id_nAny metadata to associate with the sample. tbdtbd
File 0: Customer metadata associated with the project. This file is used for reconciliation across files 1 to 5.

Gated population frequencies by sample

ColumnDefinitionData TypeExample value(s)
teiko_sample_nameTeiko code associated with the patient sample from a single time pointAlphanumericUMN002-037
sample_nameAccession number from external CRO, if applicableAlphanumeric160816P001
subject_nameCode associated with the patientAlphanumeric14020-UMN-6876
top_level_cell_populationCode defining top-level population of the population being reportedAlphanumericnonGRAN
top_level_cell_population_display_nameTop-level population name in plain languageAlphanumericnon-Granulocyte
cell_populationCode defining population being quantifiedAlphanumericB_CELL
cell_population_display_namePopulation name in plain languageAlphanumericB Cell
cell_population_event_countNumber of events associated with populationNumeric2157
top_level_cell_population_event_countNumber of events associated with top-level populationNumeric199448
percentage_of_top_level_gatePercentage of top-level population associated with populationNumeric1.081485
low_cell_countIndicates whether a parent population is below (TRUE) or above (FALSE) the threshold for analysis. The standard threshold is 100 cells.Boolean(TRUE or FALSE)FALSE
File 1: Gated population frequencies by sample, column descriptions

Gated population cell state frequencies by sample

ColumnDefinitionData TypeExample value(s)
teiko_sample_nameTeiko code associated with the patient sample from a single time pointAlphanumericUMN002-037
sample_nameAccession number from external CRO, if applicableAlphanumeric160816P001
subject_nameCode associated with the patientAlphanumeric14020-UMN-6876
parent_cell_populationCode defining the parent population of the population being reportedAlphanumericB_CELL
parent_cell_population_display_nameParent population name in plain languageAlphanumericB Cell
cell_stateCode defining population being quantifiedAlphanumericPDL1
cell_state_display_nameCell state name in plain languageAlphanumericPD-L1
cell_state_event_countNumber of events associated with populationNumeric1917
parent_cell_population_event_countNumber of events associated with the parent populationNumeric2157
percentage_of_parentPercentage of parent population associated with populationNumeric88.873435
low_cell_countIndicates whether a parent population is below (TRUE) or above (FALSE) the threshold for analysis. The standard threshold is 100 cells.Boolean(TRUE or FALSE)FALSE
File 2: Gated population cell state frequencies by sample, column descriptions.

Gated population marker expression by sample

ColumnDefinitionDate TypeExample value(s)
teiko_sample_nameTeiko code associated with the patient sample from a single time pointAlphanumericUMN002-037
sample_nameAccession number from external CRO, if applicableAlphanumeric160816P001
subject_nameCode associated with the patientAlphanumeric14020-UMN-6876
parent_cell_populationCode defining parent population of the population being reportedAlphanumericB_CELL
parent_cell_population_display_nameParent population name in plain languageAlphanumericB Cell
cell_stateCode defining population being quantifiedAlphanumericPDL1
cell_state_display_nameCell state name in plain languageAlphanumericPD-L1
is_state_filter_appliedIndicates whether the population contains all cells (FALSE) or only contains cells positive for a specific state marker (TRUE)Boolean(TRUE or FALSE)FALSE
<Marker 1 of n> (ex. AREG)Arcsinh-transformed median channel value of <Marker 1 of n> within the population; if a median channel value was not computed for this marker in this population, the value is reported as <blank>Numeric0.093265
<Marker 2 of n> (ex. CCR7)Arcsinh-transformed median channel value of <Marker 2 of n> within the population; if a median channel value was not computed for this marker in this population, the value is reported as <blank>Numeric2.995684
[additional markers]
<Marker n of n> (ex. VISTA)Arcsinh-transformed median channel value of <Marker n of n> within the population; if a median channel value was not computed for this marker in this population, the value is reported as <blank>Numeric1.484823
File 3: Gated marker expression by sample column descriptions

Clustered population frequencies by sample

ColumnDefinitionData TypeExample value(s)
teiko_sample_nameTeiko code associated with the patient sample from a single time pointAlphanumericUMN002-037
sample_nameAccession number from External CRO, if applicableAlphanumeric160816P001
subject_nameCode associated with the patientAlphanumeric14020-UMN-6876
top_level_cell_populationCode defining top-level population of the population being reportedAlphanumericnonGRAN
top_level_cell_population_display_nameTop-level population name in plain languageAlphanumericnon-Granulocyte
cell_populationCode defining population being quantifiedAlphanumericMONO.0
percentage_of_top_level_gatePercentage of top-level population associated with populationNumeric1.8132
cell_population_event_countNumber of events associated with populationNumeric3611
File 4: Clustered population frequencies by sample column descriptions

Clustered population marker expression by sample

ColumnDefinitionDate TypeExample value(s)
teiko_sample_nameTeiko code associated with the patient sample from a single time pointAlphanumericUMN002-037
sample_nameAccession number from external CRO, if applicableAlphanumeric160816P001
subject_nameCode associated with the patientAlphanumeric14020-UMN-6876
cell_populationCode defining clustered cell populationAlphanumericB_MEM.38
<Marker 1 of n> (ex. AREG)Arcsinh-transformed median channel value of <Marker 1 of n> within the population; if a median channel value was not computed for this marker in this population, the value is reported as <blank>Numeric0.00000
<Marker 2 of n> (ex. CCR7)Arcsinh-transformed median channel value of <Marker 2 of n> within the population; if a median channel value was not computed for this marker in this population, the value is reported as <blank>Numeric2.665345
[additional markers]
<Marker n of n> (ex. VISTA)Arcsinh-transformed median channel value of <Marker n of n> within the population; if a median channel value was not computed for this marker in this population, the value is reported as <blank>Numeric1.446879
File 5: Clustered populations marker expression by sample

Glossary of Gated Populations (to change depending on project)

Population Code (cell_population)Population NameDefinition (Marker Parameters)
LIVELiveDNA+ Event Length below 40 Cisplatin-
LEUKOCYTETotal LeukocyteDNA+ Event Length below 40 Cisplatin- CD61-CD235AB-
B_CELLB CellnonGRAN CD3- CD19+ CD14- CD56-  
B_MEMB MemorynonGRAN CD3- CD19+ CD14- CD56- CD38- CD27+
B_NAIVEB NaivenonGRAN CD3- CD19+ CD14- CD56- CD38+ CD27-
PBPlasmablastnonGRAN CD3- CD19+ CD14- CD56- CD38hi CD27hi
CD4_TCD4+ T
CD4_TCMCD4+ T Central Memory
CD4_TEMCD4+ T Effector Memory
CD4_TEMRACD4+ TEMRA
CD4_TNAIVECD4+ T Naive
TREGTreg
CD8_TCD8+ T
CD8_TCMCD8+ T Central Memory
CD8_TEMCD8+ T Effector Memory
CD8_TEMRACD8+ TEMRA
CD8_TNAIVECD8+ T Naive
DCDendritic Cell
cDCClassical DC
pDCPlasmacytoid DC
transDCTransitional DC
MONOMonocyte
cMONOClassical Monocyte
inMONOIntermediate Monocyte
ncMONONon-classical Monocyte
NKNatural Killer
CD16_NKCytolytic NK
CD16neg_NKNon-cytolytic NK
CD56hi_NKCytokine-producing NK
DNTDouble-negative T
DPTDouble-positive T
GDTGamma-delta T
NKTNKT

Glossary of State Markers (to change depending on project)

State Code(state_marker)State NameAnalysis of State FrequencyAnalysis of State Marker Expression
TIM3TIM-3TRUETRUE
PDL1PD-L1TRUETRUE
TCF1TCF-1TRUETRUE
TBETTBETTRUETRUE
CTLA4CTLA-4TRUETRUE
KI67Ki67TRUETRUE
TIGITTIGITTRUETRUE
CD38CD38TRUETRUE
PD1PD-1TRUETRUE
LAG3LAG3TRUETRUE
CCR7CCR7TRUETRUE
HLADRHLA-DRTRUETRUE
CD11BCD11BTRUETRUE
CD25CD25TRUETRUE
CD38_HLADRCD38+HLA-DR+TRUEFALSE
LOX1LOX-1TRUETRUE