About

Data Portal Overview

The mapMECFS website serves as the omics data sharing portal for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) research, created as part of the NIH funded ME/CFS Network. This website enables researchers to gain a broader view of ME/CFS by:

  • Bringing together the data that researchers have collected across the multiple systems affected by ME/CFS
  • Providing a dynamic navigation portal to search across these domains
  • Facilitating the integration of complementary data types to offer a new, more complete picture of the disorder

Mission

Our mission is to help ME/CFS researchers discover new insights about the disorder, promote data sharing between experts, and present a comprehensive picture of the hallmarks of this disorder. We hope these efforts help millions of people suffering from ME/CFS by enabling a faster path to better diagnostics and treatments.

Registration Process

The mapMECFS data portal is open to any researcher who will use the data for research purposes only and is able to comply with the Data Use Agreement (DUA). New users must submit the registration form, including a brief description of how they would like to use the system, and agree to the mapMECFS DUA terms.

The registration form will be sent to the NIH for review and you will be notified when your account has been approved. The approval process should be quick (less than 2 days) and you will be notified of any delays.

While your account is pending approval, you will be able to log into the system, but you will not have access to any data. Once approved, a user-specific Organization will be created. A written email request is required to join an existing Organization or to add other mapMECFS-approved site users to your Organization (enabling sharing of private datasets). Please see more information here.

Please email mapmecfs@rti.org if you need assistance with registering.

Learn More About ME/CFS

Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is characterized by debilitating fatigue that is worsened by physical or mental activity and does not subside with normal levels of rest. For more information about ME/CFS, see the CDC definition of the disease or the ME/CFS Network’s FAQ page.

Contact Information

mapMECFS is hosted by RTI International and supports data sharing for all organizations in the ME/CFS Network and other ME/CFS researchers. If you find an issue with the website, wish to request specific features be implemented in the future, or have a question about how to use the portal please email mapmecfs@rti.org.

Data Use Agreement (DUA)

mapMECFS site users must follow the data use agreement that was agreed to during registration. Please see a copy here. If you become aware of a violation of the terms, you must notify site administrators by emailing mapmecfs@rti.org with a description of the violation. User accounts may be suspended while the incident is reviewed.

Privacy Policy

mapMECFS site users must agree to the privacy policy during registration and on login. Please see a copy here.

Ongoing Curation

The MECFSnet DMCC prioritizes, curates, quality controls, and shares publicly available data from recent manuscripts. The DMCC prioritizes the most recent, open access, and data availability data from manuscripts and databases (e.g., Gene Expression Omnibus [GEO], Metabolomics Workbench, and MetaboLights). The DMCC curates’ metadata, results tables, and supplemental files into a mapMECFS dataset. An independent team member quality controls the dataset. The DMCC aims to share 15 datasets per quarter including >100 result files.

API

mapMECFS has an API to programmatically access, upload, and browse data and metadata. Example API queries in the curl, python, and R languages are available on github.

Citation

Please cite the original authors when using data on mapMECFS as indicated within each dataset. The use of mapMECFS should also be referenced using the following citation:
Mathur, R.* & Carnes, M.U.*, et al. mapMECFS: a portal to enhance data discovery across biological disciplines and collaborative sites. J Transl Med 19, 461 (2021). https://doi.org/10.1186/s12967-021-03127-3

*contributed equally and are designated co-first authors

Code for the custom extensions implemented in mapMECFS is available for advanced authentication, search terms, and summary statistics.


A diagram describing the features of mapMECFS

Click titles below to expand/collapse sections

Organization – A group of users who belong to the same institution, research center, or individual research lab. Users must be associated with an Organization to upload datasets. A written email request to mapmecfs@rti.org will be required to add other mapMECFS-approved site users to your Organization.

Dataset – A dataset is a collection of resources (such as data files, phenotype files, result files, supporting files, or website links) with a description and study-level metadata. A dataset will generally contain one data file, one phenotype file, and an unlimited number of supporting files. During upload, users must select either public or private visibility.

Public vs Private Visibility – Datasets are designated as either “public” or “private” when uploaded:

  • Public data are available to all mapMECFS site users
  • Private data are only available to users in your Organization

No datasets will be visible to individuals until they register for an account and are approved by their designated Organization. Click here for more information about how to register.

Group – Groups are collections of datasets that are all related to a common study, cohort, experiment, or publication. For example, a group designation that connects a collection of datasets containing experimental results from multiple assays that were all run on samples derived from the same set of study participants. Datasets in a group may originate from different Organizations or the same Organization.

Note: Public and private visibility settings of the data still apply when a dataset is part of a group.

Account Roles


Each user account has a defined account role. These roles are:

Organization Administrator
  • View public and private datasets within their Organization
  • Create datasets
  • Edit any dataset within their Organization
  • Edit member roles within their Organization
Editor
  • View public and private datasets within their Organization
  • Create datasets
  • Edit any dataset they created
Member
  • View public and private datasets within their Organization
To view your role in existing Organizations, select "My Organizations" from your dashboard or click here. If you have questions about your account, please email mapmecfs@rti.org.

Upload

The mapMECFS upload process is designed to be flexible. Users can upload data files in a variety of file types (TSV, TXT, PDF, etc.) or hyperlinks. Key study-level metadata are collected during upload by a simple drop-down menu.

Files should be uploaded in the expected format and a dataset should contain the following:

  • One Data File containing sample-level values as columns and molecules as rows. For example:
    • Gene expression counts
    • Methylation signal intensities
    • Metabolomics MS peak height
    • Others
      • Note: An exception is that the data type Demographic, Health, and Survey (DHS) does not support a Data File format. Instead, DHS data are uploaded as a Phenotype File, as shown in the expected format example. The DHS data type is best defined as data collected from study participants that are not derived from biological samples such as blood or saliva.
  • One Phenotype File containing subject-level clinical values, such as case-control status (“Phenotype”), age, sex, any relevant covariates, etc.
  • As many Results Files as needed, including summary statistics for your own analysis on molecules with at least a p-value and adjusted p-value reported, as shown in this example.
  • As many Supporting Documents as needed, such as
    • An SOP form (see this template) describing the dataset generation in more detail. Completing the SOP form is strongly recommended, as users can better understand the experimental conditions under which the Results Files were generated, as shown in this example.
    • A link to existing publication(s)
    • A Data Dictionary File containing information about elements in the Phenotype and/or Data Files. Please refer to the provided example (page 9), which outlines the suggested column names. Acceptable file extensions for a formatted Data Dictionary include .txt, .tsv, .csv, and .xlsx. A Data Dictionary that does not meet the mapMECFS formatting requirements should still be uploaded as a Supporting Document, but it may not interact with mapMECFS search features.

During upload, Data and Phenotype Files in the expected format will be processed for tagging and summary statistics will be calculated. Results Files will only be processed for tagging when submitted in the expected format. Tagging and calculated summary statistics are not currently availabile for DHS datasets.

How to share data: Moving a dataset from Private to Public

We encourage researchers to share datasets with other approved mapMECFS site users when the corresponding data is part of a manuscript accepted for publication by making their datasets “public” (viewable to all approved site users). By default, all uploaded datasets will be “private” (only viewable to the uploader and other users registered with the same organization as the uploader).

To change a dataset from private to public, open the dataset you wish to share and select Manage > Set Visibility to Public. This will notify the mapMECFS site administration of the request, who will review the dataset for breaches of the Data Use Agreement (DUA), non-scientific content, and any personally identifiable information (PII).

Note: It is the uploader’s full responsibility to ensure none of these are present, that participant privacy is fully protected, and that sharing is compliant with all other governing policies (e.g., IRB-approved protocols, embargos, etc.). If the specific DUA changes (e.g., retroactive changes in the study’s approved IRB protocols), it is the uploader’s responsibility to remove the data or request help from mapMECFS site administrators by emailing mapmecfs@rti.org.

Data uploaded to mapMECFS should never contain PII. Users are recommended to review the U.S. Department of Health and Human Services Safe Harbor Method for more information. In addition, we recommend only including metadata relevant to the study and binning variables (e.g., age 20-30 years), in datasets with few participants (e.g., <30), so that there are always more than three participants in a group no matter how the data is stratified.

mapMECFS does not support the upload of SNP-level genotypes or raw sequencing data; instead, we recommend submitting the data to dbGaP or an alternative database. To make these data findable, users can create a dataset within mapMECFS describing the study with a link to the website containing the raw data and/or study accession IDs.

The mapMECFS Search function is designed to recognize user-specified terms describing multiple aspects of a Dataset. This flexibility allows users to enter keywords that identify:

  • Experiments involving of a specific sample type (e.g., “blood” or “PBMCs”)
  • Results of a particular experimental class (e.g., “microbiome,” “metabolomics,” or “RNAseq”)
  • Studies with participants that share a specific ME/CFS case definition (e.g., “Fukuda”)
  • Data files containing specific common analytes (e.g., “glucose”, “IL17”, or “EBI2”)

Note regarding synonymous terms: Because different labs apply different conventions for annotating data, it is necessary to account for synonymous terminology describing common data features. For each dataset uploaded to mapMECFS, all features (including synonymous terms) of recognized data types are automatically tagged upon upload.

By tagging both the indicated annotation and all recognized synonyms, the system can more readily match the keywords entered by the user with the data type they seek. For example, if you are interested in the cytokine interleukin 17 analyte, simply type “IL-17” or “IL-17A” and the search will return all datasets containing the desired analyte (regardless of the convention used by the reporting lab) including data from gene expression assays or cytokine screens.

Uploaded data files are processed for tagging. This tagging process expands the search space to include the contents in the original files, molecule synonyms, and related identifiers. Tagging will only work on recognized data types; however, you can contact the mapMECFS team to request a new data type.

Recognized Data Types for Tagging

Tagging allows users to search the contents in the original source file as well as an expanded search of relevant databases (see table below). The recognized data types for tagging include:

  • Cytokine Assay
  • Demographic, Health, and Survey (DHS)
  • Gene Expression
  • Metabolomics
  • Methylation
  • miRNA
  • Proteomics

NOTE: Upload is not restricted to these data types; any data files can be uploaded to mapMECFS; tags will be cleaned annually by the DMCC for consistency across studies and to eliminate redundancy.

Data Type Required Data Column(s) Database used for Tagging What is Searchable? Example Searches
Cytokine Assay
  • Molecule
NCBI Gene (December 2021)
  • Any entry from the Molecule data column
  • Matching gene synonyms
Demographic, Health, and Survey
  • ParticipantID
  • Phenotype
N/A
  • Any entry from the ParticipantID data column
Gene Expression
  • Molecule
NCBI Gene (December 2021)
  • Any entry from the Molecule data column
  • Matching gene synonyms
Metabolomics
  • InChiKey
  • Molecule
  • database_identifier
N/A
  • User input from any of the three required columns
Methylation
  • Molecule
Illumina 450K (v.15017482_v1-2) or Infinium MethylationEPIC (v-1-0-b4). Please email mapmecfs@rti.org if another manifest file is needed.
  • Any entry from the Molecule data column
  • Corresponding B37 coordinates (Chr:Pos)
miRNA
  • Molecule
miRBase (March 2019)
  • Any entry from the Molecule data column
  • Any miRNA related to the primary transcript
  • Any matching alias
Other
  • Molecule
N/A
  • Any entry from the Molecule data column
Proteomics
  • Molecule
NCBI Gene (December 2021)
  • Any entry from the Molecule data column
  • Matching protein/gene synonyms

For recognized Data Types mapMECFS generates a Summary Statistics file to characterize how dataset measures compare between phenotype groups as annotated in the uploaded Phenotype File. A nonparametric Wilcoxon rank-sum test is used to distinguish how dataset features differ between groups within the study (e.g. between cases and controls). Summary statistics are automatically calculated for each feature in the uploaded gene expression, cytokine assay, metabolomics, miRNA, or methylation Data Files when a correctly formatted Phenotype File is uploaded to mapMECFS. Please note that summary statistics are processed asynchronously to avoid impacting load times. Therefore, they may not be immediately available after upload as the calculations are made in the background.

Once calculations are complete, one can view the resulting Summary Stats file by opening the dataset of interest and scrolling to Summary Statistics > View Summary Statistics.

Summary columns in this file include:

  • Sample sizes in each group (labeled as “count”)
  • Median value for each group
  • Standard deviation
  • Wilcoxon rank-sum test statistic (labeled as "Ranksum stat")
  • Wilcoxon rank-sum p-value (labeled as "Ranksum p-value")
  • Wilcoxon rank-sum Bonferroni Corrected p-value (labeled as "Ranksum Bonf")

The Results File Explorer tool, available under the EXPLORER tab, enables the search and view of specific molecules compiled across all uploaded datasets. Dataset privacy settings are maintained, so only private datasets available to a user and public datasets will be displayed. Using this tool, users can quickly evaluate the robustness of a given result across multiple studies and identify datasets for subsequent integrated analyses.

The results contain three separate tables:

  • Data Files and Calculated Summary Statistics contains search results only from the mapMECFS-calculated summary statistics.
  • Results Files contains search results only from the user-uploaded results files.
  • Other contains search results from other elements of the dataset, including the title, description, and metadata.