Data Portal Overview
The mapMECFS website serves as the omics data sharing portal for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) research, created as part of the NIH funded ME/CFS Network. This website enables researchers to gain a broader view of ME/CFS by:
- Bringing together the data that researchers have collected across the multiple systems affected by ME/CFS
- Providing a dynamic navigation portal to search across these domains
- Facilitating the integration of complementary data types to offer a new, more complete picture of the disorder
Our mission is to help ME/CFS researchers discover new insights about the disorder, promote data sharing between experts, and present a comprehensive picture of the hallmarks of this disorder. We hope these efforts help millions of people suffering from ME/CFS by enabling a faster path to better diagnostics and treatments.
The mapMECFS data portal is open to any researcher who will use the data for research purposes only and is able to comply with the Data Use Agreement (DUA). New users must submit the registration form, including a brief description of how they would like to use the system, and agree to the mapMECFS DUA terms.
The registration form will be sent to the NIH for review and you will be notified when your account has been approved. The approval process should be quick (less than 2 days) and you will be notified of any delays.
While your account is pending approval, you will be able to log into the system, but you will not have access to any data. Once approved, a user-specific Organization will be created. A written email request is required to join an existing Organization or to add other mapMECFS-approved site users to your Organization (enabling sharing of private datasets). Please see more information here.
Please email email@example.com if you need assistance with registering.
Learn More About ME/CFS
Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is characterized by debilitating fatigue that is worsened by physical or mental activity and does not subside with normal levels of rest. For more information about ME/CFS, see the CDC definition of the disease or the ME/CFS Network’s FAQ page.
mapMECFS is hosted by RTI International and supports data sharing for all organizations in the ME/CFS Network and other ME/CFS researchers. If you find an issue with the website, wish to request specific features be implemented in the future, or have a question about how to use the portal please email firstname.lastname@example.org.
Data Use Agreement (DUA)
mapMECFS site users must follow the data use agreement that was agreed to during registration. Please see a copy here. If you become aware of a violation of the terms, you must notify site administrators by emailing email@example.com with a description of the violation. User accounts may be suspended while the incident is reviewed.
The MECFSnet DMCC prioritizes, curates, quality controls, and shares publicly available data from recent manuscripts. The DMCC prioritizes the most recent, open access, and data availability data from manuscripts and databases (e.g., Gene Expression Omnibus [GEO], Metabolomics Workbench, and MetaboLights). The DMCC curates’ metadata, results tables, and supplemental files into a mapMECFS dataset. An independent team member quality controls the dataset. The DMCC aims to share 15 datasets per quarter including >100 result files.
mapMECFS has an API to programmatically access, upload, and browse data and metadata. Example API queries in the curl, python, and R languages are available on github.
When using data or features on mapMECFS, in addition to citing the original publication please cite the mapMECFS manuscript.
Click titles below to expand/collapse sections
Website Structure and Terminology
Organization – A group of users who belong to the same institution, research center, or individual research lab. Users must be associated with an Organization to upload datasets. A written email request to firstname.lastname@example.org will be required to add other mapMECFS-approved site users to your Organization.
Dataset – A dataset is a collection of resources (such as data files, phenotype files, result files, supporting files, or website links) with a description and study-level metadata. A dataset will generally contain one data file, one phenotype file, and an unlimited number of supporting files. During upload, users must select either public or private visibility.
Public vs Private Visibility – Datasets are designated as either “public” or “private” when uploaded:
- Public data are available to all mapMECFS site users
- Private data are only available to users in your Organization
Group – Groups are collections of datasets that are all related to a common study, cohort, experiment, or publication. For example, a group designation that connects a collection of datasets containing experimental results from multiple assays that were all run on samples derived from the same set of study participants. Datasets in a group may originate from different Organizations or the same Organization.
Note: Public and private visibility settings of the data still apply when a dataset is part of a group.
Each user account has a defined account role. These roles are:
- View public and private datasets within their Organization
- Create datasets
- Edit any dataset within their Organization
- Edit member roles within their Organization
- View public and private datasets within their Organization
- Create datasets
- Edit any dataset they created
- View public and private datasets within their Organization
Upload and Share Data
The mapMECFS upload process is designed to be flexible. Users can upload data files in a variety of file types (TSV, TXT, PDF, etc.) or hyperlinks. Key study-level metadata are collected during upload by a simple drop-down menu.
Files should be uploaded in the expected format and a dataset should contain the following:
- One Data File containing sample-level values as columns and molecules as rows. For example:
- Gene expression counts
- Methylation signal intensities
- Metabolomics MS peak height
- One Phenotype File containing subject-level clinical values, such as case-control status (“Phenotype”), age, sex, any relevant covariates, etc.
- As many Results Files as needed, including summary statistics for your own analysis on molecules with at least a p-value and adjusted p-value reported, as shown in this example.
- As many Supporting Documents as needed, such as
- An SOP form (see this template) describing the dataset generation in more detail. Completing the SOP form is strongly recommended, as users can better understand the experimental conditions under which the Results Files were generated, as shown in this example.
- A link to existing publication(s)
- A Data Dictionary File containing information about elements in the Phenotype and/or Data Files. Please refer to the provided example (page 9), which outlines the suggested column names. Acceptable file extensions for a formatted Data Dictionary include .txt, .tsv, .csv, and .xlsx. A Data Dictionary that does not meet the mapMECFS formatting requirements should still be uploaded as a Supporting Document, but it may not interact with mapMECFS search features.
During upload, Data and Phenotype Files in the expected format will be processed for tagging and summary statistics will be calculated. Results Files will only be processed for tagging when submitted in the expected format.
How to share data: Moving a dataset from Private to Public
We encourage researchers to share datasets with other approved mapMECFS site users when the corresponding data is part of a manuscript accepted for publication by making their datasets “public” (viewable to all approved site users). By default, all uploaded datasets will be “private” (only viewable to the uploader and other users registered with the same organization as the uploader).
To change a dataset from private to public, open the dataset you wish to share and select Manage > Set Visibility to Public. This will notify the mapMECFS site administration of the request, who will review the dataset for breaches of the Data Use Agreement (DUA), non-scientific content, and any personally identifiable information (PII).
Note: It is the uploader’s full responsibility to ensure none of these are present, that participant privacy is fully protected, and that sharing is compliant with all other governing policies (e.g., IRB-approved protocols, embargos, etc.). If the specific DUA changes (e.g., retroactive changes in the study’s approved IRB protocols), it is the uploader’s responsibility to remove the data or request help from mapMECFS site administrators by emailing email@example.com.
Data uploaded to mapMECFS should never contain PII. Users are recommended to review the U.S. Department of Health and Human Services Safe Harbor Method for more information. In addition, we recommend only including metadata relevant to the study and binning variables (e.g., age 20-30 years), in datasets with few participants (e.g., <30), so that there are always more than three participants in a group no matter how the data is stratified.
mapMECFS does not support the upload of SNP-level genotypes or raw sequencing data; instead, we recommend submitting the data to dbGaP or an alternative database. To make these data findable, users can create a dataset within mapMECFS describing the study with a link to the website containing the raw data and/or study accession IDs.
The mapMECFS Search function is designed to recognize user-specified terms describing multiple aspects of a Dataset. This flexibility allows users to enter keywords that identify:
- Experiments involving of a specific sample type (e.g., “blood” or “PBMCs”)
- Results of a particular experimental class (e.g., “microbiome,” “metabolomics,” or “RNAseq”)
- Studies with participants that share a specific ME/CFS case definition (e.g., “Fukuda”)
- Data files containing specific common analytes (e.g., “glucose”, “IL17”, or “EBI2”)
Note regarding synonymous terms: Because different labs apply different conventions for annotating data, it is necessary to account for synonymous terminology describing common data features. For each dataset uploaded to mapMECFS, all features (including synonymous terms) of recognized data types are automatically tagged upon upload.
By tagging both the indicated annotation and all recognized synonyms, the system can more readily match the keywords entered by the user with the data type they seek. For example, if you are interested in the cytokine interleukin 17 analyte, simply type “IL-17” or “IL-17A” and the search will return all datasets containing the desired analyte (regardless of the convention used by the reporting lab) including data from gene expression assays or cytokine screens.
Uploaded data files are processed for tagging. This tagging process expands the search space to include the contents in the original files, molecule synonyms, and related identifiers. Tagging will only work on recognized data types; however, you can contact the mapMECFS team to request a new data type.
Recognized Data Types for Tagging
Tagging allows users to search the contents in the original source file as well as an expanded search of relevant databases (see table below). The recognized data types for tagging include:
- Gene Expression
- Cytokine Assay
NOTE: Upload is not restricted to these data types; any data files can be uploaded to mapMECFS; tags will be cleaned annually by the DMCC for consistency across studies and to eliminate redundancy.
|Data Type||Required Data Column(s)||Database used for Tagging||What is Searchable?||Example Searches|
|Gene Expression|| ||NCBI Gene (December 2018)|| |
|Cytokine Assay|| ||NCBI Gene (December 2018)|| |
|Metabolomics|| ||N/A|| |
|miRNA|| ||miRBase (March 2019)|| |
|Methylation|| ||Illumina 450K (v.15017482_v1-2) or Infinium MethylationEPIC (v-1-0-b4). Please email firstname.lastname@example.org if another manifest file is needed.|| |
For recognized Data Types mapMECFS generates a Summary Statistics file to characterize how dataset measures compare between phenotype groups as annotated in the uploaded Phenotype File. A nonparametric Wilcoxon rank-sum test is used to distinguish how dataset features differ between groups within the study (e.g. between cases and controls). Summary statistics are automatically calculated for each feature in the uploaded gene expression, cytokine assay, metabolomics, miRNA, or methylation Data Files when a correctly formatted Phenotype File is uploaded to mapMECFS. Please note that summary statistics are processed asynchronously to avoid impacting load times. Therefore, they may not be immediately available after upload as the calculations are made in the background.
Once calculations are complete, one can view the resulting Summary Stats file by opening the dataset of interest and scrolling to Summary Statistics > View Summary Statistics.
Summary columns in this file include:
- Sample sizes in each group (labeled as “count”)
- Median value for each group
- Standard deviation
- Wilcoxon rank-sum test statistic (labeled as "Ranksum stat")
- Wilcoxon rank-sum p-value (labeled as "Ranksum p-value")
- Wilcoxon rank-sum Bonferroni Corrected p-value (labeled as "Ranksum Bonf")
Results File Explorer
The Results File Explorer tool, available under the EXPLORER tab, enables the search and view of specific molecules compiled across all uploaded datasets. Dataset privacy settings are maintained, so only private datasets available to a user and public datasets will be displayed. Using this tool, users can quickly evaluate the robustness of a given result across multiple studies and identify datasets for subsequent integrated analyses.
The results contain three separate tables:
- Data Files and Calculated Summary Statistics contains search results only from the mapMECFS-calculated summary statistics.
- Results Files contains search results only from the user-uploaded results files.
- Other contains search results from other elements of the dataset, including the title, description, and metadata.