This work is licensed under aCreative Commons Attribution-ShareAlike 3.0 Unported License. This means that you can copy, share and modify the work, as long as the result is distributed under the same license.
By Veronique Voisin, Chaitra Sarathy and Ruth Isserlin
Introduction
scNetViz is a Cytoscape application designed to perform downstream analysis of single cell RNAseq (scRNA) data. The aim is to explore the results of a single cell RNAseq experiment in the context of the pathways involved and within the framework of network analysis.
It is assumed that the data has already been pre-processed, for example using Seurat or a similar pipeline. Ideally, the experimental results have been published and deposited in a public repository. Biologists with little or no computer experience can now load the data set of their choice into Cytoscape.
Each dataset is loaded with metadata indicating, for example, different groups that the cells belong to, ie. control or treated groups or specific groups revealed by upstream data analysis. The idea is to select groups or clusters of interest and perform differential expression analysis.
The scNetViz application creates graphs that visualize the results of differential expression analysis, including heat maps and violin plots.
The app takes the top n differentially expressed genes and creates a network that represents them. One of the best features of this app is that at this point you can take advantage of all Cytoscape features and make it easy to integrate other omics data.
We can perform pathway enrichment analysis on the created network (the top genes) allowing the identification of functions associated with the differentially expressed top genes.
The application takes as input the scRNA data stored in theSingle Cell Expression Atlas Repositoryorganized by the EMBL European Bioinformatics Institute (https://www.ebi.ac.uk/gxa/sc/home). This repository contains scRNA experiments from animals, plants, fungi, and protists. Features experiments from the Human Cell Atlas, the Fly Cell Atlas, the Chan Zuckerberg Biohub, and more.
Objective of the practical laboratory.
In this example, we will explore the Single Cell Expression Atlas from Cytoscape, explore a particular data set, perform differential expression analysis based on one of the provided cell annotation categories, generate networks from the top expressed genes differentially for each group within the chosen category, and functionally characterize and visualize the networks.
Data
- ACCESS NUMBER:E-MTAB-7417
- TISSUE: Cells were taken from a digested skin sample of two 8-week-old female C57BL/6 mice.
- INSTRUMENT(S): Illumina HiSeq 4000
- ORGANISM(S): Mouse Muscle
- DISEASE(S): Normal
- DATA PROCESSING: Droplet-based sequencing data was aligned, filtered, and quantified using the Cell Ranger Single-Cell software package (version 22.2.0), against the mouse reference genome provided by Cell Ranger.
scNetViz accesses the data directly from the Single Cell ExpressionAtlas repository, so there is no need to download the data for the hands-on lab.
However, if you want to work with data offline, there is an option to manually import the data into scNetViz:
- In the Cytoscape menu bar selectApplications->scNetViz->loading experiment->import from file.
It can also be useful if you already have your data of interest downloaded to your computer.
The instructions below show you how to download the data directly from the repository, but you don't need to pull your data from there.
- DATA AVAILABLE IN:https://www.ebi.ac.uk/gxa/sc/experiments/E-MTAB-7417/descargas:
- Scroll down toresults filesand downloadNormalized count files (MatrixMarket file).
- A folder namedE-GEOD-109979-archivos-normalizados.zipit is now downloaded to your computer.* Note 1: Values in the file are normalized to counts per million.* Note 2: Do not unzip the file to upload to scNetViz
- Also download the cluster information:E-MTAB-7417.clusters.tsv*Note: Cells were clustered using the Louvain algorithm.
- Reference document: Davidson S, Efremova M, Riedel A, Mahata B, Pramanik J et al. (2018) Single cell RNA sequencing reveals a dynamic stromal niche within the evolving tumor microenvironment [https://pubmed.ncbi.nlm.nih.gov/32433953/].
- This document discusses melanoma using a mouse model. The scRNAs are skin samples from 2 healthy mice. Only clusters identified as fibroblasts (based on Col1a1, Col1a2 expression) were considered for comparison with stromal clusters.
Right click on the link below and select "Save Link As...".
Place it in the appropriate modules directory of your CBW working directory.
There is no need to download this data as you can download it directly from the scNetViz application..
- E-MTAB-7417.aggregated_filtered_normalised_counts.mtx
- E-MTAB-7417.aggregated_filtered_normalised_counts.mtx_cols
- E-MTAB-7417.aggregated_filtered_normalised_counts.mtx_rows
- E-MTAB-7417.clusters.tsv
Steps
- open cytoscape
- Click on the iconExplore the EBI Single Cell Expression Atlason the Cytoscape toolbar. This opens the Atlas of Single Cell Experiments browser.
- Click the column header labeledAccessionand searchE-MTAB-7417in the resulting table ordered by accession numbers.
- Select the row with accession number E-MTAB-7417 by double-clicking on it.
- An experiment table with 3 tabs will open, select the first tab namedTPM.
This tab contains the genes as rows and the individual cells as columns. Each cell in the table represents the normalized counts for a given gene and cell.
The table has many blank spaces. Single cell RNASeq arrays are rare because not all genes are detected in all cells.
- Select the second tab calledcategories.
- make sure of thatGroupis selected in theAvailable categoriescampo.
Clustering has been calculated by the Atlas of Single Cell Experiments. Louvain's algorithm has been run with various resolution values. Higher resolutions produce a larger number of clusters and lower resolutions produce fewer clusters.
The scNetViz application selects a resolution of 1 by default. Corresponds to the default Louvain method. It is indicated with “True” in the sel.K column.
In this example, the cluster with the parameter K=20 is chosen for further analysis. The other columns indicate the group membership of each cell.
Calculate differential expression
- Calculate the differential expression:
- To obtain the table of differentially expressed genes, press the button labeledCalcular Dif Exp.. This starts a comparison of each cluster with all other clusters. There are a total of 20 clusters and we will get results for each of them in the same table.
- Differential gene expression is calculated using a Wilcoxon rank sum test. The log2 (fold change) represents the logarithmic ratio of the mean expression of a gene in the cells of the selected group versus the mean expression of that gene in all other cells.
- Create a protein-protein network for each of the cluster marker genes:
- CheckOnly positive(Because we are performing differential expression between each cluster versus all remaining clusters, genes with positive scores represent cluster-specific genes and negative-scoring genes are specific from the remaining set of clusters. Therefore, we are only interested in on genes that are positive and specific for a given group)
- click oncreate networks.
- For each cluster, a maximum of 200 genes having an FDR 0.05 and a logFC > 0 are selected to create the representative network for each cluster. (These parameters are adjustable.)
- Since there are 20 networks to create, this step takes a few seconds.
- Once created, the list of networks is visible on theRedtab in the control panel on the left. In the image below, cluster 20 is selected and contains 79 genes and is displayed in the main grid window.
- In the results panel on the right hand side, we can see the STRING application parameters, which were used to create the protein interaction networks. Nodes (genes/proteins) are connected to each other if they are known to physically associate or interact with each other.
Cluster path analysis
- The next step is to perform a path analysis on one of these gene lists/one of these clusters.
- Click on the network namedGroup 15. Cluster 15 has the largest number of connections: it contains 192 nodes (genes/proteins) and 1045 edges (associations/interactions).
- Locate the STRING application tab in the results panel on the right, unlock it and make it bigger.
- Click onFunctional Enrichment.
- ARecover functional enrichmentwindow opens. Click onOK.
- The STRING enrichment table appears in the String Enrichment Panel below the network in the Table Panel. The routes are classified according to the FDR values of the enrichment.
- The STRING application uses more than 15 pathway sources and gene pools to calculate pathway enrichment. We can filter the results to show only the results of the GO biological process for clarity.
- Click the funnel icon at the top left of the STRING enrichment table.
- select categoriesGO Biological Processand clickOK
- Now the table contains only the results for theGO Biological Processroads
- Click on the top GO term. It will highlight in yellow all the genes annotated to this pathway contained in the network.
- Optional: Perform the same analysis on the remaining clusters.
- Here are the functional enrichment results for each group.
- the extracellular matrix
- keratinization
- Tissue development/cell migration
- Skin development/epithelial cell differentiation
- Regulation of the metabolic process
- Angiogenesis/Development of blood vessels
- Tissue development/differentiation of skeletal muscle cells
- Development of anatomical structures. Development of the Nervous System.
- Development of the muscular structure
- Nervous System Development
- tissue development
- small molecule metabolic process
- Response to endogenous stimulus
- skin development
- Immune system/Cell activation
- Tissue development/cell migration
- lipid biosynthesis
- Regulation of cell motility
- angiogenesis
- Regulation of cell motility
This analysis highlights the heterogeneity of skin tissue composed of differentiated keratinocytes, but also epidermal stem cells, fibroblasts, endothelial cells, immune cells such as T and B cells, and macrophages.
Pathway enrichment analysis with marker genes can help identify cell types based on scRNA clustering and subsequent steps could focus solely on groups of interest.
We take the upper path to score the group. It is an arbitrary decision. STRING has an interface similar to EnrichmentTable and you have the option of creating an enrichment map or enhanced graphs from the STRINGpathway enrichment resultssee cytoscape primer for steps on how to use this feature. Combined with the AutoAnnote app, it could be a more comprehensive approach to exploring the features associated with each group.
color cluster nodes 15 proportional to thelogFC
- Locate theStyletab and select theFill color. Expand it using the down arrow.
- In the column field, selectCluster 15 log2FCattribute
- Double click on the color gradient.
- Choose a color palette of your choice.
- The option to choose a color blind-friendly color gradient is available.
- ClickYeahat the message “This will reset your current settings. Are you sure you want to continue?"
- Another window called "Continuous Mapping Editor for Node Fill Color" appears. Click onOK.
- The network nodes are now colored using logFC (foldchange) from differential expression analysis. The red color indicates the marker genes of the top 15 group.
Automation (for advanced users)
scNetViz provides its own automation commands and they are useful for scripting to control the operations of scNetViz. Details are available in the Swagger documentation (Help ! Automation ! CyREST Command API)) and in the scVIzNet reference document.
scNetViz References
- ISCB 2019 video:https://www.youtube.com/watch?v=GGpsWKD9iQE&t=36s
- reference document:https://pubmed.ncbi.nlm.nih.gov/34912541/
- tutorial:http://www.rbvi.ucsf.edu/cytoscape/scNetViz/index.shtml
Merkel, Dirk. 2014“Docker: Lightweight Linux Containers for Consistent Development and Deployment.” linux diary2014 (239): 2.
FAQs
What is pathway analysis for scRNA-seq? ›
The pathway analysis is at the very end of a scRNA-seq workflow. This means, that any Q/C was already performed, the data was normalized and cells were already clustered. The ReactomeGSA package can now be used to get pathway-level expression values for every cell cluster.
How long does it take to analyze RNA-seq data? ›Each run takes 16-36hrs depending on the type of sequencing being done. Putting all these steps together and allowing for some amount of troubleshooting and scheduling around other runs, we typically take 1-2wks to get from start to finish.
What is the difference between scRNA-seq and RNA-seq? ›Bulk RNA-seq is typically used to assess changes due to experimental conditions, whereas scRNA-seq is more frequently used to assess differences between cell types (or to find cell types).
How many cells do you need for scRNA-seq? ›Single-Cell RNA-Seq requires at least 50,000 cells (1 million is recommended) as an input. See below for more information about sample submission guidelines.
What are the disadvantages of scRNA-seq? ›Limitations of scRNA-seq in Cardiac Tissue
Both contribute to a higher difficulty in detecting the low abundant transcripts. The low amount of transcripts often resulted from library preparation leads to high levels of computational noise, which disturbs data analysis and may mask underlying biological variation.
Sequencing costs are approximately $125/sample for single-end 50bp sequencing and $250/sample for paired-end 100bp/150bp sequencing.
Is RNA-Seq hard? ›Sequencing RNA is much more difficult than mapping DNA, with hundreds of millions of reads (of about 100 bp each), interrupted transcripts, and more obstacles to achieve alignments.
Why is RNA sequencing difficult? ›Read Alignment
Mapping RNA-Seq reads to the genome is considerably more challenging than mapping DNA sequencing reads because many reads map across splice junctions.
ChIP-Seq may require only a few reads (~5-15 million) for a highly targeted transcription factor, and many more reads (~50 million) for a ubiquitous protein such as a histone mark pull-down.
What is the advantage of single cell RNA-seq? ›Single-cell RNA sequencing helps in exploring the complex systems beyond the different cell types. It enables cell-by-cell molecular as well as cellular characterization of the cells. The scRNA-Seq makes it possible to explore complex systems such as the immune system without any limitation.
What is the difference between stranded and non-stranded RNA-Seq? ›
The read ambiguity in stranded RNA-seq arises only from overlapping genes transcribed from the same strand. In contrast, for non-stranded RNA-seq, the ambiguity arises from both the overlapping genes on the same strand and also from the opposite strands.
Why is RNA-Seq better than DNA SEQ? ›RNA sequencing generated much more bacterial reads than DNA sequencing and has the advantages of detecting actively transcribed infections, enabling differential gene expression analysis and the possibility to detect RNA genomes.
Is scRNA-seq expensive? ›scRNA seq using the 10x Genomics system
$6600 for 4 samples (can handle 4 at a time on the 10X machine) we typically plan to capture 1000 cells per sample with library prep through the BU core.
So what depth is sufficient? One study showed that estimated expression levels from one million reads per cell strongly correlate with those from 10 million reads per cell4, suggesting that one million reads per cell may suffice.
What is the minimum RNA amount for RNA-seq? ›RNA Isolation
We require a minimum of 500 ng of total RNA for QC and library preparation for Illumina sequencing. A number of well-established commercial kits and protocols exist for a variety of species and tissue/cell types.
The primary limitation in scRNA-seq technology is high dropout noise level caused by the poor sensitivity of scRNA-seq technology, which makes low-expression genes hard to detect. scImpute and SAVER are existing tools for solving the noise in scRNA-seq data.
What is the difference between scRNA-seq 5 and 3? ›In 5' scRNA-Sequencing, poly(A)+ mRNAs are reverse transcribed using a polydT primer. However, in contrast to 3' scRNA-Sequencing, the sequencing barcodes are not adjacent to the polydT primer, but they are located at the 5' end of the transcripts.
What are the drawbacks of Illumina sequencing? ›Disadvantages of illumina sequencing
One of the main drawbacks of the Illumina/Solexa platform is the high requirement for sample loading control because overloading can result in overlapping clusters and poor sequencing quality which results the overall error rate of this sequencing technology is about 1% [22,23].
Nonetheless, RNA sequencing is still expensive and time-consuming, because it first requires the costly preparation of an entire genomic library -- the DNA pool generated from the RNA of cells -- while the data itself are also difficult to analyze.
What is the cost of clinical genome sequencing test? ›Test Name | Whole Exome Sequencing WES Test |
---|---|
Test type | Genetician |
Is RNA-seq cheaper than microarray? ›
In most cases, obtaining the expression profile of your sample would still be a bit cheaper using microarrays instead of RNA-sequencing, the difference being in the range of 50 to 100 EUR/USD per sample. However, the benefits of RNA-seq can easily outweigh the extra cost.
How expensive is RNA-Seq? ›The cost of RNA-sequencing (RNA-seq) ranges from approximately $36.9 to $173 for a single sample in an mRNA-seq experiment.
Is RNA-Seq reliable? ›RNA-Seq has also been shown to be highly accurate for quantifying expression levels, as determined using quantitative PCR (qPCR)18 and spike-in RNA controls of known concentration20.
What is the best pipeline for RNA-Seq? ›These results clearly show that counting and normalization methods are the most critical steps in the RNA-seq analysis process. Particularly, considering the above results, we concluded that the combination of Trimmomatic + RUM + HTSeq Union + TMM was the most precise and accurate pipeline.
Is RNA-Seq better than qPCR? ›Advantages of RNA-Seq vs. qPCR. While qRT-PCR is useful for quantifying the expression of a few genes, it can only detect known sequences. In contrast, RNA sequencing (RNA-Seq) using NGS can detect both known and novel transcripts.
Is RNA-Seq better than microarray? ›“mRNA-Seq offers improved specificity, so it's better at detecting transcripts, and specifically isoforms, than microarrays. It's also more sensitive in detecting differential expression and offers increased dynamic range.”
What is the most challenging issue facing genome sequencing? ›the inability to develop fast and accurate sequencing techniques.
What is the main difference between ChIP-seq and ChIP ChIP? ›Similar to ChIP-chip, ChIP-seq provides information about genome-wide protein binding. However, unlike ChIP-chip, ChIP-seq uses NGS technology to identify DNA fragments and map them against the entire genome.
Is ChIP-seq paired or single end? ›Single-end reads are often used for typical ChIP-seq analyses, while paired-end ones improve the library complexity and increase mapping efficiency at repetitive regions [38]. When research focuses on repetitive regions, longer and/or paired-end reads are preferred.
How much DNA is needed for ChIP-seq? ›ChIP-Seq experiments typically require one to ten million cells resulting in 10–100 ng of ChIP DNA.
What is the difference between single nuclear and single cell RNA seq? ›
Difference between snRNA-seq and scRNA-seq
That is to say, scRNA-seq measures both cytoplasmic and nuclear transcripts, while snRNA-seq mainly measures nuclear transcripts (though some transcripts might be attached to the rough endoplasmic reticulum and partially preserved in nuclear preps).
Since it has high rate of multiplication, large biomass can be produced. It helps reduce pollutants. It is not affected by weather conditions.
Why is RNA sequencing better? ›RNA sequencing strandedness allows researchers to determine which DNA strand (sense or antisense) a transcript came from. Compared to regular RNA sequencing methods, stranded RNA sequencing can find novel transcripts, distinguish transcripts from overlapping genes, find antisense sequences, and annotate genes.
Why is it important that RNA is only single-stranded instead of double? ›There are two reasons why RNA is generally single-stranded: RNAses are extremely abundant in most cells. Usually, double-stranded RNA is associated with a viral infection and is destroyed rapidly. Double-stranded RNA is not as stable as double-stranded DNA.
How do I know if my RNASeq is stranded? ›- If sequences of read 1 align to the RNA strand, the library is “stranded”.
- If sequences of read 2 align to the RNA strand, the library is “reversed stranded”.
- Sometimes sequences of read 1 align to the RNA strand; the other times sequences of read 2 align to the RNA strand.
Whereas DNA always occurs in cells as a double-stranded helix, RNA is single-stranded. RNA chains therefore fold up into a variety of shapes, just as a polypeptide chain folds up to form the final shape of a protein (Figure 6-6).
What are the three types of sequencing? ›- DNA Sequencing. Analyze the entire genome, focus on regions of interest with whole-exome and targeted sequencing, or study DNA-protein interactions.
- RNA Sequencing. ...
- Methylation Sequencing. ...
- High-Throughput Sequencing. ...
- Long-Read Sequencing.
It is much more difficult and time-consuming to sequence proteins than DNA. Thanks to the genetic code, the protein sequence can be deduced from the DNA sequence (but not vice versa, because most amino acids are encoded by more than one codon, see earlier section).
What are the three types of DNA sequencing? ›The most popular methods are hybridization capture, amplicon sequencing, and molecular inversion probes (MIPs). For more in-depth comparison of hybridization capture and amplicon sequencing, see our Targeted Sequencing Guide. Whole exome sequencing identifies all the protein-coding genes in the genome.
What is the cheapest genome sequencing machine? ›Genetic sequencing company Illumina has unveiled a new machine it says can sequence the human genome for just $200, Wired reported Sept.
Is Illumina cheaper than Nanopore? ›
For example, the estimated cost per Gb for the Illumina NovaSeq 5500 is $50-$63, while for the Oxford Nanopore PromethION the cost estimate is between $21-$425. It is also prudent to factor the sequencing instrument's maintenance requirements and costs into your decision.
How expensive is Illumina sequencing? ›Number of Samples2 | Pricing |
---|---|
1 – 6 | $500.00 |
7-12 | $800.00 |
13-18 | $1000.00 |
19-24 | $1200.00 |
In contrast to the bulk RNA-seq where the average gene expressions are measured across a large population of cells, scRNA-seq quantifies transcriptome of individual cells.
How many cells do you need for single cell RNA-seq? ›Number of cells per sample
Single-cell sequencing projects can be anywhere between a few hundred to 10,000 cells per sample.
Typically, we recommend a sequencing depth between 30,000 and 70,000 reads per cell for 10x Genomics projects.
How deep should I sequence RNA? ›A higher sequencing depth generates more informational reads, which increases the statistical power to detect differential expression also among genes with lower expression levels. For that reason, many published human RNA-Seq experiments have been sequenced with a sequencing depth between 20 M - 50 M reads per sample.
How many samples do I need for RNA-seq? ›Recommendations for RNA-seq experiment design
At least six replicates per condition for all experiments. At least 12 replicates per condition for experiments where identifying the majority of all DE genes is important.
The standard protocol for library construction requires between 100 ng and 1 μg of total RNA. There are kits available for ultra-low RNA input that start with as little is 10 pg-10ng of RNA; however, the reproducibility increases considerably when starting with 1-2 ng.
What is gene pathway analysis? ›Pathway analysis is a set of widely used tools for research in life sciences intended to give meaning to high-throughput biological data. The methodology of these tools settles in the gathering and usage of knowledge that comprise biomolecular functioning, coupled with statistical testing and other algorithms.
What is pathway and gene set analysis? ›You need a Pathway Analysis – when you care about how genes are known to interact. The crucial difference between a gene set and a pathway is that a gene set is an unordered collection of genes whereas a pathway is a complex model that describes a given process, mechanism or phenomenon.
How does pathway enrichment analysis work? ›
Pathway enrichment analysis (PEA) is a computational biology method that identifies biological functions that are overrepresented in a group of genes more than would be expected by chance and ranks these functions by relevance.
How does KEGG pathway analysis work? ›The KEGG pathway representation focuses on the network of gene products, mostly proteins but including functional RNAs. As illustrated in Figure 2, the metabolic pathway is a network of indirect protein–protein interactions, which is actually a network of enzyme–enzyme relations.
What is the difference between KEGG and go pathway analysis? ›GO stands for Gene Ontology and as the name suggests, it annotates genes using an ontology. KEGG, Panther and other "pathway" databases group genes into "pathways" which are basically lists of genes participating in the same biological process.
What is the difference between gene mapping and gene sequencing? ›“A sequence spells out the order of every DNA base in the genome, while a map simply identifies a series of landmarks in the genome,” it said. “Sometimes mapping and sequencing are completely separate processes. For example, it's possible to determine the location of a gene — to 'map' the gene — without sequencing it.
What is the difference between a gene and a pathway? ›The crucial difference between a gene set and a pathway is that a gene set is an unordered collection of genes whereas a pathway is a complex model that describes a given process, mechanism or phenomenon.
What are the three types of gene mapping? ›The three types of maps — linkage, chromosomal, and physical — are illustrated in Figure 7.1 and are distinguished both by the methods used for their derivation and the metric used for measuring distances within them.
What can gene sequencing tell you? ›The sequence tells scientists the kind of genetic information that is carried in a particular DNA segment. For example, scientists can use sequence information to determine which stretches of DNA contain genes and which stretches carry regulatory instructions, turning genes on or off.
What is the purpose of pathway analysis? ›Pathway analysis is a set of widely used tools for research in life sciences intended to give meaning to high-throughput biological data. The methodology of these tools settles in the gathering and usage of knowledge that comprise biomolecular functioning, coupled with statistical testing and other algorithms.
Which tool is used for pathway analysis? ›PathVisio. "PathVisio is a free open-source pathway analysis and drawing software which allows drawing, editing, and analyzing biological pathways. It is developed in Java and can be extended with plugins." PathVisio is freely available for download.
What is the difference between GSEA and GO? ›Fundamentally, GSEA is an analysis method and the Gene Ontology is a dataset. There are two different types of entities present in GO: i) genes (or other macromolecules - transcripts, proteins etc); and ii) GO terms.
What is the difference between Ora and GSEA? ›
ORA methods differ from GSEA because they only consider the query gene set of interest and need a strict cutoff to classify genes as up- and down-regulated; thus, it is advisable to choose GSEA methods when there is uncertainty about the cutoff value.
Is KEGG worth it? ›Highly recommend!! I love how easy Kegg is to use and honestly feel like it's the most reliable tool I've had in my pregnancy journey. I actually became pregnant on my 2nd or 3rd cycle using Kegg, unfortunately that pregnancy resulted in miscarriage.
When should I start using KEGG? ›kegg is best suited for women who are trying to get pregnant, have regular ovulatory cycles of 21-40 days in length, are off hormonal birth control or IUD and their ovulatory cycles have returned, and are at least 6 weeks postpartum and have had at least two menstrual cycles.
Is KEGG an ontology? ›It is an ontology database containing hierarchical classifications of various entities including genes, proteins, organisms, diseases, drugs, and chemical compounds.