This repository contains a Dockerfile "recipe" of 10x Genomics Cell Ranger ATAC pipeline for processing ATAC-seq data.
docker build -t cellranger-atac:latest .To log all the build process you can run
docker build -t cellranger-atac:latest . > >(tee -a log.txt) 2> >(tee -a log.txt >&2)> >(tee -a log.txt)captures standard output (stdout) and appends it to log.txt2> >(tee -a log.txt >&2)captures standard error (stderr) and appends it to log.txt
Create required directories for data organization:
mkdir -p ~/cellranger_data/{fastqs,reference,output}- Place your FASTQ files in
~/cellranger_data/fastqs/ - Download reference data to
~/cellranger_data/reference/ - Output will be generated in
~/cellranger_data/output/
Non-interactive docker container run
docker run -v ~/cellranger_data:/data cellranger-atac \
cellranger-atac count \
--id=run1 \
--reference=/data/reference/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \
--fastqs=/data/fastqs \
--sample=SAMPLE_NAMEInteractive docker container run
docker run -it -v ~/cellranger_data:/data cellranger-atac bashThis command:
- -it enables interactive terminal
- -v mounts your local data directory
- bash starts an interactive bash shell
Once inside the container, you can:
- Navigate directories: cd /data
- Check Cell Ranger ATAC version: cellranger-atac --version
- Run commands directly:
cellranger-atac count \
--id=run1 \
--reference=/data/reference/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \
--fastqs=/data/fastqs \
--sample=SAMPLE_NAME- Explore outputs in real-time
- Test different parameters
- Exit container when done: exit
The interactive session gives flexibility in monitoring the analysis within the container environment.
FASTQ Files
- R1: Forward genomic reads
- R2: Cell barcode and sample index
- R3: Reverse genomic reads Example naming convention:
SAMPLE_S1_L002_R1_001.fastq.gz
SAMPLE_S1_L002_R2_001.fastq.gz
SAMPLE_S1_L002_R3_001.fastq.gzDownload from 10x Genomics:
- Mouse genome (mm10)
wget https://cf.10xgenomics.com/supp/cell-atac/refdata-cellranger-arc-mm10-2020-A-2.0.0.tar.gz- Human genome (GRCh38)
wget https://cf.10xgenomics.com/supp/cell-atac/refdata-cellranger-arc-GRCh38-2020-A-2.0.0.tar.gzThe pipeline generates several key outputs in the ~/cellranger_data/output/run1/ directory:
web_summary.html: A summary report of the analysis results.cloupe.cloupe: A file for visualization in Loupe Browser.filtered_peak_bc_matrix/: Directory containing filtered peak-barcode matrices.filtered_tf_bc_matrix/: Directory containing filtered transcription factor matrices.analysis/: Directory with various analysis results (clustering, dimensionality reduction, etc.).possorted_bam.bam: Aligned reads in BAM format.peaks.bed: Called peaks in BED format.fragments.tsv.gz: Fragment file for custom analyses.
- Number of cells detected
- Median fragments per cell
- TSS enrichment score
- Fraction of reads in peaks
- Fragment size distribution
- Dimension reduction plots (UMAP/tSNE)
- Clustering information
- Peak annotations
- Cell-type assignments (if reference is provided)
Recommended specifications:
RAM: 32GB minimum CPU: 8 cores minimum Storage: 500GB minimum
If you encounter issues:
- Ensure Docker has sufficient resources allocated.
- Verify input data integrity and format.
- Check for sufficient disk space.
- Review logs in the output directory for specific error messages.
10x Cell Ranger belongs to 10x. Licence is available here. No changes or any kind of licence infringement, or violations have been introduced to the software
For questions or support, please open an issue in this repository.