Matches in Nanopublications for { ?s ?p ?o <https://w3id.org/np/RAd7ADyixVqjfKOIK7ny49jWGeR2EmgWOW4Prv3BwKbIo/assertion>. }
- a49165ba-fbd5-4483-9c39-9cb8258e6c5a type Phrase assertion.
- b673a29e-23c3-487e-ad95-8fc69f7d7547 type Phrase assertion.
- ccf5e4c4-0c4e-48dd-8bdd-84006c60ec4b type Phrase assertion.
- e193bcdc-d8c4-4b4e-8331-c1d46c14c206 type Phrase assertion.
- faf7e5b3-b0ef-4e09-b55c-a745db352edf type Phrase assertion.
- 0d2c2af7-0ab6-4218-885f-a838c95f4871 type Sentence assertion.
- 0fc5108a-3620-4c1b-94d3-a22e717bd424 type Sentence assertion.
- 13a2600e-aeba-4334-9fd1-abb8f2542e75 type Sentence assertion.
- 5cf3d332-4cd5-432a-8423-200792b8d509 type Sentence assertion.
- 78fbbea2-7dd0-4f4b-984d-625c6013c161 type Sentence assertion.
- 428 type Person assertion.
- 0000-0001-8197-3303 type Person assertion.
- Workflow-RO-Crate type CreativeWork assertion.
- 17e5cc71-3d5c-46fd-8a27-709516f324bd type CreativeWork assertion.
- ro-crate-metadata.json type CreativeWork assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f type Dataset assertion.
- 2a0f5a70-7132-4851-8b07-d08c25f21ffe type Dataset assertion.
- 53f07e4a-00bf-42ad-aa4b-b369718baac1 type Dataset assertion.
- 6816861d-1552-4acb-bce5-5a7392b92880 type Dataset assertion.
- 9511483c-1216-4050-acdc-a6b331ce81c7 type Dataset assertion.
- a3ad6b5f-1ddd-4d58-b35d-3ff382255e47 type Dataset assertion.
- c16b77ee-3113-4dca-ab0a-e871e51008c0 type Dataset assertion.
- e95cef25-bb33-4427-ae67-fa3ac3e94c9f type Dataset assertion.
- 03bab0c3-27a1-4bad-9192-54fd5718efb6 type MediaObject assertion.
- 165ceeeb-0982-40e6-a0ad-a77d0eec7316 type MediaObject assertion.
- 17e5cc71-3d5c-46fd-8a27-709516f324bd type MediaObject assertion.
- 19242bbc-9eae-461c-9535-fff02a359034 type MediaObject assertion.
- 1b909ce5-a128-47dc-93dd-4e0a1795d8c7 type MediaObject assertion.
- 22f177c1-06e1-49bf-ade6-d3ba16e6d622 type MediaObject assertion.
- 23c6fd11-d3c3-4615-8946-3c11f0876dac type MediaObject assertion.
- 2699beb6-1816-4b09-a671-c7b4e8e27632 type MediaObject assertion.
- 26f96889-0418-4090-92f3-3040ccf7daad type MediaObject assertion.
- 2d4cc403-0af3-4ea5-9cae-06b8e07ae93f type MediaObject assertion.
- 2e217125-2078-43e1-a55c-b9db9862ce5a type MediaObject assertion.
- 365c007e-1313-401c-8ac1-1f681739363a type MediaObject assertion.
- 3cded656-98eb-4101-a93e-5842ba43a49e type MediaObject assertion.
- 3daa2270-78fe-4674-aa82-9272af94aee1 type MediaObject assertion.
- 44eaf315-663c-450e-b246-b417fb1251ba type MediaObject assertion.
- 4cdb7678-4556-4b45-9bd8-05790d8e59b7 type MediaObject assertion.
- 4ff57492-0375-4aa6-a794-2009fbbbf255 type MediaObject assertion.
- 52bc888b-bb39-405f-8491-53b49d676d40 type MediaObject assertion.
- 5aacd754-ef17-459c-a829-f6e0af1013f6 type MediaObject assertion.
- 5c0d3b16-c3ec-4020-8d86-927d5ed39ad4 type MediaObject assertion.
- 5d6b0cd0-9e8e-456c-bc4d-f371f89984c3 type MediaObject assertion.
- 60e2708f-da85-46e8-b4a9-0907b507fd15 type MediaObject assertion.
- 672f2b4f-fac0-4518-8150-408e4469fbca type MediaObject assertion.
- 68fd58a3-bed8-4164-9bca-bc4aabe88cec type MediaObject assertion.
- 6d7b6733-4b86-4ace-98af-8db33f4eba06 type MediaObject assertion.
- 7746fab3-7efa-4abf-a940-7b31e2c7fd98 type MediaObject assertion.
- 890b303b-e8c0-4c2b-b1a0-cabcdbe1e1f2 type MediaObject assertion.
- 8a3c0ab0-e7e0-43b9-99b0-d199e24487c3 type MediaObject assertion.
- 90ea7f1f-c153-4af1-8544-9f4dc0889864 type MediaObject assertion.
- 9223fbb7-c378-4a6d-87fe-0da8b12a0404 type MediaObject assertion.
- 93bebfa3-e544-4175-9efb-7d82ea7d27bb type MediaObject assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd type MediaObject assertion.
- 98f60e6a-e6fa-4fbb-bc56-ccda2a3340e3 type MediaObject assertion.
- 99e9f2a6-a57d-4443-86af-15afee791284 type MediaObject assertion.
- 9b9b25b1-77f6-4dac-af7e-92fa55863cea type MediaObject assertion.
- 9d59e482-707c-4522-87c0-c5ca05313e38 type MediaObject assertion.
- a034a7c7-cf41-47cb-920b-f4904701712a type MediaObject assertion.
- a0f3633c-223b-439d-99f3-2a4926996f00 type MediaObject assertion.
- a2b0640e-a918-4ddf-8f36-2021c706c9fe type MediaObject assertion.
- a4a0093b-349f-443b-9fac-bb4a60a213bd type MediaObject assertion.
- acfc63b3-0045-4720-80a4-8a36d26a72b4 type MediaObject assertion.
- b520ea10-64d3-4124-b441-154cd4749048 type MediaObject assertion.
- b7726c93-0c81-4159-89fb-e8d59cdc7182 type MediaObject assertion.
- bceff5e5-5748-47fa-8225-46eda16071b0 type MediaObject assertion.
- bf76b440-5e45-49fc-bc9d-fe7b0398a545 type MediaObject assertion.
- c4e0cdb1-351c-4fc2-8f72-55d35d80297e type MediaObject assertion.
- d29dda34-5600-44a7-80be-d89fe409e4d0 type MediaObject assertion.
- d38663ba-d7a4-4c02-981c-f14cef1ade0a type MediaObject assertion.
- dd1cc899-8d02-43b3-8bf7-38de97153ed0 type MediaObject assertion.
- e0e3ae31-7616-4e18-9b9c-fa616bffa161 type MediaObject assertion.
- about.workflowhub.eu type Organization assertion.
- 148 type Organization assertion.
- 148 type Project assertion.
- 0000-0003-2388-0744 type Agent assertion.
- enrichment_service-account-enrichment type Agent assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f mainEntity "workflow/Snakefile" assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f importedBy 0000-0003-2388-0744 assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd sdPublisher "https://about.workflowhub.eu/" assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f creativeWorkStatus "Work-in-progress" assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd producer "https://workflowhub.eu/projects/148" assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f description 3fdc0374-95f4-4c7d-928c-24dd80fbd26f assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd description "# prepareChIPs This is a simple `snakemake` workflow template for preparing **single-end** ChIP-Seq data. The steps implemented are: 1. Download raw fastq files from SRA 2. Trim and Filter raw fastq files using `AdapterRemoval` 3. Align to the supplied genome using `bowtie2` 4. Deduplicate Alignments using `Picard MarkDuplicates` 5. Call Macs2 Peaks using `macs2` A pdf of the rulegraph is available [here](workflow/rules/rulegraph.pdf) Full details for each step are given below. Any additional parameters for tools can be specified using `config/config.yml`, along with many of the requisite paths To run the workflow with default settings, simply run as follows (after editing `config/samples.tsv`) ```bash snakemake --use-conda --cores 16 ``` If running on an HPC cluster, a snakemake profile will required for submission to the queueing system and appropriate resource allocation. Please discuss this will your HPC support team. Nodes may also have restricted internet access and rules which download files may not work on many HPCs. Please see below or discuss this with your support team Whilst no snakemake wrappers are explicitly used in this workflow, the underlying scripts are utilised where possible to minimise any issues with HPC clusters with restrictions on internet access. These scripts are based on `v1.31.1` of the snakemake wrappers ### Important Note Regarding OSX Systems It should be noted that this workflow is **currently incompatible with OSX-based systems**. There are two unsolved issues 1. `fasterq-dump` has a bug which is specific to conda environments. This has been updated in v3.0.3 but this patch has not yet been made available to conda environments for OSX. Please check [here](https://anaconda.org/bioconda/sra-tools) to see if this has been updated. 2. The following error appears in some OSX-based R sessions, in a system-dependent manner: ``` Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found ``` The fix for this bug is currently unknown ## Download Raw Data ### Outline The file `samples.tsv` is used to specify all steps for this workflow. This file must contain the columns: `accession`, `target`, `treatment` and `input` 1. `accession` must be an SRA accession. Only single-end data is currently supported by this workflow 2. `target` defines the ChIP target. All files common to a target and treatment will be used to generate summarised coverage in bigWig Files 3. `treatment` defines the treatment group each file belongs to. If only one treatment exists, simply use the value 'control' or similar for every file 4. `input` should contain the accession for the relevant input sample. These will only be downloaded once. Valid input samples are *required* for this workflow As some HPCs restrict internet access for submitted jobs, *it may be prudent to run the initial rules in an interactive session* if at all possible. This can be performed using the following (with 2 cores provided as an example) ```bash snakemake --use-conda --until get_fastq --cores 2 ``` ### Outputs - Downloaded files will be gzipped and written to `data/fastq/raw`. - `FastQC` and `MultiQC` will also be run, with output in `docs/qc/raw` Both of these directories are able to be specified as relative paths in `config.yml` ## Read Filtering ### Outline Read trimming is performed using [AdapterRemoval](https://adapterremoval.readthedocs.io/en/stable/). Default settings are customisable using config.yml, with the defaults set to discard reads shorter than 50nt, and to trim using quality scores with a threshold of Q30. ### Outputs - Trimmed fastq.gz files will be written to `data/fastq/trimmed` - `FastQC` and `MultiQC` will also be run, with output in `docs/qc/trimmed` - AdapterRemoval 'settings' files will be written to `output/adapterremoval` ## Alignments ### Outline Alignment is performed using [`bowtie2`](https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml) and it is assumed that this index is available before running this workflow. The path and prefix must be provided using config.yml This index will also be used to produce the file `chrom.sizes` which is essential for conversion of bedGraph files to the more efficient bigWig files. ### Outputs - Alignments will be written to `data/aligned` - `bowtie2` log files will be written to `output/bowtie2` (not the conenvtional log directory) - The file `chrom.sizes` will be written to `output/annotations` Both sorted and the original unsorted alignments will be returned. However, the unsorted alignments are marked with `temp()` and can be deleted using ```bash snakemake --delete-temp-output --cores 1 ``` ## Deduplication ### Outline Deduplication is performed using [MarkDuplicates](https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-) from the Picard set of tools. By default, deduplication will remove the duplicates from the set of alignments. All resultant bam files will be sorted and indexed. ### Outputs - Deduplicated alignments are written to `data/deduplicated` and are indexed - DuplicationMetrics files are written to `output/markDuplicates` ## Peak Calling ### Outline This is performed using [`macs2 callpeak`](https://pypi.org/project/MACS2/). - Peak calling will be performed on: a. each sample individually, and b. merged samples for those sharing a common ChIP target and treatment group. - Coverage bigWig files for each individual sample are produced using CPM values (i.e. Signal Per Million Reads, SPMR) - For all combinations of target and treatment coverage bigWig files are also produced, along with fold-enrichment bigWig files ### Outputs - Individual outputs are written to `output/macs2/{accession}` + Peaks are written in `narrowPeak` format along with `summits.bed` + bedGraph files are automatically converted to bigWig files, and the originals are marked with `temp()` for subsequent deletion + callpeak log files are also added to this directory - Merged outputs are written to `output/macs2/{target}/` + bedGraph Files are also converted to bigWig and marked with `temp()` + Fold-Enrichment bigWig files are also created with the original bedGraph files marked with `temp()`" assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f description "# prepareChIPs This is a simple `snakemake` workflow template for preparing **single-end** ChIP-Seq data. The steps implemented are: 1. Download raw fastq files from SRA 2. Trim and Filter raw fastq files using `AdapterRemoval` 3. Align to the supplied genome using `bowtie2` 4. Deduplicate Alignments using `Picard MarkDuplicates` 5. Call Macs2 Peaks using `macs2` A pdf of the rulegraph is available [here](workflow/rules/rulegraph.pdf) Full details for each step are given below. Any additional parameters for tools can be specified using `config/config.yml`, along with many of the requisite paths To run the workflow with default settings, simply run as follows (after editing `config/samples.tsv`) ```bash snakemake --use-conda --cores 16 ``` If running on an HPC cluster, a snakemake profile will required for submission to the queueing system and appropriate resource allocation. Please discuss this will your HPC support team. Nodes may also have restricted internet access and rules which download files may not work on many HPCs. Please see below or discuss this with your support team Whilst no snakemake wrappers are explicitly used in this workflow, the underlying scripts are utilised where possible to minimise any issues with HPC clusters with restrictions on internet access. These scripts are based on `v1.31.1` of the snakemake wrappers ### Important Note Regarding OSX Systems It should be noted that this workflow is **currently incompatible with OSX-based systems**. There are two unsolved issues 1. `fasterq-dump` has a bug which is specific to conda environments. This has been updated in v3.0.3 but this patch has not yet been made available to conda environments for OSX. Please check [here](https://anaconda.org/bioconda/sra-tools) to see if this has been updated. 2. The following error appears in some OSX-based R sessions, in a system-dependent manner: ``` Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found ``` The fix for this bug is currently unknown ## Download Raw Data ### Outline The file `samples.tsv` is used to specify all steps for this workflow. This file must contain the columns: `accession`, `target`, `treatment` and `input` 1. `accession` must be an SRA accession. Only single-end data is currently supported by this workflow 2. `target` defines the ChIP target. All files common to a target and treatment will be used to generate summarised coverage in bigWig Files 3. `treatment` defines the treatment group each file belongs to. If only one treatment exists, simply use the value 'control' or similar for every file 4. `input` should contain the accession for the relevant input sample. These will only be downloaded once. Valid input samples are *required* for this workflow As some HPCs restrict internet access for submitted jobs, *it may be prudent to run the initial rules in an interactive session* if at all possible. This can be performed using the following (with 2 cores provided as an example) ```bash snakemake --use-conda --until get_fastq --cores 2 ``` ### Outputs - Downloaded files will be gzipped and written to `data/fastq/raw`. - `FastQC` and `MultiQC` will also be run, with output in `docs/qc/raw` Both of these directories are able to be specified as relative paths in `config.yml` ## Read Filtering ### Outline Read trimming is performed using [AdapterRemoval](https://adapterremoval.readthedocs.io/en/stable/). Default settings are customisable using config.yml, with the defaults set to discard reads shorter than 50nt, and to trim using quality scores with a threshold of Q30. ### Outputs - Trimmed fastq.gz files will be written to `data/fastq/trimmed` - `FastQC` and `MultiQC` will also be run, with output in `docs/qc/trimmed` - AdapterRemoval 'settings' files will be written to `output/adapterremoval` ## Alignments ### Outline Alignment is performed using [`bowtie2`](https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml) and it is assumed that this index is available before running this workflow. The path and prefix must be provided using config.yml This index will also be used to produce the file `chrom.sizes` which is essential for conversion of bedGraph files to the more efficient bigWig files. ### Outputs - Alignments will be written to `data/aligned` - `bowtie2` log files will be written to `output/bowtie2` (not the conenvtional log directory) - The file `chrom.sizes` will be written to `output/annotations` Both sorted and the original unsorted alignments will be returned. However, the unsorted alignments are marked with `temp()` and can be deleted using ```bash snakemake --delete-temp-output --cores 1 ``` ## Deduplication ### Outline Deduplication is performed using [MarkDuplicates](https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-) from the Picard set of tools. By default, deduplication will remove the duplicates from the set of alignments. All resultant bam files will be sorted and indexed. ### Outputs - Deduplicated alignments are written to `data/deduplicated` and are indexed - DuplicationMetrics files are written to `output/markDuplicates` ## Peak Calling ### Outline This is performed using [`macs2 callpeak`](https://pypi.org/project/MACS2/). - Peak calling will be performed on: a. each sample individually, and b. merged samples for those sharing a common ChIP target and treatment group. - Coverage bigWig files for each individual sample are produced using CPM values (i.e. Signal Per Million Reads, SPMR) - For all combinations of target and treatment coverage bigWig files are also produced, along with fold-enrichment bigWig files ### Outputs - Individual outputs are written to `output/macs2/{accession}` + Peaks are written in `narrowPeak` format along with `summits.bed` + bedGraph files are automatically converted to bigWig files, and the originals are marked with `temp()` for subsequent deletion + callpeak log files are also added to this directory - Merged outputs are written to `output/macs2/{target}/` + bedGraph Files are also converted to bigWig and marked with `temp()` + Fold-Enrichment bigWig files are also created with the original bedGraph files marked with `temp()` " assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd version "1" assertion.
- Workflow-RO-Crate version "0.2.0" assertion.
- 44eaf315-663c-450e-b246-b417fb1251ba contentSize "0" assertion.
- 5c0d3b16-c3ec-4020-8d86-927d5ed39ad4 contentSize "0" assertion.
- a034a7c7-cf41-47cb-920b-f4904701712a contentSize "0" assertion.
- a2b0640e-a918-4ddf-8f36-2021c706c9fe contentSize "33" assertion.
- 03bab0c3-27a1-4bad-9192-54fd5718efb6 contentSize "89" assertion.
- 26f96889-0418-4090-92f3-3040ccf7daad contentSize "95" assertion.
- bceff5e5-5748-47fa-8225-46eda16071b0 contentSize "99" assertion.
- 3daa2270-78fe-4674-aa82-9272af94aee1 contentSize "100" assertion.
- 22f177c1-06e1-49bf-ade6-d3ba16e6d622 contentSize "114" assertion.
- 60e2708f-da85-46e8-b4a9-0907b507fd15 contentSize "122" assertion.
- 9223fbb7-c378-4a6d-87fe-0da8b12a0404 contentSize "138" assertion.
- dd1cc899-8d02-43b3-8bf7-38de97153ed0 contentSize "139" assertion.