Matches in Nanopublications for { ?s ?p ?o <https://w3id.org/np/RA-Lfsn87bQrDCVAAovciy2Jmxvb_ZHgsyv7kDPMm2gfA/assertion>. }
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd type SoftwareSourceCode assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd type Resource assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd type MediaObject assertion.
- 98f60e6a-e6fa-4fbb-bc56-ccda2a3340e3 type Resource assertion.
- 98f60e6a-e6fa-4fbb-bc56-ccda2a3340e3 type MediaObject assertion.
- 99e9f2a6-a57d-4443-86af-15afee791284 type Resource assertion.
- 99e9f2a6-a57d-4443-86af-15afee791284 type MediaObject assertion.
- 9b9b25b1-77f6-4dac-af7e-92fa55863cea type Resource assertion.
- 9b9b25b1-77f6-4dac-af7e-92fa55863cea type MediaObject assertion.
- 9d59e482-707c-4522-87c0-c5ca05313e38 type Resource assertion.
- 9d59e482-707c-4522-87c0-c5ca05313e38 type MediaObject assertion.
- a034a7c7-cf41-47cb-920b-f4904701712a type Resource assertion.
- a034a7c7-cf41-47cb-920b-f4904701712a type MediaObject assertion.
- a0f3633c-223b-439d-99f3-2a4926996f00 type Resource assertion.
- a0f3633c-223b-439d-99f3-2a4926996f00 type MediaObject assertion.
- a2b0640e-a918-4ddf-8f36-2021c706c9fe type Resource assertion.
- a2b0640e-a918-4ddf-8f36-2021c706c9fe type MediaObject assertion.
- a4a0093b-349f-443b-9fac-bb4a60a213bd type Resource assertion.
- a4a0093b-349f-443b-9fac-bb4a60a213bd type MediaObject assertion.
- acfc63b3-0045-4720-80a4-8a36d26a72b4 type Resource assertion.
- acfc63b3-0045-4720-80a4-8a36d26a72b4 type MediaObject assertion.
- b520ea10-64d3-4124-b441-154cd4749048 type Resource assertion.
- b520ea10-64d3-4124-b441-154cd4749048 type MediaObject assertion.
- b7726c93-0c81-4159-89fb-e8d59cdc7182 type Resource assertion.
- b7726c93-0c81-4159-89fb-e8d59cdc7182 type MediaObject assertion.
- bceff5e5-5748-47fa-8225-46eda16071b0 type Resource assertion.
- bceff5e5-5748-47fa-8225-46eda16071b0 type MediaObject assertion.
- bf76b440-5e45-49fc-bc9d-fe7b0398a545 type Resource assertion.
- bf76b440-5e45-49fc-bc9d-fe7b0398a545 type MediaObject assertion.
- c4e0cdb1-351c-4fc2-8f72-55d35d80297e type Resource assertion.
- c4e0cdb1-351c-4fc2-8f72-55d35d80297e type MediaObject assertion.
- d29dda34-5600-44a7-80be-d89fe409e4d0 type Resource assertion.
- d29dda34-5600-44a7-80be-d89fe409e4d0 type MediaObject assertion.
- d38663ba-d7a4-4c02-981c-f14cef1ade0a type Resource assertion.
- d38663ba-d7a4-4c02-981c-f14cef1ade0a type MediaObject assertion.
- dd1cc899-8d02-43b3-8bf7-38de97153ed0 type Resource assertion.
- dd1cc899-8d02-43b3-8bf7-38de97153ed0 type MediaObject assertion.
- e0e3ae31-7616-4e18-9b9c-fa616bffa161 type Resource assertion.
- e0e3ae31-7616-4e18-9b9c-fa616bffa161 type MediaObject assertion.
- ro-crate-metadata.json type CreativeWork assertion.
- 4366f40c-5956-4a88-87bf-794b7c12d33b type NASA assertion.
- 46d241be-9337-4749-9543-e01808c828ce type FieldOfResearch assertion.
- 49f24138-b352-4b9d-bbcd-b10c266846ca type NASA assertion.
- 52e0bc84-491c-4913-9a92-5fe4af038b25 type Phrase assertion.
- 5bf0c07d-8b93-4a78-b020-101fa59fd22a type Lemma assertion.
- 5cf3d332-4cd5-432a-8423-200792b8d509 type Sentence assertion.
- 61a7eb88-5e9f-4cfc-b578-c406b7a43144 type IPTC assertion.
- 6277e661-77ff-4bd7-b1b3-f2f51c0ab0ec type Concept assertion.
- 64f93c98-c150-4431-885a-8ac5fb5add82 type FieldOfResearch assertion.
- 69951488-193c-43c5-ae9a-c5135803cb3e type Phrase assertion.
- 6b05e40d-078b-4023-b984-3dd996317ee9 type Lemma assertion.
- 708386a0-6582-4493-a7a2-7e22bc2c0dbb type IPTC assertion.
- 78fbbea2-7dd0-4f4b-984d-625c6013c161 type Sentence assertion.
- 8e9e8062-ddbd-46c3-a771-5be65444212f type Concept assertion.
- 9156e762-e630-40b0-b9aa-c059ba0a40c1 type Lemma assertion.
- a1bb094f-e223-4eb7-90c8-2d55ec57e261 type Domain assertion.
- a49165ba-fbd5-4483-9c39-9cb8258e6c5a type Phrase assertion.
- a6991c9d-c113-435e-9455-dfbbd1f0ed19 type IPTC assertion.
- af02653a-41cb-4cbc-86f3-4da526e29d72 type Domain assertion.
- b1a61304-f050-4541-9d32-259ff31f5fdc type Concept assertion.
- b673a29e-23c3-487e-ad95-8fc69f7d7547 type Phrase assertion.
- b827efbe-6c40-47b1-bba2-b9623dc2ed2f type IPTC assertion.
- c33311df-7f79-464a-98d9-4ca351c53eb6 type IPTC assertion.
- c55159f8-5154-4d92-b6c8-ebd2fcfc4bf0 type Domain assertion.
- c65c143c-5478-46a4-a257-4ec365270d90 type NASA assertion.
- c9a9f204-3830-48be-a7a8-9f4e19d0fd00 type NASA assertion.
- ccf5e4c4-0c4e-48dd-8bdd-84006c60ec4b type Phrase assertion.
- d35c78c4-748a-4706-a8b1-cee0aa47f9ab type Concept assertion.
- d80380ce-e484-491a-b68b-e978fd2930c2 type Concept assertion.
- d98db78b-77a7-454e-af9d-4ef2514949dc type Lemma assertion.
- dba77b48-6599-4bbd-82d2-f9d4cfd9abd8 type Lemma assertion.
- dfdcee9f-16ae-4a8f-bae0-4b0b69a73d53 type Concept assertion.
- e08487bf-fd3c-4882-90b4-5f9ce844a146 type IPTC assertion.
- e193bcdc-d8c4-4b4e-8331-c1d46c14c206 type Phrase assertion.
- f7b300ce-ad48-4976-9503-40223a6a9144 type Concept assertion.
- faf7e5b3-b0ef-4e09-b55c-a745db352edf type Phrase assertion.
- ff38c043-b00d-41e1-b3e0-4f9aa80de812 type Concept assertion.
- 0000-0001-8197-3303 type Person assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f mainEntity "workflow/Snakefile" assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f importedBy 0000-0003-2388-0744 assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd sdPublisher "https://about.workflowhub.eu/" assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f creativeWorkStatus "Work-in-progress" assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd producer "https://workflowhub.eu/projects/148" assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f description 3fdc0374-95f4-4c7d-928c-24dd80fbd26f assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f description "# prepareChIPs This is a simple `snakemake` workflow template for preparing **single-end** ChIP-Seq data. The steps implemented are: 1. Download raw fastq files from SRA 2. Trim and Filter raw fastq files using `AdapterRemoval` 3. Align to the supplied genome using `bowtie2` 4. Deduplicate Alignments using `Picard MarkDuplicates` 5. Call Macs2 Peaks using `macs2` A pdf of the rulegraph is available [here](workflow/rules/rulegraph.pdf) Full details for each step are given below. Any additional parameters for tools can be specified using `config/config.yml`, along with many of the requisite paths To run the workflow with default settings, simply run as follows (after editing `config/samples.tsv`) ```bash snakemake --use-conda --cores 16 ``` If running on an HPC cluster, a snakemake profile will required for submission to the queueing system and appropriate resource allocation. Please discuss this will your HPC support team. Nodes may also have restricted internet access and rules which download files may not work on many HPCs. Please see below or discuss this with your support team Whilst no snakemake wrappers are explicitly used in this workflow, the underlying scripts are utilised where possible to minimise any issues with HPC clusters with restrictions on internet access. These scripts are based on `v1.31.1` of the snakemake wrappers ### Important Note Regarding OSX Systems It should be noted that this workflow is **currently incompatible with OSX-based systems**. There are two unsolved issues 1. `fasterq-dump` has a bug which is specific to conda environments. This has been updated in v3.0.3 but this patch has not yet been made available to conda environments for OSX. Please check [here](https://anaconda.org/bioconda/sra-tools) to see if this has been updated. 2. The following error appears in some OSX-based R sessions, in a system-dependent manner: ``` Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found ``` The fix for this bug is currently unknown ## Download Raw Data ### Outline The file `samples.tsv` is used to specify all steps for this workflow. This file must contain the columns: `accession`, `target`, `treatment` and `input` 1. `accession` must be an SRA accession. Only single-end data is currently supported by this workflow 2. `target` defines the ChIP target. All files common to a target and treatment will be used to generate summarised coverage in bigWig Files 3. `treatment` defines the treatment group each file belongs to. If only one treatment exists, simply use the value 'control' or similar for every file 4. `input` should contain the accession for the relevant input sample. These will only be downloaded once. Valid input samples are *required* for this workflow As some HPCs restrict internet access for submitted jobs, *it may be prudent to run the initial rules in an interactive session* if at all possible. This can be performed using the following (with 2 cores provided as an example) ```bash snakemake --use-conda --until get_fastq --cores 2 ``` ### Outputs - Downloaded files will be gzipped and written to `data/fastq/raw`. - `FastQC` and `MultiQC` will also be run, with output in `docs/qc/raw` Both of these directories are able to be specified as relative paths in `config.yml` ## Read Filtering ### Outline Read trimming is performed using [AdapterRemoval](https://adapterremoval.readthedocs.io/en/stable/). Default settings are customisable using config.yml, with the defaults set to discard reads shorter than 50nt, and to trim using quality scores with a threshold of Q30. ### Outputs - Trimmed fastq.gz files will be written to `data/fastq/trimmed` - `FastQC` and `MultiQC` will also be run, with output in `docs/qc/trimmed` - AdapterRemoval 'settings' files will be written to `output/adapterremoval` ## Alignments ### Outline Alignment is performed using [`bowtie2`](https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml) and it is assumed that this index is available before running this workflow. The path and prefix must be provided using config.yml This index will also be used to produce the file `chrom.sizes` which is essential for conversion of bedGraph files to the more efficient bigWig files. ### Outputs - Alignments will be written to `data/aligned` - `bowtie2` log files will be written to `output/bowtie2` (not the conenvtional log directory) - The file `chrom.sizes` will be written to `output/annotations` Both sorted and the original unsorted alignments will be returned. However, the unsorted alignments are marked with `temp()` and can be deleted using ```bash snakemake --delete-temp-output --cores 1 ``` ## Deduplication ### Outline Deduplication is performed using [MarkDuplicates](https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-) from the Picard set of tools. By default, deduplication will remove the duplicates from the set of alignments. All resultant bam files will be sorted and indexed. ### Outputs - Deduplicated alignments are written to `data/deduplicated` and are indexed - DuplicationMetrics files are written to `output/markDuplicates` ## Peak Calling ### Outline This is performed using [`macs2 callpeak`](https://pypi.org/project/MACS2/). - Peak calling will be performed on: a. each sample individually, and b. merged samples for those sharing a common ChIP target and treatment group. - Coverage bigWig files for each individual sample are produced using CPM values (i.e. Signal Per Million Reads, SPMR) - For all combinations of target and treatment coverage bigWig files are also produced, along with fold-enrichment bigWig files ### Outputs - Individual outputs are written to `output/macs2/{accession}` + Peaks are written in `narrowPeak` format along with `summits.bed` + bedGraph files are automatically converted to bigWig files, and the originals are marked with `temp()` for subsequent deletion + callpeak log files are also added to this directory - Merged outputs are written to `output/macs2/{target}/` + bedGraph Files are also converted to bigWig and marked with `temp()` + Fold-Enrichment bigWig files are also created with the original bedGraph files marked with `temp()` " assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd description "# prepareChIPs This is a simple `snakemake` workflow template for preparing **single-end** ChIP-Seq data. The steps implemented are: 1. Download raw fastq files from SRA 2. Trim and Filter raw fastq files using `AdapterRemoval` 3. Align to the supplied genome using `bowtie2` 4. Deduplicate Alignments using `Picard MarkDuplicates` 5. Call Macs2 Peaks using `macs2` A pdf of the rulegraph is available [here](workflow/rules/rulegraph.pdf) Full details for each step are given below. Any additional parameters for tools can be specified using `config/config.yml`, along with many of the requisite paths To run the workflow with default settings, simply run as follows (after editing `config/samples.tsv`) ```bash snakemake --use-conda --cores 16 ``` If running on an HPC cluster, a snakemake profile will required for submission to the queueing system and appropriate resource allocation. Please discuss this will your HPC support team. Nodes may also have restricted internet access and rules which download files may not work on many HPCs. Please see below or discuss this with your support team Whilst no snakemake wrappers are explicitly used in this workflow, the underlying scripts are utilised where possible to minimise any issues with HPC clusters with restrictions on internet access. These scripts are based on `v1.31.1` of the snakemake wrappers ### Important Note Regarding OSX Systems It should be noted that this workflow is **currently incompatible with OSX-based systems**. There are two unsolved issues 1. `fasterq-dump` has a bug which is specific to conda environments. This has been updated in v3.0.3 but this patch has not yet been made available to conda environments for OSX. Please check [here](https://anaconda.org/bioconda/sra-tools) to see if this has been updated. 2. The following error appears in some OSX-based R sessions, in a system-dependent manner: ``` Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found ``` The fix for this bug is currently unknown ## Download Raw Data ### Outline The file `samples.tsv` is used to specify all steps for this workflow. This file must contain the columns: `accession`, `target`, `treatment` and `input` 1. `accession` must be an SRA accession. Only single-end data is currently supported by this workflow 2. `target` defines the ChIP target. All files common to a target and treatment will be used to generate summarised coverage in bigWig Files 3. `treatment` defines the treatment group each file belongs to. If only one treatment exists, simply use the value 'control' or similar for every file 4. `input` should contain the accession for the relevant input sample. These will only be downloaded once. Valid input samples are *required* for this workflow As some HPCs restrict internet access for submitted jobs, *it may be prudent to run the initial rules in an interactive session* if at all possible. This can be performed using the following (with 2 cores provided as an example) ```bash snakemake --use-conda --until get_fastq --cores 2 ``` ### Outputs - Downloaded files will be gzipped and written to `data/fastq/raw`. - `FastQC` and `MultiQC` will also be run, with output in `docs/qc/raw` Both of these directories are able to be specified as relative paths in `config.yml` ## Read Filtering ### Outline Read trimming is performed using [AdapterRemoval](https://adapterremoval.readthedocs.io/en/stable/). Default settings are customisable using config.yml, with the defaults set to discard reads shorter than 50nt, and to trim using quality scores with a threshold of Q30. ### Outputs - Trimmed fastq.gz files will be written to `data/fastq/trimmed` - `FastQC` and `MultiQC` will also be run, with output in `docs/qc/trimmed` - AdapterRemoval 'settings' files will be written to `output/adapterremoval` ## Alignments ### Outline Alignment is performed using [`bowtie2`](https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml) and it is assumed that this index is available before running this workflow. The path and prefix must be provided using config.yml This index will also be used to produce the file `chrom.sizes` which is essential for conversion of bedGraph files to the more efficient bigWig files. ### Outputs - Alignments will be written to `data/aligned` - `bowtie2` log files will be written to `output/bowtie2` (not the conenvtional log directory) - The file `chrom.sizes` will be written to `output/annotations` Both sorted and the original unsorted alignments will be returned. However, the unsorted alignments are marked with `temp()` and can be deleted using ```bash snakemake --delete-temp-output --cores 1 ``` ## Deduplication ### Outline Deduplication is performed using [MarkDuplicates](https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-) from the Picard set of tools. By default, deduplication will remove the duplicates from the set of alignments. All resultant bam files will be sorted and indexed. ### Outputs - Deduplicated alignments are written to `data/deduplicated` and are indexed - DuplicationMetrics files are written to `output/markDuplicates` ## Peak Calling ### Outline This is performed using [`macs2 callpeak`](https://pypi.org/project/MACS2/). - Peak calling will be performed on: a. each sample individually, and b. merged samples for those sharing a common ChIP target and treatment group. - Coverage bigWig files for each individual sample are produced using CPM values (i.e. Signal Per Million Reads, SPMR) - For all combinations of target and treatment coverage bigWig files are also produced, along with fold-enrichment bigWig files ### Outputs - Individual outputs are written to `output/macs2/{accession}` + Peaks are written in `narrowPeak` format along with `summits.bed` + bedGraph files are automatically converted to bigWig files, and the originals are marked with `temp()` for subsequent deletion + callpeak log files are also added to this directory - Merged outputs are written to `output/macs2/{target}/` + bedGraph Files are also converted to bigWig and marked with `temp()` + Fold-Enrichment bigWig files are also created with the original bedGraph files marked with `temp()`" assertion.
- Workflow-RO-Crate version "0.2.0" assertion.
- 985f7fa0-bee5-4e8d-88cc-b1aba653c3fd version "1" assertion.
- 3fdc0374-95f4-4c7d-928c-24dd80fbd26f contentSize "75099" assertion.
- 03bab0c3-27a1-4bad-9192-54fd5718efb6 contentSize "89" assertion.
- 165ceeeb-0982-40e6-a0ad-a77d0eec7316 contentSize "6262" assertion.
- 17e5cc71-3d5c-46fd-8a27-709516f324bd contentSize "24823" assertion.
- 19242bbc-9eae-461c-9535-fff02a359034 contentSize "444" assertion.
- 1b909ce5-a128-47dc-93dd-4e0a1795d8c7 contentSize "861" assertion.
- 22f177c1-06e1-49bf-ade6-d3ba16e6d622 contentSize "114" assertion.
- 23c6fd11-d3c3-4615-8946-3c11f0876dac contentSize "3680" assertion.
- 2699beb6-1816-4b09-a671-c7b4e8e27632 contentSize "1427" assertion.
- 26f96889-0418-4090-92f3-3040ccf7daad contentSize "95" assertion.
- 2d4cc403-0af3-4ea5-9cae-06b8e07ae93f contentSize "1162" assertion.
- 2e217125-2078-43e1-a55c-b9db9862ce5a contentSize "1107" assertion.