Matches in Nanopublications for { ?s <http://schema.org/description> ?o ?g. }
- f4351d12-996f-42c0-a920-dbc4513691c5 description "This folder contains project documents such as DMP, link to website and github repository, etc." assertion.
- f4351d12-996f-42c0-a920-dbc4513691c5 description "This folder contains project documents such as DMP, link to website and github repository, etc." assertion.
- 9b1c9bc2-f28a-449d-be32-0b36fe29ab1c description "This pitcure shows Tina Odaka presenting the Global Fish Tracking System (GFTS) DestinE DESP Use Case at the 8th International Bio-logging Science Symposium, Tokyo, Japan (4-8 March 2024)." assertion.
- 9b1c9bc2-f28a-449d-be32-0b36fe29ab1c description "This pitcure shows Tina Odaka presenting the Global Fish Tracking System (GFTS) DestinE DESP Use Case at the 8th International Bio-logging Science Symposium, Tokyo, Japan (4-8 March 2024)." assertion.
- a09a17f7-75cb-4d19-8166-aea9308ce506 description "Slide extracted from the presentation to the 3rd Destination Earth User eXchange (2024)." assertion.
- a09a17f7-75cb-4d19-8166-aea9308ce506 description "Slide extracted from the presentation to the 3rd Destination Earth User eXchange (2024)." assertion.
- egusphere-egu24-10741 description "Poster presentated at EGU 2024." assertion.
- egusphere-egu24-10741 description "Poster presentated at EGU 2024." assertion.
- egusphere-egu24-15500 description "Presentation given at EGU 2024." assertion.
- egusphere-egu24-15500 description "Presentation given at EGU 2024." assertion.
- zenodo.10213946 description "Presentation given at the kick-off meting of the GFTS project." assertion.
- zenodo.10213946 description "Presentation given at the kick-off meting of the GFTS project." assertion.
- zenodo.10372387 description "Slides presented by Mathieu Woillez at the Roadshow Webinar: DestinE in action – meet the first DESP use cases (13 December 2023)" assertion.
- zenodo.10372387 description "Slides presented by Mathieu Woillez at the Roadshow Webinar: DestinE in action – meet the first DESP use cases (13 December 2023)" assertion.
- zenodo.10372387 description "Slides presented by Mathieu Woillez at the Roadshow Webinar: DestinE in action – meet the first DESP use cases (13 December 2023)" assertion.
- zenodo.10809819 description "Poster presented at the 8th InternationalBio-logging Science Symposium by Tina Odaka, March 2024." assertion.
- zenodo.10809819 description "Poster presented at the 8th InternationalBio-logging Science Symposium by Tina Odaka, March 2024." assertion.
- zenodo.10809819 description "Poster presented at the 8th InternationalBio-logging Science Symposium by Tina Odaka, March 2024." assertion.
- zenodo.11185948 description "Project Management Plan for the Global fish Tracking System Use Case on the DestinE Platform." assertion.
- zenodo.11185948 description "Project Management Plan for the Global fish Tracking System Use Case on the DestinE Platform." assertion.
- zenodo.11185948 description "Project Management Plan for the Global fish Tracking System Use Case on the DestinE Platform." assertion.
- zenodo.11186084 description "Deliverable 5.2 - Use Case Descriptor for the Global fish Tracking System Use Case on the DestinE Platform." assertion.
- zenodo.11186084 description "Deliverable 5.2 - Use Case Descriptor for the Global fish Tracking System Use Case on the DestinE Platform." assertion.
- zenodo.11186084 description "Deliverable 5.2 - Use Case Descriptor for the Global fish Tracking System Use Case on the DestinE Platform." assertion.
- zenodo.11186123 description "The Gobal Fish track system Use case Application on the DestinE Platform" assertion.
- zenodo.11186123 description "The Gobal Fish track system Use case Application on the DestinE Platform" assertion.
- zenodo.11186123 description "The Gobal Fish track system Use case Application on the DestinE Platform" assertion.
- zenodo.11186179 description "Deliverable 5.5 corresponding to the Global Fish tracking System Use Case Promotion Package" assertion.
- zenodo.11186179 description "Deliverable 5.5 corresponding to the Global Fish tracking System Use Case Promotion Package" assertion.
- zenodo.11186179 description "Deliverable 5.5 corresponding to the Global Fish tracking System Use Case Promotion Package" assertion.
- zenodo.11186191 description "This report corresponds to the Software Reuse File for the GFTS DestinE Platform Use Case. New version will be uploaded regularly." assertion.
- zenodo.11186191 description "This report corresponds to the Software Reuse File for the GFTS DestinE Platform Use Case. New version will be uploaded regularly." assertion.
- zenodo.11186191 description "This report corresponds to the Software Reuse File for the GFTS DestinE Platform Use Case. New version will be uploaded regularly." assertion.
- zenodo.11186227 description "The Software Release Plan for the Global Fish Tracking System DestinE Use Case." assertion.
- zenodo.11186227 description "The Software Release Plan for the Global Fish Tracking System DestinE Use Case." assertion.
- zenodo.11186227 description "The Software Release Plan for the Global Fish Tracking System DestinE Use Case." assertion.
- zenodo.11186257 description "The Software Requirement Specifications for the Global fish Tracking System DestinE Use Case." assertion.
- zenodo.11186257 description "The Software Requirement Specifications for the Global fish Tracking System DestinE Use Case." assertion.
- zenodo.11186257 description "The Software Requirement Specifications for the Global fish Tracking System DestinE Use Case." assertion.
- zenodo.11186288 description "The Software Verification and Validation Plan for the Global fish Tracking System DestinE Use Case." assertion.
- zenodo.11186288 description "The Software Verification and Validation Plan for the Global fish Tracking System DestinE Use Case." assertion.
- zenodo.11186288 description "The Software Verification and Validation Plan for the Global fish Tracking System DestinE Use Case." assertion.
- zenodo.11186318 description "The Software Verification and Validation Report from the Global Fish Tracking System DestinE Use Case." assertion.
- zenodo.11186318 description "The Software Verification and Validation Report from the Global Fish Tracking System DestinE Use Case." assertion.
- zenodo.11186318 description "The Software Verification and Validation Report from the Global Fish Tracking System DestinE Use Case." assertion.
- zenodo.13908850 description "Poster presented at the 2nd DestinE User eXchange Conference." assertion.
- zenodo.13908850 description "Poster presented at the 2nd DestinE User eXchange Conference." assertion.
- pangeo_openeo.png description "Image used to illustrate the joint collaboration between Pangeo and OpenEO." assertion.
- pangeo_openeo.png description "Image used to illustrate the joint collaboration between Pangeo and OpenEO." assertion.
- pangeo-openeo-BiDS-2023 description "It points to the rendered version of the Pangeo & OpenEO training material that was generated with Jupyter book. Please bear in mind that it links to the latest version and not necessarily to the exact version used during the training. The bids2023 release is the source code (markdown & Jupyter book) used for the actual training." assertion.
- pangeo-openeo-BiDS-2023 description "It points to the rendered version of the Pangeo & OpenEO training material that was generated with Jupyter book. Please bear in mind that it links to the latest version and not necessarily to the exact version used during the training. The bids2023 release is the source code (markdown & Jupyter book) used for the actual training." assertion.
- 80e2215b-49cb-456e-9ae9-803f3bcdbba3 description "BiDS - Big Data from Space Big Data from Space 2023 (BiDS) brings together key actors from industry, academia, EU entities and government to reveal user needs, exchange ideas and showcase latest technical solutions and applications touching all aspects of space and big data technologies. The 2023 edition of BiDS will focus not only on the technologies enabling insight and foresight inferable from big data, but will emphasize how these technologies impact society. More information can be found on the BiDS’23 website. ## Pangeo & OpenEO tutorial The tutorials are divided in 3 parts: - Introduction to Pangeo - Introduction to OpenEO - Unlocking the Power of Space Data with Pangeo & OpenEO The workshop timelines, setup and content are accessible online at [https://pangeo-data.github.io/pangeo-openeo-BiDS-2023](https://pangeo-data.github.io/pangeo-openeo-BiDS-2023)." assertion.
- 80e2215b-49cb-456e-9ae9-803f3bcdbba3 description "BiDS - Big Data from Space Big Data from Space 2023 (BiDS) brings together key actors from industry, academia, EU entities and government to reveal user needs, exchange ideas and showcase latest technical solutions and applications touching all aspects of space and big data technologies. The 2023 edition of BiDS will focus not only on the technologies enabling insight and foresight inferable from big data, but will emphasize how these technologies impact society. More information can be found on the BiDS’23 website. ## Pangeo & OpenEO tutorial The tutorials are divided in 3 parts: - Introduction to Pangeo - Introduction to OpenEO - Unlocking the Power of Space Data with Pangeo & OpenEO The workshop timelines, setup and content are accessible online at [https://pangeo-data.github.io/pangeo-openeo-BiDS-2023](https://pangeo-data.github.io/pangeo-openeo-BiDS-2023)." assertion.
- 0da58fff-a17b-4ec7-97d1-3eb9cb89e1cf description "Jupyter Book source code that was collaboratively developed to deliver Pangeo & OpenEO training at BiDS 2023. All the Jupyter Notebooks are made available under MIT license.The tutorial is written in markdown and can be rendered in HTML using Jupyter book." assertion.
- 0da58fff-a17b-4ec7-97d1-3eb9cb89e1cf description "Jupyter Book source code that was collaboratively developed to deliver Pangeo & OpenEO training at BiDS 2023. All the Jupyter Notebooks are made available under MIT license.The tutorial is written in markdown and can be rendered in HTML using Jupyter book." assertion.
- dd5c3d62-b632-46a1-99e4-761f2e6cb60d description dd5c3d62-b632-46a1-99e4-761f2e6cb60d assertion.
- dd5c3d62-b632-46a1-99e4-761f2e6cb60d description dd5c3d62-b632-46a1-99e4-761f2e6cb60d assertion.
- dd5c3d62-b632-46a1-99e4-761f2e6cb60d description "## Summary HPPIDiscovery is a scientific workflow to augment, predict and perform an insilico curation of host-pathogen Protein-Protein Interactions (PPIs) using graph theory to build new candidate ppis and machine learning to predict and evaluate them by combining multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations.<br> HPPIDiscovery contains three main steps: (i) acquirement of pathogen and host proteins information from seed ppis provided by HPIDB search methods, (ii) Model training and generation of new candidate ppis from HPIDB seed proteins' partners, and (iii) Evaluation of new candidate ppis and results exportation. (i) The first step acquires the identification of the taxonomy ids of the host and pathogen organisms in the result files. Then it proceeds parsing and cleaning the HPIDB results and downloading the protein interactions of the found organisms from the STRING database. The string protein identifiers are also mapped using the id mapping tool of uniprot API and we retrieve the uniprot entry ids along with the functional annotations, sequence, domain and kegg enzymes. (ii) The second step builds the training dataset using the non redundant hpidb validated interactions of each genome as positive set and random string low confidence ppis from each genome as negative set. Then, PredPrin tool is executed in the training mode to obtain the model that will evaluate the new candidate PPIs. The new ppis are then generated by performing a pairwise combination of string partners of host and pathogen hpidb proteins. Finally, (iii) in the third step, the predprin tool is used in the test mode to evaluate the new ppis and generate the reports and list of positively predicted ppis. The figure below illustrates the steps of this workflow. ## Requirements: * Edit the configuration file (config.yaml) according to your own data, filling out the following fields: - base_data: location of the organism folders directory, example: /home/user/data/genomes - parameters_file: Since this workflow may perform parallel processing of multiple organisms at the same time, you must prepate a tabulated file containng the genome folder names located in base data, where the hpidb files are located. Example: /home/user/data/params.tsv. It must have the following columns: genome (folder name), hpidb_seed_network (the result exported by one of the search methods available in hpidb database), hpidb_search_method (the type of search used to generate the results) and target_taxon (the target taxon id). The column hpidb_source may have two values: keyword or homology. In the keyword mode, you provide a taxonomy, protein name, publication id or detection method and you save all results (mitab.zip) in the genome folder. Finally, in the homology mode allows the user to search for host pathogen ppis giving as input fasta sequences of a set of proteins of the target pathgen for enrichment (so you have to select the search for a pathogen set) and you save the zip folder results (interaction data) in the genome folder. This option is extremely useful when you are not sure that your organism has validated protein interactions, then it finds validated interactions from the closest proteins in the database. In case of using the homology mode, the identifiers of the pathogens' query fasta sequences must be a Uniprot ID. All the query protein IDs must belong to the same target organism (taxon id). - model_file: path of a previously trained model in joblib format (if you want to train from the known validated PPIs given as seeds, just put a 'None' value) ## Usage Instructions The steps below consider the creation of a sqlite database file with all he tasks events which can be used after to retrieve the execution time taken by the tasks. It is possible run locally too (see luigi's documentation to change the running command). <br><br> * Preparation: 1. ````git clone https://github.com/YasCoMa/hppidiscovery.git```` 2. ````cd hppidiscovery```` 3. ````mkdir luigi_log```` 4. ````luigid --background --logdir luigi_log```` (start luigi server) 5. conda env create -f hp_ppi_augmentation.yml 6. conda activate hp_ppi_augmentation 6.1. (execute ````pip3 install wget```` (it is not installed in the environment)) 7. run ````pwd```` command and get the full path 8. Substitute in config_example.yaml with the full path obtained in the previous step 9. Download SPRINT pre-computed similarities in https://www.csd.uwo.ca/~ilie/SPRINT/precomputed_similarities.zip and unzip it inside workflow_hpAugmentation/predprin/core/sprint/HSP/ 10. ````cd workflow_hpAugmentation/predprin/```` 11. Uncompress annotation_data.zip 12. Uncompress sequence_data.zip 13. ````cd ../../```` 14. ````cd workflow_hpAugmentation```` 15. snake -n (check the plan of jobs, it should return no errors and exceptions) 16. snakemake -j 4 (change this number according the number of genomes to analyse and the amount of cores available in your machine)" assertion.
- dd5c3d62-b632-46a1-99e4-761f2e6cb60d description "## Summary HPPIDiscovery is a scientific workflow to augment, predict and perform an insilico curation of host-pathogen Protein-Protein Interactions (PPIs) using graph theory to build new candidate ppis and machine learning to predict and evaluate them by combining multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations.<br> HPPIDiscovery contains three main steps: (i) acquirement of pathogen and host proteins information from seed ppis provided by HPIDB search methods, (ii) Model training and generation of new candidate ppis from HPIDB seed proteins' partners, and (iii) Evaluation of new candidate ppis and results exportation. (i) The first step acquires the identification of the taxonomy ids of the host and pathogen organisms in the result files. Then it proceeds parsing and cleaning the HPIDB results and downloading the protein interactions of the found organisms from the STRING database. The string protein identifiers are also mapped using the id mapping tool of uniprot API and we retrieve the uniprot entry ids along with the functional annotations, sequence, domain and kegg enzymes. (ii) The second step builds the training dataset using the non redundant hpidb validated interactions of each genome as positive set and random string low confidence ppis from each genome as negative set. Then, PredPrin tool is executed in the training mode to obtain the model that will evaluate the new candidate PPIs. The new ppis are then generated by performing a pairwise combination of string partners of host and pathogen hpidb proteins. Finally, (iii) in the third step, the predprin tool is used in the test mode to evaluate the new ppis and generate the reports and list of positively predicted ppis. The figure below illustrates the steps of this workflow. ## Requirements: * Edit the configuration file (config.yaml) according to your own data, filling out the following fields: - base_data: location of the organism folders directory, example: /home/user/data/genomes - parameters_file: Since this workflow may perform parallel processing of multiple organisms at the same time, you must prepate a tabulated file containng the genome folder names located in base data, where the hpidb files are located. Example: /home/user/data/params.tsv. It must have the following columns: genome (folder name), hpidb_seed_network (the result exported by one of the search methods available in hpidb database), hpidb_search_method (the type of search used to generate the results) and target_taxon (the target taxon id). The column hpidb_source may have two values: keyword or homology. In the keyword mode, you provide a taxonomy, protein name, publication id or detection method and you save all results (mitab.zip) in the genome folder. Finally, in the homology mode allows the user to search for host pathogen ppis giving as input fasta sequences of a set of proteins of the target pathgen for enrichment (so you have to select the search for a pathogen set) and you save the zip folder results (interaction data) in the genome folder. This option is extremely useful when you are not sure that your organism has validated protein interactions, then it finds validated interactions from the closest proteins in the database. In case of using the homology mode, the identifiers of the pathogens' query fasta sequences must be a Uniprot ID. All the query protein IDs must belong to the same target organism (taxon id). - model_file: path of a previously trained model in joblib format (if you want to train from the known validated PPIs given as seeds, just put a 'None' value) ## Usage Instructions The steps below consider the creation of a sqlite database file with all he tasks events which can be used after to retrieve the execution time taken by the tasks. It is possible run locally too (see luigi's documentation to change the running command). <br><br> * Preparation: 1. ````git clone https://github.com/YasCoMa/hppidiscovery.git```` 2. ````cd hppidiscovery```` 3. ````mkdir luigi_log```` 4. ````luigid --background --logdir luigi_log```` (start luigi server) 5. conda env create -f hp_ppi_augmentation.yml 6. conda activate hp_ppi_augmentation 6.1. (execute ````pip3 install wget```` (it is not installed in the environment)) 7. run ````pwd```` command and get the full path 8. Substitute in config_example.yaml with the full path obtained in the previous step 9. Download SPRINT pre-computed similarities in https://www.csd.uwo.ca/~ilie/SPRINT/precomputed_similarities.zip and unzip it inside workflow_hpAugmentation/predprin/core/sprint/HSP/ 10. ````cd workflow_hpAugmentation/predprin/```` 11. Uncompress annotation_data.zip 12. Uncompress sequence_data.zip 13. ````cd ../../```` 14. ````cd workflow_hpAugmentation```` 15. snake -n (check the plan of jobs, it should return no errors and exceptions) 16. snakemake -j 4 (change this number according the number of genomes to analyse and the amount of cores available in your machine)" assertion.
- 13c69a83-de3f-4379-b137-6a12d45bf6e7 description "## Summary HPPIDiscovery is a scientific workflow to augment, predict and perform an insilico curation of host-pathogen Protein-Protein Interactions (PPIs) using graph theory to build new candidate ppis and machine learning to predict and evaluate them by combining multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations.<br> HPPIDiscovery contains three main steps: (i) acquirement of pathogen and host proteins information from seed ppis provided by HPIDB search methods, (ii) Model training and generation of new candidate ppis from HPIDB seed proteins' partners, and (iii) Evaluation of new candidate ppis and results exportation. (i) The first step acquires the identification of the taxonomy ids of the host and pathogen organisms in the result files. Then it proceeds parsing and cleaning the HPIDB results and downloading the protein interactions of the found organisms from the STRING database. The string protein identifiers are also mapped using the id mapping tool of uniprot API and we retrieve the uniprot entry ids along with the functional annotations, sequence, domain and kegg enzymes. (ii) The second step builds the training dataset using the non redundant hpidb validated interactions of each genome as positive set and random string low confidence ppis from each genome as negative set. Then, PredPrin tool is executed in the training mode to obtain the model that will evaluate the new candidate PPIs. The new ppis are then generated by performing a pairwise combination of string partners of host and pathogen hpidb proteins. Finally, (iii) in the third step, the predprin tool is used in the test mode to evaluate the new ppis and generate the reports and list of positively predicted ppis. The figure below illustrates the steps of this workflow. ## Requirements: * Edit the configuration file (config.yaml) according to your own data, filling out the following fields: - base_data: location of the organism folders directory, example: /home/user/data/genomes - parameters_file: Since this workflow may perform parallel processing of multiple organisms at the same time, you must prepate a tabulated file containng the genome folder names located in base data, where the hpidb files are located. Example: /home/user/data/params.tsv. It must have the following columns: genome (folder name), hpidb_seed_network (the result exported by one of the search methods available in hpidb database), hpidb_search_method (the type of search used to generate the results) and target_taxon (the target taxon id). The column hpidb_source may have two values: keyword or homology. In the keyword mode, you provide a taxonomy, protein name, publication id or detection method and you save all results (mitab.zip) in the genome folder. Finally, in the homology mode allows the user to search for host pathogen ppis giving as input fasta sequences of a set of proteins of the target pathgen for enrichment (so you have to select the search for a pathogen set) and you save the zip folder results (interaction data) in the genome folder. This option is extremely useful when you are not sure that your organism has validated protein interactions, then it finds validated interactions from the closest proteins in the database. In case of using the homology mode, the identifiers of the pathogens' query fasta sequences must be a Uniprot ID. All the query protein IDs must belong to the same target organism (taxon id). - model_file: path of a previously trained model in joblib format (if you want to train from the known validated PPIs given as seeds, just put a 'None' value) ## Usage Instructions The steps below consider the creation of a sqlite database file with all he tasks events which can be used after to retrieve the execution time taken by the tasks. It is possible run locally too (see luigi's documentation to change the running command). <br><br> * Preparation: 1. ````git clone https://github.com/YasCoMa/hppidiscovery.git```` 2. ````cd hppidiscovery```` 3. ````mkdir luigi_log```` 4. ````luigid --background --logdir luigi_log```` (start luigi server) 5. conda env create -f hp_ppi_augmentation.yml 6. conda activate hp_ppi_augmentation 6.1. (execute ````pip3 install wget```` (it is not installed in the environment)) 7. run ````pwd```` command and get the full path 8. Substitute in config_example.yaml with the full path obtained in the previous step 9. Download SPRINT pre-computed similarities in https://www.csd.uwo.ca/~ilie/SPRINT/precomputed_similarities.zip and unzip it inside workflow_hpAugmentation/predprin/core/sprint/HSP/ 10. ````cd workflow_hpAugmentation/predprin/```` 11. Uncompress annotation_data.zip 12. Uncompress sequence_data.zip 13. ````cd ../../```` 14. ````cd workflow_hpAugmentation```` 15. snake -n (check the plan of jobs, it should return no errors and exceptions) 16. snakemake -j 4 (change this number according the number of genomes to analyse and the amount of cores available in your machine)" assertion.
- 13c69a83-de3f-4379-b137-6a12d45bf6e7 description "## Summary HPPIDiscovery is a scientific workflow to augment, predict and perform an insilico curation of host-pathogen Protein-Protein Interactions (PPIs) using graph theory to build new candidate ppis and machine learning to predict and evaluate them by combining multiple PPI detection methods of proteins according to three categories: structural, based on primary aminoacid sequence and functional annotations.<br> HPPIDiscovery contains three main steps: (i) acquirement of pathogen and host proteins information from seed ppis provided by HPIDB search methods, (ii) Model training and generation of new candidate ppis from HPIDB seed proteins' partners, and (iii) Evaluation of new candidate ppis and results exportation. (i) The first step acquires the identification of the taxonomy ids of the host and pathogen organisms in the result files. Then it proceeds parsing and cleaning the HPIDB results and downloading the protein interactions of the found organisms from the STRING database. The string protein identifiers are also mapped using the id mapping tool of uniprot API and we retrieve the uniprot entry ids along with the functional annotations, sequence, domain and kegg enzymes. (ii) The second step builds the training dataset using the non redundant hpidb validated interactions of each genome as positive set and random string low confidence ppis from each genome as negative set. Then, PredPrin tool is executed in the training mode to obtain the model that will evaluate the new candidate PPIs. The new ppis are then generated by performing a pairwise combination of string partners of host and pathogen hpidb proteins. Finally, (iii) in the third step, the predprin tool is used in the test mode to evaluate the new ppis and generate the reports and list of positively predicted ppis. The figure below illustrates the steps of this workflow. ## Requirements: * Edit the configuration file (config.yaml) according to your own data, filling out the following fields: - base_data: location of the organism folders directory, example: /home/user/data/genomes - parameters_file: Since this workflow may perform parallel processing of multiple organisms at the same time, you must prepate a tabulated file containng the genome folder names located in base data, where the hpidb files are located. Example: /home/user/data/params.tsv. It must have the following columns: genome (folder name), hpidb_seed_network (the result exported by one of the search methods available in hpidb database), hpidb_search_method (the type of search used to generate the results) and target_taxon (the target taxon id). The column hpidb_source may have two values: keyword or homology. In the keyword mode, you provide a taxonomy, protein name, publication id or detection method and you save all results (mitab.zip) in the genome folder. Finally, in the homology mode allows the user to search for host pathogen ppis giving as input fasta sequences of a set of proteins of the target pathgen for enrichment (so you have to select the search for a pathogen set) and you save the zip folder results (interaction data) in the genome folder. This option is extremely useful when you are not sure that your organism has validated protein interactions, then it finds validated interactions from the closest proteins in the database. In case of using the homology mode, the identifiers of the pathogens' query fasta sequences must be a Uniprot ID. All the query protein IDs must belong to the same target organism (taxon id). - model_file: path of a previously trained model in joblib format (if you want to train from the known validated PPIs given as seeds, just put a 'None' value) ## Usage Instructions The steps below consider the creation of a sqlite database file with all he tasks events which can be used after to retrieve the execution time taken by the tasks. It is possible run locally too (see luigi's documentation to change the running command). <br><br> * Preparation: 1. ````git clone https://github.com/YasCoMa/hppidiscovery.git```` 2. ````cd hppidiscovery```` 3. ````mkdir luigi_log```` 4. ````luigid --background --logdir luigi_log```` (start luigi server) 5. conda env create -f hp_ppi_augmentation.yml 6. conda activate hp_ppi_augmentation 6.1. (execute ````pip3 install wget```` (it is not installed in the environment)) 7. run ````pwd```` command and get the full path 8. Substitute in config_example.yaml with the full path obtained in the previous step 9. Download SPRINT pre-computed similarities in https://www.csd.uwo.ca/~ilie/SPRINT/precomputed_similarities.zip and unzip it inside workflow_hpAugmentation/predprin/core/sprint/HSP/ 10. ````cd workflow_hpAugmentation/predprin/```` 11. Uncompress annotation_data.zip 12. Uncompress sequence_data.zip 13. ````cd ../../```` 14. ````cd workflow_hpAugmentation```` 15. snake -n (check the plan of jobs, it should return no errors and exceptions) 16. snakemake -j 4 (change this number according the number of genomes to analyse and the amount of cores available in your machine)" assertion.
- edit?usp=sharing description "This file uses new_atrix.csv as a starting point and filtered by dates e.g. from 1st January 2022 to 31 December 2022. Then a pivot table is created where sort, q and reslabel were selected for column and c for row." assertion.
- edit?usp=sharing description "This file uses new_atrix.csv as a starting point and filtered by dates e.g. from 1st January 2022 to 31 December 2022. Then a pivot table is created where sort, q and reslabel were selected for column and c for row." assertion.
- edit?usp=sharing description "This google sheet contains the FIP convergence matrix for year 2022. The tab "matrix" is the main tab while FERs are unique list of FERs and Communities tab the list of unique names for communities." assertion.
- edit?usp=sharing description "This google sheet contains the FIP convergence matrix for year 2022. The tab "matrix" is the main tab while FERs are unique list of FERs and Communities tab the list of unique names for communities." assertion.
- 073ab8fc-67b3-4ec7-915e-17ffb47f09c5 description "This Research Object contains all the data used for producing a FIP convergence matrix for the year 2022. The raw data has been fetch from [https://github.com/peta-pico/dsw-nanopub-api](https://github.com/peta-pico/dsw-nanopub-api) on Thursday 5 October 2023. The original matrix called new_matrix.csv ([https://github.com/peta-pico/dsw-nanopub-api/blob/main/tables/new_matrix.csv](https://github.com/peta-pico/dsw-nanopub-api/blob/main/tables/new_matrix.csv)) is stored in the raw data folder for reference. The methodology used to create the FIP convergence Matrix is detailed in the presentation from Barbara Magagna [https://osf.io/de6su/](https://osf.io/de6su/)." assertion.
- 073ab8fc-67b3-4ec7-915e-17ffb47f09c5 description "This Research Object contains all the data used for producing a FIP convergence matrix for the year 2022. The raw data has been fetch from [https://github.com/peta-pico/dsw-nanopub-api](https://github.com/peta-pico/dsw-nanopub-api) on Thursday 5 October 2023. The original matrix called new_matrix.csv ([https://github.com/peta-pico/dsw-nanopub-api/blob/main/tables/new_matrix.csv](https://github.com/peta-pico/dsw-nanopub-api/blob/main/tables/new_matrix.csv)) is stored in the raw data folder for reference. The methodology used to create the FIP convergence Matrix is detailed in the presentation from Barbara Magagna [https://osf.io/de6su/](https://osf.io/de6su/)." assertion.
- 7683f508-3363-4c9a-8eb2-12d31d7e5a4a description "Partial view of the FIP convergence matrix for illustration purposes only." assertion.
- 7683f508-3363-4c9a-8eb2-12d31d7e5a4a description "Partial view of the FIP convergence matrix for illustration purposes only." assertion.
- a55b4924-4d0b-4a17-8a5c-22dce26fcf6c description "This CSV file is the result of a SPARQL query executed by a GitHub action." assertion.
- a55b4924-4d0b-4a17-8a5c-22dce26fcf6c description "This CSV file is the result of a SPARQL query executed by a GitHub action." assertion.
- bc3f8893-19ec-4d72-a1fb-20fc92244634 description "PDF file generated from the FIP convergence Matrix google sheet." assertion.
- bc3f8893-19ec-4d72-a1fb-20fc92244634 description "PDF file generated from the FIP convergence Matrix google sheet." assertion.
- 0a79368c-5f22-42b6-b315-3e76354918f9 description "This repository contains the python code to reproduce the experiments in Dłotko, Gurnari "Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems"" assertion.
- 0a79368c-5f22-42b6-b315-3e76354918f9 description "This repository contains the python code to reproduce the experiments in Dłotko, Gurnari "Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems"" assertion.
- ed0379fa-4990-4eb0-8375-3c3572847495 description "This repository contains the python code to reproduce the experiments in Dłotko, Gurnari "Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems"" assertion.
- ed0379fa-4990-4eb0-8375-3c3572847495 description "This repository contains the python code to reproduce the experiments in Dłotko, Gurnari "Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems"" assertion.
- content description "This Notebook provides a workflow of ArcGis toolboxes to identify ML targets from bathynetry." assertion.
- content description "This Notebook provides a workflow of ArcGis toolboxes to identify ML targets from bathynetry." assertion.
- content description "This Notebook provides a workflow of ArcGis toolboxes to identify ML targets from bathynetry." assertion.
- content description "This Notebook provides a workflow of ArcGis toolboxes to identify ML targets from bathynetry." assertion.
- ro-id.EHMJMDN68Q description "It reads and elaborates ASCII file produced with FM Midwater in order to calculate some statistics, mean, standard deviation of backscatter, depth, angle of net for each ping, and to produce a graph with the track of the net" assertion.
- ro-id.EHMJMDN68Q description "It reads and elaborates ASCII file produced with FM Midwater in order to calculate some statistics, mean, standard deviation of backscatter, depth, angle of net for each ping, and to produce a graph with the track of the net" assertion.
- ro-id.IKMY8URJ9Q description "It calculates the sink velocity of a net floating in water starting from water column data." assertion.
- ro-id.IKMY8URJ9Q description "It calculates the sink velocity of a net floating in water starting from water column data." assertion.
- 058215c9-d7d3-4e78-8f49-5655f899d5e3 description "A dedicated workflow in ArcGIS (shared as Jupyter notebook) was developed to identify targets from the bathymetry within the MAELSTROM Project - Smart technology for MArinE Litter SusTainable RemOval and Management. In that framework, the workflow identified marine litter on the seafloor starting from a bathymetric surface collected in Sacca Fisola (Venice Lagoon) on 2021." assertion.
- 058215c9-d7d3-4e78-8f49-5655f899d5e3 description "A dedicated workflow in ArcGIS (shared as Jupyter notebook) was developed to identify targets from the bathymetry within the MAELSTROM Project - Smart technology for MArinE Litter SusTainable RemOval and Management. In that framework, the workflow identified marine litter on the seafloor starting from a bathymetric surface collected in Sacca Fisola (Venice Lagoon) on 2021." assertion.
- 2ca9dde1-bb6c-4a17-8a9c-121bbf83ac7b description "Marine Litter Identification from Bathymetry" assertion.
- 2ca9dde1-bb6c-4a17-8a9c-121bbf83ac7b description "Marine Litter Identification from Bathymetry" assertion.
- 515fd558-2e2e-41e6-a613-71146ba0866c description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of ROs developed within MarGnet Project" assertion.
- 515fd558-2e2e-41e6-a613-71146ba0866c description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of ROs developed within MarGnet Project" assertion.
- 51d98bb1-2fc4-4468-8f17-c9c05edb7b57 description "Schema of the ArcGIS toolnested in the Jupyter Notebook" assertion.
- 51d98bb1-2fc4-4468-8f17-c9c05edb7b57 description "Schema of the ArcGIS toolnested in the Jupyter Notebook" assertion.
- ce7f4ce9-0a6e-41a1-b1ec-f95d1f883c5a description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of the tool" assertion.
- ce7f4ce9-0a6e-41a1-b1ec-f95d1f883c5a description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of the tool" assertion.
- df584059-663d-4da3-8eec-d01467544383 description "Workflow requirements" assertion.
- df584059-663d-4da3-8eec-d01467544383 description "Workflow requirements" assertion.
- fc64e1c2-a7de-41db-a2b2-372b81cda150 description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of the application of the tool" assertion.
- fc64e1c2-a7de-41db-a2b2-372b81cda150 description "EASME/EMFF/2017/1.2.1.12/S2/05/SI2.789314 MarGnet official document with description of the application of the tool" assertion.
- 1873-0604.2012018 description "This paper presents a semi-automated method to recognize, spatially delineate and characterise morphometrically pockmarks at the seabed" assertion.