사용 통계

No 분석 파이프라인 이름 설명 이용 횟수
1 POSTECH_EPIGENOME_SEQUENCING_FASTQC_BOWTIE_MACS_PIPING 각 단계에서 진행되는 분석 과정은 다음과 같다. Quality control 단계에서 입력 데이터의 sequencing quality를 확인한다. Quality filter 단계에서 데이터 중 quality가 낮은 reads를 제거한다. Alignment 단계에서 참조 서열에 기반 해 데이터를 mapping 한다. Cross correlation 단계에서 그 결과에 대해 quality control을 한다. Peak calling 단계에서 유의미한 부위인 peaks를 탐색한다. 이 때, MACS을 사용한다. Annotation 단계에서는 앞 단계에서 찾은 부위들에 대한 상세한 설명을 덧붙인다. Visualization 단계에서는 mapping 데이터와 peaks 데이터를 시각화 한다. 122
2 RNASeq_TOPHAT2_CUFFLINKS_PIPELINE This pipeline Analyze and processes RNA_seq sample _then it assembles transcripts_ estimates their abundances_ and tests for differential expression and regulation in RNA_Seq samples using CUFFLINK_ 64
3 POSTECH_BROAD_SOURCE_CHIP_SEQ_FASTQC_BWA_MACS2_PIPING 각 단계에서 진행되는 분석 과정은 다음과 같다. Quality control 단계에서 입력데이터의 sequencing quality를 확인한다. Quality filter 단계에서 데이터 중 quality가 낮은 reads를 제거한다. Alignment 단계에서 참조 서열에 기반 해 데이터를 mapping 한 후 Mapping이 끝난 데이터의 Mapping Quality 및 duplication level을 확인한다. Visualization 단계에서는 mapping 데이터와 peaks 데이터를 시각화 한다. Peak calling 단계에서 broad-source factor에 특화된 RSEG/SICER/hiddenDomains/BCP/MACS2를 이용해 유의미한 부위인 peak(또는 domain)를 탐색한다. Annotation 단계에서는 앞 단계에서 찾은 부위들에 대한 상세한 설명을 덧붙인다. 44
4 RNASeq_STAR_RSEM_PIPELINE This pipeline is an RNA sequencing pipeline that aligns with the STAR program and performs quntification with RSEM. 33
5 RNASeq_STAR_HTSEQ_PIPELINE This pipeline is an RNA sequencing pipeline that aligns with the STAR program and performs quantification with HTSeq. 31
6 RNASeq_KALLISTO_PIPELINE This pipeline is an RNA sequencing pipeline that performs pseudo alignment and quntification quickly using the Kallisto program_ 19
7 COLLECTIVE_GENOME_PCA_KIMURA_PIPING First_ convert the VCF file to Plink format_ Second_ convert the PED file of the converted Plink format file into the Fasta format file_ Third_ we use this modified Fasta file to generate a Pari_wise matrix of Kimura two parameter distances for all samples_ Fourth_ PCA and PCE plots for PC1 and PC2 are generated using SVD singular value decomposition using the generated pairwise matrix_ Finally_ the phylogenetic tree is drawn using the BioNJ algorithm_ which is an improved version of the neighbor joining algorithm_ using the pairwise matrix_ and then a Newick form that can be additionally used for MEGA7 is generated 15
8 RNASeq_TOPHAT2_CUFFLINKS_PIPING This pipeline Analyze and processes RNA_seq sample _then it assembles transcripts_ estimates their abundances_ and tests for differential expression and regulation in RNA_Seq samples using CUFFLINK_ 13
9 POSTECH_BROAD_SOURCE_CHIP_SEQ_FASTQC_BWA_MACS2_PIPELINE 각 단계에서 진행되는 분석 과정은 다음과 같다. Quality control 단계에서 입력데이터의 sequencing quality를 확인한다. Quality filter 단계에서 데이터 중 quality가 낮은 reads를 제거한다. Alignment 단계에서 참조 서열에 기반 해 데이터를 mapping 한 후 Mapping이 끝난 데이터의 Mapping Quality 및 duplication level을 확인한다. Visualization 단계에서는 mapping 데이터와 peaks 데이터를 시각화 한다. Peak calling 단계에서 broad-source factor에 특화된 RSEG/SICER/hiddenDomains/BCP/MACS2를 이용해 유의미한 부위인 peak(또는 domain)를 탐색한다. Annotation 단계에서는 앞 단계에서 찾은 부위들에 대한 상세한 설명을 덧붙인다. 9
10 resequencing_pipeline resequencing pipeline 8
11 RNASeq_EMSAR_PPIPELINE This pipeline Analyze the RNA_seq to get isoform_level esitmates by EMSAR_ and then it will give you gene_level expression level estimates using isoform_level esitmates_ 7
12 RNASeq_STARFUSION_PIPELINE STAR_Fusion을 이용한 RNA_Seq에서의 Fusion detection pipeline은 Quality Check와 Alignment _ Fusion Prediction의 총 2단계 과정으로 구성된다_ 각 단계에서 진행되는 분석 과정은 다음과 같다_ 첫 번째 분석 단계인_ Quality Check는 입력 데이터의 sequencing quality를 FastQC로 체크한다_ Alignment _ Fusion Prediction 단계로 넘어가기 전에 reference file_reference genome fasta file_ transcriptome annotation file_ blast matching gene pair file_ fusion annotation file_을 indexing하여 reference index를 생성한다_ 이렇게 얻어진 library index 파일을 reference로 mapping을 진행하고 Fusion prediction 과정을 거치면 최종적으로 fusion_prediction_tsv 파일을 얻게 된다_ 여러 옵션을 사용해서 결과에 annotation을 포함할 수도 있으며 fusion_prediction_tsv파일은 후속 분석에 이용하게 된다_ 7
13 POSTECH_EPIGENOME_SEQUENCING_FASTQC_BOWTIE_MACS_PIPELINE 각 단계에서 진행되는 분석 과정은 다음과 같다. Quality control 단계에서 입력 데이터의 sequencing quality를 확인한다. Quality filter 단계에서 데이터 중 quality가 낮은 reads를 제거한다. Alignment 단계에서 참조 서열에 기반 해 데이터를 mapping 한다. Cross correlation 단계에서 그 결과에 대해 quality control을 한다. Peak calling 단계에서 유의미한 부위인 peaks를 탐색한다. 이 때, MACS을 사용한다. Annotation 단계에서는 앞 단계에서 찾은 부위들에 대한 상세한 설명을 덧붙인다. Visualization 단계에서는 mapping 데이터와 peaks 데이터를 시각화 한다. 6
14 RNASeq_STAR_RSEM_PIPING This pipeline is an RNA sequencing pipeline that aligns with the STAR program and performs quntification with RSEM 6
15 COLLECTIVE_GENOME_BETWEEN_GROUPS_PI_CALCULATION_PIPING First_ calculate the Pi value according to the window size and step to be calculated by using the VCF file of the group_ Second_ we calculate the basic statistic of Pi value calculated for each group_ Third_ check whether Pi values are abnormal in order to integrally visualize Pi values calculated for each group_ Fourth_ the Pi value calculated for each group is converted into a file for visualization_ Fifth_ the converted files are merged into the final input file for visualization_ Finally_ visualize the Pi values for each group using the final input_ 5
16 COLLECTIVE_GENOME_LD_DECAY_CALCULATION_PIPING First_ calculate the average degree of collapse of LD for each group according to the distance to be calculated by using the software called PopLDdecay by the group VCF file_ Second_ we convert the decay values of LD calculated for each group into files for visualization_ Third_ the converted files are merged into one file_ which is the final input for visualizing the converted files_ Finally_ the final input is used to visualize the average LD collapse according to the distance to each group_ 5
17 COLLECTIVE_GENOME_BETWEEN_GROUPS_XP_CLR_CALCULATION_PIPING First_ separate the VCF files and the entire VCF file for each group according to the Chromosome Second_ in order to generate a reference file __ Map file_ for XPCLR calculation_ all the VCFs separated by Chromosome are generated as individual map files Third_ in the first_ the VCF file for each group separated by each chromosome is converted into XPCLR input file for each Chromosome Fourth_ the XPCLR is calculated for each chromosome using a separate XPCLR input file for each chromosome and a separate map file for each chromosome for the entire VCF Fifth_ integrate XPCLR output calculated for each Chromosome into one output Sixth_ the integrated XPCLR output is converted into a file for plotting Finally_ visualize the XPECLR output for the two groups as a Manhattan plot 4
18 COLLECTIVE_GENOME_BETWEEN_GROUPS_FST_CALCULATION_PIPING First_ using the whole VCF file_ input each sample list in the VCF of each group to be calculated_ and calculate the Fst value between the two groups by the desired Window size and step_ Second_ after calculating the basic statistic of the calculated Fst value between the two groups_ the Fst value is normalized and the P_value is calculated_ Third_ the Fst value between the two normalized groups is converted into a final input file for visualization_ Finally_ a Manhattan plot of normalized Fst values for the two groups is generated using the final input_ 4
19 COLLECTIVE_GENOME_BETWEEN_GROUPS_GENE_FLOW_CALCULATION_PIPING First_ convert the entire VCF file to Plink format and create a clust file of each sample at the same time Second_ the clust file for each generated sample is readjusted so that the group is explicitly specified_ Third_ using the re_adjusted Clust file and the file created in Step 1_ an Allele frequency file is generated to indicate which Allele is dominant in each group in each SNP of the VCF Fourth_ using this Allele frequency file_ a file representing ML tree and recent gene flow is created through Treemix Fifth_ it draws a picture visualizing through the generated file Sixth_ calculate the error of the file generated in the fourth Finally_ the error for each group is visualized as a pairwise plot_ 4
20 COLLECTIVE_GENOME_BETWEEN_GROUPS_XP_EHH_CALCULATION_PIPING First_ convert each groups VCF file to Impute format Second_ convert the _hap file among the input files of each group to the XPEHH input file for each group Third_ to generate a reference file Map file_ for calculating XPEHH a map file is created using the entire VCF Fourth_ XPEHH is calculated by using XPEHH input file and Map file for two groups Fifth_ an XPEHH out file is created by using an entire VCF to reconstruct a file for plotting the plotted XPEHH out file Finally_ visualize the XPEHH output for the two groups as a Manhattan plot 4
21 RNASeq_STAR_HTSEQ_PIPING This pipeline is an RNA sequencing pipeline that aligns with the STAR program and performs quantification with HTSeq 3
22 RNASeq_STARFUSION_PIPING STAR_Fusion을 이용한 RNA_Seq에서의 Fusion detection pipeline은 Quality Check와 Alignment _ Fusion Prediction의 총 2단계 과정으로 구성된다_ 각 단계에서 진행되는 분석 과정은 다음과 같다_ 첫 번째 분석 단계인_ Quality Check는 입력 데이터의 sequencing quality를 FastQC로 체크한다_ Alignment _ Fusion Prediction 단계로 넘어가기 전에 reference file_reference genome fasta file_ transcriptome annotation file_ blast matching gene pair file_ fusion annotation file_을 indexing하여 reference index를 생성한다_ 이렇게 얻어진 library index 파일을 reference로 mapping을 진행하고 Fusion prediction 과정을 거치면 최종적으로 fusion_prediction_tsv 파일을 얻게 된다_ 여러 옵션을 사용해서 결과에 annotation을 포함할 수도 있으며 fusion_prediction_tsv파일은 후속 분석에 이용하게 된다. 2
23 RNASeq_RSEM_VOOM_PIPELINE Quality control_ Adaptive trimming_ Alignment_ Filter reads_ Quantification_ Differential expression 총 6단계의 모듈로 구성된다_ 각 단계에서 진행되는 분석 과정은 다음과 같다_ 첫 번째 분석 단계인_ Quality control은 입력 데이터의 sequencing quality를 FastQC로 체크한다_ 그리고_ Adaptive trimming 단계는 Sickle를 이용하여 입력 데이터의 quality가 낮은reads와 adaptor를 제거한 후_ R1과 R2의pair를 맞춰서 공통 서열을 얻는다_ 이렇게 얻어진 R1과 R2의 공통서열을 Alignment 단계에서 입력으로 활용하여_ Bowtie1을 이용한 reference의 index를 생성하고_ MapSplice2로 mapping한다_ Filter reads 단계는 mapping된 데이터를 입력으로 활용하여 Picard를 이용하여 mapping된 bam file을 정렬한 후_ SamTools로 genomic location 별로 정렬한 후 performace 를 높여주기위해 indexing 한다_ 그 다음 perl script를 이용하여 reference의 순서와 같도록 chromosome order로 재정렬한 후_ Java scrpit를 이용하여 transcriptome을 annotation한 후 Indel_ Insert가 크거나 mapping이 잘되지 않은 read를 제거한다_ 이렇게 얻어진 bam file을 RSEM을 이용하여 Quantification하여 read를 count한다_ 이 과정에서 FPKM_ TPM_ read count값을 얻을 수 있다_ 마지막 Differential expression 단계에서는 R package Limma voom을 이용하여 유전자 transcripts의 expression levels를 비교하여 differentially expressed genes _DEG_를 얻는다. 2
24 RNASeq_KALLISTO_PIPING This pipeline is an RNA sequencing pipeline that performs pseudo alignment and quntification quickly using the Kallisto program 1
25 RNASeq_RSEM_VOOM_PIPING Quality control Adaptive trimming_ Alignment_ Filter reads_ Quantification_ Differential expression 총 6단계의 모듈로 구성된다_ 각 단계에서 진행되는 분석 과정은 다음과 같다_ 첫 번째 분석 단계인_ Quality control은 입력 데이터의 sequencing quality를 FastQC로 체크한다_ 그리고_ Adaptive trimming 단계는 Sickle를 이용하여 입력 데이터의 quality가 낮은reads와 adaptor를 제거한 후_ R1과 R2의pair를 맞춰서 공통 서열을 얻는다_ 이렇게 얻어진 R1과 R2의 공통서열을 Alignment 단계에서 입력으로 활용하여_ Bowtie1을 이용한 reference의 index를 생성하고_ MapSplice2로 mapping한다_ Filter reads 단계는 mapping된 데이터를 입력으로 활용하여 Picard를 이용하여 mapping된 bam file을 정렬한 후_ SamTools로 genomic location 별로 정렬한 후 performace 를 높여주기위해 indexing 한다_ 그 다음 perl script를 이용하여 reference의 순서와 같도록 chromosome order로 재정렬한 후_ Java scrpit를 이용하여 transcriptome을 annotation한 후 Indel_ Insert가 크거나 mapping이 잘되지 않은 read를 제거한다_ 이렇게 얻어진 bam file을 RSEM을 이용하여 Quantification하여 read를 count한다_ 이 과정에서 FPKM_ TPM_ read count값을 얻을 수 있다_ 마지막 Differential expression 단계에서는 R package Limma voom을 이용하여 유전자 transcripts의 expression levels를 비교하여 differentially expressed genes DEG를 얻는다. 1
26 RNASeq_EMSAR_PIPING This pipeline Analyze the RNA_seq to get isoform_level esitmates by EMSAR_ and then it will give you gene_level expression level estimates using isoform_level esitmates 1
No 분석 프로그램 설명 형태 이용 횟수
1 ubu_sort_bam Parameterize samtools properly (Alignment를 마친 BAM File 을 reference 순서와 같도록 chromosome 별로 정렬 하는 과정) LINUX 22
2 SAMTools_faidx Index reference sequence in the FASTA format or extract subsequence from indexed reference sequence. If no region is specified, faidx will index the file and create .fai on the disk. If regions are specified, the subsequences will be retrieved and printed to stdout in the FASTA format LINUX 24
3 qiime_split_libraries Split libraries according to barcodes specified in mapping file (매핑 파일에 지정된 바코드에 따라 라이브러리를 분할하는 과정) LINUX 10
4 picard_buildbamindex Generates a BAM index ".bai" file. This tool creates an index file for the input BAM that allows fast look-up of data in a BAM file, lke an index on a database. (이 도구는 데이터베이스의 인덱스와 같은 BAM 파일의 데이터를 빠르게 검색 할 수 있도록 입력 BAM에 대한 인덱스 파일을 만드는 과정) LINUX 18
5 ngsgd gender determination tool for NGS data (NGS 데이터를 이용하여 성별을 판별하는 과정) LINUX 1
6 emsar_postprocessing Process of calculating TPM value about sample using the 'gfpkm', 'gene read count' and 'gen' ('gfpkm', 'gene read count', 'gene'을 이용하여 sample에 대한 TPM값을 계산하는 과정) LINUX 10
7 cmpfastq A simple perl program that allows the user to compare QC filtered fastq files LINUX 17
8 interproscan Users who have novel nucleotide or protein sequences that they wish to functionally characterise can use the software package InterProScan to run the scanning algorithms from the InterPro database in an integrated way. Sequences are submitted in FASTA format. Matches are then calculated against all of the required member database's signatures and the results are then output in a variety of formats (fasta 포맷의 Sequences데이터를 입력 받아 scanning 알고리즘을 실행하여 데이터베이스의 모든 signatures에 대해 계산한 테이블을 만드는 과정) LINUX 23
9 homer_annotatepeaks All-in-one program for performing peak annotation (Peak calling의 결과인 peak의 genomic feature를 확인하는 과정) LINUX 25
10 decompress This program decompresses compressed files(tar.gz, tar.bz2, tar.xz, tar, gz, bz2, xz, zip) LINUX 3
11 trimmomatic_pe Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. The paired end mode will maintain correspondence of read pairs and also use the additional information contained in paired reads to better find adapter or PCR primer fragments introduced by the library preparation process (paired-end read들에서 adpater를 제거하고, illumina fastq 데이터를 자르는 과정) LINUX 1
12 hadoop_bam_index Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. Index algorism is indexing BAM file HADOOP 1
13 picard_mergesamfiles Merges multiple less than 6 SAM and/or BAM files into a single file. This tool is used for combining SAM and/or BAM files from different runs or read groups, similarly to the "merge" function of Samtools (6개 이하의 SAM 또는 BAM 파일을 병합하는 과정) LINUX 17
14 fastx_fastq_quality_filter Filters sequences based on quality (low quality sequence를 filtering 하기 위한 과정) LINUX 42
15 picard_collectalignmentsummarymetrics Produces a summary of alignment metrics from a SAM or BAM file (SAM 또는 BAM 파일에서 정렬 메티릭을 요약하는 과정) LINUX 18
16 macs_pe Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer (histone enriched regions을 찾는 과정) LINUX 25
17 bwa_aln Find the SA coordinates of the input reads. Maximum maxSeedDiff differences are allowed in the first seedLen subsequence and maximum maxDiff differences are allowed in the whole sequence (SA 좌표를 찾는 과정) LINUX 17
18 make_g2b Process of sorting chromosome gtf file and to convert bed file to gtf file (gtf 파일을 chromosome 순으로 정렬한 다음 gtf 파일을 bed 파일로 변환하는 과정) LINUX 22
19 qiime_make_phylogeny Produces this tree from a multiple sequence alignment. (multiple sequence 정렬에서 트리를 생성하는 과정) LINUX 10
20 emsar_transcript_stat Process of making stats file to express GC content, isoform information of transcript (transcript의 GC content, isoform 정보 등을 나타내는 stats file 만드는 과정) LINUX 10
21 snpeff SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes). (variant annotation 및 예측을 위한 과정, 유전자에 대한 변이형을 예측하고 표현하는 과정) LINUX 18
22 BWA_mem Align 70bp-1Mbp query sequences with the BWA-MEM algorithm. Briefly, the algorithm works by seeding alignments with maximal exact matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman algorithm (SW) LINUX 24
23 bedtools_genomecov_bam one wants to measure the genome wide coverage of a feature file. (Bam 파일의 genome 범위를 측정하는 과정) LINUX 18
24 make_g2t Process of using the gene id and transcript id creating g2t file (gene id와 그에 맞는 transcript id를 이용하여 g2t file을 만드는 과정) LINUX 10
25 samtools_flagstat Does a full pass through the input file to calculate and print statistics to stdout (전체 과정을 계산하고, print statistics 하는 과정) LINUX 39
26 SAMTools_sort Sort alignments by leftmost coordinates, or by read name when -n is used. An appropriate @HD-SO sort order header tag will be added or an existing one updated if necessary LINUX 5
27 GATK_RealignerTargetCreator Define intervals to target for local realignment LINUX 24
28 fastx_fastx_artifacts_filter Process of removing some of the base sequence artifacts(일부 염기 서열 artifacts를 제거 하는 과정) LINUX 17
29 ubu_translate Translate from genome to transcriptome coordinates (Chromosome순으로 정렬된 BAM file에 transcriptome을 annotation 하는 과정) LINUX 22
30 SAMTools_view With no options or regions specified, prints all alignments in the specified input alignment file (in SAM, BAM, or CRAM format) to standard output in SAM format (with no header) LINUX 24
31 gatk_baserecalibrator_two_databases The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, developed by the Data Science and Data Engineering group at the Broad Institute (2개의 데이터베이스로 시퀀싱 데이터를 변형, 분석하는 과정) LINUX 18
32 make_matrix_count Process of Making Matrix for limma voom(Enter 10 or fewer isoforms.result file) LINUX 22
33 Picard_MarkDuplicates Replace read groups in a BAM file.This tool enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file LINUX 24
34 samtools_rmdup Remove potential PCR duplicates- if multiple read pairs have identical external coordinates, only retain the pair with highest mapping quality. In the paired-end mode, this command ONLY works with FR orientation and requires ISIZE is correctly set. It does not work for unpaired reads (potential PCR 중복을 제거하는 과정) LINUX 43
35 qiime_filter_alignment Filter sequence alignment by removing highly variable regions. (가변적인 영역을 제거함으로써 시퀀스를 정렬하는 과정) LINUX 10
36 ubu_filtering Filter reads from a paired end SAM or BAM file (only outputs paired reads) (Indels, inserts가 크게 된 것 그리고 mapping이 잘 안 된 것은 제거하는 과정) LINUX 22
37 r_summary_for_flagstat_DepthofCoverage Group generic methods can be defined for four pre-specified groups of functions, Math, Ops, Summary and Complex. (There are no objects of these names in base R, but there are in the methods package.) (R로 요약하여, 하나의 Table로 만드는 과정) LINUX 17
38 GATK_PrintReads Write reads from SAM format file (SAM/BAM/CRAM) that pass criteria to a new file. A common use case is to subset reads by genomic interval using the -L argument. Note when applying genomic intervals, the tool is literal and does not retain mates of paired-end reads outside of the interval, if any. Data with missing mates will fail ValidateSamFile validation with MATE_NOT_FOUND, but certain tools may still analyze the data. If needed, to rescue such mates, use either FilterSamReads or ExtractOriginalAlignmentRecordsByNameSpark.By default, PrintReads applies the WellformedReadFilter at the engine level. What this means is that the tool does not print reads that fail the WellformedReadFilter filter. You can similarly apply other engine-level filters to remove specific types of reads with the --read-filter argument. See documentation category 'Read Filters' for a list of available filters. To keep reads that do not pass the WellformedReadFilter, either disable the filter with --disable-read-filter or disable all default filters with --disable-tool-default-read-filters. The reference is strictly required when handling CRAM files. LINUX 24
39 Picard_SortSam Sorts a SAM or BAM file. This tool sorts the input SAM or BAM file by coordinate, queryname (QNAME), or some other property of the SAM record. The SortOrder of a SAM/BAM file is found in the SAM file header tag @HD in the field labeled SO. LINUX 24
40 fq_split The process of splitting the fastq file (fastq 파일을 split 하는 과정) LINUX 1
41 hadoop_blastp An algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences, based on hadoop.(Hadoop 기반의 단백질의 아미노선 서열 또는 DNA 서열의 nucleotides와 같은 생물학적 서열 정보를 비교하는 과정) HADOOP 1
42 bowtie_se Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour (sam file을 생성하는 과정) LINUX 27
43 hadoop_bam_cat Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. Cat is concatenation of partial SAM and BAM files HADOOP 1
44 qiime_make_otu_mapping_table The number of times an OTU is found in each sample, and adds the taxonomic predictions for each OTU in the last column if a taxonomy file is supplied. (마지막열에 각 OTU에 대한 분류학적 예측을 추가하여 OTU table을 만드는 과정) LINUX 10
45 picard_collecinsertsizemetrics This tool provides useful metrics for validating library construction including the insert size distribution and read orientation of paired-end libraries (insert size distribution 의 통계를 제공하고, paired-end 라이브러리 방향을 제공하는 과정) LINUX 35
46 gsa_seq At Ambry, Sanger gene sequencing is performed on specific regions of DNA that have been amplified by polymerase chain reaction (PCR). Double stranded sequencing occurs in both sense and antisense directions to detect sequence variations. For Specific Site Analysis, specific region(s) of DNA is/are amplified by PCR and sequenced. Sanger sequencing is performed for any regions missing or with insufficient read depth coverage for reliable heterozygous variant detection. Suspect variant calls other than "likely benign" or "benign" are verified by Sanger sequencing (RNA-Sequencing 데이터를 실험군과 대조군과의 발현의 양을 비교하여 차이가 나는 생물학적 기능을 밝혀 주는 과정) LINUX 1
47 bwa_sampe Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly (paired-end reads로 SAM 형식으로 정렬하는 과정) LINUX 17
48 SAMTools_index Index a coordinate-sorted BAM or CRAM file for fast random access. (Note that this does not work with SAM files even if they are bgzip compressed to index such files, use tabix(1)instead.) LINUX 30
49 qiime_pick_otus The OTU picking step assigns similar sequences to operational taxonomic units, or OTUs, by clustering sequences based on a user-defined similarity threshold (임계값에 따라 시퀀스를 클러스터링하여 OTU에 유사한 시퀀스를 할당하는 과정) LINUX 10
50 picard_sortbam Sorts a BAM file. This tool sorts the input BAM file by coordinate, queryname (QNAME), or some other property of the SAM record (Bam 파일을 Sort 하는 과정) LINUX 1
51 bowtie_pe Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour (sam file을 생성하는 과정) LINUX 13
52 rsem_bam_calculate_expression Aligns input reads against a reference transcriptome with Bowtie and calculates expression values using the alignments (Filtering한 BAM file을 이용하여 reference에 따라 read를 count 하는 quantification 하는 과정) LINUX 22
53 FastQC FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis LINUX 24
54 clustalo Clustao is a general purpose multiple sequence alignment program (다목적 다중 서열 정렬 하는 과정) LINUX 14
55 Sickle_pe Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3'-end of reads and also determines when the quality is sufficiently high enough to trim the 5'-end of reads (with pair file) LINUX 24
56 gatk_analyzecovariates This tool generates plots for visualizing the quality of a recalibration run (보정 실행의 품질을 시각화 하기 위한 플롯을 생성하는 과정) LINUX 18
57 Cufflinks Cufflinks is both the name of a suite of tools and a program within that suite. Cufflinks the program assembles transcriptomes from RNA-Seq data and quantifies their expression LINUX 1
58 qiime_make_otu_table The number of times an OTU is found in each sample, and adds the taxonomic predictions for each OTU in the last column if a taxonomy file is supplied. (마지막열에 각 OTU에 대한 분류학적 예측을 추가하여 OTU table을 만드는 과정) LINUX 10
59 emsar_build_pe Process of using the Transcriptome.fa file make easy computation file it is an index file(.rsh file) (Transcriptome.fa file을 이용하여 easy computation file 인 index file(.rsh file) 생성하는 과정) LINUX 10
60 GATK_IndelRealigner The local realignment process is designed to consume one or more BAM files and to locally realign reads such that the number of mismatching bases is minimized across all the reads. In general, a large percent of regions requiring local realig ment are due to the presence of an insertion or deletion (indels) in the individual’s genome with respect to the reference genome. Such alignment artifacts result in many bases mismatching the reference near the misalignment, which are easily mistaken as SNPs. Moreover, since read mapping algorithms operate on each read independently, it is impossible to place reads on the reference genome such at mismatches are minimized across all reads. Consequently, even when some reads are correctly mapped with indels, reads covering the indel near just the start or end of the read are often incorrectly mapped with respect the true indel, also requiring realignment. Local realignment serves to transform regions with misalignments due to indels into clean reads containing a consensus indel suitable for standard variant discovery approaches LINUX 24
61 cuffmerge Transcriptome assembly and differential expression analysis for RNA-Seq (transcriptome assembly 및 RNA-Seq에 대한 미분 발현 분석 과정) LINUX 40
62 last LAST_lastdb, LAST_lastal, LAST_split, LAST_maf-swap, LAST_maf-convert LINUX 26
63 GATK_BaseRecalibrator First pass of the base quality score recalibration. Generates a recalibration table based on various covariates. The default covariates are read group, reported quality score, machine cycle, and nucleotide context. This walker generates tables based on specified covariates. It does a by-locus traversal operating only at sites that are in the known sites VCF. ExAc, gnomAD, or dbSNP resources can be used as known sites of variation. We assume that all reference mismatches we see are therefore errors and indicative of poor base quality. Since there is a large amount of data one can then calculate an empirical probability of error given the particular covariates seen at this site, where p(error) = num mismatches / num observations. The output file is a table (of the several covariate values, num observations, num mismatches, empirical quality score) LINUX 24
64 bwa_samse Generate alignments in the SAM format given single-end reads. Repetitive hits will be randomly chosen (single-end reads로 SAM 형식으로 정렬하는 과정) LINUX 17
65 qiime_align_seqs Aligns the sequences in a FASTA file to each other or to a template sequence alignment, depending on the method chosen. (fasta 파일의 서열을 템플레이트 서열 정렬에 정렬 시키는 과정) LINUX 11
66 gatk_depthofcoverage Assess sequence coverage by a wide array of metrics, partitioned by sample, read group, or library (샘플, 읽기 그룹 또는 라이브러리별로 파티셔닝 된 다양한 메트릭으로 시퀀스 범위를 평가하는 과정) LINUX 17
67 qiime_beta_diversity_through_plots perform beta diversity, principal coordinate analysis, and generate a preferences file along with 3D PCoA Plots.(배타 다양성과 주요 좌표 분석을 수행하고 3D PCoA 플롯을 생성하는 과정) LINUX 10
68 spark_bwa_mem SparkBWA MEM is a tool that integrates the Burrows-Wheeler Aligner--BWA on a Apache Spark framework running on the top of Hadoop HADOOP 2
69 Picard_AddOrReplaceReadGroups Replace read groups in a BAM file.This tool enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file LINUX 24
70 hadoop_bam_fixmate Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. Fixmate algorism has BAM and SAM mate information fixing HADOOP 1
71 qiime_pick_rep_set After picking OTUs, you can then pick a representative set of sequences. (대표적인 일련의 시퀀스를 얻는 과정) LINUX 10
72 samtools_index_bam Index a coordinate-sorted BAM or CRAM file for fast random access (BAM file에 random하게 접근할 때, performance를 높여주기 위한 indexing 과정) LINUX 86
73 Picard_CreateSequenceDictionary Creates a sequence dictionary for a reference sequence. This tool creates a sequence dictionary file (with ".dict" extension) from a reference sequence provided in FASTA format, which is required by many processing and analysis tools. The output file contains a header but no SAMRecords, and the header contains only sequence records. LINUX 24
74 qiime_assign_taxonomy Assign taxonomy to each sequence. (각 시퀀스에 taxonomy를 할당하는 과정) LINUX 10
75 GATK_AnalyzeCovariates_single Evaluate and compare base quality score recalibration tables This tool generates plots to assess the quality of a recalibration run as part of the Base Quality Score Recalibration (BQSR) procedure. Summary of the BQSR procedure The goal of this procedure is to correct for systematic bias that affects the assignment of base quality scores by the sequencer. The first pass consists of calculating error empirically and finding patterns in how error varies with basecall features over all bases. The relevant observations are written to a recalibration table. The second pass consists of applying numerical corrections to each individual basecall based on the patterns identified in the first step (recorded in the recalibration table) and writing out the recalibrated data to a new BAM or CRAM file. (with single file) LINUX 24
76 Cuffdiff Comparing expression levels of genes and transcripts in RNA-Seq experiments is a hard problem. Cuffdiff is a highly accurate tool for performing these comparisons, and can tell you not only which genes are up- or down-regulated between two or more conditions, but also which genes are differentially spliced or are undergoing other types of isoform-level regulation LINUX 1
77 muscle MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests (빠른 속도로 정확하게 시퀀스를 다중 정렬하는 과정) LINUX 3
78 Picard_FixMateInformation Verify mate-pair information between mates and fix if needed.This tool ensures that all mate-pair information is in sync between each read and its mate pair. If no OUTPUT file is upplied then the output is written to a temporary file and then copied over the INPUT file. eads marked with the secondary alignment flag are written to the output file unchanged LINUX 24
79 homer_makeucscfile The UCSC Genome Browser is quite possibly one of the best computational tools ever developed. Not only does it contain an incredible amount of data in a single application, it allows users to upload custom information such as data from their ChIP-Seq experiments so that they can be easily visualized and compared to other information (bedgraph format file을 만드는 과정) LINUX 25
80 qiime_validate_mapping_file Checks user’s metadata mapping file for required data, valid format. (메타 데이터 mapping 파일에서 유효한 형식인지 파악하는 과정) LINUX 11
81 cutadapt Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads (reads로 부터 sequences를 trim adapter하는 과정) LINUX 17
82 BWA_index Index database sequences in the FASTA format LINUX 24
83 homer_maketagdirectory To facilitate the analysis of ChIP-Seq (or any other type of short read re-sequencing data), it is useful to first transform the sequence alignment into platform independent data structure representing the experiment, analogous to loading the data into a database (Genome상에 있는 모든 위치의 ChIP-fragment density를 보여주는 과정) LINUX 25
84 qiime_alpha_rarefaction Generate rarefied OTU tables, compute alpha diversity metrics for each rarefied OTU table collate alpha diversity results and generate alpha rarefaction plots.(rarefied OTU 테이블을 생성하고, 알파 diversity 메트릭스를 계산하여 결과에 대한 대조를 통해 알파 플롯을 생성하는 과정) LINUX 10
85 samtools_index_sam Index a coordinate-sorted BAM or CRAM file for fast random access (SAM file에 random하게 접근할 때, performance를 높여주기 위한 indexing 과정) LINUX 1
86 big_bwa_mem Hadoop to boost the performance of the Burrows-Wheeler Aligner (BWA - works by seeding alignments with maximal exact matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman algorithm (SW)). (Hadoop 기반으로 BWA의 affine-gap Swmith-Watemant 알고리즘으로 시드를 확장하여 정렬하는 과정) HADOOP 4
87 hadoop_bam_sort Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. Sort algorism does sorting and merging BAM or SAM file HADOOP 1
88 qiime_summarize_taxa_through_plots Summarize OTU by Category (optional, pass -c); Summarize Taxonomy; and Plot Taxonomy Summary.(OTU를 범주별로 요약하는 과정) LINUX 10
89 r_voom The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline (mutation vs. wild type의 차이의 관계를 나타내는 design matrix를 생성하고, design matrix를 이용하여 voom transformation을 통해 quantile normalization을 하는 과정) LINUX 22