Skip to content
hero

TP 1.2: Single execution on a cluster

Goal: Identify genes of a transcript fasta file thanks to the alignment software blast (NCBI) by using cluster compute nodes.

Simple submission command

We will use sequence alignement with NCBI_Blast+ as a use case.

Interactive job

Question

Connect to a node in interactive mode.

Correct social behavior expected

Never run a calculation on a login node! Use an interactive job or a batch job.

Solution

srun --pty bash

Prerequisite

Load the NCBI_Blast+ module

module load bioinfo/NCBI_Blast+/2.10.0+

Run blast

Question

Launch a blast comparing the file contigs.fasta against ensembl_danio_rerio_pep databank in interactive mode on the cluster.

Your query is nucleic, your databank is proteic so you need to use the blastx program.

Tip

For more help on blast, type

blastx -help

Solution

# On genobioinfo cluster, NCBI blast databanks are available in /bank/blastdb,# however the cluster was configured in a way that you don't need to specify the path.blastx -query contigs.fasta -db ensembl_danio_rerio_pep \ -evalue 10e-10 -out contigs.blastx_dr

Look for running jobs

Question

Open a new terminal and connect to the cluster again. Now check all the jobs running or waiting on the cluster. In particular, check your own job.

Solution

squeue...squeue -t R...squeue -t PD...squeue -u <username>...

Question

On which node are you running ?

Solution

squeue -u toto JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1232823 workq bash toto R 0:05 1 node129

Stop a running job

Question

Kill your job.

Solution

scancel 1232823

Batch mode

Question

Use a text editor to create a command file blastn.sh with the same module load and almost the same blast command line (replace blastx with blastn and ensembl_danio_rerio_pep by ensembl_danio_rerio_cdna). The first line of the file must be :

blastn.sh
1
#!/bin/sh

Launch it in batch mode.

Solution

File blastn.sh contains:

blastn.sh
1
2
3
#!/bin/sh
module load bioinfo/NCBI_Blast+/2.10.0+
blastn -db ensembl_danio_rerio_cdna -query contigs.fasta -evalue 10e-10 -out contigs.blastn_dr

Launch it with :

sbatch blastn.sh

Check running job

Question

Check the execution. When it's over, look at the blast output file and the 2 execution trace files slurm-xxxxx.out.

Has the job finished correctly ?

Solution

# Get job statesqueue -u <username># Check blast outputless contigs.blastn_dr# Check execution logsless slurm-XXXXX.out

Batch mode with inline command

Question

Launch the same command without using a file ( option --wrap='command').

Check the execution.

When it's over, look at the blast output file and the execution trace file (slurm-xxxxx.out).

Has the job finished correctly ?

Solution

sbatch -J blastdr --wrap='module load bioinfo/NCBI_Blast+/2.10.0+; blastn -db ensembl_danio_rerio_cdna -query contigs.fasta -evalue 10e-10 -out contigs.blastn_dr'

Look at the trace file

If you didn't have any error until now, redo the previous submission with an error in the command. Have a look to the trace file.

How much ressources

Question

Look at the ressources used by previous jobs. In particular, pay attention to CPU and Memory usage.

Solution

seff <job_id>

where you replace the by your previous job id