TP 2: Multithreading and ecological behavior
Objective: Speed up a job by using many cpu on one node. Create efficent jobs in a context of digital sobriety/ecological practices.
Going faster¶
Multithreading¶
We use the work we did in TP 1.2 as a script basis:
Question
Create a script file named blastx.sh with following content:
| blastx.sh | |
|---|---|
1 2 3 4 5 6 7 | |
Edit blastx.sh in order to run blastx with 8 cpus on the same node.
Check the execution in detail while running.
Solution
The script file:
| blastx.sh | |
|---|---|
1 2 3 4 5 6 7 8 9 | |
-
It will define the number of cpu reserved by slurm
-
Two things
- The
\at the end of line 8 allows to split a command on many lines for readability $SLURM_CPUS_PER_TASKis the value defined by--cpus-per-task(line 2).
- The
The script blastx.sh is submitted as a job with the following command:
An inline version, without relying on a script file is also possible by using --wrap option:
Running script can be checked with one of the following command:
$(...) part is called a subshell.# It means 'run the command and get back the result'.# Here the command whoami returns the username. How much faster?¶
Question
When the job is ended, take a look at the ressources used. How much time and memory were consumed?
Here is an extract of the seff command for the blastx job on 1 cpu. What is the speedup provided by the blastx job on 8 cpus ? Compare the memory consumption.
Cluster: genobioinfo
User/Group: ...
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:05:57
CPU Efficiency: 95.20% of 00:06:15 core-walltime
Job Wall-clock time: 00:06:15
Memory Utilized: 18.14 MB
Memory Efficiency: 0.89% of 2.00 GB
Tip
It is a good practice to check the resources a job has consumed
Solution
8x cpus doesn't means 8x faster (~3.6x for this example). For blast, 4 cpu is a good tradoff (~2.7x for this example).
Digital sobriety¶
Genotoul-bioinfo provides some ressources about digital sobriety applied to bioinformatics.
Alternative tools¶
Question
Some alternatives can go faster than blast on proteins. Create a script diamondx.sh where blastx is replaced with diamond. Usually, diamond banks are available in /bank/diamonddb/, but the one we will use in this practice is in /bank/users_banks/banks_tp_cluster/
When the job has ended, look at the ressources used. What could you conclude regarding time and memory?
Solution
The script:
| diamondx.sh | |
|---|---|
1 2 3 4 5 6 7 8 9 10 | |
Run it with:
The speedup provided (x100 to x1000 faster) makes diamond a good tool when targeting digital sobriety.
Tunning the slurm parameters¶
Question
Reduce the amount of memory used by diamond job. What appends if you reduce the amount too much?
Tip
Setting resources correctly (#cpus, memory, max time) ensures a job don't waste ressources. The side effect is that your jobs may aslo start sooner. However it requires knowledge to set them before hand.
We provide a page to help you with some tools. In addition, ask community to help you to choose the right tools and set efficient parameters.
Solution
Breakpoint is under 170 MB
or edit the script diamondx.sh to keep track of parameters
| diamondx.sh | |
|---|---|
1 2 3 4 5 6 7 8 9 10 11 | |
Even with x10 memory consumption, diamond still a good tool for digital sobriety. Some diamond options allow to reduce memory consumption by trading off speed.