TP 1.1: Prepare data
Goal: Refresh your mind about linux commands.
Prepare¶
Connect to cluster¶
Start your machine and open a terminal (please use mobaXterm for window). You can now try to access the genotoul server by using ssh.
Don't forget to replace <username> with your own username.
Create project¶
Question
In the work directory, create new a directory named cluster and go inside it.
Solution
Get data¶
Question
Download the transcript file from https://web-genobioinfo.toulouse.inrae.fr/~formation/cluster/data/contigs.fasta.gz
Solution
Uncompress files¶
Question
Un-compress the file.
Solution
Note
Manipulating files (compress, zip, ...) can use a lot of resources, it's necessary to perform it on a cluster node when possible. We will learn how to connect to a node in next practices
Look at data¶
Question
Display the ten first lines of contigs.fasta file, then the twenty first lines.
Which is the format file ?
Which is the kind of data ?
Solution
The commands:
-
The ten first lines:
head contigs.fasta -
The twenty first lines
head -n 20 contigs.fasta -
The file format:
file contigs.fastacontigs.fasta: ASCII text
The file contigs.fasta is a fasta file. It is a text file that contains some blocks of data. Each block begins with a > followed by a description of the data (all in a single line). The lines immediately following the description line are the sequence data. It could be nucleic or proteic.
Here contigs.fasta is a nucleic file.