Natalie Elphick
Bioinformatician II
Yihang Xin (TA)
Software Engineer III
Run the following commands if you did not attend part 1:
mkdir unix_workshop
cd unix_workshop
curl -L -o unix_workshop.tar.gz 'https://www.dropbox.com/scl/fi/tdzpoivf7mienlenunqhf/unix_workshop.tar.gz?rlkey=6bfxnqgc5n4lgc9mc80ld75z4&dl=0'
tar -xzf unix_workshop.tar.gz
cd unix_workshop
curl -o part_2/homo_sapiens.refseq.tsv.gz https://ftp.ensembl.org/pub/current_tsv/homo_sapiens/Homo_sapiens.GRCh38.113.refseq.tsv.gz
gzip
: compresses a file and replaces it with a
compressed version (.gz)tar
: create and manipulate archive filesArchive: a single file that contains one or more files and/or folders that have been compressed
gunzip part_2/homo_sapiens.refseq.tsv.gz
du -h part_2/homo_sapiens.refseq.tsv
33M part_2/homo_sapiens.refseq.tsv
gzip part_2/homo_sapiens.refseq.tsv
du -h part_2/homo_sapiens.refseq.tsv.gz
3.3M part_2/homo_sapiens.refseq.tsv.gz
tar -czf part_1.tar.gz part_1
ls -l
total 8
drwx---rw-@ 4 nelphick staff 128 Feb 10 11:16 part_1
-rw-r--r-- 1 nelphick staff 801 Feb 10 11:16 part_1.tar.gz
drwxr-xr-x@ 4 nelphick staff 128 Feb 10 11:16 part_2
tar -xzf part_1.tar.gz
gunzip -c
gunzip -c part_2/homo_sapiens.refseq.tsv.gz | head
gene_stable_id transcript_stable_id protein_stable_id xref db_name info_type source_identity xref_identity linkage_type
ENSG00000142611 ENST00000378391 ENSP00000367643 NP_955533 RefSeq_peptide DIRECT 100 100 -
ENSG00000142611 ENST00000378391 ENSP00000367643 NM_199454 RefSeq_mRNA DIRECT 99 62 -
ENSG00000142611 ENST00000270722 ENSP00000270722 NP_071397 RefSeq_peptide DIRECT 100 100 -
ENSG00000142611 ENST00000270722 ENSP00000270722 NM_022114 RefSeq_mRNA DIRECT 100 100 -
ENSG00000157911 ENST00000288774 ENSP00000288774 NP_001361354 RefSeq_peptide INFERRED_PAIR - - -
ENSG00000157911 ENST00000288774 ENSP00000288774 NP_001361355 RefSeq_peptide INFERRED_PAIR - - -
ENSG00000157911 ENST00000288774 ENSP00000288774 NP_722540 RefSeq_peptide DIRECT 100 100 -
ENSG00000157911 ENST00000288774 ENSP00000288774 NM_001374425 RefSeq_mRNA DIRECT 99 100 -
ENSG00000157911 ENST00000288774 ENSP00000288774 NM_001374426 RefSeq_mRNA DIRECT 94 92 -
Example:
echo $HOME
/Users/nelphick
These can change depending on the specific OS or program, TMPDIR can also be TEMP, TEMPDIR and TMP.
$PATH
to find its associated executable fileecho $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/mysql/bin
$PATH
like this:export PATH="/path/to/new/software:$PATH"
$PATH
for the current terminal
session~/.bashrc
or
~/.zshrc
$PATH
incorrectly can break system
functionalitywhich ls
/bin/ls
.sh
nano part_2/example_script.sh
#!/bin/bash
#!
tells the OS where the
interpreter iswhich bash
/bin/bash
ls -l part_2/example_script.sh
-rw-r--r-- 1 nelphick staff 287 Feb 10 11:16 part_2/example_script.sh
chmod u+x part_2/example_script.sh
ls -l part_2/example_script.sh
-rwxr--r-- 1 nelphick staff 287 Feb 10 11:16 part_2/example_script.sh
#!/bin/bash
# This is a comment. Comments are ignored by the shell.
# $1 is the first argument passed to the script
echo "Counting the genes in $1"
# count the unique genes in the file
u_genes=$(gunzip -c $1 | cut -f 1 | sort -u | wc -l)
echo "There are $u_genes unique genes in $1"
./part_2/example_script.sh part_2/homo_sapiens.refseq.tsv.gz
Counting the genes in part_2/homo_sapiens.refseq.tsv.gz
There are 36353 unique genes in part_2/homo_sapiens.refseq.tsv.gz
for i in {1..3}
do
echo $i
done
1
2
3
count=0
while [ $count -lt 5 ] # loop while count is less than 5
do
echo $count
count=$((count+1))
done
0
1
2
3
4
x=5
if [ $x -gt 10 ] # check if x is greater than 10
then
echo "x is greater than 10"
else
echo "x is not greater than 10"
fi # end if statement
x is not greater than 10
Example:
sed 's/search_string/replace_string/g' input.txt > output.txt
ssh username@remote
username
would be your user on the remote server
and remote
is the hostname or IP address of the remote
server or computerscp [options] [source] [destination]
scp /path/to/local/file.txt username@remote:/path/to/remote/directory/
scp username@remote:/path/to/file.txt /path/to/local/directory/
Basic command:
awk options 'pattern {action}' input_file
awk -F '\t' '{print $1+$2}' part_1/list_numbers.tsv
4
15
17
$1,$2
: the first and second fieldsgunzip -c part_2/homo_sapiens.refseq.tsv.gz | \
awk -F '\t' '$5 == "RefSeq_mRNA" {sum += $7; count++} \
END {print sum / count}'
64.2653
Introduction
to RNA-Seq Analysis
February 13-February 14, 2025 1:00-4:00pm PST
Intermediate
RNA-Seq Analysis Using R
February 20, 2025 9:00am-12:00pm PST
Introduction
to Statistics, Experimental Design and Hypothesis Testing
February 24-February 25, 2025 1:00-3:00pm PST