Natalie Elphick
Bioinformatician I
Yihang Xin (TA)
Software Engineer III
Run the following commands if you did not attend part 1:
mkdir unix_workshop
cd unix_workshop
curl -L -o unix_workshop_2024.tar.gz 'https://www.dropbox.com/scl/fi/o8msrl3a1k986jvjll4mv/unix_workshop_2024.tar.gz?rlkey=m7jfkvpz0iq12zdzphq7013l5&dl=0'
tar -xzf unix_workshop_2024.tar.gz
cd unix_workshop_2024
curl -o part_2/homo_sapiens.refseq.tsv.gz https://ftp.ensembl.org/pub/current_tsv/homo_sapiens/Homo_sapiens.GRCh38.111.refseq.tsv.gz
gzip
: compresses a file and replaces it with a
compressed version (.gz)tar
: create and manipulate archive filesArchive: a single file that contains one or more files and/or folders that have been compressed
gunzip part_2/homo_sapiens.refseq.tsv.gz
du -h part_2/homo_sapiens.refseq.tsv
33M part_2/homo_sapiens.refseq.tsv
gzip part_2/homo_sapiens.refseq.tsv
du -h part_2/homo_sapiens.refseq.tsv.gz
3.2M part_2/homo_sapiens.refseq.tsv.gz
tar -czf part_1.tar.gz part_1
ls -l
total 8
drwx---rw-@ 4 nelphick staff 128 Mar 12 09:36 part_1
-rw-r--r-- 1 nelphick staff 803 Mar 12 12:52 part_1.tar.gz
drwxr-xr-x@ 4 nelphick staff 128 Mar 12 12:52 part_2
tar -xzf part_1.tar.gz
gunzip -c
gunzip -c part_2/homo_sapiens.refseq.tsv.gz | head
gene_stable_id transcript_stable_id protein_stable_id xref db_name info_type source_identity xref_identity linkage_type
ENSG00000228037 ENST00000424215 - NR_121638 RefSeq_ncRNA DIRECT - - -
ENSG00000142611 ENST00000378391 ENSP00000367643 NP_955533 RefSeq_peptide DIRECT 100 100 -
ENSG00000142611 ENST00000378391 ENSP00000367643 NM_199454 RefSeq_mRNA DIRECT 99 62 -
ENSG00000142611 ENST00000270722 ENSP00000270722 NP_071397 RefSeq_peptide DIRECT 100 100 -
ENSG00000142611 ENST00000270722 ENSP00000270722 NM_022114 RefSeq_mRNA DIRECT 100 100 -
ENSG00000157911 ENST00000288774 ENSP00000288774 NP_001361354 RefSeq_peptide INFERRED_PAIR - - -
ENSG00000157911 ENST00000288774 ENSP00000288774 NP_001361355 RefSeq_peptide INFERRED_PAIR - - -
ENSG00000157911 ENST00000288774 ENSP00000288774 NP_722540 RefSeq_peptide DIRECT 100 100 -
ENSG00000157911 ENST00000288774 ENSP00000288774 NM_001374425 RefSeq_mRNA DIRECT 99 100 -
Example:
echo $HOME
/Users/nelphick
These can change depending on the specific OS or program, TMPDIR can also be TEMP, TEMPDIR and TMP.
$PATH
to find its associated executable fileecho $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/mysql/bin
$PATH
like this:export PATH="/path/to/new/software:$PATH"
$PATH
for the current terminal
session~/.bashrc
or
~/.zshrc
$PATH
incorrectly can break system
functionalitywhich ls
/bin/ls
.sh
nano part_2/example_script.sh
#!/bin/bash
#!
tells the OS where the
interpreter iswhich bash
/bin/bash
ls -l part_2/example_script.sh
-rw-r--r-- 1 nelphick staff 287 Mar 12 12:52 part_2/example_script.sh
chmod u+x part_2/example_script.sh
ls -l part_2/example_script.sh
-rwxr--r-- 1 nelphick staff 287 Mar 12 12:52 part_2/example_script.sh
#!/bin/bash
# This is a comment. Comments are ignored by the shell.
# $1 is the first argument passed to the script
echo "Counting the genes in $1"
# count the unique genes in the file
u_genes=$(gunzip -c $1 | cut -f 1 | sort -u | wc -l)
echo "There are $u_genes unique genes in $1"
./part_2/example_script.sh part_2/homo_sapiens.refseq.tsv.gz
Counting the genes in part_2/homo_sapiens.refseq.tsv.gz
There are 33338 unique genes in part_2/homo_sapiens.refseq.tsv.gz
for i in {1..3}
do
echo $i
done
1
2
3
count=0
while [ $count -lt 5 ] # loop while count is less than 5
do
echo $count
count=$((count+1))
done
0
1
2
3
4
x=5
if [ $x -gt 10 ] # check if x is greater than 10
then
echo "x is greater than 10"
else
echo "x is not greater than 10"
fi # end if statement
x is not greater than 10
Example:
sed 's/search_string/replace_string/g' input.txt > output.txt
ssh username@remote
username
would be your user on the remote server
and remote
is the hostname or IP address of the remote
server or computerscp [options] [source] [destination]
scp /path/to/local/file.txt username@remote:/path/to/remote/directory/
scp username@remote:/path/to/file.txt /path/to/local/directory/
Basic command:
awk options 'pattern {action}' input_file
awk -F '\t' '{print $1+$2}' part_1/list_numbers.tsv
4
15
17
$1,$2
: the first and second fieldsgunzip -c part_2/homo_sapiens.refseq.tsv.gz | \
awk -F '\t' '$5 == "RefSeq_mRNA" {sum += $7; count++} \
END {print sum / count}'
64.1533
Introduction
to Pathway Analysis
April 2, 2024 1:00-4:00pm PDT
Statistics
of Enrichment Analysis Methods
April 11-April 12, 2024 1:00-3:00pm PDT
Working on
Wynton
April 15, 2024 1:00-4:00pm PDT
Introduction
to Linear Mixed Effects Models
April 25-April 26, 2024 1:00-3:00pm PDT