Introduction to UNIX and RNA-seq Repositories

Day 1 of the workshop. June 1st, 2026.

The slides are in Slides/ (PDF and PPTX). Answer keys for each exercise are in Exercises/answers/. Data files used in the exercises are in Exercises/data/.

Before the session

You need a terminal.

Mac: just open Terminal.
Linux: same, open your terminal.
Windows: install WSL Ubuntu and open it. From PowerShell:

wsl --install -d Ubuntu
wsl -l -v
wsl -d Ubuntu

Then run the setup script from the repo:

bash setup.sh

That installs wget, sra-toolkit, and the small text utilities we use (awk, grep, sed, tar, gzip, curl). It creates ~/workshop/data and prints a check at the end. Safe to re-run.

If it fails on your machine, do not panic. Come a few minutes early and we will sort it out.

Outline

Setup and troubleshooting.
UNIX basics: navigation, inspection, pipes and redirects, text search.
GEO basics: accession types (GSE, GSM, SRR), common files, downloading with wget and fastq-dump.

Exercises

Exercises/answers/Exercise1_unix.sh — directory tree, file creation, seq, mv, concatenation.
Exercises/answers/Exercise_grep.sh — grep on happiness.csv: plain match, -w, -v, -n.
Exercises/answers/Exercise_wordle.sh — mini-capstone: solve today's Wordle with cat | tr | egrep over /usr/share/dict/words.
Exercises/answers/Exercise_awk_fastq.sh — awk on the paired-end FASTQ subset: read count, average length, GC%, N filtering, top 5' hexamers.
Exercises/answers/Exercise2_geo_download.sh — wget for a GEO supplementary file and an ENA FASTQ, then fastq-dump -X 10000 for a capped SRA pull.
Exercises/answers/Exercise3_1_counts_csv.sh — inspecting the gzipped count matrix from GSE251845.
Exercises/answers/Exercise3_2_fastq.sh — counting reads, hexamer bias, motif search, and safely subsetting a FASTQ.

The demo accession is SRR390728 (small, public, finishes fast). The count matrix is from GSE251845.

References

Count matrix: GSE251845.
Paired-end FASTQ subsets: statOmics SGA2019 airway data.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Exercises		Exercises
Slides		Slides
.gitignore		.gitignore
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to UNIX and RNA-seq Repositories

Before the session

Outline

Exercises

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction to UNIX and RNA-seq Repositories

Before the session

Outline

Exercises

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages