Chapter 3: Conda setup on macOS (Apple Silicon)

Learning Objectives

Conceptualize the role of a package manager (Conda) as a "Lab Manager" that handles software acquisition and compatibility.
Install and configure Miniconda and the Bioconda channel to access thousands of specialized bioinformatics tools.
Create and activate isolated environments to prevent software conflicts (the "sterile hood" method).
Verify software installation and export environment details to ensure your analysis is reproducible by others.

In the previous chapter, we built the physical structure of your lab (the folders). Now, we need to buy the equipment. In the wet lab, if you want to run a PCR, you need a thermocycler. If you want to check DNA quality, you need a Bioanalyzer. In computational biology, we need software tools. We need a tool to align reads, a tool to check quality, and a tool to call peaks.

However, installing scientific software is historically a nightmare.

The "Dependency" Problem: Imagine you buy a new centrifuge, but the manual says it only works if you plug it into a wall outlet installed specifically in 1998, and it requires a specific brand of fuse that hasn't been made in ten years.
The Conflict Problem: Imagine you fix the centrifuge, but doing so causes the freezer on the same circuit to shut down.

In the digital world, this happens constantly. One tool requires "Python 2.7," while another requires "Python 3.9." If you try to install both on your main computer, they clash.

The Solution: Conda.

Think of Conda as an automated Lab Manager. Instead of you manually hunting down the right version of every software library, you tell Conda: "I need to run Samtools." Conda automatically looks up the "recipe," finds the correct versions of every dependency, downloads them, and installs them in a way that doesn't break your other tools.

Part 1: Installing the Lab Manager (Miniconda)

We will install a lightweight version of Conda called Miniconda.

Note for Apple Users: This guide is specific to Macs with Apple Silicon (M1, M2, or M3 chips). If you bought your Mac after late 2020, this is likely you. You can check by clicking the Apple Logo (top left) → About This Mac. If it says "Chip: Apple M1/M2/etc," follow the steps below.

Step 1: Download the Installer

Open your Terminal. We are going to use a command called curl. Think of this as a digital pipette—it sucks data from the internet and drops it onto your computer.

Copy and paste this command:

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh

What happened? You downloaded a script file ending in .sh. If you type ls, you will see it sitting in your folder.

Step 2: Run the Installer

Now we need to actually run that script to install the program.

bash Miniconda3-latest-MacOSX-arm64.sh

The Process: The screen will scroll with text (the License Agreement).
Your Job:
1. Press Enter repeatedly to scroll through the license.
2. Type yes when asked if you agree.
3. Press Enter to accept the default install location (usually /Users/yourname/miniconda3).
4. Crucial Step: It will ask: "Do you wish the installer to initialize Miniconda3 by running conda init?" Type yes. This connects Conda to your terminal permanently.

Step 3: Waking up Conda

The installer finished, but your current Terminal window doesn't know that yet. It's like installing a new lightbulb but forgetting to flip the switch.

To activate it, close your Terminal window completely and open a new one. Alternatively, run this command to "refresh" your settings immediately:

source ~/.zshrc

Verification: To prove it worked, ask Conda to identify itself:

conda --version

If it replies with a version number (e.g., conda 23.9.0), you have successfully hired your Lab Manager.

Part 2: Setting up the Suppliers (Bioconda)

Conda is the manager, but it needs to know where to buy supplies. By default, it looks at a general store. We need to point it toward the specialist biological supply store.

We call these "Channels."

conda-forge: The general hardware store (core utilities).
bioconda: The specialist bio-store (genomics tools).

We must set the priority order strictly. We want Conda to always prefer the high-quality community packages (conda-forge) first to ensure stability.

Run these commands one by one:

conda config --add channels defaults

conda config --add channels bioconda

conda config --add channels conda-forge

conda config --set channel_priority strict

Note: The order you run these matters! The last one added becomes the top priority. We want conda-forge on top.

Verify the order:

conda config --show channels

You should see:

channels:

  - conda-forge

  - bioconda

  - defaults

Part 3: Creating a "Sterile Hood" (Environments)

This is the most critical concept in computational reproducibility.

In a wet lab, you have a Pre-PCR Room and a Post-PCR Room. You never carry equipment between them to avoid cross-contamination.

In Conda, we create Environments.

The (base) environment is your hallway. It contains the bare minimum. You should never install your experiment's heavy tools here.
We will create a new room called bio-tools specifically for this project.

Step 1: Create the Environment

We will create the room and install two small tools immediately: seqtk (a Swiss-army knife for manipulating sequence files) and fastqc (for quality control).

conda create -n bio-tools seqtk fastqc

create: Build a new environment.
-n bio-tools: Name it "bio-tools".
seqtk fastqc: Install these two tools inside it immediately.

Conda will calculate the "recipe" (dependencies). It might show a list of 20+ other tiny programs it needs to install to make fastqc work. This is good—it is solving the dependency headache for you. Type y and Enter.

Step 2: Enter the Room (Activate)

Creating the room isn't enough; you are still standing in the hallway (Base). You must step inside.

conda activate bio-tools

Look at your prompt!

It used to say (base) user@mac. Now it says (bio-tools) user@mac. You are now inside the sterile hood. Any tool you run now comes from this specific environment.

Part 4: Verification and Protocol

Did it actually work? Let's check our equipment.

fastqc --version

If it prints FastQC v0.11.9 (or similar), you are ready to go.

The "Frozen" Protocol

In a paper, you write: "We used FastQC v0.11.9." But what about the underlying libraries? To ensure someone else can replicate your exact computer setup 5 years from now, we export a "YAML" file. This is a snapshot of every piece of software in your environment.

# First, ensure your meta folder exists (from Chapter 2)

mkdir -p meta

# Export the snapshot

conda env export --name bio-tools > meta/bio-tools-env.yml

Summary

You have successfully:

Installed the package manager (Miniconda).
Configured the biological supply lines (Bioconda).
Built a sterile workspace (bio-tools environment).
Installed your first tools (fastqc and seqtk).

Next Step

Your lab is built (Chapter 2) and your equipment is purchased and installed (Chapter 3). You are now ready to handle biological samples. In Chapter 4, we will download a real dataset and use FastQC to verify that our sequencing reads are healthy enough for analysis.

← Back: Setting Up Your Digital Lab Bench Home Next: FASTQC →