Are you new to the world of molecular biology and genetics, or just need a refresher? Then read on!
Nearly all cells in the human body (apart from red blood cells) have the same basic components, although they can look very different depending on the role they play.
- The factory floor of the cell, which contains the machinery needed for it to function. Proteins, lipids and other molecules are made, broken down and transported in and out of the cell.
The cell's power plant, which provides energy for the cell processes.
Contains the DNA of the cell, the instruction book for life.
Most cells in our body divide and multiply using a process called mitosis.
DeoxyriboNucleic Acid makes up the instruction manuals for almost all life on Earth. How we develop from a single-cell embryo, grow, function and reproduce is all encoded in this simple molecule.
DNA consists of two backbones that run in opposite directions (forming the double-helix shape) and four chemical bases which connect together in a specific manner:
G pairs with C
A pairs with T
It's the number and order of these letters that makes all the difference!
DNA was first described by James Watson and Francis Crick in 1953, and earned them the Nobel Prize in Physiology or Medicine 1962 alongside Maurice Wilkins. Rosalind Franklin was also a key part of the discovery, but died before the prize was announced.
DNA is now being used in research to store computer data! It's very early days, but offers the potential to store data for thousands of years in a fraction of the space and energy required for conventional methods.
In the nucleus, DNA is tightly wound around histones and other structural proteins to form chromatin. This is then packaged into chromosomes.
Chromosomes are usually fairly loose and dispersed within the nucleus during normal cell activity, but condense down during the metaphase stage of cell division. The classic 'X' shape is actually two duplicated sister chromatids, joined together by their centromere prior to the cell dividing.
Chromosomes are identified by their unique banding pattern under a chemical stain, and are capped by regions of repeated DNA sequence called telomeres.
Humans have 22 pairs of chromosomes (autosomes) and one pair that determine our gender (sex chromosomes).
Males are XY
Females are XX
Large scale alterations in chromosome number or makeup (deletions, duplications or translocations) can cause diseases such as Down Syndrome and cancer.
Traits and inheritance
We possess two copies of each chromosome in our cells, making us diploid organisms. We inherit one set from each of our parents.
The exception for this are the haploid germ cells, which form eggs and sperm (gametes) using a special reductive cell division called meiosis to result in only one copy. Otherwise, each generation would double the amount of genetic material in the cell.
During meiosis, the genetic material of paired chromosomes is swapped between them using a process called recombination. It is this process that is responsible for the huge amount of variation we see in humans and other animals.
The genetics of inheritance was first described by Gregor Mendel in 1886, an Austrian monk who studied traits (phenotypes) in pea plant breeding. By crossing plants with different colours or seed coats, he found that the different traits came out in the following generations at predictable ratios and forms.
The next big discovery that made the link between inherited phenotypes and chromosomes was by Thomas Morgan in 1910, who found that a rare white-eyed mutation in fruit flies was only seen in males when he bred them. He deduced that the mechanism responsible must be on the X chromosome.
Many diseases and syndromes are inherited from our parents. These are called genetic diseases.
So what is responsible for these different phenotypes and diseases? It's all in our genes!
The Y chromosome does not recombine during meiosis, which makes it useful for studying human ancient migration patterns. There are only a few Y haplogroups for the whole of the human race!
In 2013, the oldest known ancient haplogroup A00 was discovered in an African-American man who had submitted his DNA for analysis.
Classically, genes are small sections of DNA within the genome that code for proteins. Genes control our embryonic development, how our bodies are put together and the characteristics and phenotypes which make us who we are.
Protein-coding genes consist of the following main components:
Promoters act as an on/off switch.
Enhancers and silencers act as binding sites for transcription factors and small RNAs and control gene regulation; whether it is switched on and if so, how much protein is produced
Exons code for the actual proteins
Introns are spacers between exons and are edited out before protein production.
A different class of genes, known as non-coding genes, are used to make small RNA molecules which are used in gene regulation.
Genes do not work in isolation and are highly regulated, both from within the cell (usually by proteins made by other genes) and from signals outside of it. They work in large interconnected pathways, and if a gene is mutated and stops functioning properly, it can cause disease.
Not all genes are switched on (expressed) in every cell or at all stages of development. It's this differential expression which determines the eventual fate of the cell.
Genes can be engineered to be expressed in the wrong place (ectopic expression), as shown in the image of the fly with eyes on its legs!
anterior olfactory nucleus and the cerebral cortex
Image taken by Elise Delagnes and Hannah Rollins (UC Berkeley).
Homeobox genes act as master control switches for organs and body parts during early embryonic development.
Mutations in these genes can have dire consequences. For example, a mutation in the Antennapedia (Antp) Hox gene results in the fly developing legs where the antennae usually are!
Alleles, phenotypes and disease
Different forms of the same genes are referred to as alleles, and are a result of mutations through the ages. It's these different alleles which result in different traits, or phenotypes.
The most common form of a gene is called the 'wild type allele', although this term is usually only used in respect to species studied in the lab.
Your genotype is based on how many copies of a particular allele you have:
Homozygous = two copies of the same allele
Heterozygous = two different alleles.
One allele can mask the effect of the other, a process called dominance. Dominant alleles only need one copy to show their phenotype, whereas recessive alleles need two. Some examples of human disease and dominance are shown below:
Autosomal dominant: Huntington disease
Autosomal recessive: Cystic fibrosis, Sickle-cell anaemia
Sex-linked: Hemophilia A.
The entirety of the DNA in our chromosomes makes up our genome. Mitochondria also have their own mini-genome (mtDNA) which, in humans, is 16,569bp. The composition of the genome might not quite be what you expect!
Less than 2% of the human genome actually codes for proteins!
50% is made up of unique sequences including known regulatory regions, genes that make small RNA molecules and the introns of coding genes.
Some non-coding sequences can be highly conserved between species, suggesting an important role in gene regulation.
The other half of the genome is made up of tandem repeats, interspersed repeats consisting of retrotransposons and DNA transposons, and non-functional pseudogenes which have accumulated over evolution.
On June 26th, 2000, a draft sequence of the first human genome was announced. It took ten years to complete and cost over $2.5bn.
Now a genome can be sequenced for as little as $1500 in a matter of days. Genomics England plan to study 100,000 genomes over the next few years!
How big is a genome?
1 kilobase = 1000 bases. 1 megabase = 1 million bases. 1 gigabase = 1 billion bases!
The size of genomes and numbers of genes varies enormously between species. Although genome size is an indicator of organism complexity, it's not always accurate. The lungfish Protopterus aethiopicus, for example, has a genome size of 139GB, forty times larger than humans!
Genes don't just always code for one protein, however. One gene can code for many different forms of a protein depending on what tissue or developmental stage they are expressed in.
But how is this possible? It's all to do with transcripts and alternative splicing.
Analysing genomes takes a lot of computing power. The data centre at the Wellcome Trust Sanger Institute in the UK contains 17,000 processor cores and over 30 petabytes of storage. That's 3 million gigabytes!
Transcription, translation and the genetic code
DNA lives in the cell nucleus, and the machinery to make proteins is in the cytoplasm.
To relay the information from the gene to the rest of the cell, the cell uses a different nucleic acid called RNA (Ribo Nucleic Acid).
RNA is single stranded and uses the base Uracil (U) instead of Thymine (T).
An enzyme (a protein involved in chemical reactions) called RNA polymerase binds to the DNA at the promoter, unzips it and synthesizes a strand of RNA based on the DNA template as it travels down, a process called transcription. It eventually reaches a termination signal and the RNA is released.
Before the RNA can be used to make a protein, it needs to be processed into messenger RNA (mRNA). The introns are spliced out and the molecule is stabilised before it is transported into the cytoplasm.
By retaining different combinations of exons during post-processing, many different mRNA transcripts can be made from one gene. This allows for a huge variation of different proteins to be made without increasing the gene count.
To actually make a protein, however, another process is needed. This is called translation.
The human alpha-tropomyosin gene has 15 exons and can make nine distinct transcripts and proteins depending on the cell type they are expressed in.
If you think that's a lot, the Dscam-hv gene in the fruit fly has over 38,000 potential transcripts!
Mutations are the driving force of variation and evolution. Although they can be benign or even beneficial, many mutations are the cause of disease. Mutations can be large, duplicating or deleting one or more genes, or small, affecting a single base. Out of 3.4 billion letters of DNA, just one incorrect letter in a vital place can mean the difference between life and death. Some examples are shown below.
Base substitutions can cause the incorrect amino acid to be added to the protein (mis-sense), or even a stop signal to be introduced (non-sense).
This can cause structural or folding issues, or alter a crucial part of the enzyme responsible for facilitating chemical reactions (the catalytic domain).
They can also cause aberrant splice or donor sites to be created, creating problems in processing the mRNA properly.
For example, Sickle-cell anaemia is due to a single base change in the β-Globin gene, changing a glutamine amino acid to a valine.
Frame-shifts occur if an insertion or deletion of sequence occurs that is not a multiple of three.
The wrong amino acids are added to the growing protein and a premature stop codon may be introduced. The resulting protein is not functional
eg Tay Sachs disease is caused by a 4bp TATC insertion.
Triplet expansion is where a repeat of the same amino acid gets much longer than normal, the result of incorrect DNA replication.
One example of this is Huntington's disease, where a CAG-repeat in the Huntingtin HTT gene expands beyond 36 repeats.
Triplet expansion disease can get worse with subsequent generations, as each new mutation builds on the current one.
Some diseases are not just caused by one type of mutation. Cystic Fibrosis, which is caused by mutations in the CFTR gene, has hundreds of different causative mutations, split into five distinct classes.
Diseases caused by simple mutations in single genes are prime candidates for repair by gene editing.
The genetic code, translation and making proteins.
Translation of mRNA to proteins is performed in the cytoplasm by a complex cellular machine called the ribosome.
But how do letters of DNA and RNA make a protein? How does the cell know what to make? It's all in the genetic code.
The genetic code is based on triplets of letters, called codons. Each codon codes for a particular amino acid, the building blocks of proteins.
22 amino acids are coded for in humans, 9 of which are ‘essential’ – they cannot be synthesised and must be in our diet.
The triplet AUG (Methionine) is used as a start signal for the protein production. A stop codon (eg TAG) signals to the ribosome to end the production of the protein.
To make a protein, the ribosome attaches to the mRNA and reads along it until it gets to the start signal.
At this point, the ribosome then begins to attach molecules called tRNAs.
tRNAs consist of a special RNA molecule which is bound to a specific amino acid. The tip of the tRNA has a specific 'anti-codon' which binds only to the next correct codon in the messenger RNA.
If the code is correct, then the amino acid is joined on to the growing chain.
Eventually the ribosome will reach a stop signal, at which point the new protein is released.
A summary of transcription and translation