by Integrated Genomics™
o the untrained eye, microalgae are typical photosynthetic, unicellular organisms. Most species are characterized as green, red, or brown in color with a moderate growth rate and size as small as one micron in diameter.
However, looks can be deceiving. Hidden inside a rather simplistic exterior lies the intracellular machinery capable of producing natural products, nutraceuticals and biofuels.
Approximately 30,000 species of microalgae have been identified to date, yet our understanding of algal genomics is limited. Although the Department of Energy’s Joint Genome Institute has over 60 algal genome projects currently underway, only six strains have been fully sequenced and annotated. Consequently, the scarcity of such information in the public domain has created a bottleneck that limits the extent to which researchers can examine the genes, proteins, and pathways that render microalgae so valuable for commercial applications.
One solution lies in developing a thorough understanding of the biological capabilities of these organisms through genomic DNA sequencing and bioinformatic analysis. Specifically, the value of raw DNA sequence data is almost entirely dependent upon the quality of comparative genomic tools used to decipher biological significance. Armed with the appropriate software, researchers can dissect a microorganism’s genetic composition and proceed with metabolic engineering to optimize growth conditions and more efficiently produce desirable products.
Algal DNA Sequencing
The task of deciphering genetic potential using next-generation sequencing methods is a key component of algal genomics. The primary differences between each sequencing method lie in the degree of accuracy and number of nucleotides that can be read at one time. Automated generation of short DNA sequencing fragments or ‘reads’ can be determined more quickly and cheaply than longer counterparts but tend to be less accurate.
There are three predominant next-generation DNA sequencing technologies in the current marketplace. In each case, the DNA to be sequenced is sheared into a large number of small fragments and the ends of each fragment are read. Life Technologies recently launched the 5500 and 5500xl SOLiD™ DNA sequencers capable of reading 75 base pairs at once. In contrast, the 454 GS Flex System (Roche) can generate longer reads of up to 400 base pairs with each run covering over one million base pairs. The third option is Illumina’s HiSeq™ 2000. Available since early 2010, the HiSeq 2000 yields DNA fragments between 35 and 100 base pairs. High throughput performance is increased by five-fold compared to competing systems and the HiSeq 2000 reads roughly 200 billion base pairs in one run.
Recent advances in DNA sequencing technology have driven down costs and increased efficiency such that some algal genomes of roughly 100 megabases in size can now be sequenced in a single day! Despite this progress, several challenges remain. The diversity of algal species and the ease at which samples can be obtained complicates decisions in terms of exactly which strains to sequence. In terms of size, algal strains can be large.
When sequencing is completed, the next step is to assemble the DNA fragments or contigs and close the genome in a process analogous to solving a jigsaw puzzle. During genome assembly, puzzle pieces take the form of millions of DNA sequence fragments. The goal is to arrange the small pieces of DNA, end-to-end, in the correct order using software such as the publicly available tools contained within the AMOS (A Module Open Source) consortium for whole genome assembly. Similar assembly programs include SOAP (Short Oligonucleotide Analysis Package), developed by the Beijing Genomics Institute, as well as ARACHNE, written by the Broad Institute.
Despite the plethora of tools available for genome assembly, two problems remain. Many genomes, including those of algal strains, contain short, identical regions of DNA. Such repeats can confuse assembly software leading to potential mistakes or mis-assemblies. In addition, sophisticated molecular biology techniques are required to sequence gaps because even the best assembly programs are unable to join every DNA fragment.
Genome Annotation and the ERGO Genome Analysis Suite
Once a genome of interest is assembled, it can be annotated in a two step process. First, individual genes and the corresponding proteins they encode are identified. Computer algorithms calculate protein similarity scores between putative proteins and those of known function. Next, each newly identified protein is assigned a biological function and characterized further based on its role in a particular cellular pathway(s).
One of the commercial packages for genome analysis, the ERGO Genome Analysis Suite (Integrated Genomics, IL) adopts a curation-based approach to drive the annotation process. Packaged as a secure, web-based bioinformatics platform, ERGO™ integrates a curated and diverse array of biochemical and genomic data from the public domain and primary literature. A comprehensive and user-friendly resource of this nature is beneficial when elucidating gene function in the context of metabolic and non-metabolic pathways.
Bioinformatics resources for both public and proprietary genomes are provided while the identification of open reading frames (ORFs) takes place by comparing DNA fragments against a non-redundant collection of over 8.1 million sequences from all domains of life. Over 2,000 annotated genomes currently reside in ERGO and between 20 and 30 new genomes are added on a monthly basis. Annotations are typically performed in an automated fashion although a customized manual annotation for a genome of interest is an alternative for scientists seeking increased accuracy.
Publicly available tools can also assist with the annotation process. Examples include AAT and Manatee, although results generated from such programs may contain ambiguities. With the exception of Augustus, a tool for gene prediction, there are few genome analysis programs tailored specifically to algae. More general resources pertaining to algae are described below.
Once a genome has been annotated, additional information can be obtained by harnessing the potential of comparative tools such as ERGO. Further investigation may focus on sequence alignments, the identification of single nucleotide polymorphisms (SNPs), or patterns of gene expression. For researchers seeking a more comprehensive genomic overview, Integrated Genomics provides metabolic reconstruction reports that describe all pathways and subsystems contained within a particular algal strain.
Applications of Algal Genomics
Algae can manufacture various byproducts in ways that are more affordable, sustainable, and environmentally friendly compared to traditional methods. By augmenting existing research with the wealth of pathway data derived from genome analysis, scientists can expand the algae industry in several directions.
The ability to use algae as biomass to drive the production of biofuels is perhaps the area of algal research that has garnered the greatest amount of interest. As a renewable resource with remarkable photosynthetic efficiency, algae are the ideal vehicle in which to produce bioethanol or biodiesel.
Optimizing algal production strains is challenging. Although lipids are one of the key building blocks of biofuels, increasing lipid biosynthesis may decrease growth. However, tools such as ERGO™ provide access to annotated pathways that can help guide metabolic engineering strategies. Lipid profiles and growth characteristics can be compared across several algal genomes to identify a candidate that will likely improve ethanol yield.
Carotenoids and omega-3 polyunsaturated fatty acids such as docosahexanoic acid (DHA) can be produced using algae. Benefits of algae include the ease of cultivation and the naturally high levels of compounds such as b-carotene and astaxanthin in certain strains.
Opportunities to leverage genome analysis occur when researchers wish to correlate SNPs or patterns of gene expression with increased product yields. In addition, manufacturers can save time and money by assessing modifications to biosynthetic pathways in silico instead of at the lab bench.
Besides being a source of biofuels and nutraceuticals, microalgae called phytoplankton serve as a food source for fish and bivalve mollusks such as clams and oysters. Although algae tend to be less expensive than other feedstocks, there are certain drawbacks that a genomics-based approach might help overcome.
Algae contain certain toxins that negatively impact the flavor of shellfish. If pathways responsible for producing such contaminants can be identified, more seafood could be sold as opposed to discarded. Similarly, the rapid growth rate of certain algae is normally an advantage except in the case of algal blooms. Excess algae that are not consumed threaten to destroy ecosystems. A closer look at certain algal genomes might suggest growth pathways or individual genes that trigger such extensive proliferation.
For more information or to learn more about the ERGO™ platform, please contact Integrated Genomics™, an IG Assets, Inc. company, at (312) 491–0846.