chemistry: Augmenting the alphabet

Taken from Nature Science update


The French writer Georges Perec once wrote an entire novel without using the letter 'e'. Communicating with a restricted alphabet must be a frustrating business. Yet the genetic messages of the cell get by with just four characters, encoded in the four varieties of small molecule, called 'bases', that constitute DNA.

Now a team of US chemists has come up with a way to broaden the genetic alphabet, creating the potential to write molecular messages in a language quite alien to normal cells. This work is reported in the Journal of the American Chemical Society1.

The four DNA bases are adenine, thymine, cytosine and guanine, denoted A, T, C and G. These recur in specific sequences along the backbone of DNA's coiled double helix, making up a code that tells the cell how to put together the protein molecules that it needs in order to function.

Protein molecules have complex structures built up from 20 components; but these can be translated from the four-character genetic code because the DNA bases are read in groups of three, with 64 permutations.

Nevertheless, researchers have many reasons to be interested in creating a new genetic code with different characters. For example, DNA with non-natural bases might be more resistant to chemical degradation than natural DNA. Or non-natural 'genes' inserted into the genome might influence the way that natural genes are translated into proteins.

Conventional bases combine in pairs that zip together the twin strands of the double helix. Their stickiness comes from a kind of interaction called a hydrogen bond, and is highly selective: A sticks only to T, and C to G. Previously, researchers have tried to introduce new hydrogen-bonding bases into DNA, but have been hampered by the bases' tendency to stick together less discriminately.

Floyd Romesberg and colleagues at the Scripps Research Institute in La Jolla, California, have taken a different approach. They have incorporated into DNA a base pair that unites instead via 'hydrophobic' interactions. Unlike conventional DNA bases, the new variety are relatively insoluble in water, and so they tend to attract one another and clump together rather as grease globules coalesce in water. This has the advantage that the artificial bases have virtually no inclination to pair up with the natural, hydrogen-bonding bases.

Romesberg and colleagues have shown previously that one particular pair of synthetic bases, denoted '7AI' and 'ICS', pair up efficiently when attached to the backbone of the double helix. But to truly function as DNA bases, such artificial constructs must be able to interact with the protein enzymes that process DNA in the cell. Most importantly, they must be accepted as respectable components of DNA by the enzyme that puts the double helix together, called 'DNA polymerase'. If the enzyme decides that the new bases are acceptable, the mutated version of DNA with a broader character set could be constructed and copied at will.

Both 7AI and ICS are tolerated as building blocks by a particular kind of bacterial DNA polymerase. But they are far from ideal. Each has a strong propensity to pair up with another copy of itself, forming 7AI:7AI and ICS:ICS unions as well as the desired 7AI:ICS pair. Romesberg and colleagues have now found a way to avoid these 'mismatches' by using modified versions of the two artificial bases.

They synthesized a whole range of variants and trawled through them to find versions that would pair up selectively. The search produced a non-natural pair that DNA polymerase would incorporate into a growing DNA double helix efficiently and selectively. This makes the new pairing, denoted PP:MICS, the first addition to the genetic code that allows new genetic 'words' to be written without too many mistakes.

Wu, Y. et al. Efforts towards expansion of the genetic alphabet: optimization of interbase hydrophobic interactions. Journal of the American Chemical Society 122, 7621-7632 (2000).