Search

Close

Article Published in the Author Account of

Benjamin Yang

Executive Summary of a Book — The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World

In May of 1998, Craig Venter unveiled an ambitious and audacious plan to complete the sequencing of the human genome sometime in 2001, four years ahead of the anticipated release of the complete human genome sequence being assembled by the Human Genome Project (HGP). The plan was an absolute bombshell for the publicly funded HGP, an international consortium of high profile scientists formed to undertake this mammoth project. Halfway into their fifteen year timeline, the consortium was beginning to lag behind their original timelines having sequenced only 3% of the genome. Venter’s revelation that he was in the process of forming a biotechnology company that would sequence the human genome, and commercially harness the wealth of information it contained by selling access to a gigantic database shocked the academic researchers of the HGP, and brought about a major revision to their plans on how to accomplish their goal of a complete sequence map of the human genome by 2005. A little over two years later, two enterprises that had been aggressively chasing the same goals, powerful scientific forces which had many times clashed head-on with one another, would together announce a staggering scientific achievement: the completion of the human genome project, the decoding of the human book of life, years ahead of even the most optimistic schedules.

At the time of Venter’s announcement regarding the birth of Celera Genomics, the HGP had been in existence for some eight years, the past four of which had been under the direction of Francis Collins, Director of the National Human Genome Research Institute (NHGRI) since 1994. The consortium consisted of numerous National Institute of Health (NIH)-funded academic centers within the U.S. with a reputation for high quality gene sequencing, together with the U.K.’s Sanger Centre, a brand new, state of the art facility funded by the Wellcome Trust, which was to complete one sixth of the sequencing effort. The plan of attack was straightforward enough. The groundwork for an all out sequencing effort was to be laid by careful “low resolution” mapping of the genome. Once these efforts had been completed, it was hoped that the technology to sequence vast quantities of genetic material would have evolved sufficiently, and the genetic maps would provide a context within which countless thousands of pieces of sequence data could be interpreted and strung together into ever larger strings of contiguous genetic information. Furthermore, in an effort to break such a monumental task down into more manageable pieces, sequencing would be divided out among different participating centers in the same manner by which nature had already divided up the human genome, i.e., chromosome by chromosome. Scientists within the HGP consortium were united by two tenets, the desire to put accuracy above all else, and a commitment to make sequence data immediately and freely available via the GenBank database.

If accuracy above all else was the credo of the HGP consortium, Celera’s was “sooner rather than later.” Venter’s plans were diametrically opposed to those of the HGP, and relied heavily on the success of as yet unproven technological advances. Rather than dividing the sequencing of the genome into chromosome-sized pieces that would be sequenced by different centers essentially working in parallel, Venter’s plan called for a single, factory-sized sequencing facility, which, thanks to recent advances in automated sequencing, technology would spew forth sequence data from a massive array of sequencers virtually 24 hours a day, 7 days a week. As soon as the raw sequence data had been generated by these banks of sequencers, the Celera plan called for a computer mainframe, enormous in both size and complexity, running specially designed, but as yet undeveloped, software algorithms that would search for overlapping regions within individual sequences, and assemble them into continuously elongating stretches of contiguous sequences. The technology requirements of this approach were staggering, but by far the most controversial aspect of the Celera plan was the basic biological methodology underlying the generation of sequence data. Venter’s plan was to use the whole genome shotgun method of sequencing, a technique in which the entire genome is deliberately fragmented into small pieces of defined size, before each of those fragments is individually sequenced. Venter and Nobel Laureate Hamilton Smith would rely on the shotgun approach to produce the perfectly sized fragments of DNA that would feed his army of automated sequencers. As Smith and Mark Adams refined their techniques, they would supply Celera’s sequencers with progressively larger DNA fragments. Inspired by Venter’s success using the whole genome shotgun methodology to sequence bacterial genomes, two scientists from outside of the HGP, Jim Weber and Gene Myers, had formulated a plan to adapt this methodology to the vastly more complex problem of sequencing the human genome.

In February of 1996, Myers presented the details of this strategy at a meeting of the sequencing consortium in Bermuda. He met with nothing but resistance to his thinking, and the whole genome shotgun assembly approach was dismissed outright. Having days earlier decided upon what became known as the Bermuda Accord, i.e., an undertaking to make any sequence of 2,000 base pairs or more, publicly available within only 24 hours of its generation, the program leaders of the HGP stiffened their resolve to insist upon only the highest quality sequence data. This “Bermuda standard” would accept an error rate of no more than 0.01%, and together with the Bermuda Accord, committed the HGP to a course that would prove very difficult to follow, especially in the face of the sequencing juggernaut envisioned by Craig Venter two and a half years later.

Destined, at least in many people’s view, to become a threat to its very existence, Craig Venter was for some time a major contributor to the Human Genome Project. In 1989, as the Human Genome Project was conceived under the leadership of James Watson, Craig Venter was working at the National Institute of Neurological Disorders and Stroke. Two years earlier, while focusing upon the analysis of brain proteins, Venter had purchased two first generation automated sequencers manufactured by Applied Biosystems Inc. (ABI), and in 1989, showed the fruits of his labor to James Watson. Realizing that Venter had succeeded where others had failed, by harvesting useful information from the newly developed and temperamental automated sequencers, Watson backed Venter’s request for major funding to continue and expand his automated sequencing efforts. However, Venter’s aggressive approach did not fit with the “map first, sequence later” approach decided upon by the HGP, and his requests for grant funding were rejected. Shortly thereafter, Venter decided to focus upon finding new genes not by complete sequencing, but by the isolation and sequencing of short cDNA fragments, later known as expressed sequence tags (ESTs), just big enough to indicate the presence and unique nature of the whole genes from which they were derived. Venter saw ESTs as a way to inventory the human genome much more rapidly and cheaply than by complete sequence analysis. By the summer of 1990, Venter’s group had amassed a large number of ESTs, and Venter pushed his ideas hard with Watson. But Watson was unimpressed, and trouble lay ahead. By spring of 1991, Venter’s efforts had caught the attention of Reid Adler, a lawyer and Head of NIH’s Technology Transfer Office. Thanks to the Bayh-Dole Act, which obliged researchers funded by federal tax dollars to secure intellectual property rights on any commercially valuable discoveries before publication, Adler explained to Venter that the NIH must make some attempt to obtain patent protection for all and any novel gene sequences that had been discovered by the EST method. Hesitant at first, Venter grew to accept the idea, and advised the Human Genome Project of his plans. Watson hated the very notion, and is quoted as having described the idea of patenting gene sequences as “sheer lunacy.” Within months, Venter left the NIH, and by April 1992, embittered by the wrangling over patent issues within the NIH, Watson too had resigned.

The HGP’s lack of interest in Venter’s efforts to catalogue the human genome by way of EST’s was in no way reflected by the business community, and it wasn’t long before Venter received an offer, which he couldn’t refuse, from Wallace Steinberg, the head of a New Jersey-based venture capital investment group called HealthCare Investment Corporation. The Institute for Genome Research (TIGR) was founded shortly thereafter. Craig Venter would receive $70 million over a period of seven years with which to run a non-profit research institute, in return for which, a separate profit-making entity would receive proprietary commercial rights for any marketable discoveries made by Venter’s enterprise. It appeared to be the best of both worlds, free from both academic politics, and the need to make a profit, and Venter brought many of his NIH staff into the new TIGR facilities, including Hamilton Smith, computational biologist Granger Sutton, and molecular biologist Mark Adams. Meanwhile, Steinberg brought in William Haseltine to direct the “for profit” arm of the arrangement, christened Human Genome Sciences, Inc. (HGS). Venter’s group would search for and sequence novel ESTs using large numbers of newly acquired ABI sequencers, and scientists at HGS would have up to six months from the discovery of novel sequences to determine whether they might have commercial value. After that, TIGR could publish the data provided that HGS retained the first option of commercial rights on anything useful spawned by the sequence data. Furthermore, on top of the six months lag between discovery and publication, HGS also had the option of requesting an additional twelve months hold on publication of any sequence that was deemed worthy of closer scrutiny.

Despite TIGR’s scientific success with the EST discovery method, trouble was not far around the corner. Merck offered to fund a rival EST discovery project at Washington University, ironically one of the HGP participants, and Incyte Pharmaceuticals had been founded to further exploit the power of ESTs in finding novel genes. Driven by Haseltine’s grandiose ambitions, HGS countered by signing an exclusive deal with SmithKline Beecham, which netted them $85 million up front, in return for access to TIGR’s repository of genes, together with another $40 million for access to genes not yet discovered. By that time, Venter’s agenda of discovery and publication was clashing head-on with that of HGS, which was invoking the option of delaying publication of any sequence with so much as a hint of medical value, and relations between Venter and Haseltine foundered.

As the book’s author Shreeve points out, Craig Venter did not himself conceive the idea of a privately funded company sequencing the entire human genome. Rather, it was an idea that came into being as the technology with which to accomplish such an undertaking of that scale matured, and as the company behind that technology made the decision to do more than just sell sequencing instruments. In 1995, two years after its acquisition of Applied Biosystems, Inc. (ABI), Tony White was appointed CEO of the Perkin Elmer Corporation (PE). By that time, ABI was already supplying the vast majority of the world’s automated DNA sequencing instruments. White divided PE into two separate corporate entities, one devoted to life sciences which included the company’s PCR product lines and ABI, the other to analytical instruments and an assortment of other products. White saw PE’s future rooted firmly in the life sciences, feeding off the rapidly growing biotechnology industry. He also realized that PE might just as easily end up being no more than a supplier of the tools required by companies interested in mining wealth out of the genome, tools that included a new multi-capillary automated sequencing instrument being developed by a team of ABI researchers. New genes would inevitably lead to a better understanding of disease processes and to new “rational” targets for therapeutic intervention. A comprehensive catalogue of the human genome would be of incalculable value to the pharmaceutical industry, and White wanted PE to move up the “scientific food chain,” and into the business of genetic information. Why not use their own new multi-capillary automated sequencers to sequence the human genome in house, and sell the information gleaned from it to the pharmaceutical industry, to academic researchers, and to anyone else who might be interested?

Celera Genomics had become a physical reality by August of 1998, having leased a large building in Rockville, MD, now undergoing a major renovation. Two of its four floors had been completely gutted to accommodate the sequencing instruments on one, and the computer system on another. By mid-September, Celera had fifty employees on the staff, and twice that number being recruited, and the building renovations were well in hand. But it would be early January before the first Prism 3700 sequencers would be delivered, and months still before the massive computer system that would make sense of the sequence data would come on line. In the meantime, the HGP made its own “bombshell” announcement. Confident that they could finish sequencing the human genome in 2003, they planned to have a draft covering 90% of its genetic code completed sometime in 2000, essentially the same time by which Venter promised the completion of Celera’s sequencing efforts. With $202 million available to the US based HGP participants over the next two years, supplemented by an additional $85 million from the Department of Energy, and a staggering $77 million available to the U.K.’s Sanger Centre for the coming year alone, Collins and his colleagues were poised to put together automated sequencing factories of their own. Furthermore, the HGP announced that it now planned to make use of the ABI Prism 3700 sequencer, the very same instrument that had spawned Celera Genomics in the first place. The battle lines had been drawn, and war between the privately funded and publicly funded programs seemed inevitable.

The next six months saw Celera’s fortunes ebb and flow dramatically. By spring of 1999, about the same time that the Human Genome Project was to receive its new wave of funding and embark upon a purchasing orgy of some two hundred Prism 3700s, Celera had taken delivery of some 90 sequencing machines of its own. However, their technicians were experiencing problems with a significant proportion of them. In addition, the total number of Prism 3700 sequencers delivered thus far was woefully short of the 230 originally planned. The inevitable finger pointing began, and some of the key relationships between Celera and PE staff were becoming increasingly strained. Venter’s relationship with his superiors at PE suffered in particular, but just as things appeared to be unraveling for Venter himself, Celera’s fortunes were taking a turn for the better. The collaboration between Gerry Rubin, head of the Drosophila Genome Project based at the University of California Berkeley, and Celera, formulated to sequence the entire genome of the fruit fly, was steadily getting back on track after falling behind schedule thanks to a series of setbacks with the sequencers themselves. Additional 3700s sanctioned by PE were arriving almost daily, and the number of installation technicians tripled, the Drosophila sequence was being accumulated rapidly, and by late July, it appeared that it would be possible to run an assembly of the sequence data just in time for the Genome Sequencing and Analysis Conference (GSAC) in early fall. By the latter half of August, the number of ABI 3700 sequencers online had risen to 180, and a test run of the sequence assembly software developed by Myers and Sutton, successfully reassembled the complete Haemophilus influenzae genome from a deliberately shredded copy. Furthermore, a reassembly of the 20% of Drosophila’s sequence using genes already put together by Gerry Rubin proved successful. Realizing that Celera would in fact need less overlap in sequence data to reassemble the whole genome than originally thought, Myers was even more optimistic about the prospects of presenting the fruit fly data at the GSAC meeting, thereby proving the validity of the ever controversial whole genome shotgun method. During the final week of August 1999, sufficient data had been accumulated, and a full scale assembly run was made. Disaster struck once more. Rather than a complete genome with a few thousand gaps as anticipated, the assembly program had instead produced over 800,000 short sequences, and it was well over a week before the problem was traced to a single line of faulty code, which had led to the rejection of valid sequence matches and overlaps. With Celera having submitted a last minute provisional patent covering the assembly of the fruit fly genome the very day that the GSAC conference began, Gene Myers and colleagues presented the results of the sequencing of the fruit fly genome to a stunned audience at the meeting. They had proven that the whole genome shotgun method could, and in fact did, work.

As soon as the sequencing of the fruit fly genome had been completed, human DNA began surging through the 300 sequencers now installed at Celera. But the four-month delay in completing the Drosophila sequencing had cost Celera its lead over the Human Genome Project, which had already sequenced 739 million base pairs of the human genome. Furthermore, Venter had become convinced that the winner of the race to sequence the human genome would be perceived to be whoever had sequenced the most of it by the time the HGP consortium delivered its working draft the following spring. Now, thanks to an ironic twist of events, the seeds of collaboration between the two rivals were sown by Eric Lander, director of the Whitehead Institute’s Genome Center. One of the five main centers of the HGP consortium, the Whitehead Institute, now completely re-equipped with ABI 3700 sequencers, was sequencing at a rate far in excess of the rate at which it was receiving pre-mapped clones that were being distributed by Washington University.

Lander wondered whether some as yet undefined co-operative effort between Celera and the Human Genome Project might combine the strengths of the two programs, and ultimately deliver a more complete sequence than either was likely to do so in a “winner takes all” race. In early October 1999, Lander met with key scientists and board members from Celera, but although both sides were optimistic about reaching some sort of accord, the two parties were far from any agreement. At the very heart of the debate, not surprisingly, was the issue of access to the human genome sequence. Celera was a private enterprise formed to commercially exploit the information contained within the human genome, and develop patentable intellectual property from it, whereas the HGP was a taxpayers-funded effort that insisted upon immediate and unrestricted access to the genome data for anyone who might have an interest in it, including other private enterprises who might themselves commercially harness the data. In addition to the concerns about data access, there was also the question of authorship and publication. Despite considerable suspicion on both sides, talks between Lander and Celera continued, and at the very end of December 1999, Francis Collins and several other representatives of the HGP consortium met directly with Celera. More entrenched in their respective positions than ever, the two parties left the meeting further away from any collaborative agreement than they had been before it.

Despite repeated attempts to reach a collaborative agreement over the next few months, including one last ditch and highly original plan brokered by Norton Zinder, a key figure in the establishment of the human genome project and a member of Celera’s board, no such agreement was forthcoming. Convinced that the Human Genome Project was planning to make a preemptive, and in their eyes premature release of a working draft at the annual Cold Spring Harbor Symposium in May, Celera began to sequence chunks of DNA two and a half times the size of the chunks they had been using, and their sequencing effort ran at full throttle. Contingent upon the availability of sufficient data in time, they even toyed with the idea of using an invite to present their assembly of the fruit fly genome at that same meeting to make a surprise unveiling of a complete human genome two days before they suspected Collins might do. Even so, they also made a decision to augment their own raw sequence data with that already deposited into GenBank, significantly reducing the amount of sequencing that would be required to assemble a complete linear sequence. This would not only save Celera an enormous amount of money, but perhaps more importantly, would render the sequencing pipeline available for an earlier than scheduled launch of the mouse genome sequence analysis, in many respects more useful in terms of drug development than the human genome, and therefore a commodity of enormous value to the pharmaceutical industry, Celera’s primary clientele. In the meantime, the HGP consortium had also changed tack. Whereas Celera was trying to assemble the entire human genome en masse, as one would assemble an enormous jigsaw puzzle, the HGP scientists had divided the genome into thousands of smaller sub-puzzles. However, given the low resolution coverage planned for their working draft, these puzzles would not be fully resolved, containing stretches of DNA sequence out of order, and perhaps in incorrect orientations. Unlike the HGP investigators, Celera had been sequencing from both ends of their shotgun DNA fragments, and Collins saw a similar approach as a way of “signposting” sequences within the smaller sequence puzzles that the HGP was trying to solve.

Collins planned to implement this additional sequencing effort via collaboration with an alliance of eleven drug companies, together with the Wellcome Trust, searching for single nucleotide polymorphisms (i.e., tiny sequence variations or SNPs) within the human genome. Within only weeks of the complete breakdown in talks between Celera and the HGP, Collins was broaching the subject of the new sequencing initiative with none other than Celera’s primary commercial rival - Incyte Pharmaceuticals. News of the new paired-end sequencing initiative reached Celera at a time when relations between Celera and HGP had become so strained and so publicly venomous, that it was attracting the attention of governments on both sides of the Atlantic. Early in March of 2000, Bill Clinton, and his UK counterpart Tony Blair, issued a joint statement that they would lead an initiative to ensure that the fundamental data contained within the human genome remain within the public domain. Somehow misconstrued as a plan to prohibit intellectual property protection for any gene based discovery, the announcement sent biotechnology stocks into free fall. By day’s end, $40 billion had been pulled out of the biotechnology sector, Celera alone losing $2 billion. Things got worse. Two weeks later, just one week before a congressional hearing set to review the genome controversy in early April, Venter and Collins appeared side by side at an NIH conference. During the now customary heated exchanges, Collins revealed the HGP consortium’s plan to begin paired-end sequencing to a shocked Venter.

In the end, no collaboration between HGP and Celera was ever to become a reality. Instead, the two sides effectively declared a truce. Each agreed not to make a pre-emptive release of an incomplete sequence designed to upstage the other, as both parties expected one another to do, and agreed to the concept of coordinated announcements regarding the completion and release of their respective genome sequences. Furthermore, although merging of the two data sets was simply not an option for the very same reasons as before, simultaneous publication of the data within a single issue of the journal Science seemed to be an option, provided the issue of public access to the data could be resolved; Celera would under no circumstances deposit its sequence data into the GenBank database, the fully accessible database into which published sequence data would ordinarily be deposited. At last, it seemed that the sequence of the human genome would in the end, become the focus of attention, upstaging the increasingly vitriolic public wrangling between the two scientific rivals.

It was only a great deal of quiet, highly effective diplomacy brokered by Ari Patrinos, head of the Genome Project at the Department of Energy, and long time friend, colleague and confidant of both Craig Venter and Francis Collins, that would define the terms of the truce. After months of mutual suspicion concerning a preemptive release of the genome, Venter and Collins agreed, in effect, to declare the result of the race a tie, i.e., to make a joint statement concerning the simultaneous completion of the human genome by both parties. Set for Monday June 26, 2000, the announcement was to be made at a White House Press Briefing presided over by Bill Clinton, flanked on one side by Craig Venter, on the other by Francis Collins, with supporting members of the cast from both sides present in some abundance, and British Prime Minister Tony Blair present thanks to a giant video screen. However, even as the final arrangements for the press briefing that would announce completion of the human genome sequencing effort were being made, there remained one last stumbling block for the scientists of both the Human Genome Project and Celera; neither had successfully assembled anything resembling a complete sequence.

At Celera, the problem centered around inclusion of publicly available sequence data into the pool of information from which a program, specifically designed to assemble the genome from data produced only via whole genome shotgun sequencing, attempted to generate a sequence. Efforts to assemble the genome within the HGP consortium were also foundering. The number of base pairs of raw sequence data available for analysis was 100 million base pairs short of the 90% coverage required even for the working draft, and on the whole, the data lacked any overriding order. Both programs needed nothing short of a miracle in order to deliver a sequence that would come close to living up to its impending billing as “complete”.

Celera’s miracle took the form of “a sedative” for the more complex of the two computer programs being used to assemble the genome from thousands of sequence fragments. Nicknamed “The Grande”, based upon the amount of coffee Gene Myers had consumed while developing the software, the program had twice choked on the raw sequence data, and during its most recent run, had assembled only slightly more than half of it into a badly truncated genome. Having been up throughout the night nursing the program through that ill-fated assembly run, Myers had spent several more hours searching for bugs in the program that might have caused the problem, before being struck by a simple solution. Thinking that he had set the stringency of the assembly too high, Myers harnessed the program to relax a little, reducing the amount of sequence overlap between fragments required for them to be called as truly overlapping fragments. Myers’s sedation of his program worked, although full assembly would not be complete until only a day before the White House press release.

The Human Genome Project’s miracle materialized in the form of the Gigassembler program that had been developed by Jim Kent, a graduate student working within a group of HGP computational specialists based at the University of California at Santa Cruz. All efforts prior to his development of the Gigassembler program had met without success. Unbelievable as it may seem, Kent ran his program for the first time on June 22, 2000, using the 80% or so of sequence available to him. Given the incomplete nature of the data being analyzed, the program naturally made plenty of mistakes, and overall, the finished product contained many large holes. However, when completion of the genome sequencing was announced only days later, the HGP could legitimately claim to have assembled a working draft of it.

On the day of the great announcement, June 26, 2000, the key players assembled at the White House. Hamilton Smith, developer of the techniques that had given Celera perfectly sized chunks of DNA to sequence, Mark Adams, Gene Myers and close colleague Granger Sutton, co-developers of Celera’s assembly software. James Watson was there, as was Norton Zinder, together with Ari Patrinos, architect of the declaration of the tie that was about to be made. Also in attendance was Francis Collins, of course, together with Eric Lander from the Human Genome Project. Sadly, Michael Hunkapiller, who had driven ABI’s development of the Prism 3700 automated sequencers that generated so much of the sequence data, was unable to attend thanks to chicken pox. Clinton began his announcement by stating that “we are here to celebrate the completion of the first survey of the entire human genome.” After Clinton’s initial remarks, Tony Blair acknowledged the scope of the achievement via satellite, and then Francis Collins followed by Craig Venter mounted the podium and made their respective comments. It seemed for a brief moment in time, that everyone involved in the sequencing of the human genome was happy.

Venter took a fight with a giant rival and earned his place in the history of science, medicine, and mankind. He challenged his competitor from far behind and energized both himself and his competitor in a breathtaking race against each other and against time. He must have had a good time.

[Discovery Medicine, 4(21):84-89, 2004]

Access This PDF as a Subscriber |
Close
Close
E-mail It
Close