Careers In Chemical Information: It's Not Just Literature Searching Anymore
Lisa M. Balbes
Everyone's heard that this is the Information Age, and chemistry has not been left out. In fact, a whole new set of careers has evolved around chemical information. Chemical information refers to any type of data related to chemistry, including:
- Spectra and spectral data
- Crystallographic data
- Chemical structure databases, both 2D and 3D
- Chemical compound and process patents
- Chemical nomenclature
- Chemical and physical properties (including activity) of compounds
- Chemical syntheses or reactions
- Chemical safety or toxicology information
- Commercially available or proprietary corporate compounds
- Published scientific articles, and abstracts thereof
.... And much more.
Chemical information is the key to keeping track of the chemistry done in any organization, and all scientists need access to the information. Chemical information careers can involve acquiring, managing, or using information. In each area, people are needed to develop systems, as well as use them. In large or highly specialized companies, any one type of task may be a full-time job. In smaller companies, a scientist is likely to do more than one type of task routinely.
Information can be generated and collected by traditional analysis or experimentation, or by computer simulation (an entire field in itself called computational chemistry, see below). Effective use of information can be accomplished through visualization techniques, or by the development of sophisticated decision support systems.
Information management includes creating, searching and maintaining databases of all kinds-databases of 2D or 3D chemical structures, primary journal literature databases, abstract databases. The rise of new techniques such as combinatorial synthesis and high throughput screening allow a single company to synthesize and analyze hundreds of thousands of molecules in a single year. While originally developed by the pharmaceutical industry, these techniques are now being adopted by the chemical industry in general.1 The vast amounts of information generated create demand for libraries of chemicals and raw materials, and also for systems to manage the generated data. These problems with storing and retrieving data offer exciting new opportunities for chemical informatics professionals.
The closely related field of bioinformatics (the creation, management or use of biological information) has also exploded recently, fueled by the massive amounts of information generated by the Human Genome Project. While more widely known, bioinformatics techniques are probably not as highly developed as cheminformatics techniques.
From Library Science to Computational Chemistry
Chemical information and computational chemistry are two fields that have grown up separately, but have begun to overlap-covering the range from library science to computer science, with chemistry firmly in the middle. Opportunities in chemical information range from scientific librarian, to technical information specialist, to technical publisher, to software developer, to software architect.
Chemical information is about raw data, for example, finding answers to specific questions, searching primary literature, registering compounds in a corporate database, keeping track of computed and/or measured data associated with compounds and so on. Chemical information specialists communicate with coworkers, formulate queries, perform searches, analyze results, and deliver written and oral reports. They have to know the right sources to search (in-house or commercial databases, reference books, Internet, literature, etc.). They may search for physical properties of a particular compound, synthesis of it or similar compounds, biological activity in a variety of systems, patent restrictions on synthetic methods or biological uses, or use the chemical entity as an index to the literature. The skills required for these activities are a degree in chemistry and a Master of Library Science (MLS), knowledge of information sources, some computer skills, and foreign language skills, if possible.
The original objective of chemical librarians was to perform complex and expensive literature searching work for lab scientists and management. With the proliferation of personal computers, it became possible for individuals to search databases themselves, and the librarians became intermediaries. Now, the tremendous amount of information being generated again requires information experts who know which databases are likely to have the desired answers, and to coordinate searching of multiple resources.
Chemistry librarians are responsible for providing general and in-depth reference services for students, faculty, researchers and other users, using reference tools for subjects of chemistry and related areas. They may also train others on how to use paper and electronic reference resources. An MLS is often required, along with experience with chemical information systems and chemical database systems, an understanding of how chemical research is done (and learning, if in an academic environment), the ability to handle bibliographic citations in a wide variety of languages, a working knowledge of the technical underpinnings of the digital library environments of the future, and familiarity with multimedia resources and with microcomputer software applications.
Andrew Berks, Senior Information Scientist at Merck & Company, says, "There are more jobs than people in chemical information right now. This field grew out of librarianship, but in the online era, it has taken on a new life and requires people who can use computers and databases, have analytical skills, and can write. The skill level needed has become high, and unfortunately salaries have not kept pace. However, with the increasing complexity of modern science, salaries and status will catch up, to the point that more people will be drawn in to the field." He describes chemical information as the "needle in the haystack problem."
Chemical information management also includes registering compounds in a corporate database, and keeping track of computed and/or measured data associated with each compound. It can require training in chemistry to work on the data entry end, or in computer science to work on the database development end.
Cheminformatics (replacing the older term "chemoinformatics") is the next step towards computer science, and is more about "processing, developing and making decisions based upon the information that has been derived from the raw data, usually through computer processing and analysis", says Dr. Peter Gund, Sr. Director of Cheminformatics Marketing at Accelrys Inc. Cheminformatics encompasses data mining and analysis in combinatorial chemistry and high throughput screening, descriptor-based model generation, and so on. Chemically intelligent OCR software that can extract the content of printed texts, including chemical structures and reactions, is also starting to appear. This type of work is extremely computer intensive, and generally requires a Ph.D. in chemistry, as well as postdoctoral work in this area.
Slightly further along the continuum is computational chemistry-using mathematical models to calculate molecular properties, or to simulate molecular behavior. This includes molecular modeling, quantitative structure activity relationship (QSAR) predictions, protein folding and 3D structure predictions, and so on. The chemical information in this case is actually computed data.
Many methods have been devised to measure the similarity, or dissimilarity, of chemical compounds, usually to search databases for other compounds that have similar properties. The basis of any structure similarity method is the descriptors used to characterize the molecules. These can include various types of molecular fingerprints, 3D pharmacophores, physico-chemical properties, electrotopological states or connectivity indices. Some represent 2D molecules; some handle 3D ones. Some are "meaningful" to medicinal chemists; some are not. Generating molecular descriptors is another computationally intense field, and requires an advanced degree.
On the computer science end of the spectrum are those who develop programs that chemists use. These are generally computer scientists by training, but a background in chemistry can be very valuable. "The programmer who can understand at least in part what the researcher does will be more likely to find a good fit than someone with no science background. Without the chemical knowledge, programmers have to rely heavily on the scientists to keep the design of systems on track. Programmers with a science background are mostly few and far between these days. It is a big plus when we see someone with that background," says Donna Triebe, Senior Product Developer, Tripos, Inc.
Specialized Sub-Fields
Indexing and Abstracting
By far the best known indexers and abstractors of chemical literature are at the American Chemical Society's Chemical Abstracts Service in Columbus, OH. They index and prepare abstracts of periodical articles, putting the journal information into a form that can be easily accessed by researchers and scientists worldwide. With a 700% increase in scientific journals since 1900, researchers there also spend a significant amount of time designing systems to make the information easily searchable.
ADME/Tox
In the ever-increasing pressure to bring better drugs to market sooner, companies are using more data to make decisions and are using it much earlier in the pipeline. ADME/Tox stands for absorption, distribution, metabolism, excretion and toxicity. These are the properties, in general, that differentiate a chemical from a "drug." Informatics plays a role in everything from using computers to predict ADME/Tox properties of compounds that have not yet been synthesized, to storing the results of such studies from biological screens and feeding that information back to make better predictions, thus allowing unpromising candidates to be eliminated before they are ever made.
In all cases, searchers need to understand computer systems and data handling. They need to be able to work with others to identify what information is needed, and then work independently to find that data. They need to have the skills and the confidence to say that information doesn't exist, not just that it was missed in the search. Not only that, they need to be able to read between the lines-to learn from what is not there, as well as from what is. This is especially true in patent searching.
Patents
Patent searching is a specialized sub-field in itself. Most companies want to patent their new compounds, inventions and technologies. In order to do this, they must first know that the compound or method has not been already been patented by someone else. If it has been patented, then by whom and when the patent expires are of prime interest. If it hasn't been patented, the real work begins. In order to patent a new compound or method, the company must prove that it is novel, not an obvious extension of the current state of the art in that field, and that it has a specific utility. In order to prove this, the current state of the art in that field must be fully researched and documented. This involves searches not only of published scientific literature and conference proceedings, but other patents. Complicating this is the fact that indexing has changed over time, particularly the indexing of chemical patents, so different search strategies are required for searches of different periods of time. The ability to efficiently use Markush structures in searches is also critical. A Markush structure is one that contains one or more structural variables based on a list of stated alternatives. Each compound that could be constructed from the list is covered by the claims, and must be covered in the search.
R1 = CH3, CH2CH3 or CH2CH2CH3
R2 = NH2, OH or CH3
A Markush Structure. This represents 9 different chemical structures, every possible combination of the two substituent groups.
Skills and Education
According to Suzanne Robins of Patent Information Services Inc., success in chemical information demands, "Good interpersonal and organizational skills, and an attention to detail are absolutely essential. Rarely do I conduct a search that does not involve multiple communications with the requester and some request a full written report, including an analysis of the results."
Everyone learns informally how to search for chemical information in graduate school, but there are few formal courses. For those who decide to make this their career, most training is on the job, provided by the employer after they are hired. A Ph.D. degree can be an advantage for chemical information positions, but is most likely required for positions on the computational chemistry end of the spectrum.
Combinatorial chemistry itself is just beginning to be taught-New York State's University at Buffalo offers a one-semester course (one third lecture and two-thirds lab experience), and a semester long course in the theoretical foundations of combinatorial chemistry.
There are currently three schools that offer M. Sc. degrees in informatics. Indiana University offers a two-year program in either chemical informatics or bioinformatics, while the University of Sheffield and University of Manchester Institute of Science and Technology each offer one-year programs in chemical information. All programs assume a first degree in chemistry.2
Summary
Chemical information is an exciting field that is currently experiencing growth in a wide variety of directions. It covers a wide range of activities, from traditional chemical literature searching to the development of new computational methods for predicting properties. Some areas of work require only an undergraduate degree, others require Ph.D.s and beyond. It's an exciting option for those who are intrigued by the principles of chemistry.
Lisa M. Balbes, Ph.D, founded Balbes Consultants (formerly Osiris Consultants) in 1992, offering a wide range of services to the scientific software and bio/chemoinformatics industries. Balbes Consultants is located at 648 Simmons Avenue, Kirkwood MO 63122, USA, tel. 314-966-5298, e-mail [email protected] or visit www.Balbes.com.
Related Reading:
The ACS Division of Chemical Information is committed to providing a forum for the exchange of information and expertise among the generators, developers, providers, and users of chemical information worldwide through innovative high-quality programs and publications, and through opportunities for career development and recognition of excellence. The division also provides links to other chemical information resources on the Web.
The bioinformatics industry is full of players these days. The Motley Fool looks at candidates for consolidation
1Combinatorial chemistry, Andrew Wood, Chemical Week; New York; Jul 18, 2001.
2Recent developments in chemoinformatics education, Helen Schofield, Gary Wiggins and Peter Willett, Drug Discovery Today, 2001, 6:18:931-934.
|