Careers In Chemcal
Information: It's Not Just Literature Searching
Anymore
Lisa M. Balbes
|
|
Printer-friendly
version
|
Everyone's heard that this is the Information
Age, and chemistry has not been left out. In fact,
a whole new set of careers has evolved around
chemical information. Chemical information refers
to any type of data related to chemistry, including:
- Spectra and spectral data
- Crystallographic data
- Chemical structure databases, both 2D and
3D
- Chemical compound and process patents
- Chemical nomenclature
- Chemical and physical properties (including
activity) of compounds
- Chemical syntheses or reactions
- Chemical safety or toxicology information
- Commercially available or proprietary corporate
compounds
- Published scientific articles, and abstracts
thereof
.... And much more.
Chemical information is the key to keeping track
of the chemistry done in any organization, and
all scientists need access to the information.
Chemical information careers can involve acquiring,
managing, or using information. In each area,
people are needed to develop systems, as well
as use them. In large or highly specialized companies,
any one type of task may be a full-time job. In
smaller companies, a scientist is likely to do
more than one type of task routinely.
Information can be generated and collected by
traditional analysis or experimentation, or by
computer simulation (an entire field in itself
called computational chemistry, see below). Effective
use of information can be accomplished through
visualization techniques, or by the development
of sophisticated decision support systems.
Information management includes creating, searching
and maintaining databases of all kinds-databases
of 2D or 3D chemical structures, primary journal
literature databases, abstract databases. The
rise of new techniques such as combinatorial synthesis
and high throughput screening allow a single company
to synthesize and analyze hundreds of thousands
of molecules in a single year. While originally
developed by the pharmaceutical industry, these
techniques are now being adopted by the chemical
industry in general.1 The vast amounts of information
generated create demand for libraries of chemicals
and raw materials, and also for systems to manage
the generated data. These problems with storing
and retrieving data offer exciting new opportunities
for chemical informatics professionals.
The closely related field of bioinformatics (the
creation, management or use of biological information)
has also exploded recently, fueled by the massive
amounts of information generated by the Human
Genome Project. While more widely known, bioinformatics
techniques are probably not as highly developed
as cheminformatics techniques.
From Library Science to Computational Chemistry
Chemical information and computational chemistry
are two fields that have grown up separately,
but have begun to overlap-covering the range from
library science to computer science, with chemistry
firmly in the middle. Opportunities in chemical
information range from scientific librarian, to
technical information specialist, to technical
publisher, to software developer, to software
architect.
Chemical information is about raw data, for example,
finding answers to specific questions, searching
primary literature, registering compounds in a
corporate database, keeping track of computed
and/or measured data associated with compounds
and so on. Chemical information specialists communicate
with coworkers, formulate queries, perform searches,
analyze results, and deliver written and oral
reports. They have to know the right sources to
search (in-house or commercial databases, reference
books, Internet, literature, etc.). They may search
for physical properties of a particular compound,
synthesis of it or similar compounds, biological
activity in a variety of systems, patent restrictions
on synthetic methods or biological uses, or use
the chemical entity as an index to the literature.
The skills required for these activities are a
degree in chemistry and a Master of Library Science
(MLS), knowledge of information sources, some
computer skills, and foreign language skills,
if possible.
The original objective of chemical librarians
was to perform complex and expensive literature
searching work for lab scientists and management.
With the proliferation of personal computers,
it became possible for individuals to search databases
themselves, and the librarians became intermediaries.
Now, the tremendous amount of information being
generated again requires information experts who
know which databases are likely to have the desired
answers, and to coordinate searching of multiple
resources.
Chemistry librarians are responsible for providing
general and in-depth reference services for students,
faculty, researchers and other users, using reference
tools for subjects of chemistry and related areas.
They may also train others on how to use paper
and electronic reference resources. An MLS is
often required, along with experience with chemical
information systems and chemical database systems,
an understanding of how chemical research is done
(and learning, if in an academic environment),
the ability to handle bibliographic citations
in a wide variety of languages, a working knowledge
of the technical underpinnings of the digital
library environments of the future, and familiarity
with multimedia resources and with microcomputer
software applications.
Andrew Berks, Senior Information Scientist at
Merck & Company,
says, "There are more jobs than people in
chemical information right now. This field grew
out of librarianship, but in the online era, it
has taken on a new life and requires people who
can use computers and databases, have analytical
skills, and can write. The skill level needed
has become high, and unfortunately salaries have
not kept pace. However, with the increasing complexity
of modern science, salaries and status will catch
up, to the point that more people will be drawn
in to the field." He describes chemical information
as the "needle in the haystack problem."
Chemical information management also includes
registering compounds in a corporate database,
and keeping track of computed and/or measured
data associated with each compound. It can require
training in chemistry to work on the data entry
end, or in computer science to work on the database
development end.
Cheminformatics (replacing the older term "chemoinformatics")
is the next step towards computer science, and
is more about "processing, developing and
making decisions based upon the information that
has been derived from the raw data, usually through
computer processing and analysis", says Dr.
Peter Gund, Sr. Director of Cheminformatics Marketing
at Accelrys
Inc. Cheminformatics encompasses data mining
and analysis in combinatorial chemistry and high
throughput screening, descriptor-based model generation,
and so on. Chemically intelligent OCR software
that can extract the content of printed texts,
including chemical structures and reactions, is
also starting to appear. This type of work is
extremely computer intensive, and generally requires
a Ph.D. in chemistry, as well as postdoctoral
work in this area.
Slightly further along the continuum is computational
chemistry-using mathematical models to calculate
molecular properties, or to simulate molecular
behavior. This includes molecular modeling, quantitative
structure activity relationship (QSAR) predictions,
protein folding and 3D structure predictions,
and so on. The chemical information in this case
is actually computed data.
Many methods have been devised to measure the
similarity, or dissimilarity, of chemical compounds,
usually to search databases for other compounds
that have similar properties. The basis of any
structure similarity method is the descriptors
used to characterize the molecules. These can
include various types of molecular fingerprints,
3D pharmacophores, physico-chemical properties,
electrotopological states or connectivity indices.
Some represent 2D molecules; some handle 3D ones.
Some are "meaningful" to medicinal chemists;
some are not. Generating molecular descriptors
is another computationally intense field, and
requires an advanced degree.
On the computer science end of the spectrum are
those who develop programs that chemists use.
These are generally computer scientists by training,
but a background in chemistry can be very valuable.
"The programmer who can understand at least
in part what the researcher does will be more
likely to find a good fit than someone with no
science background. Without the chemical knowledge,
programmers have to rely heavily on the scientists
to keep the design of systems on track. Programmers
with a science background are mostly few and far
between these days. It is a big plus when we see
someone with that background," says Donna
Triebe, Senior Product Developer, Tripos,
Inc.
Specialized Sub-Fields
Indexing and Abstracting
By far the best known indexers and abstractors
of chemical literature are at the American Chemical
Society's Chemical
Abstracts Service in Columbus, OH. They index
and prepare abstracts of periodical articles,
putting the journal information into a form that
can be easily accessed by researchers and scientists
worldwide. With a 700% increase in scientific
journals since 1900, researchers there also spend
a significant amount of time designing systems
to make the information easily searchable.
ADME/Tox
In the ever-increasing pressure to bring better
drugs to market sooner, companies are using more
data to make decisions and are using it much earlier
in the pipeline. ADME/Tox stands for absorption,
distribution, metabolism, excretion and toxicity.
These are the properties, in general, that differentiate
a chemical from a "drug." Informatics
plays a role in everything from using computers
to predict ADME/Tox properties of compounds that
have not yet been synthesized, to storing the
results of such studies from biological screens
and feeding that information back to make better
predictions, thus allowing unpromising candidates
to be eliminated before they are ever made.
In all cases, searchers need to
understand computer systems and data handling.
They need to be able to work with others to identify
what information is needed, and then work independently
to find that data. They need to have the skills
and the confidence to say that information doesn't
exist, not just that it was missed in the search.
Not only that, they need to be able to read between
the lines-to learn from what is not there, as
well as from what is. This is especially true
in patent searching.
Patents
Patent searching is a specialized sub-field in
itself. Most companies want to patent their new
compounds, inventions and technologies. In order
to do this, they must first know that the compound
or method has not been already been patented by
someone else. If it has been patented, then by
whom and when the patent expires are of prime
interest. If it hasn't been patented, the real
work begins. In order to patent a new compound
or method, the company must prove that it is novel,
not an obvious extension of the current state
of the art in that field, and that it has a specific
utility. In order to prove this, the current state
of the art in that field must be fully researched
and documented. This involves searches not only
of published scientific literature and conference
proceedings, but other patents. Complicating this
is the fact that indexing has changed over time,
particularly the indexing of chemical patents,
so different search strategies are required for
searches of different periods of time. The ability
to efficiently use Markush structures in searches
is also critical. A Markush
structure is one that contains one or more
structural variables based on a list of stated
alternatives. Each compound that could be constructed
from the list is covered by the claims, and must
be covered in the search.
R1 = CH3, CH2CH3 or CH2CH2CH3
R2 = NH2, OH or CH3
A Markush Structure. This represents 9 different
chemical structures, every possible combination
of the two substituent groups.
Skills and Education
According to Suzanne Robins of Patent Information
Services Inc., success in chemical information
demands, "Good interpersonal and organizational
skills, and an attention to detail are absolutely
essential. Rarely do I conduct a search that does
not involve multiple communications with the requester
and some request a full written report, including
an analysis of the results."
Everyone learns informally how to search for
chemical information in graduate school, but there
are few formal courses. For those who decide to
make this their career, most training is on the
job, provided by the employer after they are hired.
A Ph.D. degree can be an advantage for chemical
information positions, but is most likely required
for positions on the computational chemistry end
of the spectrum.
Combinatorial chemistry itself is just beginning
to be taught-New
York State's University at Buffalo offers
a one-semester course (one third lecture and two-thirds
lab experience), and a semester long course in
the theoretical foundations of combinatorial chemistry.
There are currently three schools that offer
M. Sc. degrees in informatics. Indiana
University offers a two-year program in either
chemical informatics or bioinformatics, while
the University
of Sheffield and University
of Manchester Institute of Science and Technology
each offer one-year programs in chemical information.
All programs assume a first degree in chemistry.2
Summary
Chemical information is an exciting field that
is currently experiencing growth in a wide variety
of directions. It covers a wide range of activities,
from traditional chemical literature searching
to the development of new computational methods
for predicting properties. Some areas of work
require only an undergraduate degree, others require
Ph.D.s and beyond. It's an exciting option for
those who are intrigued by the principles of chemistry.
Lisa M. Balbes, Ph.D, founded Balbes Consultants
(formerly Osiris Consultants) in 1992, offering
a wide range of services to the scientific software
and bio/chemoinformatics industries. Balbes Consultants
is located at 648 Simmons Avenue, Kirkwood MO
63122, USA, tel. 314-966-5298, e-mail
or visit www.Balbes.com.
Related Reading:
The ACS
Division of Chemical Information is committed
to providing a forum for the exchange of information
and expertise among the generators, developers,
providers, and users of chemical information worldwide
through innovative high-quality programs and publications,
and through opportunities for career development
and recognition of excellence. The division also
provides links to other chemical information resources
on the Web.
The bioinformatics industry is full of players
these days. The
Motley Fool looks at candidates for consolidation
1Combinatorial chemistry, Andrew Wood, Chemical
Week; New York; Jul 18, 2001.
2Recent developments in chemoinformatics education,
Helen Schofield, Gary Wiggins and Peter Willett,
Drug Discovery Today, 2001, 6:18:931-934.
|