This is a very short glossary of terms which might be useful to students of linguistics who are unfamiliar with computational linguistics. For detailed information on the theory underlying the current advances in speech technology, read Speech and Language Processing, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin (Prentice Hall: New Jersey, 2000). This text is clearly written and integrates basic linguistic theory with computational theory. CLABU members recommend it highly.
According to Miriam Webster's Collegiate Dictionary (http://www.yourdictionary.com) artificial intelligence is "1 : the capability of a machine to imitate intelligent human behavior 2 : a branch of computer science dealing with the simulation of intelligent behavior in computers."
Artificial intelligence is "made possible by [computer] strength in information-processing capability associated with certain basic areas. These areas include matching, goal reduction, constraint exploitation, search, control, problem solving, and logic." (Patrick Henry Winston, Artificial Intelligence, 2nd ed. Addison-Wesley: Reading, MA, 1984. p.17)
One application of artificial intelligence is natural language processing.
"Simply put, computational linguistics is the scientific study of language from a computational perspective. Computational linguists are interested in providing computational models of various kinds of linguistic phenomena. These models may be "knowledge-based" ("hand-crafted") or "data-driven" ("statistical" or "empirical"). Work in computational linguistics is in some cases motivated from a scientific perspective in that one is trying to provide a computational explanation for a particular linguistic or psycholinguistic phenomenon; and in other cases the motivation may be more purely technological in that one wants to provide a working component of a speech or natural language system. Indeed, the work of computational linguists is incorporated into many working systems today, including speech recognition systems, text-to-speech synthesizers, automated voice response systems, web search engines, text editors, language instruction materials, to name just a few. "
A corpus is "a collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language."(David Crystal, A Dictionary of Linguistics and Phonetics, Blackwell, 3rd Edition, 1991.)
The corpus may or may not be "tagged", or annotated, for linguistic features such as parts of speech, discourse features, phonetic transcriptions, syntactic information etc. The most useful corpora are in electronic form so that they may be searched for selected words, phrases, patterns, or features. Commonly used corpora are the CHILDES database, COBUILD, the British National Corpus, and the Corpus of Professional Spoken American English. See Michael Barlow's website for a list of corpora and corpus linguistics references.
Corpus linguistics is "the study of human language using extensive and authentic examples of language in use". (Luisa Plaja of L&H/Dragon). Because the selected body of authentic language, or corpus, is typically large, quantitative analysis is usually employed to extract linguistic information.
"The goal of Natural Language Processing (NLP) is to design and build a computer system that will analyze, understand, and generate natural human-languages. Applications of NLP include machine translation of one human-language text to another; generation of human-language text such as fiction, manuals, and general descriptions; interfacing to other systems such as databases and robotic systems thus enabling the use of human-language type commands and queries; and understanding human-language text to provide a summary or to draw conclusions. One of the easiest tasks for a NLP system is to parse a sentence to determine its syntax. A more difficult task is determining the semantic meaning of a sentence. One of the most difficult tasks is the analysis of the context to determine the true meaning and comparing that with other text."
PC AI - Natural Language Processing
http://www.pcai.com/web/ai_info/natural_lang_proc.html
"Automatic speech recognition is the process by which a computer maps an acoustic speech signal to text. Automatic speech understanding is the process by which a computer maps an acoustic speech signal to some form of abstract meaning of the speech. Speech synthesis is the task of transforming written input to spoken output. The input can either be provided in a graphemic/orthographic or a phonemic script, depending on its source. As a consequence of its reliance on phonology, linguistics, signal processing, statistics, computer science, acoustics, connectionist networks, psychology and other fields, there are many technologies involved in speech technology. "
PC AI - Speech Recognition
http://www.pcai.com/web/ai_info/speech_recognition.html