Cognitive Approach to Natural Language Processing (SD213)
→ other AI courses
Processing language is one of the most important and most challenging issues of Artificial Intelligence. NLP (Natural Language Processing) has many applications. It is commonly used in machine translation, in text mining, in speech recognition, in dialogue based applications, in text generation, in automatic summarization, in Web search, etc. Conversely, it is hard to imagine an "intelligent" machine that would be unable to understand language.
NLP remains a challenging task. Statistical techniques perform well in domains such as machine translation, but they are intrinsically limited to average meanings and cannot take contextual knowledge into account. This course explores some symbolic alternatives to mere statistics.
Some NLP techniques, like grammars, parsing and ontologies, are classic symbolic methods. Some others are inspired by cognitive modelling. They include procedural semantics, aspect processing, dialogue processing. The point is not only to adopt a "reverse engineering" approach to language, but also to adapt engineering techniques to human requirements to improve efficiency and acceptability.
This course presents different NLP methods that are inspired by the study of natural language and of the underlying cognitive processes. The techniques and concepts that will be studied have however a broader scope in artificial intelligence and are used to study reasoning, decision making and symbolic machine learning. They include:
- Syntactic processing using context-free grammars. Basic parsing methods.
- Knowledge representation – Meaning representation – Procedural semantics – Aspect.
- Relevance: interest, newsworthiness, argumentative relevance and processing.
Students are supposed to have followed SD206 (Logic and knowledge representation), or equivalent.
Students are asked to complete the exercises of each session within 7 days.
- Answers to questions during lab sessions will be recorded and evaluated.
- Students will be asked to perform a small technical study (in pairs) by extending some issue addressed during lab sessions. They will be given the opportunity to present their work during a few minutes at the end of the course ("soutenance" day). They will also write a 3-page report.
- Students will answer a small quiz.
Each student will choose a problem related to the above topics and perform a micro-research on that problem. Students will write a 3-page paper (typical structure: problem, relevant studies, claim, evidence, discussion, bibliography (with weblinks)).
The study should be related to symbolic
The easiest way to do this study is to work on a topic closely related to one of the lab work sessions. You are free, however, to work on any other relevant topic. Be careful to keep it feasible: it’s supposed to be a mini-study.
Caveat: if your study involves statistical aspects, only the symbolic part will be considered in the evaluation.
Implementation language should be Prolog or Python (ask in case of problem).
Examples: Extend a grammar to analyze more complex sentences (such as the fist sentence of this section); create a grammar for a different language; extend the lab work on procedural semantics to understand more sentences about chess; or to understand sentences about the genealogy of an actual family; extend the lab work on aspect to include more aspectual words (always, ancient, already, still, ...); create a mini-knowledge base on a specific domain (football, Roland-Garros...) and use CAN (last lab work) to propose interactive dialogues; etc.Students may work in pairs
. In this case, the respective contributions of each student should appear unambiguously. And the expectations are of course doubled.
You will present your work on the "soutenance" day during no more than 5 minutes (audio). Make sure to make it relevant to the audience. A couple of days before presentations, you will be asked to post a few slides
that will be displayed while you are talking.
The project itself can be handed in until the last week.
- your report
- your source code
- any suitable additional material (e.g. slides if they changed)
- Difficulty of interpreting in context:
Scott, T. (2020). The Sentences Computers Can’t Understand -
- Critique of statistical translation:
Hofstadter, D. R. (2018). The shallowness of Google Translate. The Atlantic, Jan, 30.
(voir aussi cette critique de l’apprentissage profond)
- A short introduction to syntax:
Lasnik, H. (2002). The minimalist program in syntax. Trends in cognitive sciences, 6 (10), 432-437.
- Another introduction to syntax:
Gerold Schneider. Introduction to Government and binding
- About feature structures and unification: See these slides taken from:
Bender, E. M., Sag, I. A. & Wasow, T. (2003). Syntactic theory: A formal introduction. CSLI, Stanford, CA.
- About using Prolog to process natural language:
Natural Language Processing Techniques in Prolog by Patrick Blackburn and Kristina Striegnitz