[ad_1]
This publish relies on our RANLP 2023 paper “Exploring the Landscape of Natural Language Processing Research”. You’ll be able to learn extra particulars there.
Desk of Contents
As an environment friendly method to grasp, generate, and course of pure language texts, analysis in pure language processing (NLP) has exhibited a fast unfold and broad adoption lately. Given the fast developments in NLP, acquiring an outline of the area and sustaining it’s tough. This weblog publish goals to offer a structured overview of various fields of research in NLP and analyzes current developments on this area.
Fields of research are educational disciplines and ideas that often include (however should not restricted to) duties or strategies.
On this article, we examine the next questions:
- What are the totally different fields of research investigated in NLP?
- What are the traits and developments over time of the analysis literature in NLP?
- What are the present developments and instructions of future work in NLP?
Though most fields of research in NLP are well-known and outlined, there at present exists no generally used taxonomy or categorization scheme that makes an attempt to gather and construction these fields of research in a constant and comprehensible format. Subsequently, getting an outline of the complete area of NLP analysis is tough. Whereas there are lists of NLP matters in conferences and textbooks, they have an inclination to fluctuate significantly and are sometimes both too broad or too specialised. Subsequently, we developed a taxonomy encompassing a variety of various fields of research in NLP. Though this taxonomy might not embrace all doable NLP ideas, it covers a variety of the most well-liked fields of research, whereby lacking fields of research could also be thought-about as subtopics of the included fields of research. Whereas growing the taxonomy, we discovered that sure lower-level fields of research needed to be assigned to a number of higher-level fields of research relatively than only one. Subsequently, some fields of research are listed a number of instances within the NLP taxonomy, however assigned to totally different higher-level fields of research. The ultimate taxonomy was developed empirically in an iterative course of along with area specialists.
The taxonomy serves as an overarching classification scheme through which NLP publications might be categorized in accordance with at the least one of many included fields of research, even when they don’t instantly handle one of many fields of research, however solely subtopics thereof. To research current developments in NLP, we educated a weakly supervised mannequin to categorise ACL Anthology papers in accordance with the NLP taxonomy.
You’ll be able to learn extra particulars in regards to the growth strategy of the classification mannequin and the NLP taxonomy in our paper.
The next part gives brief explanations of the fields of research ideas included within the NLP taxonomy above.
Multimodality
Multimodality refers back to the functionality of a system or methodology to course of enter of various varieties or modalities (Garg et al., 2022). We distinguish between programs that may course of textual content in pure language together with visible knowledge, speech & audio, programming languages, or structured knowledge corresponding to tables or graphs.
Pure Language Interfaces
Pure language interfaces can course of knowledge primarily based on pure language queries (Voigt et al., 2021), often carried out as query answering or dialogue & conversational programs.
Semantic Textual content Processing
This high-level area of research consists of all kinds of ideas that try to derive that means from pure language and allow machines to interpret textual knowledge semantically. One of the vital highly effective fields of research on this regard are language fashions that try to study the joint likelihood perform of sequences of phrases (Bengio et al., 2000). Current advances in language mannequin coaching have enabled these fashions to efficiently carry out numerous downstream NLP duties (Soni et al., 2022). In illustration studying, semantic textual content representations are often realized within the type of embeddings (Fu et al., 2022), which can be utilized to check the semantic similarity of texts in semantic search settings (Reimers and Gurevych, 2019). Moreover, information representations, e.g., within the type of information graphs, might be included to enhance numerous NLP duties (Schneider et al., 2022).
Sentiment Evaluation
Sentiment evaluation makes an attempt to determine and extract subjective info from texts (Wankhade et al., 2022). Often, research give attention to extracting opinions, feelings, or polarity from texts. Extra lately, aspect-based sentiment evaluation emerged as a approach to offer extra detailed info than common sentiment evaluation, because it goals to foretell the sentiment polarities of given facets or entities in textual content (Xue and Li, 2018).
Syntactic Textual content Processing
This high-level area of research goals at analyzing the grammatical syntax and vocabulary of texts (Bessmertny et al., 2016). Consultant duties on this context are syntactic parsing of phrase dependencies in sentences, tagging of phrases to their respective part-of-speech, segmentation of texts into coherent sections, or correction of misguided texts with respect to grammar and spelling.
Linguistics & Cognitive NLP
Linguistics & Cognitive NLP offers with pure language primarily based on the assumptions that our linguistic skills are firmly rooted in our cognitive skills, that that means is basically conceptualization, and that grammar is formed by utilization (Dabrowska and Divjak, 2015). Many alternative linguistic theories are current that usually argue that language acquisition is ruled by common grammatical guidelines which are widespread to all sometimes growing people (Wise and Sevcik, 2017). Psycholinguistics makes an attempt to mannequin how a human mind acquires and produces language, processes it, comprehends it, and gives suggestions (Balamurugan, 2018). Cognitive modeling is anxious with modeling and simulating human cognitive processes in numerous kinds, significantly in a computational or mathematical type (Sun, 2020).
Accountable & Reliable NLP
Accountable & reliable NLP is anxious with implementing strategies that target equity, explainability, accountability, and moral facets at its core (Barredo Arrieta et al., 2020). Inexperienced & sustainable NLP is principally centered on environment friendly approaches for textual content processing, whereas low-resource NLP goals to carry out NLP duties when knowledge is scarce. Moreover, robustness in NLP makes an attempt to develop fashions which are insensitive to biases, proof against knowledge perturbations, and dependable for out-of-distribution predictions.
Reasoning
Reasoning allows machines to attract logical conclusions and derive new information primarily based on the data obtainable to them, utilizing strategies corresponding to deduction and induction. Argument mining robotically identifies and extracts the construction of inference and reasoning expressed as arguments introduced in pure language texts (Lawrence and Reed, 2019). Textual inference, often modeled as entailment drawback, robotically determines whether or not a natural-language speculation might be inferred from a given premise (MacCartney and Manning, 2007). Commonsense reasoning bridges premises and hypotheses utilizing world information that isn’t explicitly offered within the textual content (Ponti et al., 2020), whereas numerical reasoning performs arithmetic operations (Al-Negheimish et al., 2021). Machine studying comprehension goals to show machines to find out the right solutions to questions primarily based on a given passage (Zhang et al., 2021).
Multilinguality
Multilinguality tackles all kinds of NLP duties that contain multiple pure language and is conventionally studied in machine translation. Moreover, code-switching freely interchanges a number of languages inside a single sentence or between sentences (Diwan et al., 2021), whereas cross-lingual switch strategies use knowledge and fashions obtainable for one language to resolve NLP duties in one other language.
Info Retrieval
Info retrieval is anxious with discovering texts that fulfill an info want from inside giant collections (Manning et al., 2008). Sometimes, this entails retrieving paperwork or passages.
Info Extraction & Textual content Mining
This area of research focuses on extracting structured information from unstructured textual content and allows the evaluation and identification of patterns or correlations in knowledge (Hassani et al., 2020). Textual content classification robotically categorizes texts into predefined courses (Schopf et al., 2021), whereas matter modeling goals to find latent matters in doc collections (Grootendorst, 2022), usually utilizing textual content clustering strategies that arrange semantically comparable texts into the identical clusters. Summarization produces summaries of texts that embrace the important thing factors of the enter in much less house and hold repetition to a minimal (El-Kassas et al., 2021). Moreover, the data extraction & textual content mining area of research additionally consists of named entity recognition, which offers with the identification and categorization of named entities (Leitner et al., 2020), coreference decision, which goals to determine all references to the identical entity in discourse (Yin et al., 2021), time period extraction, which goals to extract related phrases corresponding to key phrases or keyphrases (Rigouts Terryn et al., 2020), relation extraction that goals to extract relations between entities, and open info extraction that facilitates the domain-independent discovery of relational tuples (Yates et al., 2007).
Textual content Era
The target of textual content technology approaches is to generate texts which are each understandable to people and indistinguishable from textual content authored by people. Accordingly, the enter often consists of textual content, corresponding to in paraphrasing that renders the textual content enter in a unique floor type whereas preserving the semantics (Niu et al., 2021), query technology that goals to generate a fluid and related query given a passage and a goal reply (Song et al., 2018), or dialogue-response technology which goals to generate natural-looking textual content related to the immediate (Zhang et al., 2020). In lots of circumstances, nonetheless, the textual content is generated on account of enter from different modalities, corresponding to within the case of data-to-text technology that generates textual content primarily based on structured knowledge corresponding to tables or graphs (Kale and Rastogi, 2020), captioning of photos or movies, or speech recognition that transcribes a speech waveform into textual content (Baevski et al., 2022).
Contemplating the literature on NLP, we begin our evaluation with the variety of research as an indicator of analysis curiosity. The distribution of publications over the 50-year statement interval is proven within the Determine above. Whereas the primary publications appeared in 1952, the variety of annual publications grew slowly till 2000. Accordingly, between 2000 and 2017, the variety of publications roughly quadrupled, whereas within the subsequent 5 years, it has doubled once more. We subsequently observe a near-exponential progress within the variety of NLP research, indicating rising consideration from the analysis neighborhood.
[ad_2]
Source link