NAMED ENTITY RECOGNITION TOOLS AND TECHNIQUES IN CUSTOM NATURAL LANGUAGE PROCESSING MODELS
Main Article Content
Keywords
Named Entity Recognition, Natural Language Processing, SpaCy, Apache OpenNLP, TensorFlow, Information Extraction, Model Comparison, Performance Evaluation, Machine Learning, Text Mining
Abstract
Named Entity Recognition automatically extracts crucial information like names, locations, and organizations from unstructured text data. This research provides a comparative analysis of prominent NLP libraries, including SpaCy, Apache OpenNLP, and TensorFlow, for building custom NER models. Performance is evaluated based on key metrics such as accuracy, F-score, prediction time, model size, and training efficiency. Using a consistent dataset, SpaCy consistently outperforms other libraries in terms of accuracy. The study also explores various NER techniques, encompassing rule-based, learning-based, and hybrid approaches, highlighting their applications in diverse domains. Furthermore, it examines the strengths and weaknesses of different libraries and their associated tools across Java, Python, and Cython programming languages. Factors such as model size, prediction time, training loss, and F-measure are considered in the comparison. The results consistently demonstrate SpaCy’s superior performance and accuracy compared to other models. This makes SpaCy a valuable tool for information extraction from the ever-increasing volume of textual data available online, in social media, and other sources. The study contributes to a better understanding of NER techniques and tools, aiding researchers and developers in selecting the most appropriate approach for their specific needs. The findings emphasize the importance of choosing the right tools and techniques for efficient and accurate information extraction.
References
[2] Siti Syakirah Sazali, Nurazzah Abdul Rahman, and Zainab Abu Bakar. Information extraction: Evaluating named entity recognition from classical malay documents. In 2016 third international conference on information retrieval and knowledge management (CAMP), pages 48–53. IEEE, 2016.
[3] Burak Ertopc¸u, Ali Bu˘gra Kanburo˘glu, Ozan Topsakal, Onur A¸cıkg¨oz, Ali Tunca Gu¨rkan, Berke Ozen¸c,¨ Ilker C¸am, Begu¨m Avar, G¨okhan Ercan, and Olcay Taner Yıldız.˙ A new approach for named entity recognition. In 2017 International Conference on Computer Science and Engineering (UBMK), pages 474–479. IEEE, 2017.
[4] Hyejin Cho and Hyunju Lee. Biomedical named entity recognition using deep neural networks with contextual information. BMC bioinformatics, 20:1–11, 2019.
[5] Hemlata Shelar, Gagandeep Kaur, Neha Heda, and Poorva Agrawal. Named entity recognition approaches and their comparison for custom ner model. Science & Technology Libraries, 39(3):324– 337, 2020.
[6] Archana Goyal, Vishal Gupta, and Manish Kumar. Recent named entity recognition and classification techniques: a systematic review. Computer Science Review, 29:21–43, 2018.
[7] Gorjan Popovski, Stefan Kochev, Barbara Korousic-Seljak, and Tome Eftimov. Foodie: A rule-based named-entity recognition method for food information extraction. ICPRAM, 12:915, 2019.
[8] Lenz Furrer, Anna Jancso, Nicola Colic, and Fabio Rinaldi. Oger++: hybrid multi-type entity recognition. Journal of cheminformatics, 11:1–10, 2019.
[9] Anna Sniegula, Aneta Poniszewska-Maran´da, and L ukasz Chomatek. Towards the named entity´ recognition methods in biomedical field. In SOFSEM 2020: Theory and Practice of Computer Science: 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20–24, 2020, Proceedings 46, pages 375–387. Springer, 2020.
[10] Willie Boag, Elena Sergeeva, Saurabh Kulshreshtha, Peter Szolovits, Anna Rumshisky, and Tristan Naumann. Cliner 2.0: Accessible and accurate clinical concept extraction. arXiv preprint arXiv:1803.02245, 2018.
[11] Kepa Joseba Rodriquez, Mike Bryant, Tobias Blanke, and Magdalena Luszczynska. Comparison of named entity recognition tools for raw ocr text. In Konvens, pages 410–414, 2012.
[12] Stefan Dlugolinsky`, Marek Ciglan, and Michal Laclav´ık. Evaluation of named entity recognitionˇ tools on microposts. In 2013 IEEE 17th International Conference on Intelligent Engineering Systems (INES), pages 197–202. IEEE, 2013.
[13] Wahed Hemati and Alexander Mehler. Lstmvoter: chemical named entity recognition using a conglomerate of sequence labeling tools. Journal of cheminformatics, 11:1–7, 2019.
[14] Ridong Jiang, Rafael E Banchs, and Haizhou Li. Evaluating and combining name entity recognition systems. In Proceedings of the sixth named entity workshop, pages 21–27, 2016.
[15] Xavier Schmitt, Sylvain Kubler, J´er´emy Robert, Mike Papadakis, and Yves LeTraon. A replicable comparison study of ner software: Stanfordnlp, nltk, opennlp, spacy, gate. In 2019 sixth international conference on social networks analysis, management and security (SNAMS), pages 338–343. IEEE, 2019.
[16] Ashish Bansal. Advanced Natural Language Processing with TensorFlow 2: Build effective real-world NLP applications using NER, RNNs, seq2seq models, Transformers, and more. Packt Publishing Ltd, 2021.
[17] K Satheesh, A Jahnavi, L Iswarya, K Ayesha, G Bhanusekhar, and K Hanisha. Resume ranking based on job description using spacy ner model. International Research Journal of Engineering and Technology, 7(05):74–77, 2020.