PAMOJA: a component framework for grammar-aware engineering
Abstract
Formal grammars are fundamental to numerous data and text processing applications, including the development of compilers for software languages and natural language processing systems. From formal grammars, tools like scanners, parsers, compilers, and interpreters can be built. Moreover, these tools and grammars have a broader range of applications, extending to fields like speech recognition, information security, genetic sequencing, and more. Collectively, software applications that involve grammar knowledge are termed \emph{grammar-aware} software. While numerous tools exist for implementing software languages, developers often face challenges when integrating them into general-purpose development environments like NetBeans, Eclipse, and Microsoft Visual Studio -- especially for building applications dependent on grammars and associated tools. Many language implementation tools function as standalone systems and cannot be integrated with ease into the work flow of grammar-aware software development. Although various approaches have been proposed to ease this integration, they often come with steep learning curves, posing a challenge for users with limited expertise in language implementation. This thesis addresses the challenge of integrating grammars and associated tools, particularly language processor front ends, into the field of grammar-aware software development. By applying component-based software development principles, the study aims to develop a software component framework that integrates seamlessly into general-purpose development environments. This is achieved through pragmatic, solution-oriented research, guided by the design science research methodology. The research produced key artifacts, including a prescriptive architecture and the instantiation of the PAMOJA software component framework. In the first phase, the study conceptualized the research problem and identified component-based software development as a potential solution by reviewing literature and consulting experts. The second phase involved establishing a conceptual framework to guide the design and development of PAMOJA, followed by designing an architecture encompassing design requirements, guiding principles, key decisions, a structural model, and the development platform. PAMOJA was then instantiated through an iterative design and development process, allowing for continuous refinement. In the third phase, the study demonstrated PAMOJA's value in facilitating grammar-aware software development. Through building and testing, the study confirmed that PAMOJA is technically feasible, works under certain assumptions, and that component-based software principles have the potential to solve or mitigate the identified problem. Demonstration cases, which involved using PAMOJA to create a hybrid text/structure editor adaptable to various programming languages and developing extensible tools for language processor education, further validated its effectiveness. Finally, expert evaluation and technical action research confirmed that users in the grammar-aware community regard PAMOJA as a valuable artifact for improving and facilitating grammar-aware software development.