- Facilitating Computational Linguists
Ellogon tries to facilitate many aspects of the tasks computational linguists usually perform within the platform, especially if the task involves annotated corpora creation, linguistic processing component adaptation or various evaluation tasks. Providing a wide range of highly customisable and easy to use annotation tools, Ellogon is an ideal environment for annotated corpora construction. Available annotators support regular marking (e.g. part of speech tagging or named entities annotation) as well as annotation of hierarchically information (i.e. syntactic relation annotation) on plain as well as HTML corpora. (Two annotation tools are shown here and here).
Adapting linguistic processing components into a new domain is another frequent task. Usually it involves modifications to domain specific resources used internally by the processing components. Ellogon facilitates the adaptation process as the modified component can be applied immediately and the user can very easily identify the effect of his/her modifications, through the comparison facilities offered by the platform. Ellogon provides significant infrastructure for comparing the linguistic information associated with the textual data. The Collection Comparison tool (figure 1, figure 2) can be used for comparing the linguistic information stored in a set (or collection) of documents. Various constraints regarding the information that will be compared can be specified through the graphical user interface of the comparison tool and the comparison results are presented by utilising standard figures, like recall, precision and F-measure. Additionally, the comparison tool can present a comparison log. This log is a graphical representation of the differences found during the comparison process and can provide valuable help to the user in order to locate and possibly correct the errors.
- Facilitating Language Engineers
One of the most frequent tasks performed by language engineers inside Ellogon is of course the development of processing components. Significant infrastructure is provided in order to facilitate component development, from the very first step of writing the component to ensuring that the component works as expected. Operating as an integrating environment (IDE), Ellogon allows the creation of components in a wide range of programming languages (C, C++, Tcl, Java, Perl, Python): all the needed code of the component structure is automatically generated during the initial construction of a component while a component can be compiled, linked, loaded and tested from inside Ellogon. For some specific languages (all supported ones except Java) a component can be even unloaded, modified, compiled and reloaded, in order to quickly test the effect of desired modifications.
Developing components for Ellogon is a fairly easy process, as a high level API is provided both as a set of functions or as an object oriented hierarchy of classes, if the programming language allows it. Additionally, Ellogon is distributed with a small set of components whose source code can be used as an example on how to perform some commonly needed tasks.
The fact that almost everything in Ellogon is defined in terms of components, offers a large degree of flexibility to component developers. Combined with its modular architecture, Ellogon offers the ability to be tailored in order to meet specific needs. For example, particular Ellogon parts can be wrapped along with specific processing components to form a stand-alone application that performs a specific processing task (having possibly a specifically-made graphical interface). Such an application will even ran without requiring the installation of Ellogon.
- Facilitating end users
End users of Ellogon can be roughly distinguished in two categories: users that use applications or services based on Ellogon and users that use Ellogon as a "black box" in order to process corpora and collect the results.
Regarding the first category of end users, Ellogon provides many facilities for creating stand alone applications with customised graphical interfaces that are extremely easy to use. Such an application is shown in this figure, where all the complexity of creating collections, applying the required processing components and exporting the processing results is hidden behind a simple graphical interface. In addition to creating specialised applications, Ellogon can be instrumented through the use of services, like ActiveX, DDE, HTTP or SOAP, which allow other applications to use Ellogon facilities in a way transparent to the end user.
The second category of end users characterises users who want to perform some sort of linguistic processing by simply applying the components available through Ellogon on a corpus. For this category of users, Ellogon is a toolbox of "black boxes": for example users may want to apply a named-entity recognition system operating within Ellogon or use more primitive components like a syntactic analyser. Ellogon tries to facilitate this category of users by providing an easy to use graphical interface that can be used to create collections from a wide variety of sources and easily apply on them any available processing component. Processing results can be examined through the large set of available viewers or even exported to widely used formats, such as SGML or XML. Finally, Ellogon offers the ability to automate tasks through the definition of "macro" commands, which can be useful especially in tasks that must be repeated multiple times.
Page 5 of 6