Friday, April 9, 2010

10th April, 2010

Some more information about the task mode of operation for the Annotation Interfaces (not yet implemented for the Parallel Corpus Markup):


  • The annotation process starts with one copy of the document to be annotated, say, story-1.

  • When an annotator (say, john) claims this document (opens and saves it in the task mode), another copy is created, with the file name story-1-john. This is the file on which the annotator will be working.

  • However, the task name (e.g. story-1) will apply to all the copies.

  • When another user (e.g. terry) has claimed the same task, another copy will be created (story-1-terry).

  • Now the adjudicators can use the annotation comparison facility to compare the two annotations.

  • The adjudicators can select one of the two annotations and make changes to it directly from the comparison facility.

  • When an adjudicator (say, chapman) saves the selected and modified version of one of the annotations, it gets saved with the original document name (story-1), i.e., the original document is overwritten.

  • An option worth considering is whether, instead of overwriting the original document, another copy should be created with the name of the adjudicator (story-1-chapman). But what if the adjudicator is also one of the annotators? That shouldn't be a problem, because it can't be true for the same document.

  • However, the copies of the work by the two annotators remain available for more work by the annotators or for any other use later, such as calculating inter-annotator agreement.



That brings me to the facility I will try to add soon: calculating inter-annotator agreement for different levels of annotation.

Also, note that the task mode of working is not yet available for sentence alignment and word alignment interfaces. That is another item on the agenda.

Thursday, April 8, 2010

9th April, 2010

Version 0.4.1 released on Sourceforge.net.

There are the following major changes/additions in this release:


  1. Propbank Annotation is a separate application now, instead of being a mode of operation in the Syntactic Annotation Interface. The two letter code is PB.

  2. Both the Propbank Annotation and the Syntactic Annotation interfaces can now work in two modes: file mode and the task mode, which can be specified in the the file Sanchay/props/client-modes.txt. You can change it according to your needs.

  3. In the task mode, the dormant but previously active facilities of annotation comparison and task generation are available. They are quite simple and should be easy to use.

  4. In the task mode, new facilities include the ability to specify two users and two adjudicators for every annotation task. In the beginning, all the tasks (which have been created) are available to everyone, but as soon as two users claim them (by opening and saving them), they become unavailable to others. Only the adjudicators can use the comparison (annotation diff) facility. Look for a file like Sanchay/workspace/syn-annotation/Premchand/Premchand-list.txt, which lists the tasks belonging to the task group named Premchand (one of the greatest Hindi writers). There is a task properties file for every task in the task group, which specifies the details used by the annotation interface. Generating tasks means specifying the files to be annotated and automatically generating the task properties files and the task list file.



If you intend to use the task mode of annotation, which is what I would recommend, then you should try to use the Task Generation facility. You can access it in the task mode, when you select the Setup mode (the other modes being the Work mode and the Compare mode). Once you go through a task list file, such as the one mentioned above, and the task properties files generated in the Setup mode, you will get a fairly good idea about how the task mode operates.

The assumption for the task mode right now is that Sanchay will be located on one computer (preferably Linux: I am not sure whether it can work on Windows for multiple users) and that computer will have accounts for every user who is going to be involved in the annotation process. To make sure that Sanchay is accessible to all these users (write permissions are needed for some properties files), one simple way is to create a group and give read and write permissions to that group for the whole Sanchay directory, except the files you want to have restricted access, e.g. the task list file and client mode file.

Thursday, April 1, 2010

2nd April, 2010

Update 1.2. Released only here. Includes the following changes:


  1. In the Propbank Annotation Interface (Syntactic Annotation in the Propbank mode), it possible now to navigate by word stem and tags, for which files have to be provided. The default location of these files is Sanchay/workspace/syn-annotation. The file for word stem is named word-navigation-list.txt and the one for tags is named tag-navigation-list.txt. The navigation will work based on the word stem plus tag combination. For example, you can annotate all the main verbs (tag regex '"^VM") with the stem कर (kara: do).

  2. In the Propbank mode, some of the dependency structure information will be hidden so that Propbank annotation can be performed without being biased by the dependency structure information and also to ensure that there is less chance of changing this information by mistake. You can still view the dependency tree in the tree visualizer and change it by drag and drop, but that will be fixed in one of the following updates.



In the previous (1.1) update, some further changes were made in the tokenizer so that it shouldn't split sentence in the case of bullet points (in the case where the bullets are decimal numbers).