Saturday, May 21, 2011

21st May, 2011

An incomplete list of some goings on:

  • The Sanchay website has been moved to a different server. There were two main reasons for this. The first was that certain things (like online Java applications) can't be hosted on (where the site was earlier hosted) and the online apps were already hosted on another server. The other reason was that doesn't allow outgoing emails, which means that if I create some user's account on the site (where the user is not a user), the confirmation mail won't be sent to that user.
  • In the Syntactic Annotation Interface, it is now possible to build a dependency tree directly on lexical items (words), rather than on chunks. You can, of course, still use the chunk mode, which is the one being used for the major treebank projects for Indian languages.
  • There have been some extensions to the corpus query language (Sandhaan). The website for Sandhaan is also being moved, though there not much content there.
  • There is now a facility that can give you accumulated statistics for syntactically annotated data. You can query it for specific words, tags, relations etc. For example, if you want to check what tags have been given so far for a selected word, you can do that. You can do the opposite too, i.e., you can what words have been assigned a given tag (say, the tag given to the current word in the Syntactic Annotation Interface). The same for chunks and chunk tags as well as chunks and chunk relations etc.
  • There is a version of the validation tool that now uses Sandhaan queries, instead of programs or scripts in Perl or some other language.
  • All these changes were made sometime ago and many things have happened since then, so I am having trouble remembering what other changes were there ...