Re: Commit: OTS dictionaries

From: Nadav Rotem (nadavrotem@mail.ru)
Date: Mon Jul 14 2003 - 09:16:03 EDT

  • Next message: Andrew Dunbar: "Commit: French OTS dictionary"

    > CVS:
    > ----------------------------------------------------------------------
    > CVS: Enter Log. Lines beginning with `CVS:' are
    > removed automatically
    > CVS:
    > CVS: Committing in .
    > CVS:
    > CVS: Modified Files:
    > CVS: dic/de.dic dic/hu.dic dic/nl.dic dic/pt.dic
    > CVS:
    > ----------------------------------------------------------------------
    >
    > Addittions to OTS dictionary files.
    > I've also sorted each of these dictionaries in the
    > correct sorting order for their locale. Please try to
    > keep them this way as it makes them much easier to
    > check.
    >
    > Somebody please check the Portuguese file, I removed
    > several English words but there seem to be more that I
    > wasn't 100% sure about and also some what look like
    > broken English contractions such as "doesn". Also this
    > file seems to have some entried entered twice, once
    > capitalized and once not.
    >
    > Some of the files contain an entry "000" - is this
    > there on purpose? I haven't read all the docs yet...

    No need for both upper and lower case.
    when a number such as 1,000,000 is parsed the parser puts 000 as a word.
    If an article talks about the age of 17 then the number 17 is an
    important idea , but 000 is not.



    This archive was generated by hypermail 2.1.4 : Mon Jul 14 2003 - 09:34:45 EDT