wordStem                package:Rstem                R Documentation

_G_e_t _t_h_e _c_o_m_m_o_n _r_o_o_t/_s_t_e_m _o_f _w_o_r_d_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function computes the stems of each of the given words in the
     vector. This reduces a word to its base component, making it
     easier to compare words like win, winning, winner. See <URL:
     http://snowball.tartarus.org/> for more information about the
     concept and algorithms for stemming.

_U_s_a_g_e:

     wordStem(words, language = character())

_A_r_g_u_m_e_n_t_s:

   words: a character vector of words whose stems are to be computed.

language: the name of a recognized language for the package. This
          should either be a single string which is an element in the
          vector  returned by 'getStemLanguages', or alternatively a
          character vector of length 3 giving the names of the routines
          for creating and closing a Snowball 'SN_env' environment and
          performing the stem (in that order). See the example below. 

_D_e_t_a_i_l_s:

     This uses Dr. Martin Porter's stemming algorithm and the interface
     generated by  Snowball <URL: http://snowball.tartarus.org/>.

_V_a_l_u_e:

     A character vector with as many elements as there are in the input
     vector with the corresponding elements being the stem of the 
     word.

_A_u_t_h_o_r(_s):

     Duncan Temple Lang <duncan@wald.ucdavis.edu>

_R_e_f_e_r_e_n_c_e_s:

     See <URL: http://snowball.tartarus.org/>

_E_x_a_m_p_l_e_s:

        # Simple example
        # "win"    "win"    "winner"
      wordStem(c("win", "winning", 'winner'))

       # test the supplied vocabulary.
      testWords = readLines(system.file("words", "english", "voc.txt", package = "Rstem"))
      validate = readLines(system.file("words", "english", "output.txt", package = "Rstem"))

     ## Not run: 
      # Read the test words directly from the snowball site over the Web
      testWords = readLines(url("http://snowball.tartarus.org/english/voc.txt"))
     ## End(Not run)

      testOut = wordStem(testWords)
      all(validate == testOut)

       # Specify the language from one of the built-in languages.
      testOut = wordStem(testWords, "english")
      all(validate == testOut)

       # To illustrate using the dynamic lookup of symbols that allows one
       # to easily add new languages or create and close environment
       # routines (for example, to manage pools if this were an efficiency
       # issue!)
      testOut = wordStem(testWords, c("testDynCreate", "testDynClose", "testDynStem"))

