Including all tokens in the term-document matrix in the R tm package -
i'm trying make term-document matrix termdocumentmatrix
function of tm
package in r , found words not included.
> library(tm) > tdm <- termdocumentmatrix(corpus(vectorsource("the book of great importance."))) > rownames(tdm) [1] "book" "great" "importance." "the"
here, words is , of have been excluded matrix. if corpus includes deleted words, gives following message.
> tdm <- termdocumentmatrix(corpus(vectorsource("of of is"))) warning message: in is.na(x) : is.na() applied non-(list or vector) of type 'null' > rownames(tdm) null
the message signals is , of deleted before matrix built, have not been able figure out why occurs , how can include tokens in corpus.
any appreciated.
use control argument of termdocumentmatrix
require(tm) tdm <- termdocumentmatrix(corpus(vectorsource("of of is")), control = list(stopwords=false, wordlengths=c(0, inf))) rownames(tdm)
Comments
Post a Comment