Including all tokens in the term-document matrix in the R tm package -


i'm trying make term-document matrix termdocumentmatrix function of tm package in r , found words not included.

> library(tm) > tdm <- termdocumentmatrix(corpus(vectorsource("the book of great importance."))) > rownames(tdm) [1] "book"        "great"       "importance." "the"  

here, words is , of have been excluded matrix. if corpus includes deleted words, gives following message.

> tdm <- termdocumentmatrix(corpus(vectorsource("of of is"))) warning message: in is.na(x) : is.na() applied non-(list or vector) of type 'null' > rownames(tdm) null 

the message signals is , of deleted before matrix built, have not been able figure out why occurs , how can include tokens in corpus.

any appreciated.

use control argument of termdocumentmatrix

require(tm) tdm <- termdocumentmatrix(corpus(vectorsource("of of is")), control =  list(stopwords=false, wordlengths=c(0, inf))) rownames(tdm) 

Comments

Popular posts from this blog

php - regexp cyrillic filename not matches -

c# - OpenXML hanging while writing elements -

sql - Select Query has unexpected multiple records (MS Access) -