python - Numpy Genfromtxt slower than pandas read_csv -


i'm loading csv file (if want specific file, it's training csv http://www.kaggle.com/c/loan-default-prediction). loading csv in numpy takes dramatically more time in pandas.

timeit("genfromtxt('train_v2.csv', delimiter=',')", "from numpy import genfromtxt",  number=1) 102.46608114242554  timeit("pandas.io.parsers.read_csv('train_v2.csv')", "import pandas",  number=1) 13.833590984344482 

i'll mention numpy memory usage fluctuates more wildly, goes higher, , has higher memory usage once loaded. (2.49 gb numpy vs ~600mb pandas) datatypes in pandas 8 bytes, differing dtypes not difference. got near maxing out memory usage, time difference can not ascribed paging.

any reason difference? genfromtxt way less efficient? (and leaks bunch of memory?)

edit:

numpy version 1.8.0

pandas version 0.13.0-111-ge29c8e8


Comments

Popular posts from this blog

html - Sizing a high-res image (~8MB) to display entirely in a small div (circular, diameter 100px) -

java - IntelliJ - No such instance method -

identifier - Is it possible for an html5 document to have two ids? -