python - Numpy Genfromtxt slower than pandas read_csv -


i'm loading csv file (if want specific file, it's training csv http://www.kaggle.com/c/loan-default-prediction). loading csv in numpy takes dramatically more time in pandas.

timeit("genfromtxt('train_v2.csv', delimiter=',')", "from numpy import genfromtxt",  number=1) 102.46608114242554  timeit("pandas.io.parsers.read_csv('train_v2.csv')", "import pandas",  number=1) 13.833590984344482 

i'll mention numpy memory usage fluctuates more wildly, goes higher, , has higher memory usage once loaded. (2.49 gb numpy vs ~600mb pandas) datatypes in pandas 8 bytes, differing dtypes not difference. got near maxing out memory usage, time difference can not ascribed paging.

any reason difference? genfromtxt way less efficient? (and leaks bunch of memory?)

edit:

numpy version 1.8.0

pandas version 0.13.0-111-ge29c8e8


Comments

Popular posts from this blog

php - regexp cyrillic filename not matches -

c# - OpenXML hanging while writing elements -

sql - Select Query has unexpected multiple records (MS Access) -