python - Numpy Genfromtxt slower than pandas read_csv -
i'm loading csv file (if want specific file, it's training csv http://www.kaggle.com/c/loan-default-prediction). loading csv in numpy takes dramatically more time in pandas.
timeit("genfromtxt('train_v2.csv', delimiter=',')", "from numpy import genfromtxt", number=1) 102.46608114242554 timeit("pandas.io.parsers.read_csv('train_v2.csv')", "import pandas", number=1) 13.833590984344482
i'll mention numpy memory usage fluctuates more wildly, goes higher, , has higher memory usage once loaded. (2.49 gb numpy vs ~600mb pandas) datatypes in pandas 8 bytes, differing dtypes not difference. got near maxing out memory usage, time difference can not ascribed paging.
any reason difference? genfromtxt way less efficient? (and leaks bunch of memory?)
edit:
numpy version 1.8.0
pandas version 0.13.0-111-ge29c8e8
Comments
Post a Comment