I was recently running into memory issues (related to not closing a mutliprocessing.pool
) and came up with the following piece of code while debugging:
import os
import pickle
class DynamoDataFrame:
def render_fhandle(self, mode='wb'):
return open(file=self.full_fpath, mode=mode)
def __init__(self, full_fpath=None, df=None):
self.full_fpath = full_fpath
if df is not None:
fhandle = self.render_fhandle()
pickle.dump(obj=df, file=fhandle)
fhandle.close()
def load_pickle(self):
return pickle.load(file=self.render_fhandle(mode='rb'))
def __getitem__(self, item):
return self.load_pickle()[item]
Although this was unrelated to my pool
issue, it is a useful piece of code.
What’s nice is that the __get__
behaves like a series or dataframe accessor with only instantaneous in-memory load! I tend to call these things “dyamo” for dynamic.
https://gist.github.com/jasonmellone/511d80886f85331326d8c3bb74b32a41