In this tutorial, we will see how we can use Django DB Models and ORM layer without the rest of Django Web Framework. This is helpful when we want to avoid using more than one ORM frameworks (Django and SQLAlchemy, for example) for applications that have both web and standalone applications.
I've been flirting with the idea of try and shore up my machine learning knowledge by applying it to some interesting projects that would be useful to me personally. Being invested in the Equity market, I think it would be good to have a personal database of the markets rather than always referring to some website to show me statistics of some granularity. It is always easier to write a simple piece of custom code rather than raising a feature request.
A major part of my Machine Learning qualms is related to the fact that the books and the courses tend to teach you what to do with data that is already at hand and little or nothing about how to collect data. In an online world, data scraping is an important part to do that. In Python, we have Scrapy as one of the frameworks, but it suffers from a problem of not being JS ready out of the box. Of course, we can fix that by using another layer in the overall architecture (like with Splash), but if the task is one time, then I think we are better off with simpler solutions.
In my case, the website is BSE India. Historical day-wise data for every scrip listed there is available for download in CSV format. For this, one needs to:
- Know the security code
- Go to the Historical Data Page
- Set the timeline for which data is required (using a form)
- Click submit
- Click on the Download CSV JS link which exports the paginated and tabulated data to a CSV and opens up a prompt to download in the browser.
- Save it to disk
While there might be other methods to deal with the JS and handling the download - I did it with Selenium and its Google Chrome Driver.
Here's how:
[Some text missing]
Now, with all data downloaded, I thought I had what was required to put on my machine learning hat and delve into the time series data and come up with wonderful predictions about the fate of my stocks. Start simple, I thought.
The problem statement is:
Given a stock data:
- Define a class to keep it
- Have methods to plot its various trends over different timelines and parameters
A folder full of CSV files named by their security codes has an acute problem of searching, sorting and saving. So, we would like to have the rows represented as Objects. But, loading up all the data into the memory for a very small part would not only be wasteful but would also mean that I'd have to do all the searching and sorting and everything on my own. That would be like creating things that already exist! No wonder it would be instructive, but right now, we are interested in Analytics and not Database design, right? So, we would be better off if we could keep things in a DB and operated on the DB rows (as objects). But the moment we start to talk about treating DB rows as objects, we run into the same "Dont' Repeat Yourself" cycle. If we're just using a DB driver, we'd still need to convert the High Level requests into DB queries and convert the results into appropriate Class Objects. This would mean:
- Be proficient in SQL queries
- Have at least two methods - to save (to_db) and to retrieve (to_representation)
At this point, we would look for some abstraction layer that already does this for us in a DB agnostic manner. Since I am already conversant in Django, I know what its ORM layer can do. So, I want to