If you are wondering where to get data for your thesis this article is for you.
Data come in all shapes and forms. If you are doing your thesis, you are in search of a proof of concept In other words, you are attempting to prove the validity of an idea or concept, not to produce an industry/ready solution . Therefore, most of the time, you do not need particularly large datasets. However, you want to be sure that they are of a sufficient quality. After all, you need to elaborate them without introducing too much noise.
What follows is a selected set of resources. It is not exhaustive and I am pretty sure that you will be able to find more by searching the Internet and asking your supervisor. However, this list is a good starting point in case you are trying to build a research/thesis proposal or you are stuck and in search of an idea.
Google datasets search:
You guessed right. Google has a dedicated search engine for datasets. It is freely available and index data that implement a particular schema.org format. Some of the data may be behind paywalls. However, academics and students usually can contact the data provider in order to get access to them or to a selected portion that is sufficient for their study.
EU Open Data Portal:
Once again, you guessed right. The European Union is committed to foster a strong data transparency policy. At the time of writing this article, I was able to find 15399 different datasets that are freely available. They cover a large variety of topics and are related to all the various arguments under the jurisdiction of the EU.
- Link: https://finance.yahoo.com/
- Link to the help center: https://help.yahoo.com/kb/download-historical-data-yahoo-finance-sln2311.html
There are plenty of financial DB in the web. Yahoo finance is the most known and straightforward. In case you are looking for daily quotations of various financial assets, this is the place for you. The second link will explain how to download historical data.
More on financial data:
- Link Thomson Reuters API: https://customers.reuters.com/developer/apis_tech.aspx
Some programming knowledge is required. An Application Programming Interface (API) gives you the possibility to access to the data of a website in a programmatic way. Reuters has a dataset that includes financial news and press releases. All you have to do is to write a few lines of code for accessing this information.
Kaggle.com, a community approach:
Kaggle offers for its members the possibility to access and share data. In addition, it offers also a set of demo code for accessing and manipulating the data. The community is data science driven. However, some of the datasets do not require particular programming skills.
Get your data by yourself:
There is an ever growing amount of websites that offer the possibility to access them programmatically. Programmableweb is a directory that tries to list them. In addition, it lists links to the proper resources for using them. Note that use the APIs will require some programming. However, you may end up creating original datasets and this is already a valuable outcome of your thesis.
Where to get data for your thesis? In summary:
I hope that by now you realized that datasets are far from been a sparse resource. This collection or resources can serve you as the starting point for finding what you are looking for or for refining/developing your research idea.Do you need #data? Here you can find a collection of resources for finding your dream datasets. #DataScience #BigData #OpenData Click To Tweet
This article (Where to get Data: a collection of resources for your thesis) is part of the miniseries on how to do a good thesis, you can see the full list of posts at the following link: