Most of latest data science innovations happen at Kaggle. Kaggle hosts, in addtion to competitions, a large collection of datasets from various fields. The easiest way to interact with Kaggle is through its public API via command-line tool(CLI). Setting it up outside of Kaggle kernels is one of first tasks. In this post, I will guide you through that process.
Pre-requisite: Python3(>3.6) and latest pip installed.
Installation
Terminal
pip install --user kaggle
Tip: Install kaggle
package inside your conda ML development environment rather than outside of it or in base env.
Don’t do sudo pip install kaggle
as it would require admin privileges for every run.
Download API token
- Create/login into your kaggle account.
- From the site header, click on your user profile picture and select Account. You will be land on your profile with account tab active.
- Scroll down to API section. Click
Create New API Token
. Ajson
file will be downloaded your default download directory.
Move .json file to the correct location
- Move it to
.kaggle
in the home directory. Create if absent.
Terminal
cd
mkdir ~/.kaggle
mv <location>/kaggle.json ~/.kaggle/kaggle.json
- For your security, ensure that other users of your computer do not have read access to your credentials. On Unix-based systems you can do this with the following command:
Terminal
chmod 600 ~/.kaggle/kaggle.json
- Restart the terminal and navigate to the env where kaggle package is installed if necessary.
Check if it is properly installed
- Run:
Terminal
$python
>>>import kaggle
Importing kaggle shouldn’t return an error. If there is error, check whether you’re in the right env where kaggle is installed.
If no error, exit the shell and type the following command in the terminal.
Terminal
kaggle competitions list
If installed properly, the command will list all the entered competitions. 1. If not, the binary path may be incorrect. Usually it is installed in ~/.local/bin
Try using
Terminal
~/.local/bin/kaggle competitions list
- If the above command works, export that binary path to the shell environment(bashrc) so that you might use just
kaggle
next time.
API usage
It is time to use the Kaggle API. For example, to see what dataset command offers, in the CLI enter
Terminal
kaggle dataset --help
Tip: Remember to comply with competition’s terms and conditions before downloading the dataset. You will get an error forbidden
if you try to download before agreeing.
For more info on the API, Kaggle’s github page is an excellent resource.