import boto3, botocore
import glob
= glob.glob('data/*') #to upload multiple files
files files
['data/Player Data.xlsx',
'data/30-days-create-folds.ipynb',
'data/ARK_GENOMIC_REVOLUTION_ETF_ARKG_HOLDINGS.csv',
'data/star_pattern_turtlesim.png']
April 27, 2023
Boto3 is an AWS python SDK that allows access to AWS services like EC2 and S3. It provides a python object-oriented API and as well as low-level access to AWS services
['data/Player Data.xlsx',
'data/30-days-create-folds.ipynb',
'data/ARK_GENOMIC_REVOLUTION_ETF_ARKG_HOLDINGS.csv',
'data/star_pattern_turtlesim.png']
Boto3’s region defaults to N-Virginia. To create buckets in another region, region name has to be explicitly mentioned using session object.
S3 buckets have to follow bucket naming rules.
Checking for something before creation is one of the important tasks to avoid unnecessary errors. Here we check if the buckets already exists.
def check_bucket(bucket):
"""
Checks if a bucket is present in S3
args:
bucket: takes bucket name
"""
try:
s3client.head_bucket(Bucket=bucket)
print('Bucket exists')
return True
except botocore.exceptions.ClientError as e:
# If a client error is thrown, then check that it was a 404 error.
# If it was a 404 error, then the bucket does not exist.
error_code = int(e.response['Error']['Code'])
if error_code == 403:
print("Private Bucket. Forbidden Access!")
return True
elif error_code == 404:
print("Bucket Does Not Exist!")
return False
If the buckets don’t exist, we create them. We need to supply bucket name, a dictionary specifying in which region the bucket has to be created.
Bucket versioning initial state is not set by default. The response from when not initialised doesn’t carry status information rather status dict is absent. Status expects two return states: enabled, suspended. On first creation, the status is in disabled, an unknown state.
So in order to make it appear in the REST response, bucket must be enabled by calling the BucketVersioning()
boto3 resource function. If we then check the status, it will be present in the REST response.
def get_buckets_versioning_client(bucketname):
"""
Checks if bucket versioning is enabled/suspended or initialised
Args:
bucketname: bucket name to check versioning
Returns: response status - enabled or suspended
"""
response = s3client.get_bucket_versioning(Bucket = bucketname)
if 'Status' in response and (response['Status'] == 'Enabled' or response['Status'] == 'Suspended'):
print(f'Bucket {bucketname} status: {response["Status"]}')
return response['Status']
else:
print(f'Bucket versioning not initialised for bucket: {bucketname}. Enabling...')
s3resource.BucketVersioning(bucket_name=bucketname).enable()
enable_response = s3resource.BucketVersioning(bucket_name=bucket_name).status
return enable_response
for bucket_name in bucket_names:
version_status = get_buckets_versioning_client(bucket_name)
print(f'Versioning status: {version_status}')
if version_status == 'Enabled':
print('Disabling again..')
s3resource.BucketVersioning(bucket_name=bucket_name).suspend()
Bucket my-s3bucket1-usohio-region status: Enabled
Versioning status: Enabled
Disabling again..
Bucket my-s3bucket2-usohio-region status: Enabled
Versioning status: Enabled
Disabling again..
for bucket_name in bucket_names:
version_status = get_buckets_versioning_client(bucket_name)
print(f'Versioning status: {version_status}')
if version_status == 'Suspended':
print('Enabling again..')
s3resource.BucketVersioning(bucket_name=bucket_name).enable()
Bucket my-s3bucket1-usohio-region status: Suspended
Versioning status: Suspended
Enabling again..
Bucket my-s3bucket2-usohio-region status: Suspended
Versioning status: Suspended
Enabling again..
We can list the buckets in S3 using list_buckets()
client function. It return a dict. We can iterate through Buckets
key to find the names of the buckets.
Boto3 allows file upload to S3. The upload_file
client function requires three mandatory arguments -
1. filename of the file to be uploaded
2. bucket_name, Into which bucket the file would be uploaded
3. key, name of the file in S3
def upload_files_to_s3(filename, bucket_name, key=None, ExtraArgs=None):
"""
Uploads file to S3 bucket
Args:
filename: takes local filename to be uploaded
bucker_name: name of the bucket into which the file is uploaded
key: name of the file in the bucket. Default:None
ExtraArgs: other arguments. Default:None
"""
if key is None:
key = filename
try:
s3client.upload_file(filename,bucket_name,key)
print(f'uploaded file:{filename}')
except botocore.exceptions.ClientError as e:
print(e)
We can make use of glob
module to upload multiple files in a folder
bucket1_files = [files[1],files[2]]
bucket2_files = [files[0],files[3]]
bucket1_files, bucket2_files
(['data/30-days-create-folds.ipynb',
'data/ARK_GENOMIC_REVOLUTION_ETF_ARKG_HOLDINGS.csv'],
['data/Player Data.xlsx', 'data/star_pattern_turtlesim.png'])
uploaded file:data/30-days-create-folds.ipynb
uploaded file:data/ARK_GENOMIC_REVOLUTION_ETF_ARKG_HOLDINGS.csv
Getting the files list from each bucket done using list_objects
client function. It returns dict and we can iterate through Contents
key to retrieve the filenames.
for bucket in bucket_names:
print(f'Listing object inside bucket:{bucket}')
list_obj_response = s3client.list_objects(Bucket=bucket)
for obj in list_obj_response['Contents']:
print(obj['Key'])
print()
Listing object inside bucket:my-s3bucket1-usohio-region
data/30-days-create-folds.ipynb
data/ARK_GENOMIC_REVOLUTION_ETF_ARKG_HOLDINGS.csv
Listing object inside bucket:my-s3bucket2-usohio-region
data/Player Data.xlsx
data/star_pattern_turtlesim.png
Downloading a file is very similar to uploading one. We need specify bucket name, name of the file to be downloaded, and the destination filename.
This blog post shows how to use the boto3 python SDK to manage S3 aws service. With the help of documentation, we can implement require functionalities.