Save Dataframe to csv directly to s3 Python

Question

I have a pandas DataFrame that I want to upload to a new CSV file  The problem is that I don t want to save the file locally before transferring it to s3  Is there any method like to csv for writing the dataframe to s3 directly  I am using boto3  Here is what I have so far   import boto3 s3   boto3 client  s3   aws access key id  key   aws secret access key  secret key   read file   s3 get object Bucket  Key  df   pd read csv read file  Body       Make alterations to DataFrame    Then export DataFrame to CSV through direct transfer to s3

User · Answer

If you pass None as the first argument to to csv   the data will be returned as a string  From there it s an easy step to upload that to S3 in one go   It should also be possible to pass a StringIO object to to csv    but using a string will be easier

User · Answer

This is a more up to date answer   import s3fs  s3   s3fs S3FileSystem anon False     Use  w  for py3   wb  for py2 with s3 open   lt bucket-name gt   lt filename gt  csv   w   as f      df to csv f    The problem with StringIO is that it will eat away at your memory  With this method  you are streaming the file to s3  rather than converting it to string  then writing it into s3  Holding the pandas dataframe and its string copy in memory seems very inefficient   If you are working in an ec2 instant  you can give it an IAM role to enable writing it to s3  thus you dont need to pass in credentials directly  However  you can also connect to a bucket by passing credentials to the S3FileSystem   function  See documention https   s3fs readthedocs io en latest

User · Answer

You can also use the AWS Data Wrangler  import awswrangler as wr      wr s3 to csv      df df      path  quot s3       quot      Note that it will handle multipart upload for you to make the upload faster

User · Answer

I found a very simple solution that seems to be working    s3   boto3 client  s3    s3 put object      Body open  filename csv   read        Bucket  your-bucket       Key  your-key       Hope that helps

User · Answer

I read a csv with two columns from bucket s3  and the content of the file csv i put in pandas dataframe   Example   config json       credential          access key   xxxxxx        secret key   xxxxxx       s3            bucket   mybucket           key   csv user csv           cls config json     usr bin env python   - - coding  utf-8 - -  import os import json  class cls config object        def   init   self filename            self filename   filename       def getConfig self            fileName   os path join os path dirname   file     self filename          with open fileName  as f          config   json load f          return config   cls pandas py     usr bin env python   - - coding  utf-8 - -  import pandas as pd import io  class cls pandas object        def   init   self           pass      def read self stream            df   pd read csv io StringIO stream   sep                return df   cls s3 py     usr bin env python   - - coding  utf-8 - -  import boto3 import json  class cls s3 object        def    init   self access key secret key            self s3   boto3 client  s3   aws access key id access key  aws secret access key secret key       def getObject self bucket key            read file   self s3 get object Bucket bucket  Key key          body   read file  Body   read   decode  utf-8           return body   test py     usr bin env python   - - coding  utf-8 - -  from cls config import   from cls s3 import   from cls pandas import    class test object        def   init   self           self conf   cls config  config json        def process self            conf   self conf getConfig            bucket   conf  s3    bucket           key   conf  s3    key            access key   conf  credential    access key           secret key   conf  credential    secret key            s3   cls s3 access key secret key          ob   s3 getObject bucket key           pa   cls pandas           df   pa read ob           print df  if   name         main         test   test       test process

User · Answer

You can directly use the S3 path  I am using Pandas 0 24 1 In  1   import pandas as pd  In  2   df   pd DataFrame     1  1  1    2  2  2     columns   a    b    c     In  3   df Out 3      a  b  c 0  1  1  1 1  2  2  2  In  4   df to csv  s3   experimental playground temp csv dummy csv   index False   In  5   pd   version   Out 5    0 24 1   In  6   new df   pd read csv  s3   experimental playground temp csv dummy csv    In  7   new df Out 7      a  b  c 0  1  1  1 1  2  2  2   Release Note   S3 File Handling pandas now uses s3fs for handling S3 connections  This shouldn   t break any code  However  since s3fs is not a required dependency  you will need to install it separately  like boto in prior versions of pandas  GH11915

User · Answer

You can use   from io import StringIO   python3  python2  BytesIO  import boto3  bucket    my bucket name    already created on S3 csv buffer   StringIO   df to csv csv buffer  s3 resource   boto3 resource  s3   s3 resource Object bucket   df csv   put Body csv buffer getvalue

User · Answer

since you are using boto3 client    try   import boto3 from io import StringIO  python3  s3   boto3 client  s3   aws access key id  key   aws secret access key  secret key   def copy to s3 client  df  bucket  filepath       csv buf   StringIO       df to csv csv buf  header True  index False      csv buf seek 0      client put object Bucket bucket  Body csv buf getvalue    Key filepath      print f Copy  df shape 0   rows to S3 Bucket  bucket  at  filepath   Done     copy to s3 client s3  df df to upload  bucket  abc   filepath  def test csv

User · Answer

I like s3fs which lets you use s3  almost  like a local filesystem   You can do this   import s3fs  bytes to write   df to csv None  encode   fs   s3fs S3FileSystem key key  secret secret  with fs open  s3   bucket path to file csv    wb   as f      f write bytes to write    s3fs supports only rb and wb modes of opening the file  that s why I did this bytes to write stuff

User · Answer

I found this can be done using client also and not just resource   from io import StringIO import boto3 s3   boto3 client  s3                      region name region name                     aws access key id aws access key id                     aws secret access key aws secret access key  csv buf   StringIO   df to csv csv buf  header True  index False  csv buf seek 0  s3 put object Bucket bucket  Body csv buf getvalue    Key  path test csv

User · Answer

I use AWS Data Wrangler  For example  import awswrangler as wr import pandas as pd    read a local dataframe df   pd read parquet  my local file gz      upload to S3 bucket wr s3 to parquet df df  path  s3   mys3bucket file name gz    The same applies to csv files  Instead of read parquet and to parquet  use read csv and to csv with the proper file extension

[python] Save Dataframe to csv directly to s3 Python

Examples related to python

Examples related to csv

Examples related to amazon-s3

Examples related to dataframe

Examples related to boto3