[python] Pandas: how to change all the values of a column?

I have a data frame with a column called "Date" and want all the values from this column to have the same value (the year only). Example:

City     Date
Paris    01/04/2004
Lisbon   01/09/2004
Madrid   2004
Pekin    31/2004

What I want is:

City     Date
Paris    2004
Lisbon   2004
Madrid   2004
Pekin    2004

Here is my code:

fr61_70xls = pd.ExcelFile('AMADEUS FRANCE 1961-1970.xlsx')

#Here we import the individual sheets and clean the sheets    
years=(['1961','1962','1963','1964','1965','1966','1967','1968','1969','1970'])

fr={}

header=(['City','Country','NACE','Cons','Last_year','Op_Rev_EUR_Last_avail_yr','BvD_Indep_Indic','GUO_Name','Legal_status','Date_of_incorporation','Legal_status_date'])

for year in years:
    # save every sheet in variable fr['1961'], fr['1962'] and so on
    fr[year]=fr61_70xls.parse(year,header=0,parse_cols=10)
    fr[year].columns=header
    # drop the entire Legal status date column
    fr[year]=fr[year].drop(['Legal_status_date','Date_of_incorporation'],axis=1)
    # drop every row where GUO Name is empty
    fr[year]=fr[year].dropna(axis=0,how='all',subset=[['GUO_Name']])
    fr[year]=fr[year].set_index(['GUO_Name','Date_of_incorporation'])

It happens that in my DataFrames, called for example fr['1961'] the values of Date_of_incorporation can be anything (strings, integer, and so on), so maybe it would be best to completely erase this column and then attach another column with only the year to the DataFrames?

This question is related to python database pandas

The answer is


As @DSM points out, you can do this more directly using the vectorised string methods:

df['Date'].str[-4:].astype(int)

Or using extract (assuming there is only one set of digits of length 4 somewhere in each string):

df['Date'].str.extract('(?P<year>\d{4})').astype(int)

An alternative slightly more flexible way, might be to use apply (or equivalently map) to do this:

df['Date'] = df['Date'].apply(lambda x: int(str(x)[-4:]))
             #  converts the last 4 characters of the string to an integer

The lambda function, is taking the input from the Date and converting it to a year.
You could (and perhaps should) write this more verbosely as:

def convert_to_year(date_in_some_format):
    date_as_string = str(date_in_some_format)  # cast to string
    year_as_string = date_in_some_format[-4:] # last four characters
    return int(year_as_string)

df['Date'] = df['Date'].apply(convert_to_year)

Perhaps 'Year' is a better name for this column...


You can do a column transformation by using apply

Define a clean function to remove the dollar and commas and convert your data to float.

def clean(x):
    x = x.replace("$", "").replace(",", "").replace(" ", "")
    return float(x)

Next, call it on your column like this.

data['Revenue'] = data['Revenue'].apply(clean)

Or if one want to use lambda function in the apply function:

data['Revenue']=data['Revenue'].apply(lambda x:float(x.replace("$","").replace(",", "").replace(" ", "")))

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to database

Implement specialization in ER diagram phpMyAdmin - Error > Incorrect format parameter? Authentication plugin 'caching_sha2_password' cannot be loaded Room - Schema export directory is not provided to the annotation processor so we cannot export the schema SQL Query Where Date = Today Minus 7 Days MySQL Error: : 'Access denied for user 'root'@'localhost' SQL Server date format yyyymmdd How to create a foreign key in phpmyadmin WooCommerce: Finding the products in database TypeError: tuple indices must be integers, not str

Examples related to pandas

xlrd.biffh.XLRDError: Excel xlsx file; not supported Pandas Merging 101 How to increase image size of pandas.DataFrame.plot in jupyter notebook? Trying to merge 2 dataframes but get ValueError Python Pandas User Warning: Sorting because non-concatenation axis is not aligned How to show all of columns name on pandas dataframe? Pandas/Python: Set value of one column based on value in another column Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Python convert object to float