[r] How to convert dataframe into time series?

I have one csv file in which I have 2 closing prices of stock(on daily basis)

Dates   Bajaj_close Hero_close
3/14/2013   1854.8  1669.1
3/15/2013   1850.3  1684.45
3/18/2013   1812.1  1690.5
3/19/2013   1835.9  1645.6
3/20/2013   1840    1651.15
3/21/2013   1755.3  1623.3
3/22/2013   1820.65 1659.6
3/25/2013   1802.5  1617.7
3/26/2013   1801.25 1571.85
3/28/2013   1799.55 1542

I want to convert above data into time series format. (start date is 3/14/2013 and end date is 3/13/2015) I have tried this but its giving me some weird output

values <- bajaj_hero[, -1]  (excluded first column i.e date in real dataset)
bajaj_hero_timeseries <- ts(values,start=c(2013,1),end=c(2015,3),frequency=365)

Output is:

           Bajaj_close Hero_close
2013.000     1854.80    1669.10
2013.003     1850.30    1684.45
2013.005     1812.10    1690.50
2013.008     1835.90    1645.60
2013.011     1840.00    1651.15
2013.014     1755.30    1623.30
2013.016     1820.65    1659.60
2013.019     1802.50    1617.70
2013.022     1801.25    1571.85

This question is related to r csv dataframe time-series

The answer is


Input. We will start with the text of the input shown in the question since the question did not provide the csv input:

Lines <- "Dates   Bajaj_close Hero_close
3/14/2013   1854.8  1669.1
3/15/2013   1850.3  1684.45
3/18/2013   1812.1  1690.5
3/19/2013   1835.9  1645.6
3/20/2013   1840    1651.15
3/21/2013   1755.3  1623.3
3/22/2013   1820.65 1659.6
3/25/2013   1802.5  1617.7
3/26/2013   1801.25 1571.85
3/28/2013   1799.55 1542"

zoo. "ts" class series normally do not represent date indexes but we can create a zoo series that does (see zoo package):

library(zoo)
z <- read.zoo(text = Lines, header = TRUE, format = "%m/%d/%Y")

Alternately, if you have already read this into a data frame DF then it could be converted to zoo as shown on the second line below:

DF <- read.table(text = Lines, header = TRUE)
z <- read.zoo(DF, format = "%m/%d/%Y")

In either case above z ia a zoo series with a "Date" class time index. One could also create the zoo series, zz, which uses 1, 2, 3, ... as the time index:

zz <- z
time(zz) <- seq_along(time(zz))

ts. Either of these could be converted to a "ts" class series:

as.ts(z)
as.ts(zz)

The first has a time index which is the number of days since the Epoch (January 1, 1970) and will have NAs for missing days and the second will have 1, 2, 3, ... as the time index and no NAs.

Monthly series. Typically "ts" series are used for monthly, quarterly or yearly series. Thus if we were to aggregate the input into months we could reasonably represent it as a "ts" series:

z.m <- as.zooreg(aggregate(z, as.yearmon, mean), freq = 12)
as.ts(z.m)

With library fpp, you can easily create time series with date format: time_ser=ts(data,frequency=4,start=c(1954,2))

here we start at the 2nd quarter of 1954 with quarter fequency.


Late to the party, but the tsbox package is designed to perform conversions like this. To convert your data into a ts-object, you can do:

dta <- data.frame(
  Dates = c("3/14/2013", "3/15/2013", "3/18/2013", "3/19/2013"),
  Bajaj_close = c(1854.8, 1850.3, 1812.1, 1835.9),
  Hero_close = c(1669.1, 1684.45, 1690.5, 1645.6)
)

dta
#>       Dates Bajaj_close Hero_close
#> 1 3/14/2013      1854.8    1669.10
#> 2 3/15/2013      1850.3    1684.45
#> 3 3/18/2013      1812.1    1690.50
#> 4 3/19/2013      1835.9    1645.60

library(tsbox)
ts_ts(ts_long(dta))
#> Time Series:
#> Start = 2013.1971293045 
#> End = 2013.21081883954 
#> Frequency = 365.2425 
#>          Bajaj_close Hero_close
#> 2013.197      1854.8    1669.10
#> 2013.200      1850.3    1684.45
#> 2013.203          NA         NA
#> 2013.205          NA         NA
#> 2013.208      1812.1    1690.50
#> 2013.211      1835.9    1645.60

It automatically parses the dates, detects the frequency and makes the missing values at the weekends explicit. With ts_<class>, you can convert the data to any other time series class.


See this question: Converting data.frame to xts order.by requires an appropriate time-based object, which suggests looking at argument to order.by,

Currently acceptable classes include: ‘Date’, ‘POSIXct’, ‘timeDate’, as well as ‘yearmon’ and ‘yearqtr’ where the index values remain unique.

And further suggests an explicit conversion using order.by = as.POSIXct,

df$Date <- as.POSIXct(strptime(df$Date,format),tz="UTC")
xts(df[, -1], order.by=as.POSIXct(df$Date))

Where your format is assigned elswhere,

format <- "%m/%d/%Y" #see strptime for details

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to time-series

How to convert dataframe into time series? Peak signal detection in realtime timeseries data Pandas: rolling mean by time interval Excel plot time series frequency with continuous xaxis How to calculate rolling / moving average using NumPy / SciPy? Plotting two variables as lines using ggplot2 on the same graph