[xml] How to parse XML to R data frame

I tried to parse XML to R data frame, this link helped me a lot:

how to create an R data frame from a xml file

But still I was not able to figure out my problem:

Here is my code:

data <- xmlParse("http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML")
xmlToDataFrame(nodes=getNodeSet(data1,"//data"))[c("location","time-layout")]
step1 <- xmlToDataFrame(nodes=getNodeSet(data1,"//location/point"))[c("latitude","longitude")]
step2 <- xmlToDataFrame(nodes=getNodeSet(data1,"//time-layout/start-valid-time"))
step3 <- xmlToDataFrame(nodes=getNodeSet(data1,"//parameters/temperature"))[c("type="hourly"")]

The data frame I want to have is like this:

latitude  longitude   start-valid-time   hourly_temperature
29.803     -82.411  2013-06-19T15:00:00-04:00    91
29.803     -82.411  2013-06-19T16:00:00-04:00    90

I'm stuck at the xmlToDataFrame(), Any help would be very much appreciated, thanks.

This question is related to xml r

The answer is


Use xpath more directly for both performance and clarity.

time_path <- "//start-valid-time"
temp_path <- "//temperature[@type='hourly']/value"

df <- data.frame(
    latitude=data[["number(//point/@latitude)"]],
    longitude=data[["number(//point/@longitude)"]],
    start_valid_time=sapply(data[time_path], xmlValue),
    hourly_temperature=as.integer(sapply(data[temp_path], as, "integer"))

leading to

> head(df, 2)
  latitude longitude          start_valid_time hourly_temperature
1    29.81    -82.42 2014-02-14T18:00:00-05:00                 60
2    29.81    -82.42 2014-02-14T19:00:00-05:00                 55

You can try the code below:

# Load the packages required to read XML files.
library("XML")
library("methods")

# Convert the input xml file to a data frame.
xmldataframe <- xmlToDataFrame("input.xml")
print(xmldataframe)

Here's a partial solution using xml2. Breaking the solution up into smaller pieces generally makes it easier to ensure everything is lined up:

library(xml2)
data <- read_xml("http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML")

# Point locations
point <- data %>% xml_find_all("//point")
point %>% xml_attr("latitude") %>% as.numeric()
point %>% xml_attr("longitude") %>% as.numeric()

# Start time
data %>% 
  xml_find_all("//start-valid-time") %>% 
  xml_text()

# Temperature
data %>% 
  xml_find_all("//temperature[@type='hourly']/value") %>% 
  xml_text() %>% 
  as.integer()