UPDATE: Suggest using Dataframes
, plus something like ... .write.mode(SaveMode.Overwrite) ...
.
Handy pimp:
implicit class PimpedStringRDD(rdd: RDD[String]) {
def write(p: String)(implicit ss: SparkSession): Unit = {
import ss.implicits._
rdd.toDF().as[String].write.mode(SaveMode.Overwrite).text(p)
}
}
For older versions try
yourSparkConf.set("spark.hadoop.validateOutputSpecs", "false")
val sc = SparkContext(yourSparkConf)
In 1.1.0 you can set conf settings using the spark-submit script with the --conf flag.
WARNING (older versions): According to @piggybox there is a bug in Spark where it will only overwrite files it needs to to write it's part-
files, any other files will be left unremoved.