I would suggest you to try
https://spark.apache.org/docs/latest/sql-programming-guide.html#rdds
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
});
You have to have a class in this example person with the spec of your file header and associate your data to the schema and apply criteria like in mysql.. to get desired result