[r] How to split data into training/testing sets using sample function