How to make a great R reproducible example

Question

When discussing performance with colleagues  teaching  sending a bug report or searching for guidance on mailing lists and here on Stack nbsp Overflow  a reproducible example is often asked and always helpful   What are your tips for creating an excellent example  How do you paste data structures from r in a text format  What other information should you include   Are there other tricks in addition to using dput    dump   or structure     When should you include library   or require   statements   Which reserved words should one avoid  in addition to c  df  data  etc    How does one make a great r reproducible example

User · Answer

I wonder if an http   old r-fiddle org  link could be a very neat way of sharing a problem  It receives a unique ID like and one could even think about embedding it in SO

User · Answer

The answers so far are obviously great for the reproducibility part   This is merely to clarify that a reproducible example cannot and should not be the sole component of a question   Don t forget to explain what you want it to look like and the contours of your problem  not just how you have attempted to get there so far   Code is not enough  you need words also   Here s a reproducible example of what to avoid doing  drawn from a real example  names changed to protect the innocent      The following is sample data and part of function I have trouble with   code code code code code  40 or so lines of it    How can I achieve this

User · Answer

The R-help mailing list has a posting guide which covers both asking and answering questions  including an example of generating data      Examples  Sometimes it helps to   provide a small example that someone   can actually run  For example       If I have a matrix x as follows       gt  x  lt - matrix 1 8  nrow 4  ncol 2                  dimnames list c  A   B   C   D    c  x   y       gt  x     x y   A 1 5   B 2 6   C 3 7   D 4 8    gt       how can I turn it into a dataframe   with 8 rows  and three   columns named    row    col   and  value   which have   the dimension names as the values of  row  and  col   like this       gt  x df      row col value   1    A   x      1             To which the answer might be       gt  x df  lt - reshape data frame row rownames x   x   direction  long                       varying list colnames x    times colnames x                       v names  value   timevar  col   idvar  row            The word small is especially important   You should be aiming for a minimal reproducible example  which means that the data and the code should be as simple as possible to explain the problem   EDIT  Pretty code is easier to read than ugly code   Use a style guide

User · Answer

I am developing the wakefield package to address this need to quickly share reproducible data  sometimes dput works fine for smaller data sets but many of the problems we deal with are much larger  sharing such a large data set via dput is impractical     About   wakefield allows the user to share minimal code to reproduce data   The user sets n  number of rows  and specifies any number of preset variable functions  there are currently 70  that mimic real if data  things like gender  age  income etc      Installation   Currently  2015-06-11   wakefield is a GitHub package but will go to CRAN eventually after unit tests are written   To install quickly  use   if   require  pacman    install packages  pacman   pacman  p load gh  trinker wakefield     Example   Here is an example   r data frame      n   500      id      race      age      sex      hour      iq      height      died     This produces       ID  Race Age    Sex     Hour  IQ Height  Died 1  001 White  33   Male 00 00 00 104     74  TRUE 2  002 White  24   Male 00 00 00  78     69 FALSE 3  003 Asian  34 Female 00 00 00 113     66  TRUE 4  004 White  22   Male 00 00 00 124     73  TRUE 5  005 White  25 Female 00 00 00  95     72  TRUE 6  006 White  26 Female 00 00 00 104     69  TRUE 7  007 Black  30 Female 00 00 00 111     71 FALSE 8  008 Black  29 Female 00 00 00 100     64  TRUE 9  009 Asian  25   Male 00 30 00 106     70 FALSE 10 010 White  27   Male 00 30 00 121     68 FALSE

User · Answer

Basically a minimal reproducible example  MWE  should enable others to exactly reproduce your issue on their machines  A MWE consists of the following items   a minimal dataset  necessary to demonstrate the problem the minimal runnable code necessary to reproduce the error  which can be run on the given dataset all necessary information on the used packages  the R version  and the OS it is run on  in the case of random processes  a seed  set by set seed    for reproducibility  For examples of good MWEs  see section  quot Examples quot  at the bottom of help files on the function you are using  Simply type e g  help mean   or short  mean into your R console  Providing a minimal dataset Usually  sharing huge data sets is not necessary and may rather discourage others from reading your question  Therefore  it is better to use built-in datasets or create a small  quot toy quot  example that resembles your original data  which is actually what is meant by minimal  If for some reason you really need to share your original data  you should use a method  such as dput     that allows others to get an exact copy of your data  Built-in datasets You can use one of the built-in datasets  A comprehensive list of built-in datasets can be seen with data    There is a short description of every data set  and more information can be obtained  e g  with  iris  for the  iris  data set that comes with R  Installed packages might contain additional datasets  Creating example data sets Preliminary note  Sometimes you may need special formats  i e  classes   such as factors  dates  or time series  For these  make use of functions like  as factor  as Date  as xts      Example  d  lt - as Date  quot 2020-12-30 quot    where class d     1   quot Date quot   Vectors x  lt - rnorm 10      random vector normal distributed x  lt - runif 10      random vector uniformly distributed     x  lt - sample 1 100  10      10 random draws out of 1  2       100     x  lt - sample LETTERS  10      10 random draws out of built-in latin alphabet  Matrices m  lt - matrix 1 12  3  4  dimnames list LETTERS 1 3   LETTERS 1 4    m     A B C  D   A 1 4 7 10   B 2 5 8 11   C 3 6 9 12  Data frames set seed 42      for sake of reproducibility n  lt - 6 dat  lt - data frame id 1 n                     date seq Date as Date  quot 2020-12-26 quot    as Date  quot 2020-12-31 quot     quot day quot                      group rep LETTERS 1 2   n 2                     age sample 18 30  n  replace TRUE                     type factor paste  quot type quot   1 n                      x rnorm n   dat     id       date group age   type         x   1  1 2020-12-26     A  27 type 1 0 0356312   2  2 2020-12-27     B  19 type 2 1 3149588   3  3 2020-12-28     A  20 type 3 0 9781675   4  4 2020-12-29     B  26 type 4 0 8817912   5  5 2020-12-30     A  26 type 5 0 4822047   6  6 2020-12-31     B  28 type 6 0 9657529  Note  Although it is widely used  better do not name your data frame df  because df   is an R function for the density  i e  height of the curve at point x  of the F distribution and you might get a clash with it  Copying original data If you have a specific reason  or data that would be too difficult to construct an example from  you could provide a small subset of your original data  best by using dput  Why use dput    dput throws all information needed to exactly reproduce your data on your console  You may simply copy the output and paste it into your question  Calling dat  from above  produces output that still lacks information about variable classes and other features if you share it in your question  Furthermore the spaces in the type column make it difficult to do anything with it  Even when we set out to use the data  we won t manage to get important features of your data right    id       date group age   type         x 1  1 2020-12-26     A  27 type 1 0 0356312 2  2 2020-12-27     B  19 type 2 1 3149588 3  3 2020-12-28     A  20 type 3 0 9781675  Subset your data Tho share a subset  use head    subset   or the indices iris 1 4     Then wrap it into dput   to give others something that can be put in R immediately  Example dput iris 1 4       first four rows of the iris data set  Console output to share in your question  structure list Sepal Length   c 5 1  4 9  4 7  4 6   Sepal Width   c 3 5   3  3 2  3 1   Petal Length   c 1 4  1 4  1 3  1 5   Petal Width   c 0 2   0 2  0 2  0 2   Species   structure c 1L  1L  1L  1L    Label   c  quot setosa quot     quot versicolor quot    quot virginica quot    class    quot factor quot     row names   c NA   4L   class    quot data frame quot    When using dput  you may also want to include only relevant columns  e g  dput mtcars 1 3  c 2  5  6    Note  If your data frame has a factor with many levels  the dput output can be unwieldy because it will still list all the possible factor levels even if they aren t present in the the subset of your data  To solve this issue  you can use the droplevels   function  Notice below how species is a factor with only one level  e g  dput droplevels iris 1 4       One other caveat for dput is that it will not work for keyed data table objects or for grouped tbl df  class grouped df  from the tidyverse  In these cases you can convert back to a regular data frame before sharing  dput as data frame my data    Producing minimal code Combined with the minimal data  see above   your code should exactly reproduce the problem on another machine by simply copying and pasting it  This should be the easy part but often isn t  What you should not do   showing all kinds of data conversions  make sure the provided data is already in the correct format  unless that is the problem  of course  copy-paste a whole script that gives an error somewhere  Try to locate which lines exactly result in the error  More often than not  you ll find out what the problem is yourself   What you should do   add which packages you use if you use any  using library    test run your code in a fresh R session to ensure the code is runnable  People should be able to copy-paste your data and your code in the console and get the same as you have  if you open connections or create files  add some code to close them or delete the files  using unlink    if you change options  make sure the code contains a statement to revert them back to the original ones   eg op  lt - par mfrow c 1 2      some code    par op     Providing necessary information In most cases  just the R version and the operating system will suffice  When conflicts arise with packages  giving the output of sessionInfo   can really help  When talking about connections to other applications  be it through ODBC or anything else   one should also provide version numbers for those  and if possible  also the necessary information on the setup  If you are running R in R Studio  using rstudioapi  versionInfo   can help report your RStudio version  If you have a problem with a specific package  you may want to provide the package version by giving the output of packageVersion  quot name of the package quot    Seed Using set seed   you may specify a seed1  i e  the specific state  R s random number generator is fixed  This makes it possible for random functions  such as sample    rnorm    runif   and lots of others  to always return the same result  Example  set seed 42  rnorm 3     1   1 3709584 -0 5646982  0 3631284  set seed 42  rnorm 3     1   1 3709584 -0 5646982  0 3631284  1 Note  The output of set seed   differs between R  gt 3 6 0 and previous versions  Specify which R version you used for the random process  and don t be surprised if you get slightly different results when following old questions  To get the same result in such cases  you can use the RNGversion  -function before set seed    e g   RNGversion  quot 3 5 2 quot

User · Answer

It s a good idea to use functions from the testthat package to show what you expect to occur  Thus  other people can alter your code until it runs without error  This eases the burden of those who would like to help you  because it means they don t have to decode your textual description  For example  library testthat    code defining x and y if  y  gt   10        expect equal x  1 23    else       expect equal x  3 21      is clearer than  I think x would come out to be 1 23 for y equal to or exceeding 10  and 3 21 otherwise  but I got neither result   Even in this silly example  I think the code is clearer than the words  Using testthat lets your helper focus on the code  which saves time  and it provides a way for them to know they have solved your problem  before they post it

User · Answer

Please do not paste your console outputs like this   If I have a matrix x as follows   gt  x  lt - matrix 1 8  nrow 4  ncol 2              dimnames list c  A   B   C   D    c  x   y      gt  x   x y A 1 5 B 2 6 C 3 7 D 4 8  gt   How can I turn it into a dataframe with 8 rows  and three columns named  row    col   and  value   which have the dimension names as the values of  row  and  col   like this   gt  x df     row col value 1    A   x      1      To which the answer might be   gt  x df  lt - reshape data frame row rownames x   x   direction  long                    varying list colnames x    times colnames x                    v names  value   timevar  col   idvar  row       We can not copy-paste it directly   To make questions and answers properly reproducible  try to remove    amp   gt  before posting it and put   for outputs and comments like this    If I have a matrix x as follows  x  lt - matrix 1 8  nrow 4  ncol 2              dimnames list c  A   B   C   D    c  x   y     x    x y  A 1 5  B 2 6  C 3 7  D 4 8    How can I turn it into a dataframe with 8 rows  and three   columns named  row    col   and  value   which have the   dimension names as the values of  row  and  col   like this    x df      row col value  1    A   x      1       To which the answer might be   x df  lt - reshape data frame row rownames x   x   direction  long                   varying list colnames x    times colnames x                   v names  value   timevar  col   idvar  row     One more thing  if you have used any function from certain package  mention that library

User · Answer

I have a very easy and efficient way to make a R example that has not been mentioned above  You can define your structure firstly  For example   mydata  lt - data frame a character 0   b numeric 0    c numeric 0   d numeric 0     gt fix mydata      Then you can input your data manually  This is efficient for smaller examples rather than big ones

User · Answer

Sometimes the problem really isn t reproducible with a smaller piece of data  no matter how hard you try  and doesn t happen with synthetic data  although it s useful to show how you produced synthetic data sets that did not reproduce the problem  because it rules out some hypotheses       Posting the data to the web somewhere and providing a URL may be necessary   If the data can t be released to the public at large but could be shared at all  then you may be able to offer to e-mail it to interested parties  although this will cut down the number of people who will bother to work on it   I haven t actually seen this done  because people who can t release their data are sensitive about releasing it any form  but it would seem plausible that in some cases one could still post data if it were sufficiently anonymized scrambled corrupted slightly in some way    If you can t do either of these then you probably need to hire a consultant to solve your problem      edit  Two useful SO questions for anonymization scrambling    How to create example data set from private data  replacing variable names and levels with uninformative place holders   Given a set of random numbers drawn from a continuous univariate distribution  find the distribution

User · Answer

Reproducible code is key to get help  However  there are many users that might be skeptical of pasting even a chunk of their data  For instance  they could be working with sensitive data or on an original data collected to use in a research paper  For any reason  I thought it would be nice to have a handy function for  deforming  my data before pasting it publicly  The anonymize function from the package SciencesPo is very silly  but for me it works nicely with dput function    install packages  SciencesPo    dt  lt - data frame      Z   sample LETTERS 10       X   sample 1 10       Y   sample c  yes    no    10  replace   TRUE      gt  dt    Z  X   Y 1  D  8  no 2  T  1 yes 3  J  7  no 4  K  6  no 5  U  2  no 6  A 10 yes 7  Y  5  no 8  M  9 yes 9  X  4 yes 10 Z  3  no   Then I anonymize it    gt  anonymize dt       Z    X  Y 1   b2  2 5 c1 2   b6 -4 5 c2 3   b3  1 5 c1 4   b4  0 5 c1 5   b7 -3 5 c1 6   b1  4 5 c2 7   b9 -0 5 c1 8   b5  3 5 c2 9   b8 -1 5 c2 10 b10 -2 5 c1   One may also want to sample few variables instead of the whole data before apply anonymization and dput command         sample two variables without replacement  gt  anonymize sample df dt 5 vars c  Y   X        Y    X 1 a1 -0 4 2 a1  0 6 3 a2 -2 4 4 a1 -1 4 5 a2  3 6

User · Answer

If you have large dataset which cannot be easily put to the script using dput     post your data to pastebin and load them using read table   d  lt - read table  http   pastebin com raw php i m1ZJuKLH     Inspired by  Henrik

User · Answer

Guidelines   Your main objective in crafting your questions should be to make it as easy as possible for readers to understand and reproduce your problem on their systems   To do so   Provide input data Provide expected output Explain your problem succinctly  if you have over 20 lines of text   code  you can probably go back and simplify simplify your code as much as possible while preserving the problem error    This does take some work  but it seems like a fair trade-off since you ask others to do work for you  Providing Data   Built-in Data Sets The best option by far is to rely on built-in datasets   This makes it very easy for others to work on your problem   Type data   at the R prompt to see what data is available to you   Some classic examples   iris mtcars ggplot2  diamonds  external package  but almost everyone has it   Inspect the built-in datasets to find one suitable for your problem  If you can rephrase your problem to use the built-in datasets  you are much more likely to get good answers  and upvotes   Self Generated Data If your problem is specific to a type of data that is not represented in the existing data sets  then provide the R code that generates the smallest possible dataset that your problem manifests itself on   For example set seed 1     important to make random data reproducible myData  lt - data frame a sample letters 1 5   20  rep T   b runif 20    Someone trying to answer my question can copy paste those two lines and start working on the problem immediately  dput As a last resort  you can use dput to transform a data object to R code  e g  dput myData     I say as a  quot last resort quot  because the output of dput is often fairly unwieldy  annoying to copy-paste  and obscures the rest of your question  Provide Expected Output   Someone once said   A picture of expected output is worth 1000 words -- a sage person  If you can add something like  quot I expected to get this result quot      cyl   mean hp 1    6 122 28571 2    4  82 63636 3    8 209 21429  to your question  people are much more likely to understand what you are trying to do quickly   If your expected result is large and unwieldy  then you probably haven t thought enough about how to simplify your problem  see next   Explain Your Problem Succinctly  The main thing to do is simplify your problem as much as possible before you ask your question   Re-framing the problem to work with the built-in datasets will help a lot in this regard   You will also often find that just by going through the process of simplification  you will answer your own problem  Here are some examples of good questions   with built in data set with user generated data  In both cases  the user s problems are almost certainly not with the simple examples they provide   Rather they abstracted the nature of their problem and applied it to a simple data set to ask their question  Why Yet Another Answer To This Question   This answer focuses on what I think is the best practice  use built-in data sets and provide what you expect as a result in a minimal form   The most prominent answers focus on other aspects   I don t expect this answer to rising to any prominence  this is here solely so that I can link to it in comments to newbie questions

User · Answer

To quickly create a dput of your data you can just copy  a piece of  the data to your clipboard and run the following in R   for data in Excel   dput read table  clipboard  sep   t  header TRUE     for data in a txt file   dput read table  clipboard  sep    header TRUE     You can change the sep in the latter if necessary  This will only work if your data is in the clipboard of course

User · Answer

If you have one or more factor variable s  in your data that you want to make reproducible with dput head mydata    consider adding droplevels to it  so that levels of factors that are not present in the minimized data set are not included in your dput output  in order to make the example minimal   dput droplevels head mydata

User · Answer

Personally  I prefer  quot one quot  liners  Something along the lines  my df  lt - data frame col1   sample c 1 2   10  replace   TRUE           col2   as factor sample 10    col3   letters 1 10           col4   sample c TRUE  FALSE   10  replace   TRUE   my list  lt - list list1   my df  list2   my df 3   list3   letters   The data structure should mimic the idea of the writer s problem and not the exact verbatim structure  I really appreciate it when variables don t overwrite my own variables or god forbid  functions  like df   Alternatively  one could cut a few corners and point to a pre-existing data set  something like  library vegan  data varespec  ord  lt - metaMDS varespec   Don t forget to mention any special packages you might be using  If you re trying to demonstrate something on larger objects  you can try my df2  lt - data frame a   sample 10e6   b   sample letters  10e6  replace   TRUE    If you re working with spatial data via the raster package  you can generate some random data  A lot of examples can be found in the package vignette  but here s a small nugget  library raster  r1  lt - r2  lt - r3  lt - raster nrow 10  ncol 10  values r1   lt - runif ncell r1   values r2   lt - runif ncell r2   values r3   lt - runif ncell r3   s  lt - stack r1  r2  r3   If you need some spatial object as implemented in sp  you can get some datasets via external files  like ESRI shapefile  in  quot spatial quot  packages  see the Spatial view in Task Views   library rgdal  ogrDrivers   dsn  lt - system file  quot vectors quot   package    quot rgdal quot   1  ogrListLayers dsn  ogrInfo dsn dsn  layer  quot cities quot   cities  lt - readOGR dsn dsn  layer  quot cities quot

User · Answer

Often you need some data for an example  however  you don t want to post your exact data  To use some existing data frame in established library  use data command to import it   e g    data mtcars    and then do the problem  names mtcars  your problem demostrated on the mtcars data set

User · Answer

You can do this using reprex   As mt1022 noted       good package for producing minimal  reproducible example is  reprex  from tidyverse    According to Tidyverse      The goal of  reprex  is to package your problematic code in such a way that other people can run it and feel your pain    An example is given on tidyverse web site   library reprex  y  lt - 1 4 mean y  reprex      I think this is the simplest way to create a reproducible example

User · Answer

Here s my advice from How to write a reproducible example  I ve tried to make it short but sweet   How to write a reproducible example You are most likely to get good help with your R problem if you provide a reproducible example  A reproducible example allows someone else to recreate your problem by just copying and pasting R code  You need to include four things to make your example reproducible  required packages  data  code  and a description of your R environment   Packages should be loaded at the top of the script  so it s easy to see which ones the example needs   The easiest way to include data in an email or Stack Overflow question is to use dput   to generate the R code to recreate it  For example  to recreate the mtcars dataset in R  I d perform the following steps   Run dput mtcars  in R Copy the output In my reproducible script  type mtcars  lt -  then paste    Spend a little bit of time ensuring that your code is easy for others to read   Make sure you ve used spaces and your variable names are concise  but informative  Use comments to indicate where your problem lies  Do your best to remove everything that is not related to the problem  The shorter your code is  the easier it is to understand     Include the output of sessionInfo   in a comment in your code  This summarises your R environment and makes it easy to check if you re using an out-of-date package    You can check you have actually made a reproducible example by starting up a fresh R session and pasting your script in  Before putting all of your code in an email  consider putting it on Gist github   It will give your code nice syntax highlighting  and you don t have to worry about anything getting mangled by the email system

User · Answer

Inspired by this very post  I now use a handy function  reproduce  lt mydata gt   when I need to post to StackOverflow   QUICK INSTRUCTIONS If myData is the name of your object to reproduce  run the following in R  install packages  quot devtools quot   library devtools  source url  quot https   raw github com rsaporta pubR gitbranch reproduce R quot    reproduce myData    Details  This function is an intelligent wrapper to dput and does the following   Automatically samples a large data set  based on size and class  Sample size can be adjusted  Creates a dput output Allows you to specify which columns to export Appends to the front of it objName  lt -     so that it can be easily copy pasted  but    If working on a mac  the output is automagically copied to the clipboard  so that you can simply run it and then paste it to your question   The source is available here   Github - pubR reproduce R   Example    sample data DF  lt - data frame id rep LETTERS  each 4  1 100   replicate 100  sample 1001  100    Class sample c  quot Yes quot    quot No quot    100  TRUE    DF is about 100 x 102   I want to sample 10 rows and a few specific columns reproduce DF  cols c  quot id quot    quot X1 quot    quot X73 quot    quot Class quot       I could also specify the column number    Gives the following output  This is what the sample looks like        id  X1 X73 Class 1    A 266 960   Yes 2    A 373 315    No            Notice the selection split  3    A 573 208    No            which can be turned off  4    A 907 850   Yes 5    B 202  46   Yes          6    B 895 969   Yes    lt     70   of selection is from the top rows 7    B 940 928    No 98   Y 371 171   Yes           99   Y 733 364   Yes    lt     30   of selection is from the bottom rows    100  Y 546 641    No                 X                                                              X            Copy Paste this part   If on a Mac  it is already copied         X                                                              X     DF  lt - structure list id   structure c 1L  1L  1L  1L  2L  2L  2L  25L  25L  25L    Label   c  quot A quot    quot B quot    quot C quot    quot D quot    quot E quot    quot F quot    quot G quot    quot H quot    quot I quot    quot J quot    quot K quot    quot L quot    quot M quot    quot N quot    quot O quot    quot P quot    quot Q quot    quot R quot    quot S quot    quot T quot    quot U quot    quot V quot    quot W quot    quot X quot    quot Y quot    class    quot factor quot    X1   c 266L  373L  573L  907L  202L  895L  940L  371L  733L  546L   X73   c 960L  315L  208L  850L  46L  969L  928L  171L  364L  641L   Class   structure c 2L  1L  1L  2L  2L  2L  1L  2L  2L  1L    Label   c  quot No quot    quot Yes quot    class    quot factor quot      Names   c  quot id quot    quot X1 quot    quot X73 quot    quot Class quot    class    quot data frame quot   row names   c 1L  2L  3L  4L  5L  6L  7L  98L  99L  100L           X                                                              X    Notice also that the entirety of the output is in a nice single  long line  not a tall paragraph of chopped up lines  This makes it easier to read on SO questions posts and also easier to copy paste   Update Oct 2013  You can now specify how many lines of text output will take up  ie  what you will paste into StackOverflow   Use the lines out n argument for this    Example  reproduce DF  cols c 1 3  17  23   lines out 7  yields        X                                                              X            Copy Paste this part   If on a Mac  it is already copied         X                                                              X     DF  lt - structure list id   structure c 1L  1L  1L  1L  2L  2L  2L  25L 25L  25L    Label         c  quot A quot    quot B quot    quot C quot    quot D quot    quot E quot    quot F quot    quot G quot    quot H quot   quot I quot    quot J quot    quot K quot    quot L quot    quot M quot    quot N quot    quot O quot    quot P quot    quot Q quot    quot R quot    quot S quot    quot T quot    quot U quot   quot V quot    quot W quot    quot X quot    quot Y quot    class    quot factor quot          X1   c 809L  81L  862L 747L  224L  721L  310L  53L  853L  642L         X2   c 926L  409L 825L  702L  803L  63L  319L  941L  598L  830L         X16   c 447L 164L  8L  775L  471L  196L  30L  420L  47L  327L         X22   c 335L 164L  503L  407L  662L  139L  111L  721L  340L  178L     Names   c  quot id quot   quot X1 quot          quot X2 quot    quot X16 quot    quot X22 quot    class    quot data frame quot   row names   c 1L 2L  3L  4L  5L  6L  7L  98L  99L  100L          X                                                              X

User · Answer

Apart of all above answers which I found very interesting  it could sometimes be very easy as it is discussed here  - HOW TO MAKE A MINIMAL REPRODUCIBLE EXAMPLE TO GET HELP WITH R  There are many ways to make a random vector Create a 100 number vector with random values in R rounded to 2 decimals  or random matrix in R   mydf1 lt - matrix rnorm 20  nrow 20 ncol 5    Note that sometimes it is very difficult to share a given data because of various reasons such as dimension etc  However  all above answers are great and very important to think and use when one wants to make a reproducible data example  But note that in order to make a data as representative as the original  in case the OP cannot share the original data   it is good to add some information with the data example as  if we call the data mydf1    class mydf1    this shows the type of the data you have  dim mydf1    this shows the dimension of your data   Moreover  one should know the type  length and attributes of a data which can be Data structures    found based on the following  typeof mydf1   what it is  length mydf1   how many elements it contains  attributes mydf1   additional arbitrary metadata    If you cannot share your original data  you can str it and give an idea about the structure of your data head str mydf1

User · Answer

Here are some of my suggestions    Try to use default R datasets If you have your own dataset  include them with dput  so others can help you more easily Do not use install package   unless it is really necessary  people will understand if you just use require or library Try to be concise     Have some dataset Try to describe the output you need as simply as possible Do it yourself before you ask the question  It is easy to upload an image  so upload plots if you have Also include any errors you may have   All these are part of a reproducible example

User · Answer

Since R 2 14  I guess  you can feed your data text representation directly to read table    df  lt - read table header TRUE     text  Sepal Length Sepal Width Petal Length Petal Width Species 1          5 1         3 5          1 4         0 2  setosa 2          4 9         3 0          1 4         0 2  setosa 3          4 7         3 2          1 3         0 2  setosa 4          4 6         3 1          1 5         0 2  setosa 5          5 0         3 6          1 4         0 2  setosa 6          5 4         3 9          1 7         0 4  setosa

User · Answer

Here is a good guide   The most important point is  Just make sure that you make a small piece of code that we can run to see what the problem is  A useful function for this is dput    but if you have very large data  you might want to make a small sample dataset or only use the first 10 lines or so   EDIT   Also make sure that you identified where the problem is yourself  The example should not be an entire R script with  On line 200 there is an error   If you use the debugging tools in R  I love browser    and Google you should be able to really identify where the problem is and reproduce a trivial example in which the same thing goes wrong

[r] How to make a great R reproducible example

Examples related to r

Examples related to r-faq