Select N random elements from a List T in C

Question

I need a quick algorithm to select 5 random elements from a generic list  For example  I d like to get 5 random elements from a List lt string gt

User · Answer

Using linq   YourList OrderBy x   gt  rnd Next    Take 5

User · Answer

It is a lot harder than one would think  See the great Article  Shuffling  from Jeff   I did write a very short article on that subject including C  code  Return random subset of N elements of a given array

User · Answer

This will solve your issue   var entries new List lt T gt     var selectedItems   new List lt T gt                       for  var i   0  i   10  i                                          var rdm   new Random   Next entries Count                           while  selectedItems Contains entries rdm                                rdm   new Random   Next entries Count                        selectedItems Add entries rdm

User · Answer

public static IEnumerable lt T gt  GetRandom lt T gt  this IList lt T gt  list  int count  Random random                   Probably you should throw exception if count  gt  list Count         count   Math Min list Count  count            var selectedIndices   new SortedSet lt int gt                 Random upper bound         int randomMax   list Count - 1           while  selectedIndices Count  lt  count                        int randomIndex   random Next 0  randomMax                   skip over already selected indeces             foreach  var selectedIndex in selectedIndices                  if  selectedIndex  lt   randomIndex                        randomIndex                  else                     break               yield return list randomIndex                selectedIndices Add randomIndex               --randomMax                    Memory   count Complexity  O count2

User · Answer

Here is a benchmark of three different methods   The implementation of the accepted answer from Kyle  An approach based on random index selection with HashSet duplication filtering  from drzaus  A more academic approach posted by Jes  s L  pez  called Fisher   Yates shuffle   The testing will consist of benchmarking the performance with multiple different list sizes and selection sizes  I also included a measurement of the standard deviation of these three methods  i e  how well distributed the random selection appears to be  In a nutshell  drzaus s simple solution seems to be the best overall  from these three  The selected answer is great and elegant  but it s not that efficient  given that the time complexity is based on the sample size  not the selection size  Consequently  if you select a small number of items from a long list  it will take orders of magnitude more time  Of course it still performs better than the solutions based on complete reordering  Curiously enough  this O n  time complexity issue is true even if you only touch the list when you actually return an item  like I do in my implementation  The only thing I can thing of is that Random Next   is pretty slow  and that performance benefits if you generate only one random number for each selected item  And  also interestingly  the StdDev of Kyle s solution was significantly higher comparatively  I have no clue why  maybe the fault is in my implementation  Sorry for the long code and output that will commence now  but I hope it s somewhat illuminative  Also  if you spot any issues in the tests or implementations  let me know and I ll fix it  static void Main         BenchmarkRunner Run lt Benchmarks gt          new Benchmarks     ListSize   100  SelectionSize   10            BenchmarkStdDev        MemoryDiagnoser  public class Benchmarks        Params 50  500  5000       public int ListSize        Params 5  10  25  50       public int SelectionSize       private Random  rnd      private List lt int gt   list      private int    hits        GlobalSetup      public void Setup                  rnd   new Random 12345            list   Enumerable Range 0  ListSize  ToList             hits   new int ListSize               Benchmark      public void Test IterateSelect             gt  Random IterateSelect  list  SelectionSize  ToList          Benchmark      public void Test RandomIndices              gt  Random RandomIdices  list  SelectionSize  ToList          Benchmark      public void Test FisherYates              gt  Random FisherYates  list  SelectionSize  ToList         public void BenchmarkStdDev                 RunOnce Random IterateSelect   quot IterateSelect quot            RunOnce Random RandomIdices   quot RandomIndices quot            RunOnce Random FisherYates   quot FisherYates quot             void RunOnce Func lt IEnumerable lt int gt   int  IEnumerable lt int gt  gt  method  string methodName                        Setup                for  int i   0  i  lt  1000000  i                                  var selected   method  list  SelectionSize  ToList                    Debug Assert selected Count      SelectionSize                   foreach  var item in selected   hits item                               var stdDev   GetStdDev  hits               Console WriteLine   quot StdDev of  methodName    stdDev  n     of average   stdDev     hits Average     100   n   quot                       double GetStdDev IEnumerable lt int gt  hits                        var average   hits Average                return Math Sqrt hits Average v   gt  Math Pow v - average  2                          public IEnumerable lt T gt  Random IterateSelect lt T gt  IEnumerable lt T gt  collection  int needed                var count   collection Count            for  int i   0  i  lt  count  i                          if   rnd Next count - i   lt  needed                                yield return collection ElementAt i                   if  --needed    0                      yield break                                     public IEnumerable lt T gt  Random RandomIdices lt T gt  IEnumerable lt T gt  list  int needed                var selectedItems   new HashSet lt T gt             var count   list Count             while  needed  gt  0              if  selectedItems Add list ElementAt  rnd Next count                     needed--           return selectedItems             public IEnumerable lt T gt  Random FisherYates lt T gt  IEnumerable lt T gt  list  int sampleSize                var count   list Count            if  sampleSize  gt  count  throw new ArgumentException  quot sampleSize may not be greater than list count quot    quot sampleSize quot            var indices   new Dictionary lt int  int gt     int index           for  int i   0  i  lt  sampleSize  i                          int j    rnd Next i  count               if   indices TryGetValue j  out index   index   j               yield return list ElementAt index                if   indices TryGetValue i  out index   index   i              indices j    index                     Output           Method   ListSize   Select          Mean       Error      StdDev    Gen 0   Allocated    --------------  ---------  -------  ------------  ----------  ----------  -------  ----------     IterateSelect         50        5      711 5 ns     5 19 ns     4 85 ns   0 0305       144 B     RandomIndices         50        5      341 1 ns     4 48 ns     4 19 ns   0 0644       304 B       FisherYates         50        5      573 5 ns     6 12 ns     5 72 ns   0 0944       447 B      IterateSelect         50       10      967 2 ns     4 64 ns     3 87 ns   0 0458       220 B     RandomIndices         50       10      709 9 ns    11 27 ns     9 99 ns   0 1307       621 B       FisherYates         50       10    1 204 4 ns    10 63 ns     9 94 ns   0 1850       875 B      IterateSelect         50       25    1 358 5 ns     7 97 ns     6 65 ns   0 0763       361 B     RandomIndices         50       25    1 958 1 ns    15 69 ns    13 91 ns   0 2747      1298 B       FisherYates         50       25    2 878 9 ns    31 42 ns    29 39 ns   0 3471      1653 B      IterateSelect         50       50    1 739 1 ns    15 86 ns    14 06 ns   0 1316       629 B     RandomIndices         50       50    8 906 1 ns    88 92 ns    74 25 ns   0 5951      2848 B       FisherYates         50       50    4 899 9 ns    38 10 ns    33 78 ns   0 4349      2063 B      IterateSelect        500        5    4 775 3 ns    46 96 ns    41 63 ns   0 0305       144 B     RandomIndices        500        5      327 8 ns     2 82 ns     2 50 ns   0 0644       304 B       FisherYates        500        5      558 5 ns     7 95 ns     7 44 ns   0 0944       449 B      IterateSelect        500       10    5 387 1 ns    44 57 ns    41 69 ns   0 0458       220 B     RandomIndices        500       10      648 0 ns     9 12 ns     8 54 ns   0 1307       621 B       FisherYates        500       10    1 154 6 ns    13 66 ns    12 78 ns   0 1869       889 B      IterateSelect        500       25    6 442 3 ns    48 90 ns    40 83 ns   0 0763       361 B     RandomIndices        500       25    1 569 6 ns    15 79 ns    14 77 ns   0 2747      1298 B       FisherYates        500       25    2 726 1 ns    25 32 ns    22 44 ns   0 3777      1795 B      IterateSelect        500       50    7 775 4 ns    35 47 ns    31 45 ns   0 1221       629 B     RandomIndices        500       50    2 976 9 ns    27 11 ns    24 03 ns   0 6027      2848 B       FisherYates        500       50    5 383 2 ns    36 49 ns    32 35 ns   0 8163      3870 B      IterateSelect       5000        5   45 208 6 ns   459 92 ns   430 21 ns        -       144 B     RandomIndices       5000        5      328 7 ns     5 15 ns     4 81 ns   0 0644       304 B       FisherYates       5000        5      556 1 ns    10 75 ns    10 05 ns   0 0944       449 B      IterateSelect       5000       10   49 253 9 ns   420 26 ns   393 11 ns        -       220 B     RandomIndices       5000       10      642 9 ns     4 95 ns     4 13 ns   0 1307       621 B       FisherYates       5000       10    1 141 9 ns    12 81 ns    11 98 ns   0 1869       889 B      IterateSelect       5000       25   54 044 4 ns   208 92 ns   174 46 ns   0 0610       361 B     RandomIndices       5000       25    1 480 5 ns    11 56 ns    10 81 ns   0 2747      1298 B       FisherYates       5000       25    2 713 9 ns    27 31 ns    24 21 ns   0 3777      1795 B      IterateSelect       5000       50   54 418 2 ns   329 62 ns   308 32 ns   0 1221       629 B     RandomIndices       5000       50    2 886 4 ns    36 53 ns    34 17 ns   0 6027      2848 B       FisherYates       5000       50    5 347 2 ns    59 45 ns    55 61 ns   0 8163      3870 B    StdDev of IterateSelect  671 88    of average  0 67  StdDev of RandomIndices  296 07    of average  0 30  StdDev of FisherYates  280 47    of average  0 28

User · Answer

Based on Kyle s answer  here s my c  implementation        lt summary gt      Picks random selection of available game ID s      lt  summary gt  private static List lt int gt  GetRandomGameIDs int count               var gameIDs    int   HttpContext Current Application  NonDeletedArcadeGameIDs        var totalGameIDs   gameIDs Count        if  count  gt  totalGameIDs  count   totalGameIDs       var rnd   new Random        var leftToPick   count      var itemsLeft   totalGameIDs      var arrPickIndex   0      var returnIDs   new List lt int gt         while  leftToPick  gt  0                if  rnd Next 0  itemsLeft   lt  leftToPick                        returnIDs  Add gameIDs arrPickIndex                leftToPick--                    arrPickIndex            itemsLeft--             return returnIDs

User · Answer

I recently did this on my project using an idea similar to Tyler s point 1  I was loading a bunch of questions and selecting five at random  Sorting was achieved using an IComparer  aAll questions were loaded in the a QuestionSorter list  which was then sorted using the List s Sort function and the first k elements where selected       private class QuestionSorter   IComparable lt QuestionSorter gt                public double SortingKey                       get              set                     public Question QuestionObject                       get              set                     public QuestionSorter Question q                        this SortingKey   RandomNumberGenerator RandomDouble              this QuestionObject   q                     public int CompareTo QuestionSorter other                        if  this SortingKey  lt  other SortingKey                                return -1                            else if  this SortingKey  gt  other SortingKey                                return 1                            else                               return 0                                  Usage       List lt QuestionSorter gt  unsortedQuestions   new List lt QuestionSorter gt             add the questions here      unsortedQuestions Sort unsortedQuestions as IComparer lt QuestionSorter gt            select the first k elements

User · Answer

You can use this but the ordering will happen on client side    AsEnumerable   OrderBy n   gt  Guid NewGuid    Take 5

User · Answer

Was thinking about comment by  JohnShedletsky on the accepted answer regarding  paraphrase       you should be able to to this in O subset Length   rather than O originalList Length    Basically  you should be able to generate subset random indices and then pluck them from the original list   The Method  public static class EnumerableExtensions        public static Random randomizer   new Random       you d ideally be able to replace this with whatever makes you comfortable      public static IEnumerable lt T gt  GetRandom lt T gt  this IEnumerable lt T gt  list  int numItems            return  list as T      list ToArray    GetRandom numItems               because ReSharper whined about duplicate enumeration                       items Add list ElementAt randomizer Next list Count        numItems--                           just because the parentheses were getting confusing     public static IEnumerable lt T gt  GetRandom lt T gt  this T   list  int numItems            var items   new HashSet lt T gt        don t want to add the same item twice  otherwise use a list         while  numItems  gt  0                  if we successfully added it  move on             if  items Add list randomizer Next list Length      numItems--           return items                and because it s really fun  note -- you may get repetition     public static IEnumerable lt T gt  PluckRandomly lt T gt  this IEnumerable lt T gt  list            while  true               yield return list ElementAt randomizer Next list Count                 If you wanted to be even more efficient  you would probably use a HashSet of the indices  not the actual list elements  in case you ve got complex types or expensive comparisons    The Unit Test  And to make sure we don t have any collisions  etc    TestClass  public class RandomizingTests   UnitTestBase        TestMethod      public void GetRandomFromList             this testGetRandomFromList  list  num    gt  list GetRandom num                TestMethod      public void PluckRandomly             this testGetRandomFromList  list  num    gt  list PluckRandomly   Take num   requireDistinct false              private void testGetRandomFromList Func lt IEnumerable lt int gt   int  IEnumerable lt int gt  gt  methodToGetRandomItems  int numToTake   10  int repetitions   100000  bool requireDistinct   true            var items   Enumerable Range 0  100           IEnumerable lt int gt  randomItems   null           while  repetitions--  gt  0                 randomItems   methodToGetRandomItems items  numToTake               Assert AreEqual numToTake  randomItems Count                                 Did not get expected number of items  0   failed at  1  repetition--   numToTake  repetitions               if requireDistinct  Assert AreEqual numToTake  randomItems Distinct   Count                                 Collisions  non-unique values  found  failed at  0  repetition--   repetitions               Assert IsTrue randomItems All o   gt  items Contains o                             Some unknown values found  failed at  0  repetition--   repetitions

User · Answer

Here you have one implementation based on Fisher-Yates Shuffle whose algorithm complexity is O n  where n is the subset or sample size  instead of the list size  as John Shedletsky pointed out   public static IEnumerable lt T gt  GetRandomSample lt T gt  this IList lt T gt  list  int sampleSize        if  list    null  throw new ArgumentNullException  list        if  sampleSize  gt  list Count  throw new ArgumentException  sampleSize may not be greater than list count    sampleSize        var indices   new Dictionary lt int  int gt     int index      var rnd   new Random         for  int i   0  i  lt  sampleSize  i                  int j   rnd Next i  list Count           if   indices TryGetValue j  out index   index   j           yield return list index            if   indices TryGetValue i  out index   index   i          indices j    index

User · Answer

Here s my approach  full text here http   krkadev blogspot com 2010 08 random-numbers-without-repetition html     It should run in O K  instead of O N   where K is the number of wanted elements and N is the size of the list to choose from   public  lt T gt  List lt T gt  take List lt T gt  source  int k     int n   source size     if  k  gt  n       throw new IllegalStateException        Can not take     k          elements from a list with     n          elements        List lt T gt  result   new ArrayList lt T gt  k    Map lt Integer Integer gt  used   new HashMap lt Integer Integer gt      int metric   0   for  int i   0  i  lt  k  i         int off   random nextInt n - i      while  true         metric         Integer redirect   used put off  n - i - 1        if  redirect    null           break              off   redirect          result add source get off        assert metric  lt   2 k   return result

User · Answer

Extending from  ers s answer  if one is worried about possible different implementations of OrderBy  this should be safe      Instead of this YourList OrderBy x   gt  rnd Next    Take 5      Temporarily transform  YourList      Select v   gt  new  v  i   rnd Next        Associate a random index to each entry      OrderBy x   gt  x i  Take 5     Sort by  at this point fixed  random index       Select x   gt  x v      Go back to enumerable of entry

User · Answer

Iterate through and for each element make the probability of selection    number needed   number left  So if you had 40 items  the first would have a 5 40 chance of being selected  If it is  the next has a 4 39 chance  otherwise it has a 5 39 chance  By the time you get to the end you will have your 5 items  and often you ll have all of them before that  This technique is called selection sampling  a special case of Reservoir Sampling  It s similar in performance to shuffling the input  but of course allows the sample to be generated without modifying the original data

User · Answer

This method may be equivalent to Kyle s    Say your list is of size n and you want k elements    Random rand   new Random    for int i   0  k gt 0    i         int r   rand Next 0  n-i       if r lt k                   include element i         k--             Works like a charm      -Alex Gilbert

User · Answer

I just ran into this problem  and some more google searching brought me to the problem of randomly shuffling a list  http   en wikipedia org wiki Fisher-Yates shuffle  To completely randomly shuffle your list  in place  you do this   To shuffle an array a of n elements  indices 0  n-1      for i from n - 1 downto 1 do        j   random integer with 0   j   i        exchange a j  and a i    If you only need the first 5 elements  then instead of running i all the way from n-1 to 1  you only need to run it to n-5   ie  n-5   Lets say you need k items    This becomes     for  i   n - 1  i  gt   n-k  i--             j   random integer with 0   j   i        exchange a j  and a i        Each item that is selected is swapped toward the end of the array  so the k elements selected are the last k elements of the array     This takes time O k   where k is the number of randomly selected elements you need     Further  if you don t want to modify your initial list  you can write down all your swaps in a temporary list  reverse that list  and apply them again  thus performing the inverse set of swaps and returning you your initial list without changing the O k  running time     Finally  for the real stickler  if  n    k   you should stop at 1  not n-k  as the randomly chosen integer will always be 0

User · Answer

Using LINQ with large lists  when costly to touch each element  AND if you can live with the possibility of duplicates   new int 5  Select o   gt   int  rnd NextDouble     maxIndex   Select i   gt  YourIEnum ElementAt i     For my use i had a list of 100 000 elements  and because of them being pulled from a DB I about halfed  or better  the time compared to a rnd on the whole list   Having a large list will reduce the odds greatly for duplicates

User · Answer

The simple solution I use  probably not good for large lists   Copy the list into temporary list  then in loop randomly select Item from temp list and put it in selected items list while removing it form temp list  so it can t be reselected    Example   List lt Object gt  temp   OriginalList ToList    List lt Object gt  selectedItems   new List lt Object gt     Random rnd   new Random    Object o  int i   0  while  i  lt  NumberOfSelectedItems                o   temp rnd Next temp Count                selectedItems Add o               temp Remove o               i

User · Answer

I think the selected answer is correct and pretty sweet   I implemented it differently though  as I also wanted the result in random order       static IEnumerable lt SomeType gt  PickSomeInRandomOrder lt SomeType gt           IEnumerable lt SomeType gt  someTypes          int maxCount                Random random   new Random DateTime Now Millisecond            Dictionary lt double  SomeType gt  randomSortTable   new Dictionary lt double SomeType gt              foreach SomeType someType in someTypes              randomSortTable random NextDouble      someType           return randomSortTable OrderBy KVP   gt  KVP Key  Take maxCount  Select KVP   gt  KVP Value

User · Answer

public static List lt T gt  GetRandomElements lt T gt  this IEnumerable lt T gt  list  int elementsCount        return list OrderBy arg   gt  Guid NewGuid    Take elementsCount  ToList

User · Answer

why not something like this    Dim ar As New ArrayList     Dim numToGet As Integer   5      hard code just to test     ar Add  12       ar Add  11       ar Add  10       ar Add  15       ar Add  16       ar Add  17        Dim randomListOfProductIds As New ArrayList      Dim toAdd As String          For i   0 To numToGet - 1         toAdd   ar CInt  ar Count - 1    Rnd              randomListOfProductIds Add toAdd           remove from id list         ar Remove toAdd       Next  sorry i m lazy and have to write vb at work    and didn t feel like converting to c

User · Answer

I would use an extension method       public static IEnumerable lt T gt  TakeRandom lt T gt  this IEnumerable lt T gt  elements  int countToTake                var random   new Random             var internalList   elements ToList             var selected   new List lt T gt             for  var i   0  i  lt  countToTake    i                        var next   random Next 0  internalList Count - selected Count               selected Add internalList next                internalList next    internalList internalList Count - selected Count                     return selected

User · Answer

This isn t as elegant or efficient as the accepted solution  but it s quick to write up  First  permute the array randomly  then select the first K elements  In python   import numpy  N   20 K   5  idx   np arange N  numpy random shuffle idx   print idx  K

User · Answer

Selecting N random items from a group shouldn t have anything to do with order  Randomness is about unpredictability and not about shuffling positions in a group  All the answers that deal with some kinda ordering is bound to be less efficient than the ones that do not  Since efficiency is the key here  I will post something that doesn t change the order of items too much   1  If you need true random values which means there is no restriction on which elements to choose from  ie  once chosen item can be reselected     public static List lt T gt  GetTrueRandom lt T gt  this IList lt T gt  source  int count                                          bool throwArgumentOutOfRangeException   true        if  throwArgumentOutOfRangeException  amp  amp  count  gt  source Count          throw new ArgumentOutOfRangeException         var randoms   new List lt T gt  count       randoms AddRandomly source  count       return randoms      If you set the exception flag off  then you can choose random items any number of times      If you have   1  2  3  4    then it can give   1  4  4      1  4  3   etc for 3 items or even   1  4  3  2  4   for 5 items    This should be pretty fast  as it has nothing to check   2  If you need individual members from the group with no repetition  then I would rely on a dictionary  as many have pointed out already    public static List lt T gt  GetDistinctRandom lt T gt  this IList lt T gt  source  int count        if  count  gt  source Count          throw new ArgumentOutOfRangeException         if  count    source Count          return new List lt T gt  source        var sourceDict   source ToIndexedDictionary         if  count  gt  source Count   2                while  sourceDict Count  gt  count              sourceDict Remove source GetRandomIndex              return sourceDict Select kvp   gt  kvp Value  ToList               var randomDict   new Dictionary lt int  T gt  count       while  randomDict Count  lt  count                int key   source GetRandomIndex            if   randomDict ContainsKey key               randomDict Add key  sourceDict key               return randomDict Select kvp   gt  kvp Value  ToList        The code is a bit lengthier than other dictionary approaches here because I m not only adding  but also removing from list  so its kinda two loops  You can see here that I have not reordered anything at all when count becomes equal to source Count  That s because I believe randomness should be in the returned set as a whole  I mean if you want 5 random items from 1  2  3  4  5  it shouldn t matter if its 1  3  4  2  5 or 1  2  3  4  5  but if you need 4 items from the same set  then it should unpredictably yield in 1  2  3  4  1  3  5  2  2  3  5  4 etc  Secondly  when the count of random items to be returned is more than half of the original group  then its easier to remove source Count - count items from the group than adding count items  For performance reasons I have used source instead of sourceDict to get then random index in the remove method      So if you have   1  2  3  4    this can end up in   1  2  3      3  4  1   etc for 3 items    3  If you need truly distinct random values from your group by taking into account the duplicates in the original group  then you may use the same approach as above  but a HashSet will be lighter than a dictionary   public static List lt T gt  GetTrueDistinctRandom lt T gt  this IList lt T gt  source  int count                                                  bool throwArgumentOutOfRangeException   true        if  count  gt  source Count          throw new ArgumentOutOfRangeException         var set   new HashSet lt T gt  source        if  throwArgumentOutOfRangeException  amp  amp  count  gt  set Count          throw new ArgumentOutOfRangeException         List lt T gt  list   hash ToList         if  count  gt   set Count          return list       if  count  gt  set Count   2                while  set Count  gt  count              set Remove list GetRandom              return set ToList               var randoms   new HashSet lt T gt         randoms AddRandomly list  count       return randoms ToList        The randoms variable is made a HashSet to avoid duplicates being added in the rarest of rarest cases where Random Next can yield the same value  especially when input list is small      So   1  2  2  4      3 random items      1  2  4   and never   1  2  2         1  2  2  4      4 random items    exception   or   1  2  4   depending on the flag set    Some of the extension methods I have used   static Random rnd   new Random    public static int GetRandomIndex lt T gt  this ICollection lt T gt  source        return rnd Next source Count      public static T GetRandom lt T gt  this IList lt T gt  source        return source source GetRandomIndex        static void AddRandomly lt T gt  this ICollection lt T gt  toCol  IList lt T gt  fromList  int count        while  toCol Count  lt  count          toCol Add fromList GetRandom        public static Dictionary lt int  T gt  ToIndexedDictionary lt T gt  this IEnumerable lt T gt  lst        return lst ToIndexedDictionary t   gt  t      public static Dictionary lt int  T gt  ToIndexedDictionary lt S  T gt  this IEnumerable lt S gt  lst                                                              Func lt S  T gt  valueSelector        int index   -1      return lst ToDictionary t   gt    index  valueSelector       If its all about performance with tens of 1000s of items in the list having to be iterated 10000 times  then you may want to have faster random class than System Random  but I don t think that s a big deal considering the latter most probably is never a bottleneck  its quite fast enough    Edit  If you need to re-arrange order of returned items as well  then there s nothing that can beat dhakim s Fisher-Yates approach - short  sweet and simple

User · Answer

This is actually a harder problem than it sounds like  mainly because many mathematically-correct solutions will fail to actually allow you to hit all the possibilities  more on this below    First  here are some easy-to-implement  correct-if-you-have-a-truly-random-number generator    0  Kyle s answer  which is O n     1  Generate a list of n pairs   0  rand    1  rand    2  rand         sort them by the second coordinate  and use the first k  for you  k 5  indices to get your random subset   I think this is easy to implement  although it is O n log n  time    2  Init an empty list s      that will grow to be the indices of k random elements   Choose a number r in  0  1  2       n-1  at random  r   rand   n  and add this to s   Next take r   rand    n-1  and stick in s  add to r the   elements less than it in s to avoid collisions   Next take r   rand    n-2   and do the same thing  etc  until you have k distinct elements in s   This has worst-case running time O k 2    So for k  lt  lt  n  this can be faster   If you keep s sorted and track which contiguous intervals it has  you can implement it in O k log k   but it s more work    Kyle - you re right  on second thought I agree with your answer   I hastily read it at first  and mistakenly thought you were indicating to sequentially choose each element with fixed probability k n  which would have been wrong - but your adaptive approach appears correct to me   Sorry about that   Ok  and now for the kicker  asymptotically  for fixed k  n growing   there are n k k  choices of k element subset out of n elements  this is an approximation of  n choose k     If n is large  and k is not very small  then these numbers are huge   The best cycle length you can hope for in any standard 32 bit random number generator is 2 32   256 4   So if we have a list of 1000 elements  and we want to choose 5 at random  there s no way a standard random number generator will hit all the possibilities   However  as long as you re ok with a choice that works fine for smaller sets  and always  looks  random  then these algorithms should be ok   Addendum  After writing this  I realized that it s tricky to implement idea  2  correctly  so I wanted to clarify this answer   To get O k log k  time  you need an array-like structure that supports O log m  searches and inserts - a balanced binary tree can do this   Using such a structure to build up an array called s  here is some pseudopython     Returns a container s with k distinct random numbers from  0  1       n-1  def ChooseRandomSubset n  k     for i in range k       r   UniformRandom 0  n-i                    May be 0  must be  lt  n-i     q   s FirstIndexSuchThat  s q  - q  gt  r      This is the search      s InsertInOrder q   r   q   r   len s       Inserts right before q    return s   I suggest running through a few sample cases to see how this efficiently implements the above English explanation

User · Answer

I combined several of the above answers to create a Lazily-evaluated extension method  My testing showed that Kyle s approach  Order N   is many times slower than drzaus  use of a set to propose the random indices to choose  Order K    The former performs many more calls to the random number generator  plus iterates more times over the items   The goals of my implementation were   1  Do not realize the full list if given an IEnumerable that is not an IList  If I am given a sequence of a zillion items  I do not want to run out of memory  Use Kyle s approach for an on-line solution   2  If I can tell that it is an IList  use drzaus  approach  with a twist  If K is more than half of N  I risk thrashing as I choose many random indices again and again and have to skip them  Thus I compose a list of the indices to NOT keep   3  I guarantee that the items will be returned in the same order that they were encountered  Kyle s algorithm required no alteration  drzaus  algorithm required that I not emit items in the order that the random indices are chosen  I gather all the indices into a SortedSet  then emit items in sorted index order   4  If K is large compared to N and I invert the sense of the set  then I enumerate all items and test if the index is not in the set  This means that I lose the Order K  run time  but since K is close to N in these cases  I do not lose much   Here is the code            lt summary gt          Takes k elements from the next n elements at random  preserving their order                   If there are fewer than n elements in items  this may return fewer than k elements           lt  summary gt           lt typeparam name  TElem  gt Type of element in the items collection  lt  typeparam gt           lt param name  items  gt Items to be randomly selected  lt  param gt           lt param name  k  gt Number of items to pick  lt  param gt           lt param name  n  gt Total number of items to choose from          If the items collection contains more than this number  the extra members will be skipped          If the items collection contains fewer than this number  it is possible that fewer than k items will be returned  lt  param gt           lt returns gt Enumerable over the retained items                   See http   stackoverflow com questions 48087 select-a-random-n-elements-from-listt-in-c-sharp for the commentary           lt  returns gt      public static IEnumerable lt TElem gt  TakeRandom lt TElem gt  this IEnumerable lt TElem gt  items  int k  int n                var r   new FastRandom            var itemsList   items as IList lt TElem gt            if  k  gt   n     itemsList    null  amp  amp  k  gt   itemsList Count               foreach  var item in items  yield return item          else                            If we have a list  we can infer more information and choose a better algorithm                 When using an IList  this is about 7 times faster  on one benchmark               if  itemsList    null  amp  amp  k  lt  n 2                                   Since we have a List  we can use an algorithm suitable for Lists                     If there are fewer than n elements  reduce n                  n   Math Min n  itemsList Count                       This algorithm picks K index-values randomly and directly chooses those items to be selected                     If k is more than half of n  then we will spend a fair amount of time thrashing  picking                    indices that we have already picked and having to try again                     var invertSet   k  gt   n 2                    var positions   invertSet    ISet lt int gt   new HashSet lt int gt       ISet lt int gt   new SortedSet lt int gt                      var numbersNeeded   invertSet   n - k   k                  while  numbersNeeded  gt  0                      if  positions Add r Next 0  n    numbersNeeded--                   if  invertSet                                           positions contains all the indices of elements to Skip                      for  var itemIndex   0  itemIndex  lt  n  itemIndex                                                  if   positions Contains itemIndex                               yield return itemsList itemIndex                                                           else                                          positions contains all the indices of elements to Take                      foreach  var itemIndex in positions                          yield return itemsList itemIndex                                                             else                                  Since we do not have a list  we will use an online algorithm                     This permits is to skip the rest as soon as we have enough items                  var found   0                  var scanned   0                  foreach  var item in items                                        var rand   r Next 0 n-scanned                       if  rand  lt  k - found                                                yield return item                          found                                              scanned                        if  found  gt   k    scanned  gt   n                          break                                                       I use a specialized random number generator  but you can just use C  s Random if you want   FastRandom was written by Colin Green and is part of SharpNEAT  It has a period of 2 128-1 which is better than many RNGs    Here are the unit tests    TestClass  public class TakeRandomTests            lt summary gt          Ensure that when randomly choosing items from an array  all items are chosen with roughly equal probability           lt  summary gt       TestMethod      public void TakeRandom Array Uniformity                 const int numTrials   2000000          const int expectedCount   numTrials 20          var timesChosen   new int 100           var century   new int 100           for  var i   0  i  lt  century Length  i                century i    i           for  var trial   0  trial  lt  numTrials  trial                          foreach  var i in century TakeRandom 5  100                   timesChosen i                       var avg   timesChosen Average            var max   timesChosen Max            var min   timesChosen Min            var allowedDifference   expectedCount 100          AssertBetween avg  expectedCount - 2  expectedCount   2   Average              AssertBetween min  expectedCount - allowedDifference  expectedCount   Min              AssertBetween max  expectedCount  expectedCount   allowedDifference   Max             var countInRange   timesChosen Count i   gt  i  gt   expectedCount - allowedDifference  amp  amp  i  lt   expectedCount   allowedDifference           Assert IsTrue countInRange  gt   90  String Format  Not enough were in range   0    countInRange                    lt summary gt          Ensure that when randomly choosing items from an IEnumerable that is not an IList           all items are chosen with roughly equal probability           lt  summary gt       TestMethod      public void TakeRandom IEnumerable Uniformity                 const int numTrials   2000000          const int expectedCount   numTrials   20          var timesChosen   new int 100            for  var trial   0  trial  lt  numTrials  trial                          foreach  var i in Range 0 100  TakeRandom 5  100                   timesChosen i                       var avg   timesChosen Average            var max   timesChosen Max            var min   timesChosen Min            var allowedDifference   expectedCount   100          var countInRange               timesChosen Count i   gt  i  gt   expectedCount - allowedDifference  amp  amp  i  lt   expectedCount   allowedDifference           Assert IsTrue countInRange  gt   90  String Format  Not enough were in range   0    countInRange               private IEnumerable lt int gt  Range int low  int count                for  var i   low  i  lt  low   count  i                yield return i             private static void AssertBetween int x  int low  int high  String message                Assert IsTrue x  gt  low  String Format  Value  0  is less than lower limit of  1    2    x  low  message            Assert IsTrue x  lt  high  String Format  Value  0  is more than upper limit of  1    2    x  high  message               private static void AssertBetween double x  double low  double high  String message                Assert IsTrue x  gt  low  String Format  Value  0  is less than lower limit of  1    2    x  low  message            Assert IsTrue x  lt  high  String Format  Value  0  is more than upper limit of  1    2    x  high  message

User · Answer

When N is very large  the normal method that randomly shuffles the N numbers and selects  say  first k numbers  can be prohibitive because of space complexity  The following algorithm requires only O k  for both time and space complexities   http   arxiv org abs 1512 00501  def random selection indices num samples  N       modified entries          seq          for n in xrange num samples           i   N - n - 1         j   random randrange i             swap a j  and a i           a j   modified entries j  if j in modified entries else j          a i   modified entries i  if i in modified entries else i          if a i    j              modified entries j    a i            elif j in modified entries      no need to store the modified value if it is the same as index             modified entries pop j           if a j    i              modified entries i    a j          elif i in modified entries      no need to store the modified value if it is the same as index             modified entries pop i          seq append a j      return seq

User · Answer

This is the best I could come up with on a first cut   public List lt String gt  getRandomItemsFromList int returnCount  List lt String gt  list        List lt String gt  returnList   new List lt String gt         Dictionary lt int  int gt  randoms   new Dictionary lt int  int gt          while  randoms Count    returnCount                  generate new random between one and total list count         int randomInt   new Random   Next list Count               store this in dictionary to ensure uniqueness         try                       randoms Add randomInt  randomInt                     catch  ArgumentException aex                        Console Write aex Message               we can assume this element exists in the dictonary already             check for randoms length and then iterate through the original list            adding items we select via random to the return list         if  randoms Count    returnCount                        foreach  int key in randoms Keys                  returnList Add list randoms key                  break    break out of  while  loop                      return returnList      Using a list of randoms within a range of 1 - total list count and then simply pulling those items in the list seemed to be the best way  but using the Dictionary to ensure uniqueness is something I m still mulling over   Also note I used a string list  replace as needed

User · Answer

Goal  Select N number of items from collection source without duplication  I created an extension for any generic collection  Here s how I did it   public static class CollectionExtension       public static IList lt TSource gt  RandomizeCollection lt TSource gt  this IList lt TSource gt  source  int maxItems                int randomCount   source Count  gt  maxItems   maxItems   source Count          int    randomizedIndices   new int  randomCount           Random random   new Random             for  int i   0  i  lt  randomizedIndices Length  i                          int randomResult   -1              while  randomizedIndices Contains  randomResult   random Next 0  source Count                                     0 - gt  since all list starts from index 0  source Count - gt  maximum number of items that can be randomize                   continue looping while the generated random number is already in the list of randomizedIndices                            randomizedIndices i    randomResult                     IList lt TSource gt  result   new List lt TSource gt             foreach  int index in randomizedIndices              result Add source ElementAt index             return result

User · Answer

12 years on and the this question is still active  I didn t find an implementation of Kyle s solution I liked so here it is  public IEnumerable lt T gt  TakeRandom lt T gt  IEnumerable lt T gt  collection  int take        var random   new Random        var available   collection Count        var needed   take      foreach  var item in collection                if  random Next available   lt  needed                        needed--              yield return item              if  needed    0                                break                                  available--

User · Answer

From Dragons in the Algorithm  an interpretation in C    int k   10     items to select var items   new List lt int gt  new     1  2  3  4  5  6  7  8  9  10  11  12     var selected   new List lt int gt     double needed   k  double available   items Count  var rand   new Random    while  selected Count  lt  k       if  rand NextDouble    lt  needed   available           selected Add items  int available-1         needed--          available--      This algorithm will select unique indicies of the items list

[c#] Select N random elements from a List<T> in C#

Examples related to c#

Examples related to algorithm

Examples related to collections

Examples related to random

Examples related to element