[arrays] Comparing two arrays & get the values which are not common

i wanted a small logic to compare contents of two arrays & get the value which is not common amongst them using powershell

example if

$a1=@(1,2,3,4,5)
$b1=@(1,2,3,4,5,6)

$c which is the output should give me the value "6" which is the output of what's the uncommon value between both the arrays.

Can some one help me out with the same! thanks!

This question is related to arrays powershell

The answer is


Your results will not be helpful unless the arrays are first sorted. To sort an array, run it through Sort-Object.

$x = @(5,1,4,2,3)
$y = @(2,4,6,1,3,5)

Compare-Object -ReferenceObject ($x | Sort-Object) -DifferenceObject ($y | Sort-Object)

Collection

$a = 1..5
$b = 4..8

$Yellow = $a | Where {$b -NotContains $_}

$Yellow contains all the items in $a except the ones that are in $b:

PS C:\> $Yellow
1
2
3

$Blue = $b | Where {$a -NotContains $_}

$Blue contains all the items in $b except the ones that are in $a:

PS C:\> $Blue
6
7
8

$Green = $a | Where {$b -Contains $_}

Not in question, but anyways; Green contains the items that are in both $a and $b.

PS C:\> $Green
4
5

Note: Where is an alias of Where-Object. Alias can introduce possible problems and make scripts hard to maintain.


Addendum 12 October 2019

As commented by @xtreampb and @mklement0: although not shown from the example in the question, the task that the question implies (values "not in common") is the symmetric difference between the two input sets (the union of yellow and blue).

Union

The symmetric difference between the $a and $b can be literally defined as the union of $Yellow and $Blue:

$NotGreen = $Yellow + $Blue

Which is written out:

$NotGreen = ($a | Where {$b -NotContains $_}) + ($b | Where {$a -NotContains $_})

Performance

As you might notice, there are quite some (redundant) loops in this syntax: all items in list $a iterate (using Where) through items in list $b (using -NotContains) and visa versa. Unfortunately the redundancy is difficult to avoid as it is difficult to predict the result of each side. A Hash Table is usually a good solution to improve the performance of redundant loops. For this, I like to redefine the question: Get the values that appear once in the sum of the collections ($a + $b):

$Count = @{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}

By using the ForEach statement instead of the ForEach-Object cmdlet and the Where method instead of the Where-Object you might increase the performance by a factor 2.5:

$Count = @{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1})

LINQ

But Language Integrated Query (LINQ) will easily beat any native PowerShell and native .Net methods (see also High Performance PowerShell with LINQ and mklement0's answer for Can the following Nested foreach loop be simplified in PowerShell?:

To use LINQ you need to explicitly define the array types:

[Int[]]$a = 1..5
[Int[]]$b = 4..8

And use the [Linq.Enumerable]:: operator:

$Yellow   = [Int[]][Linq.Enumerable]::Except($a, $b)
$Blue     = [Int[]][Linq.Enumerable]::Except($b, $a)
$Green    = [Int[]][Linq.Enumerable]::Intersect($a, $b)
$NotGreen = [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))

Benchmark

Benchmark results highly depend on the sizes of the collections and how many items there are actually shared, as a "average", I am presuming that half of each collection is shared with the other.

Using             Time
Compare-Object    111,9712
NotContains       197,3792
ForEach-Object    82,8324
ForEach Statement 36,5721
LINQ              22,7091

To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.

$a = 1..1000
$b = 500..1500

(Measure-Command {
    Compare-Object -ReferenceObject $a -DifferenceObject $b  -PassThru
}).TotalMilliseconds
(Measure-Command {
    ($a | Where {$b -NotContains $_}), ($b | Where {$a -NotContains $_})
}).TotalMilliseconds
(Measure-Command {
    $Count = @{}
    $a + $b | ForEach-Object {$Count[$_] += 1}
    $Count.Keys | Where-Object {$Count[$_] -eq 1}
}).TotalMilliseconds

(Measure-Command {
    $Count = @{}
    ForEach ($Item in $a + $b) {$Count[$Item] += 1}
    $Count.Keys.Where({$Count[$_] -eq 1})
}).TotalMilliseconds

[Int[]]$a = $a
[Int[]]$b = $b
(Measure-Command {
    [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
}).TotalMilliseconds

This should help, uses simple hash table.

$a1=@(1,2,3,4,5) $b1=@(1,2,3,4,5,6)


$hash= @{}

#storing elements of $a1 in hash
foreach ($i in $a1)
{$hash.Add($i, "present")}

#define blank array $c
$c = @()

#adding uncommon ones in second array to $c and removing common ones from hash
foreach($j in $b1)
{
if(!$hash.ContainsKey($j)){$c = $c+$j}
else {hash.Remove($j)}
}

#now hash is left with uncommon ones in first array, so add them to $c
foreach($k in $hash.keys)
{
$c = $c + $k
}

Try:

$a1=@(1,2,3,4,5)
$b1=@(1,2,3,4,5,6)
(Compare-Object $a1 $b1).InputObject

Or, you can use:

(Compare-Object $b1 $a1).InputObject

The order doesn't matter.


Look at Compare-Object

Compare-Object $a1 $b1 | ForEach-Object { $_.InputObject }

Or if you would like to know where the object belongs to, then look at SideIndicator:

$a1=@(1,2,3,4,5,8)
$b1=@(1,2,3,4,5,6)
Compare-Object $a1 $b1