[php] Highlight the difference between two strings in PHP

What is the easiest way to highlight the difference between two strings in PHP?

I'm thinking along the lines of the Stack Overflow edit history page, where new text is in green and removed text is in red. If there are any pre-written functions or classes available, that would be ideal.

This question is related to php string diff word-diff

The answer is




If you want a robust library, Text_Diff (a PEAR package) looks to be pretty good. It has some pretty cool features.


This is a nice one, also http://paulbutler.org/archives/a-simple-diff-algorithm-in-php/

Solving the problem is not as simple as it seems, and the problem bothered me for about a year before I figured it out. I managed to write my algorithm in PHP, in 18 lines of code. It is not the most efficient way to do a diff, but it is probably the easiest to understand.

It works by finding the longest sequence of words common to both strings, and recursively finding the longest sequences of the remainders of the string until the substrings have no words in common. At this point it adds the remaining new words as an insertion and the remaining old words as a deletion.

You can download the source here: PHP SimpleDiff...


Another solution (for side-by-side comparison as opposed to a unified view): https://github.com/danmysak/side-by-side.


I came across this PHP diff class by Chris Boulton based on Python difflib which could be a good solution:

PHP Diff Lib


I would recommend looking at these awesome functions from PHP core:

similar_text — Calculate the similarity between two strings

http://www.php.net/manual/en/function.similar-text.php

levenshtein — Calculate Levenshtein distance between two strings

http://www.php.net/manual/en/function.levenshtein.php

soundex — Calculate the soundex key of a string

http://www.php.net/manual/en/function.soundex.php

metaphone — Calculate the metaphone key of a string

http://www.php.net/manual/en/function.metaphone.php


What you are looking for is a "diff algorithm". A quick google search led me to this solution. I did not test it, but maybe it will do what you need.


There is also a PECL extension for xdiff:

In particular:

Example from PHP Manual:

<?php
$old_article = file_get_contents('./old_article.txt');
$new_article = $_POST['article'];

$diff = xdiff_string_diff($old_article, $new_article, 1);
if (is_string($diff)) {
    echo "Differences between two articles:\n";
    echo $diff;
}


I had terrible trouble with the both the PEAR-based and the simpler alternatives shown. So here's a solution that leverages the Unix diff command (obviously, you have to be on a Unix system or have a working Windows diff command for it to work). Choose your favourite temporary directory, and change the exceptions to return codes if you prefer.

/**
 * @brief Find the difference between two strings, lines assumed to be separated by "\n|
 * @param $new string The new string
 * @param $old string The old string
 * @return string Human-readable output as produced by the Unix diff command,
 * or "No changes" if the strings are the same.
 * @throws Exception
 */
public static function diff($new, $old) {
  $tempdir = '/var/somewhere/tmp'; // Your favourite temporary directory
  $oldfile = tempnam($tempdir,'OLD');
  $newfile = tempnam($tempdir,'NEW');
  if (!@file_put_contents($oldfile,$old)) {
    throw new Exception('diff failed to write temporary file: ' . 
         print_r(error_get_last(),true));
  }
  if (!@file_put_contents($newfile,$new)) {
    throw new Exception('diff failed to write temporary file: ' . 
         print_r(error_get_last(),true));
  }
  $answer = array();
  $cmd = "diff $newfile $oldfile";
  exec($cmd, $answer, $retcode);
  unlink($newfile);
  unlink($oldfile);
  if ($retcode != 1) {
    throw new Exception('diff failed with return code ' . $retcode);
  }
  if (empty($answer)) {
    return 'No changes';
  } else {
    return implode("\n", $answer);
  }
}

If you want a robust library, Text_Diff (a PEAR package) looks to be pretty good. It has some pretty cool features.


Here is a short function you can use to diff two arrays. It implements the LCS algorithm:

function computeDiff($from, $to)
{
    $diffValues = array();
    $diffMask = array();

    $dm = array();
    $n1 = count($from);
    $n2 = count($to);

    for ($j = -1; $j < $n2; $j++) $dm[-1][$j] = 0;
    for ($i = -1; $i < $n1; $i++) $dm[$i][-1] = 0;
    for ($i = 0; $i < $n1; $i++)
    {
        for ($j = 0; $j < $n2; $j++)
        {
            if ($from[$i] == $to[$j])
            {
                $ad = $dm[$i - 1][$j - 1];
                $dm[$i][$j] = $ad + 1;
            }
            else
            {
                $a1 = $dm[$i - 1][$j];
                $a2 = $dm[$i][$j - 1];
                $dm[$i][$j] = max($a1, $a2);
            }
        }
    }

    $i = $n1 - 1;
    $j = $n2 - 1;
    while (($i > -1) || ($j > -1))
    {
        if ($j > -1)
        {
            if ($dm[$i][$j - 1] == $dm[$i][$j])
            {
                $diffValues[] = $to[$j];
                $diffMask[] = 1;
                $j--;  
                continue;              
            }
        }
        if ($i > -1)
        {
            if ($dm[$i - 1][$j] == $dm[$i][$j])
            {
                $diffValues[] = $from[$i];
                $diffMask[] = -1;
                $i--;
                continue;              
            }
        }
        {
            $diffValues[] = $from[$i];
            $diffMask[] = 0;
            $i--;
            $j--;
        }
    }    

    $diffValues = array_reverse($diffValues);
    $diffMask = array_reverse($diffMask);

    return array('values' => $diffValues, 'mask' => $diffMask);
}

It generates two arrays:

  • values array: a list of elements as they appear in the diff.
  • mask array: contains numbers. 0: unchanged, -1: removed, 1: added.

If you populate an array with characters, it can be used to compute inline difference. Now just a single step to highlight the differences:

function diffline($line1, $line2)
{
    $diff = computeDiff(str_split($line1), str_split($line2));
    $diffval = $diff['values'];
    $diffmask = $diff['mask'];

    $n = count($diffval);
    $pmc = 0;
    $result = '';
    for ($i = 0; $i < $n; $i++)
    {
        $mc = $diffmask[$i];
        if ($mc != $pmc)
        {
            switch ($pmc)
            {
                case -1: $result .= '</del>'; break;
                case 1: $result .= '</ins>'; break;
            }
            switch ($mc)
            {
                case -1: $result .= '<del>'; break;
                case 1: $result .= '<ins>'; break;
            }
        }
        $result .= $diffval[$i];

        $pmc = $mc;
    }
    switch ($pmc)
    {
        case -1: $result .= '</del>'; break;
        case 1: $result .= '</ins>'; break;
    }

    return $result;
}

Eg.:

echo diffline('StackOverflow', 'ServerFault')

Will output:

S<del>tackO</del><ins>er</ins>ver<del>f</del><ins>Fau</ins>l<del>ow</del><ins>t</ins> 

StackOerverfFaulowt

Additional notes:

  • The diff matrix requires (m+1)*(n+1) elements. So you can run into out of memory errors if you try to diff long sequences. In this case diff larger chunks (eg. lines) first, then diff their contents in a second pass.
  • The algorithm can be improved if you trim the matching elements from the beginning and the end, then run the algorithm on the differing middle only. A latter (more bloated) version contains these modifications too.

There is also a PECL extension for xdiff:

In particular:

Example from PHP Manual:

<?php
$old_article = file_get_contents('./old_article.txt');
$new_article = $_POST['article'];

$diff = xdiff_string_diff($old_article, $new_article, 1);
if (is_string($diff)) {
    echo "Differences between two articles:\n";
    echo $diff;
}


This is a nice one, also http://paulbutler.org/archives/a-simple-diff-algorithm-in-php/

Solving the problem is not as simple as it seems, and the problem bothered me for about a year before I figured it out. I managed to write my algorithm in PHP, in 18 lines of code. It is not the most efficient way to do a diff, but it is probably the easiest to understand.

It works by finding the longest sequence of words common to both strings, and recursively finding the longest sequences of the remainders of the string until the substrings have no words in common. At this point it adds the remaining new words as an insertion and the remaining old words as a deletion.

You can download the source here: PHP SimpleDiff...


I had terrible trouble with the both the PEAR-based and the simpler alternatives shown. So here's a solution that leverages the Unix diff command (obviously, you have to be on a Unix system or have a working Windows diff command for it to work). Choose your favourite temporary directory, and change the exceptions to return codes if you prefer.

/**
 * @brief Find the difference between two strings, lines assumed to be separated by "\n|
 * @param $new string The new string
 * @param $old string The old string
 * @return string Human-readable output as produced by the Unix diff command,
 * or "No changes" if the strings are the same.
 * @throws Exception
 */
public static function diff($new, $old) {
  $tempdir = '/var/somewhere/tmp'; // Your favourite temporary directory
  $oldfile = tempnam($tempdir,'OLD');
  $newfile = tempnam($tempdir,'NEW');
  if (!@file_put_contents($oldfile,$old)) {
    throw new Exception('diff failed to write temporary file: ' . 
         print_r(error_get_last(),true));
  }
  if (!@file_put_contents($newfile,$new)) {
    throw new Exception('diff failed to write temporary file: ' . 
         print_r(error_get_last(),true));
  }
  $answer = array();
  $cmd = "diff $newfile $oldfile";
  exec($cmd, $answer, $retcode);
  unlink($newfile);
  unlink($oldfile);
  if ($retcode != 1) {
    throw new Exception('diff failed with return code ' . $retcode);
  }
  if (empty($answer)) {
    return 'No changes';
  } else {
    return implode("\n", $answer);
  }
}

Here is a short function you can use to diff two arrays. It implements the LCS algorithm:

function computeDiff($from, $to)
{
    $diffValues = array();
    $diffMask = array();

    $dm = array();
    $n1 = count($from);
    $n2 = count($to);

    for ($j = -1; $j < $n2; $j++) $dm[-1][$j] = 0;
    for ($i = -1; $i < $n1; $i++) $dm[$i][-1] = 0;
    for ($i = 0; $i < $n1; $i++)
    {
        for ($j = 0; $j < $n2; $j++)
        {
            if ($from[$i] == $to[$j])
            {
                $ad = $dm[$i - 1][$j - 1];
                $dm[$i][$j] = $ad + 1;
            }
            else
            {
                $a1 = $dm[$i - 1][$j];
                $a2 = $dm[$i][$j - 1];
                $dm[$i][$j] = max($a1, $a2);
            }
        }
    }

    $i = $n1 - 1;
    $j = $n2 - 1;
    while (($i > -1) || ($j > -1))
    {
        if ($j > -1)
        {
            if ($dm[$i][$j - 1] == $dm[$i][$j])
            {
                $diffValues[] = $to[$j];
                $diffMask[] = 1;
                $j--;  
                continue;              
            }
        }
        if ($i > -1)
        {
            if ($dm[$i - 1][$j] == $dm[$i][$j])
            {
                $diffValues[] = $from[$i];
                $diffMask[] = -1;
                $i--;
                continue;              
            }
        }
        {
            $diffValues[] = $from[$i];
            $diffMask[] = 0;
            $i--;
            $j--;
        }
    }    

    $diffValues = array_reverse($diffValues);
    $diffMask = array_reverse($diffMask);

    return array('values' => $diffValues, 'mask' => $diffMask);
}

It generates two arrays:

  • values array: a list of elements as they appear in the diff.
  • mask array: contains numbers. 0: unchanged, -1: removed, 1: added.

If you populate an array with characters, it can be used to compute inline difference. Now just a single step to highlight the differences:

function diffline($line1, $line2)
{
    $diff = computeDiff(str_split($line1), str_split($line2));
    $diffval = $diff['values'];
    $diffmask = $diff['mask'];

    $n = count($diffval);
    $pmc = 0;
    $result = '';
    for ($i = 0; $i < $n; $i++)
    {
        $mc = $diffmask[$i];
        if ($mc != $pmc)
        {
            switch ($pmc)
            {
                case -1: $result .= '</del>'; break;
                case 1: $result .= '</ins>'; break;
            }
            switch ($mc)
            {
                case -1: $result .= '<del>'; break;
                case 1: $result .= '<ins>'; break;
            }
        }
        $result .= $diffval[$i];

        $pmc = $mc;
    }
    switch ($pmc)
    {
        case -1: $result .= '</del>'; break;
        case 1: $result .= '</ins>'; break;
    }

    return $result;
}

Eg.:

echo diffline('StackOverflow', 'ServerFault')

Will output:

S<del>tackO</del><ins>er</ins>ver<del>f</del><ins>Fau</ins>l<del>ow</del><ins>t</ins> 

StackOerverfFaulowt

Additional notes:

  • The diff matrix requires (m+1)*(n+1) elements. So you can run into out of memory errors if you try to diff long sequences. In this case diff larger chunks (eg. lines) first, then diff their contents in a second pass.
  • The algorithm can be improved if you trim the matching elements from the beginning and the end, then run the algorithm on the differing middle only. A latter (more bloated) version contains these modifications too.

I came across this PHP diff class by Chris Boulton based on Python difflib which could be a good solution:

PHP Diff Lib


What you are looking for is a "diff algorithm". A quick google search led me to this solution. I did not test it, but maybe it will do what you need.


If you want a robust library, Text_Diff (a PEAR package) looks to be pretty good. It has some pretty cool features.


Another solution (for side-by-side comparison as opposed to a unified view): https://github.com/danmysak/side-by-side.


If you want a robust library, Text_Diff (a PEAR package) looks to be pretty good. It has some pretty cool features.


What you are looking for is a "diff algorithm". A quick google search led me to this solution. I did not test it, but maybe it will do what you need.


Just wrote a class to compute smallest (not to be taken literally) number of edits to transform one string into another string:

http://www.raymondhill.net/finediff/

It has a static function to render a HTML version of the diff.

It's a first version, and likely to be improved, but it works just fine as of now, so I am throwing it out there in case someone needs to generate a compact diff efficiently, like I needed.

Edit: It's on Github now: https://github.com/gorhill/PHP-FineDiff


I would recommend looking at these awesome functions from PHP core:

similar_text — Calculate the similarity between two strings

http://www.php.net/manual/en/function.similar-text.php

levenshtein — Calculate Levenshtein distance between two strings

http://www.php.net/manual/en/function.levenshtein.php

soundex — Calculate the soundex key of a string

http://www.php.net/manual/en/function.soundex.php

metaphone — Calculate the metaphone key of a string

http://www.php.net/manual/en/function.metaphone.php


Examples related to php

I am receiving warning in Facebook Application using PHP SDK Pass PDO prepared statement to variables Parse error: syntax error, unexpected [ Preg_match backtrack error Removing "http://" from a string How do I hide the PHP explode delimiter from submitted form results? Problems with installation of Google App Engine SDK for php in OS X Laravel 4 with Sentry 2 add user to a group on Registration php & mysql query not echoing in html with tags? How do I show a message in the foreach loop?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to diff

Create patch or diff file from git repository and apply it to another different git repository Comparing the contents of two files in Sublime Text Git diff between current branch and master but not including unmerged master commits Fast way of finding lines in one file that are not in another? Python - difference between two strings How to see the changes in a Git commit? unix diff side-to-side results? Find the files existing in one directory but not in the other git diff between two different files How to get the difference (only additions) between two files in linux

Examples related to word-diff

Highlight the difference between two strings in PHP