How do I create a URL shortener

Question

I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to  http   www example org abcdef    Instead of  abcdef  there can be any other string with six characters containing a-z  A-Z and 0-9  That makes 56 57 billion possible strings   My approach   I have a database table with three columns    id  integer  auto-increment long  string  the long URL the user entered short  string  the shortened URL  or just the six characters    I would then insert the long URL into the table  Then I would select the auto-increment value for  id  and build a hash of it  This hash should then be inserted as  short   But what sort of hash should I build  Hash algorithms like MD5 create too long strings  I don t use these algorithms  I think  A self-built algorithm will work  too   My idea   For  http   www google de   I get the auto-increment id 239472  Then I do the following steps   short       if divisible by 2  add  a  the result to short if divisible by 3  add  b  the result to short     until I have divisors for a-z and A-Z    That could be repeated until the number isn t divisible any more  Do you think this is a good approach  Do you have a better idea      Due to the ongoing interest in this topic  I ve published an efficient solution to GitHub  with implementations for JavaScript  PHP  Python and Java  Add your solutions if you like

User · Answer

Here is a decent URL encoding function for PHP        From http   snipplr com view 22246 base62-encode--decode  private function base encode  val   base 62   chars  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ          str           do            i   fmod  val   base            str    chars  i     str           val     val -  i     base        while  val  gt  0       return  str

User · Answer

This is what I use     Generate a  0-9a-zA-Z  string ALPHABET   map str range 0  10     map chr  range 97  123    range 65  91    def encode id id number  alphabet ALPHABET          Convert an integer to a string         if id number    0          return alphabet 0       alphabet len   len alphabet    Cache      result          while id number  gt  0          id number  mod   divmod id number  alphabet len          result   alphabet mod    result      return result  def decode id id string  alphabet ALPHABET          Convert a string to an integer         alphabet len   len alphabet    Cache     return sum  alphabet index char    pow alphabet len  power  for power  char in enumerate reversed id string       It s very fast and can take long integers

User · Answer

Take a look at https   hashids org  it is open source and in many languages   Their page outlines some of the pitfalls of other approaches

User · Answer

I keep incrementing an integer sequence per domain in the database and use Hashids to encode the integer into a URL path   static hashids   Hashids salt    my app rocks   minSize   6    I ran a script to see how long it takes until it exhausts the character length  For six characters it can do 164 916 224 links and then goes up to seven characters  Bitly uses seven characters  Under five characters looks weird to me   Hashids can decode the URL path back to a integer but a simpler solution is to use the entire short link sho rt ka8ds3 as a primary key   Here is the full concept   function addDomain domain        table  domains   insert  domain   domain   seq   0     function addURL domain  longURL        seq   table  domains   where  domain       domain  increment  seq       shortURL   domain         hashids encode seq      table  links   insert  short   shortURL   long   longURL      return shortURL       GET   hashcode function handleRequest req  res        shortURL   req host         req param  hashcode       longURL   table  links   where  short       shortURL  get  long       res redirect 301  longURL

User · Answer

This is my initial thoughts  and more thinking can be done  or some simulation can be made to see if it works well or any improvement is needed   My answer is to remember the long URL in the database  and use the ID 0 to 9999999999999999  or however large the number is needed    But the ID 0 to 9999999999999999 can be an issue  because    it can be shorter if we use hexadecimal  or even base62 or base64   base64 just like YouTube using A-Z a-z 0-9   and -  if it increases from 0 to 9999999999999999 uniformly  then hackers can visit them in that order and know what URLs people are sending each other  so it can be a privacy issue   We can do this    have one server allocate 0 to 999 to one server  Server A  so now Server A has 1000 of such IDs   So if there are 20 or 200 servers constantly wanting new IDs  it doesn t have to keep asking for each new ID  but rather asking once for 1000 IDs for the ID 1  for example  reverse the bits  So  000   00000001 becomes 10000   000  so that when converted to base64  it will be non-uniformly increasing IDs each time  use XOR to flip the bits for the final IDs  For example  XOR with 0xD5AA96   2373  like a secret key   and the some bits will be flipped   whenever the secret key has the 1 bit on  it will flip the bit of the ID   This will make the IDs even harder to guess and appear more random   Following this scheme  the single server that allocates the IDs can form the IDs  and so can the 20 or 200 servers requesting the allocation of IDs  The allocating server has to use a lock   semaphore to prevent two requesting servers from getting the same batch  or if it is accepting one connection at a time  this already solves the problem   So we don t want the line  queue  to be too long for waiting to get an allocation  So that s why allocating 1000 or 10000 at a time can solve the issue

User · Answer

Not an answer to your question  but I wouldn t use case-sensitive shortened URLs  They are hard to remember  usually unreadable  many fonts render 1 and l  0 and O and other characters very very similar that they are near impossible to tell the difference  and downright error prone  Try to use lower or upper case only   Also  try to have a format where you mix the numbers and characters in a predefined form  There are studies that show that people tend to remember one form better than others  think phone numbers  where the numbers are grouped in a specific form   Try something like num-char-char-num-char-char  I know this will lower the combinations  especially if you don t have upper and lower case  but it would be more usable and therefore useful

User · Answer

alphabet   map chr  range 97 123  range 65 91     map str range 0 10    def lookup k  a alphabet       if type k     int          return a k      elif type k     str          return a index k    def encode i  a alphabet          Takes an integer and returns it in the given base with mappings for upper lower case letters and numbers 0-9         try          i   int i      except Exception          raise TypeError  Input must be an integer         def incode i i  p 1  a a             Here to protect p                                                                                                                                                                                                                          if i  lt   61              return lookup i           else              pval   pow 62 p              nval   i pval             remainder   i   pval             if nval  lt   61                  return lookup nval    incode i   pval              else                  return incode i  p 1       return incode      def decode s  a alphabet          Takes a base 62 string in our alphabet and returns it in base10         try          s   str s      except Exception          raise TypeError  Input must be a string         return sum  lookup i    pow 62 p  for p i in enumerate list reversed s     a   Here s my version for whomever needs it

User · Answer

You could hash the entire URL  but if you just want to shorten the id  do as marcel suggested  I wrote this Python implementation   https   gist github com 778542

User · Answer

Why not just translate your id to a string  You just need a function that maps a digit between  say  0 and 61 to a single letter  upper lower case  or digit  Then apply this to create  say  4-letter codes  and you ve got 14 7 million URLs covered

User · Answer

I would continue your  convert number to string  approach  However  you will realize that your proposed algorithm fails if your ID is a prime and greater than 52   Theoretical background  You need a Bijective Function f  This is necessary so that you can find a inverse function g  abc     123 for your f 123     abc  function  This means    There must be no x1  x2  with x1   x2  that will make f x1    f x2   and for every y you must be able to find an x so that f x    y    How to convert the ID to a shortened URL   Think of an alphabet we want to use  In your case  that s  a-zA-Z0-9   It contains 62 letters  Take an auto-generated  unique numerical key  the auto-incremented id of a MySQL table for example    For this example  I will use 12510  125 with a base of 10   Now you have to convert 12510 to X62  base 62    12510   2  621   1  620    2 1   This requires the use of integer division and modulo  A pseudo-code example   digits       while num  gt  0   remainder   modulo num  62    digits push remainder    num   divide num  62   digits   digits reverse   Now map the indices 2 and 1 to your alphabet  This is how your mapping  with an array for example  could look like   0    a 1    b     25   z     52   0 61   9   With 2   c and 1   b  you will receive cb62 as the shortened URL   http   shor ty cb    How to resolve a shortened URL to the initial ID  The reverse is even easier  You just do a reverse lookup in your alphabet    e9a62 will be resolved to  4th  61st  and 0th letter in the alphabet    e9a62    4 61 0    4  622   61  621   0  620   1915810 Now find your database-record with WHERE id   19158 and do the redirect    Example implementations  provided by commenters    C   Python Ruby Haskell C  CoffeeScript Perl

User · Answer

Very good answer  I have created a Golang implementation of the bjf   package bjf  import        math       strings       strconv     const alphabet    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789   func Encode num string  string       n       strconv ParseUint num  10  64      t    make   byte  0          Special case        if n    0           return string alphabet 0                 Map        for n  gt  0           r    n   uint64 len alphabet           t   append t  alphabet r           n   n   uint64 len alphabet                 Reverse        for i  j    0  len t  - 1  i  lt  j  i  j   i   1  j - 1           t i   t j    t j   t i             return string t     func Decode token string  int       r    int 0      p    float64 len token   - 1      for i    0  i  lt  len token   i             r    strings Index alphabet  string token i      int math Pow float64 len alphabet    p           p--            return r     Hosted at github  https   github com xor-gate go-bjf

User · Answer

C  version   public class UrlShortener        private static String ALPHABET    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789       private static int    BASE       62       public static String encode int num                StringBuilder sb   new StringBuilder             while   num  gt  0                         sb Append  ALPHABET   num   BASE                   num    BASE                     StringBuilder builder   new StringBuilder            for  int i   sb Length - 1  i  gt   0  i--                        builder Append sb i                      return builder ToString                public static int decode String str                int num   0           for   int i   0  len   str Length  i  lt  len  i                           num   num   BASE   ALPHABET IndexOf  str  i                          return num

User · Answer

Did you omit O  0  and i on purpose   I just created a PHP class based on Ryan s solution    lt  php       shorty   new App Shorty         echo  ID      1000      echo   lt br  gt  Short link       shorty- gt encode 1000       echo   lt br  gt  Decoded Short Link       shorty- gt decode  shorty- gt encode 1000                     A nice shorting class based on Ryan Charmley s suggestion see the link on Stack Overflow below          author Svetoslav Marinov  Slavi    http   WebWeb ca         see http   stackoverflow com questions 742013 how-to-code-a-url-shortener 10386945 10386945             class App Shorty                          Explicitly omitted  i  o  1  0 because they are confusing  Also use only lowercase     as            dictating this over the phone might be tough              var string                     private  dictionary    abcdfghjklmnpqrstvwxyz23456789           private  dictionary array   array             public function   construct                  this- gt dictionary array   str split  this- gt dictionary                                     Gets ID and converts it into a string              param int  id                     public function encode  id                 str id                    base   count  this- gt dictionary array                while   id  gt  0                     rem    id    base                   id     id -  rem     base                   str id     this- gt dictionary array  rem                              return  str id                                    Converts  abc into an integer ID             param string             return int  id                     public function decode  str id                 id   0               id ar   str split  str id                base   count  this- gt dictionary array                for   i   count  id ar    i  gt  0   i--                     id    array search  id ar  i - 1    this- gt dictionary array    pow  base   i - 1                             return  id                    gt

User · Answer

Function based in Xeoncross Class   function shortly  input    dictionary     a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t   u   v   w   x   y   z   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z   0   1   2   3   4   5   6   7   8   9    if  input   0      return  dictionary 0    base   count  dictionary   if is numeric  input         result           while  input  gt  0            result      dictionary   input    base             input   floor  input    base             return join     array reverse  result       i   0   input   str split  input   foreach  input as  char        pos   array search  char   dictionary        i    i    base    pos    return  i

User · Answer

For a quality Node js   JavaScript solution  see the id-shortener module  which is thoroughly tested and has been used in production for months   It provides an efficient id   URL shortener backed by pluggable storage defaulting to Redis  and you can even customize your short id character set and whether or not shortening is idempotent  This is an important distinction that not all URL shorteners take into account   In relation to other answers here  this module implements the Marcel Jackwerth s excellent accepted answer above   The core of the solution is provided by the following Redis Lua snippet   local sequence   redis call  incr   KEYS 1    local chars    0123456789ABCDEFGHJKLMNPQRSTUVWXYZ abcdefghijkmnopqrstuvwxyz  local remaining   sequence local slug       while  remaining  gt  0  do   local d    remaining   60    local character   string sub chars  d   1  d   1     slug   character    slug   remaining    remaining - d    60 end  redis call  hset   KEYS 2   slug  ARGV 1    return slug

User · Answer

Don t know if anyone will find this useful - it is more of a  hack n slash  method  yet is simple and works nicely if you want only specific chars    dictionary    abcdfghjklmnpqrstvwxyz23456789    dictionary   str split  dictionary       Encode  str id        base   count  dictionary    while  id  gt  0         rem    id    base       id     id -  rem     base       str id     dictionary  rem          Decode  id ar   str split  str id    id   0   for  i   count  id ar    i  gt  0   i--         id    array search  id ar  i-1    dictionary    pow  base   i - 1

User · Answer

Why not just generate a random string and append it to the base URL  This is a very simplified version of doing this in C     static string chars    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890   static string baseUrl    https   google com     private static string RandomString int length        char   s   new char length       Random rnd   new Random        for  int x   0  x  lt  length  x                  s x    chars rnd Next chars Length              Thread Sleep 10        return new String s       Then just add the append the random string to the baseURL   string tinyURL   baseUrl   RandomString 5     Remember this is a very simplified version of doing this and it s possible the RandomString method could create duplicate strings  In production you would want to take in account for duplicate strings to ensure you will always have a unique URL  I have some code that takes account for duplicate strings by querying a database table I could share if anyone is interested

User · Answer

A Node js and MongoDB solution  Since we know the format that MongoDB uses to create a new ObjectId with 12 bytes    a 4-byte value representing the seconds since the Unix epoch  a 3-byte machine identifier  a 2-byte process id a 3-byte counter  in your machine   starting with a random value    Example  I choose a random sequence  a1b2c3d4e5f6g7h8i9j1k2l3   a1b2c3d4 represents the seconds since the Unix epoch  4e5f6g7 represents machine identifier  h8i9 represents process id j1k2l3 represents the counter  starting with a random value    Since the counter will be unique if we are storing the data in the same machine we can get it with no doubts that it will be duplicate   So the short URL will be the counter and here is a code snippet assuming that your server is running properly   const mongoose   require  mongoose    const Schema   mongoose Schema      Create a schema const shortUrl   new Schema       long url    type  String  required  true        short url    type  String  required  true  unique  true          const ShortUrl   mongoose model  ShortUrl   shortUrl       The user can request to get a short URL by providing a long URL using a form  app post   shorten   function req  res          Create a new shortUrl           The submit form has an input with longURL as its name attribute      const longUrl   req body  longURL        const newUrl   ShortUrl           long url   longUrl          short url                   const shortUrl   newUrl  id toString   slice -6       newUrl short url   shortUrl      console log newUrl       newUrl save function err           console log  the new URL is added

User · Answer

Here is a Node js implementation that is likely to bit ly  generate a highly random seven-character string    It uses Node js crypto to generate a highly random 25 charset rather than randomly selecting seven characters   var crypto   require  crypto    exports shortURL   new function          this getShortURL   function              var sURL                    rand   crypto randomBytes 25  toString  hex                 base    rand length          for  var i   0  i  lt  7  i                sURL     rand charAt Math floor Math random      rand length            return sURL

User · Answer

For a similar project  to get a new key  I make a wrapper function around a random string generator that calls the generator until I get a string that hasn t already been used in my hashtable  This method will slow down once your name space starts to get full  but as you have said  even with only 6 characters  you have plenty of namespace to work with

User · Answer

If you don t want re-invent the wheel     http   lilurl sourceforge net

User · Answer

public class UrlShortener       private static final String ALPHABET    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789       private static final int    BASE       ALPHABET length         public static String encode int num            StringBuilder sb   new StringBuilder            while    num  gt  0                 sb append  ALPHABET charAt  num   BASE                  num    BASE                    return sb reverse   toString                  public static int decode String str            int num   0          for   int i   0  i  lt  str length    i                 num   num   BASE   ALPHABET indexOf str charAt i            return num

User · Answer

Implementation in Scala   class Encoder alphabet  String  extends  Long   gt  String       val Base   alphabet size    override def apply number  Long          def encode current  Long   List Int            if  current    0  Nil       else  current   Base  toInt    encode current   Base            encode number  reverse        map current   gt  alphabet charAt current   mkString        class Decoder alphabet  String  extends  String   gt  Long       val Base   alphabet size    override def apply string  String          def decode current  Long  encodedPart  String   Long           if  encodedPart size    0  current       else decode current   Base   alphabet indexOf encodedPart head  encodedPart tail            decode 0 string          Test example with Scala test   import org scalatest  FlatSpec  Matchers   class DecoderAndEncoderTest extends FlatSpec with Matchers      val Alphabet    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789      A number with base 10  should  be correctly encoded into base 62 string  in       val encoder   new Encoder Alphabet      encoder 127  should be   cd       encoder 543513414  should be   KWGPy           A base 62 string  should  be correctly decoded into a number with base 10  in       val decoder   new Decoder Alphabet      decoder  cd   should be  127      decoder  KWGPy   should be  543513414

User · Answer

I have a variant of the problem  in that I store web pages from many different authors and need to prevent discovery of pages by guesswork  So my short URLs add a couple of extra digits to the Base-62 string for the page number  These extra digits are generated from information in the page record itself and they ensure that only 1 in 3844 URLs are valid  assuming 2-digit Base-62   You can see an outline description at http   mgscan com MBWL

User · Answer

Why would you want to use a hash  You can just use a simple translation of your auto-increment value to an alphanumeric value  You can do that easily by using some base conversion  Say you character space  A-Z  a-z  0-9  etc   has 62 characters  convert the id to a base-40 number and use the characters as the digits

User · Answer

simple approach   original id   56789    shortened id   base convert  original id  10  36     un shortened id   base convert  shortened id  36  10

User · Answer

public class TinyUrl                private final String characterMap    quot abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 quot           private final int charBase   characterMap length                 public String covertToCharacter int num               StringBuilder sb   new StringBuilder                     while  num  gt  0                   sb append characterMap charAt num   charBase                    num    charBase                                 return sb reverse   toString                           public int covertToInteger String str               int num   0              for int i   0   i lt  str length    i                    num    characterMap indexOf str charAt i     Math pow charBase    str length   -  i   1                      return num                   class TinyUrlTest           public static void main String   args            TinyUrl tinyUrl   new TinyUrl            int num   122312215          String url   tinyUrl covertToCharacter num           System out println  quot Tiny url    quot    url           System out println  quot Id   quot    tinyUrl covertToInteger url

User · Answer

Here is one I have created and deployed in Google Cloud console  It is written in Java and Spring boot  it is https   jol ink If you want detail  just let me know in comments section  I will edit this post and  explain it in detail

User · Answer

My approach  Take the Database ID  then Base36 Encode it  I would NOT use both Upper AND Lowercase letters  because that makes transmitting those URLs over the telephone a nightmare  but you could of course easily extend the function to be a base 62 en decoder

User · Answer

Here is my PHP 5 class     lt  php class Bijective       public  dictionary    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789        public function   construct                  this- gt dictionary   str split  this- gt dictionary              public function encode  i                if   i    0          return  this- gt dictionary 0             result                base   count  this- gt dictionary            while   i  gt  0                         result      this- gt dictionary   i    base                 i   floor  i    base                       result   array reverse  result            return join      result              public function decode  input                 i   0           base   count  this- gt dictionary             input   str split  input            foreach  input as  char                         pos   array search  char   this- gt dictionary                 i    i    base    pos                     return  i

User · Answer

My Python 3 version  base list   list  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ   base   len base list   def encode num  int       result          if num    0          result append base list 0        while num  gt  0          result append base list num   base           num     base      print    join reversed result     def decode code  str       num   0     code list   list code      for index  code in enumerate reversed code list            num    base list index code    base    index     print num   if   name         main         encode 341413134141      decode  60FoItT

[algorithm] How do I create a URL shortener?

Examples related to algorithm

Examples related to url