How to join components of a path when you are constructing a URL in Python

Question

For example  I want to join a prefix path to resource paths like  js foo js   I want the resulting path to be relative to the root of the server   In the above example if the prefix was  media  I would want the result to be  media js foo js   os path join does this really well  but how it joins paths is OS dependent   In this case I know I am targeting the web  not the local file system   Is there a best alternative when you are working with paths you know will be used in URLs   Will os path join work well enough   Should I just roll my own

User · Answer

I found things not to like about all the above solutions  so I came up with my own  This version makes sure parts are joined with a single slash and leaves leading and trailing slashes alone  No pip install  no urllib parse urljoin weirdness   In  1   from functools import reduce  In  2   def join slash a  b               return a rstrip              b lstrip               In  3   def urljoin  args               return reduce join slash  args  if args else             In  4   parts     https   foo-bar quux net     foo    bar     bat      quux     In  5   urljoin  parts  Out 5    https   foo-bar quux net foo bar bat quux    In  6   urljoin  https   quux com      path    to file         here    Out 6    https   quux com path to file here    In  7   urljoin   Out 7       In  8   urljoin       beware    of this      Out 8     beware of this      In  9   urljoin   leading    and      trailing     slash    Out 9     leading and trailing slash

User · Answer

How about this  It is Somewhat Efficient  amp  Somewhat Simple  Only need to join  2  parts of url path  def UrlJoin a   b       a  b   a strip    b strip       a   a if a endswith      else a           b   b if not b startswith      else b 1       return a   b  OR  More Conventional  but Not as efficient if joining only 2 url parts of a path  def UrlJoin  parts       return     join  p strip   strip      for p in parts    Test Cases   gt  gt  gt  UrlJoin  https   example com      TestURL 1    https   example com TestURL 1    gt  gt  gt  UrlJoin  https   example com    TestURL 2    https   example com TestURL 2   Note  I may be splitting hairs here  but it is at least good practice and potentially more readable

User · Answer

Like you say  os path join joins paths based on the current os  posixpath is the underlying module that is used on posix systems under the namespace os path    gt  gt  gt  os path join is posixpath join True  gt  gt  gt  posixpath join   media     js foo js     media js foo js    So you can just import and use posixpath join instead for urls  which is available and will work on any platform   Edit   Pete s suggestion is a good one  you can alias the import for increased readability  from posixpath import join as urljoin   Edit  I think this is made clearer  or at least helped me understand  if you look into the source of os py  the code here is from Python 2 7 11  plus I ve trimmed some bits   There s conditional imports in os py that picks which path module to use in the namespace os path  All the underlying modules  posixpath  ntpath  os2emxpath  riscospath  that may be imported in os py  aliased as path  are there and exist to be used on all systems  os py is just picking one of the modules to use in the namespace os path at run time based on the current OS     os py import sys  errno   names   sys builtin module names  if  posix  in  names                from posix import                 import posixpath as path            elif  nt  in  names                from nt import                 import ntpath as path            elif  os2  in  names                from os2 import                 if sys version find  EMX GCC      -1          import ntpath as path     else          import os2emxpath as path         from  emx link import link            elif  ce  in  names                from ce import                   We can use the standard Windows path      import ntpath as path  elif  riscos  in  names                from riscos import                 import riscospath as path            else      raise ImportError   no os specific module found

User · Answer

Since  from the comments the OP posted  it seems he doesn t want to preserve  absolute URLs  in the join  which is one of the key jobs of urlparse urljoin -   I d recommend avoiding that   os path join would also be bad  for exactly the same reason   So  I d use something like     join s strip      for s in pieces   if the leading   must also be ignored -- if the leading piece must be special-cased  that s also feasible of course -

User · Answer

The basejoin function in the urllib package might be what you re looking for   basejoin   urljoin base  url  allow fragments True      Join a base URL and a possibly relative URL to form an absolute     interpretation of the latter    Edit  I didn t notice before  but urllib basejoin seems to map directly to urlparse urljoin  making the latter preferred

User · Answer

Using furl and regex  python 3    gt  gt  gt  import re  gt  gt  gt  import furl  gt  gt  gt  p   re compile r          gt  gt  gt  url   furl furl   media path   add path   js foo js   url  gt  gt  gt  url   media path js foo js   gt  gt  gt  p sub r  1   url    media path js foo js   gt  gt  gt  url   furl furl   media path   add path  js foo js   url  gt  gt  gt  url   media path js foo js   gt  gt  gt  p sub r  1   url    media path js foo js   gt  gt  gt  url   furl furl   media path    add path  js foo js   url  gt  gt  gt  url   media path js foo js   gt  gt  gt  p sub r  1   url    media path js foo js   gt  gt  gt  url   furl furl   media   path      add path    js   foo js   url  gt  gt  gt  url   media   path     js   foo js   gt  gt  gt  p sub r  1   url    media path js foo js

User · Answer

Rune Kaagaard provided a great and compact solution that worked for me  I expanded on it a little   def urljoin  args       trailing slash       if args -1  endswith      else        return     join map lambda x  str x  strip       args     trailing slash   This allows all arguments to be joined regardless of trailing and ending slashes while preserving the last slash if present

User · Answer

You can use urllib parse urljoin    gt  gt  gt  from urllib parse import urljoin  gt  gt  gt  urljoin   media path     js foo js     media path js foo js    But beware    gt  gt  gt  urljoin   media path    js foo js     media js foo js   gt  gt  gt  urljoin   media path     js foo js     js foo js    The reason you get different results from  js foo js and js foo js is because the former begins with a slash which signifies that it already begins at the website root   On Python 2  you have to do  from urlparse import urljoin

User · Answer

To improve slightly over Alex Martelli s response  the following will not only cleanup extra slashes but also preserve trailing  ending  slashes  which can sometimes be useful     gt  gt  gt  items     http   www website com     api    v2     gt  gt  gt  url       join   u strip      if index   1  lt  len items  else u lstrip       for index  u in enumerate items     gt  gt  gt  print url  http   www website com api v2    It s not as easy to read though  and won t cleanup multiple extra trailing slashes

User · Answer

This does the job nicely   def urljoin  args               Joins given arguments into an url  Trailing but not leading slashes are     stripped for each argument               return     join map lambda x  str x  rstrip       args

User · Answer

Using furl   pip install furl it will be    furl furl   media path    add path  js foo js

User · Answer

I know this is a bit more than the OP asked for  However I had the pieces to the following url  and was looking for a simple way to join them    gt  gt  gt  url    https   api foo com orders bartag spamStatus awaiting spam amp page 1 amp pageSize 250    Doing some looking around    gt  gt  gt  split   urlparse urlsplit url   gt  gt  gt  split SplitResult scheme  https   netloc  api foo com   path   orders bartag   query  spamStatus awaiting spam amp page 1 amp pageSize 250   fragment      gt  gt  gt  type split   lt class  urlparse SplitResult  gt   gt  gt  gt  dir split      add        class        contains        delattr        dict        doc        eq        format        ge        getattribute        getitem        getnewargs        getslice        getstate        gt        hash        init        iter        le        len        lt        module        mul        ne        new        reduce        reduce ex        repr        rmul        setattr        sizeof        slots        str        subclasshook        weakref       asdict     fields     make     replace    count    fragment    geturl    hostname    index    netloc    password    path    port    query    scheme    username    gt  gt  gt  split 0   https   gt  gt  gt  split    split      gt  gt  gt  type split   lt type  tuple  gt    So in addition to the path joining which has already been answered in the other answers  To get what I was looking for I did the following    gt  gt  gt  split   https    api foo com     orders bartag    spamStatus awaiting spam amp page 1 amp pageSize 250        gt  gt  gt  unsplit   urlparse urlunsplit split   gt  gt  gt  unsplit  https   api foo com orders bartag spamStatus awaiting spam amp page 1 amp pageSize 250    According to the documentation it takes EXACTLY a 5 part tuple   With the following tuple format      scheme  0   URL scheme specifier    empty string      netloc  1   Network location part   empty string       path    2   Hierarchical path   empty string       query   3   Query component empty string       fragment    4   Fragment identifier empty string

[python] How to join components of a path when you are constructing a URL in Python

Examples related to python

Examples related to url