Imagine that you want to develop a non-trivial end-user desktop (not web) application in Python. What is the best way to structure the project's folder hierarchy?
Desirable features are ease of maintenance, IDE-friendliness, suitability for source control branching/merging, and easy generation of install packages.
In particular:
This question is related to
python
directory-structure
organization
project-structure
This blog post by Jean-Paul Calderone is commonly given as an answer in #python on Freenode.
Filesystem structure of a Python project
Do:
- name the directory something related to your project. For example, if your project is named "Twisted", name the top-level directory for its source files
Twisted
. When you do releases, you should include a version number suffix:Twisted-2.5
.- create a directory
Twisted/bin
and put your executables there, if you have any. Don't give them a.py
extension, even if they are Python source files. Don't put any code in them except an import of and call to a main function defined somewhere else in your projects. (Slight wrinkle: since on Windows, the interpreter is selected by the file extension, your Windows users actually do want the .py extension. So, when you package for Windows, you may want to add it. Unfortunately there's no easy distutils trick that I know of to automate this process. Considering that on POSIX the .py extension is a only a wart, whereas on Windows the lack is an actual bug, if your userbase includes Windows users, you may want to opt to just have the .py extension everywhere.)- If your project is expressable as a single Python source file, then put it into the directory and name it something related to your project. For example,
Twisted/twisted.py
. If you need multiple source files, create a package instead (Twisted/twisted/
, with an emptyTwisted/twisted/__init__.py
) and place your source files in it. For example,Twisted/twisted/internet.py
.- put your unit tests in a sub-package of your package (note - this means that the single Python source file option above was a trick - you always need at least one other file for your unit tests). For example,
Twisted/twisted/test/
. Of course, make it a package withTwisted/twisted/test/__init__.py
. Place tests in files likeTwisted/twisted/test/test_internet.py
.- add
Twisted/README
andTwisted/setup.py
to explain and install your software, respectively, if you're feeling nice.Don't:
- put your source in a directory called
src
orlib
. This makes it hard to run without installing.- put your tests outside of your Python package. This makes it hard to run the tests against an installed version.
- create a package that only has a
__init__.py
and then put all your code into__init__.py
. Just make a module instead of a package, it's simpler.- try to come up with magical hacks to make Python able to import your module or package without having the user add the directory containing it to their import path (either via PYTHONPATH or some other mechanism). You will not correctly handle all cases and users will get angry at you when your software doesn't work in their environment.
Non-python data is best bundled inside your Python modules using the package_data
support in setuptools. One thing I strongly recommend is using namespace packages to create shared namespaces which multiple projects can use -- much like the Java convention of putting packages in com.yourcompany.yourproject
(and being able to have a shared com.yourcompany.utils
namespace).
Re branching and merging, if you use a good enough source control system it will handle merges even through renames; Bazaar is particularly good at this.
Contrary to some other answers here, I'm +1 on having a src
directory top-level (with doc
and test
directories alongside). Specific conventions for documentation directory trees will vary depending on what you're using; Sphinx, for instance, has its own conventions which its quickstart tool supports.
Please, please leverage setuptools and pkg_resources; this makes it much easier for other projects to rely on specific versions of your code (and for multiple versions to be simultaneously installed with different non-code files, if you're using package_data
).
The "Python Packaging Authority" has a sampleproject:
https://github.com/pypa/sampleproject
It is a sample project that exists as an aid to the Python Packaging User Guide's Tutorial on Packaging and Distributing Projects.
Non-python data is best bundled inside your Python modules using the package_data
support in setuptools. One thing I strongly recommend is using namespace packages to create shared namespaces which multiple projects can use -- much like the Java convention of putting packages in com.yourcompany.yourproject
(and being able to have a shared com.yourcompany.utils
namespace).
Re branching and merging, if you use a good enough source control system it will handle merges even through renames; Bazaar is particularly good at this.
Contrary to some other answers here, I'm +1 on having a src
directory top-level (with doc
and test
directories alongside). Specific conventions for documentation directory trees will vary depending on what you're using; Sphinx, for instance, has its own conventions which its quickstart tool supports.
Please, please leverage setuptools and pkg_resources; this makes it much easier for other projects to rely on specific versions of your code (and for multiple versions to be simultaneously installed with different non-code files, if you're using package_data
).
According to Jean-Paul Calderone's Filesystem structure of a Python project:
Project/
|-- bin/
| |-- project
|
|-- project/
| |-- test/
| | |-- __init__.py
| | |-- test_main.py
| |
| |-- __init__.py
| |-- main.py
|
|-- setup.py
|-- README
In my experience, it's just a matter of iteration. Put your data and code wherever you think they go. Chances are, you'll be wrong anyway. But once you get a better idea of exactly how things are going to shape up, you're in a much better position to make these kinds of guesses.
As far as extension sources, we have a Code directory under trunk that contains a directory for python and a directory for various other languages. Personally, I'm more inclined to try putting any extension code into its own repository next time around.
With that said, I go back to my initial point: don't make too big a deal out of it. Put it somewhere that seems to work for you. If you find something that doesn't work, it can (and should) be changed.
Try starting the project using the python_boilerplate template. It largely follows the best practices (e.g. those here), but is better suited in case you find yourself willing to split your project into more than one egg at some point (and believe me, with anything but the simplest projects, you will. One common situation is where you have to use a locally-modified version of someone else's library).
Where do you put the source?
PROJECT_ROOT/src/<egg_name>
.Where do you put application startup scripts?
entry_point
in one of the eggs.Where do you put the IDE project cruft?
PROJECT_ROOT/.<something>
in the root of the project, and this is fine.Where do you put the unit/acceptance tests?
PROJECT_ROOT/src/<egg_name>/tests
directory. I personally prefer to use py.test
to run them.Where do you put non-Python data such as config files?
pkg_resources
package from setuptools
, or since Python 3.7 via the importlib.resources
module from the standard library.PROJECT_ROOT/config
. For deployment there can be various options. On Windows one can use %APP_DATA%/<app-name>/config
, on Linux, /etc/<app-name>
or /opt/<app-name>/config
.PROJECT_ROOT/var
during development, and under /var
during Linux deployment.PROJECT_ROOT/src/<egg_name>/native
Documentation would typically go into PROJECT_ROOT/doc
or PROJECT_ROOT/src/<egg_name>/doc
(this depends on whether you regard some of the eggs to be a separate large projects). Some additional configuration will be in files like PROJECT_ROOT/buildout.cfg
and PROJECT_ROOT/setup.cfg
.
Check out Open Sourcing a Python Project the Right Way.
Let me excerpt the project layout part of that excellent article:
When setting up a project, the layout (or directory structure) is important to get right. A sensible layout means that potential contributors don't have to spend forever hunting for a piece of code; file locations are intuitive. Since we're dealing with an existing project, it means you'll probably need to move some stuff around.
Let's start at the top. Most projects have a number of top-level files (like setup.py, README.md, requirements.txt, etc). There are then three directories that every project should have:
- A docs directory containing project documentation
- A directory named with the project's name which stores the actual Python package
- A test directory in one of two places
- Under the package directory containing test code and resources
- As a stand-alone top level directory To get a better sense of how your files should be organized, here's a simplified snapshot of the layout for one of my projects, sandman:
$ pwd
~/code/sandman
$ tree
.
|- LICENSE
|- README.md
|- TODO.md
|- docs
| |-- conf.py
| |-- generated
| |-- index.rst
| |-- installation.rst
| |-- modules.rst
| |-- quickstart.rst
| |-- sandman.rst
|- requirements.txt
|- sandman
| |-- __init__.py
| |-- exception.py
| |-- model.py
| |-- sandman.py
| |-- test
| |-- models.py
| |-- test_sandman.py
|- setup.py
As you can see, there are some top level files, a docs directory (generated is an empty directory where sphinx will put the generated documentation), a sandman directory, and a test directory under sandman.
The "Python Packaging Authority" has a sampleproject:
https://github.com/pypa/sampleproject
It is a sample project that exists as an aid to the Python Packaging User Guide's Tutorial on Packaging and Distributing Projects.
This blog post by Jean-Paul Calderone is commonly given as an answer in #python on Freenode.
Filesystem structure of a Python project
Do:
- name the directory something related to your project. For example, if your project is named "Twisted", name the top-level directory for its source files
Twisted
. When you do releases, you should include a version number suffix:Twisted-2.5
.- create a directory
Twisted/bin
and put your executables there, if you have any. Don't give them a.py
extension, even if they are Python source files. Don't put any code in them except an import of and call to a main function defined somewhere else in your projects. (Slight wrinkle: since on Windows, the interpreter is selected by the file extension, your Windows users actually do want the .py extension. So, when you package for Windows, you may want to add it. Unfortunately there's no easy distutils trick that I know of to automate this process. Considering that on POSIX the .py extension is a only a wart, whereas on Windows the lack is an actual bug, if your userbase includes Windows users, you may want to opt to just have the .py extension everywhere.)- If your project is expressable as a single Python source file, then put it into the directory and name it something related to your project. For example,
Twisted/twisted.py
. If you need multiple source files, create a package instead (Twisted/twisted/
, with an emptyTwisted/twisted/__init__.py
) and place your source files in it. For example,Twisted/twisted/internet.py
.- put your unit tests in a sub-package of your package (note - this means that the single Python source file option above was a trick - you always need at least one other file for your unit tests). For example,
Twisted/twisted/test/
. Of course, make it a package withTwisted/twisted/test/__init__.py
. Place tests in files likeTwisted/twisted/test/test_internet.py
.- add
Twisted/README
andTwisted/setup.py
to explain and install your software, respectively, if you're feeling nice.Don't:
- put your source in a directory called
src
orlib
. This makes it hard to run without installing.- put your tests outside of your Python package. This makes it hard to run the tests against an installed version.
- create a package that only has a
__init__.py
and then put all your code into__init__.py
. Just make a module instead of a package, it's simpler.- try to come up with magical hacks to make Python able to import your module or package without having the user add the directory containing it to their import path (either via PYTHONPATH or some other mechanism). You will not correctly handle all cases and users will get angry at you when your software doesn't work in their environment.
Non-python data is best bundled inside your Python modules using the package_data
support in setuptools. One thing I strongly recommend is using namespace packages to create shared namespaces which multiple projects can use -- much like the Java convention of putting packages in com.yourcompany.yourproject
(and being able to have a shared com.yourcompany.utils
namespace).
Re branching and merging, if you use a good enough source control system it will handle merges even through renames; Bazaar is particularly good at this.
Contrary to some other answers here, I'm +1 on having a src
directory top-level (with doc
and test
directories alongside). Specific conventions for documentation directory trees will vary depending on what you're using; Sphinx, for instance, has its own conventions which its quickstart tool supports.
Please, please leverage setuptools and pkg_resources; this makes it much easier for other projects to rely on specific versions of your code (and for multiple versions to be simultaneously installed with different non-code files, if you're using package_data
).
Try starting the project using the python_boilerplate template. It largely follows the best practices (e.g. those here), but is better suited in case you find yourself willing to split your project into more than one egg at some point (and believe me, with anything but the simplest projects, you will. One common situation is where you have to use a locally-modified version of someone else's library).
Where do you put the source?
PROJECT_ROOT/src/<egg_name>
.Where do you put application startup scripts?
entry_point
in one of the eggs.Where do you put the IDE project cruft?
PROJECT_ROOT/.<something>
in the root of the project, and this is fine.Where do you put the unit/acceptance tests?
PROJECT_ROOT/src/<egg_name>/tests
directory. I personally prefer to use py.test
to run them.Where do you put non-Python data such as config files?
pkg_resources
package from setuptools
, or since Python 3.7 via the importlib.resources
module from the standard library.PROJECT_ROOT/config
. For deployment there can be various options. On Windows one can use %APP_DATA%/<app-name>/config
, on Linux, /etc/<app-name>
or /opt/<app-name>/config
.PROJECT_ROOT/var
during development, and under /var
during Linux deployment.PROJECT_ROOT/src/<egg_name>/native
Documentation would typically go into PROJECT_ROOT/doc
or PROJECT_ROOT/src/<egg_name>/doc
(this depends on whether you regard some of the eggs to be a separate large projects). Some additional configuration will be in files like PROJECT_ROOT/buildout.cfg
and PROJECT_ROOT/setup.cfg
.
According to Jean-Paul Calderone's Filesystem structure of a Python project:
Project/
|-- bin/
| |-- project
|
|-- project/
| |-- test/
| | |-- __init__.py
| | |-- test_main.py
| |
| |-- __init__.py
| |-- main.py
|
|-- setup.py
|-- README
In my experience, it's just a matter of iteration. Put your data and code wherever you think they go. Chances are, you'll be wrong anyway. But once you get a better idea of exactly how things are going to shape up, you're in a much better position to make these kinds of guesses.
As far as extension sources, we have a Code directory under trunk that contains a directory for python and a directory for various other languages. Personally, I'm more inclined to try putting any extension code into its own repository next time around.
With that said, I go back to my initial point: don't make too big a deal out of it. Put it somewhere that seems to work for you. If you find something that doesn't work, it can (and should) be changed.
Non-python data is best bundled inside your Python modules using the package_data
support in setuptools. One thing I strongly recommend is using namespace packages to create shared namespaces which multiple projects can use -- much like the Java convention of putting packages in com.yourcompany.yourproject
(and being able to have a shared com.yourcompany.utils
namespace).
Re branching and merging, if you use a good enough source control system it will handle merges even through renames; Bazaar is particularly good at this.
Contrary to some other answers here, I'm +1 on having a src
directory top-level (with doc
and test
directories alongside). Specific conventions for documentation directory trees will vary depending on what you're using; Sphinx, for instance, has its own conventions which its quickstart tool supports.
Please, please leverage setuptools and pkg_resources; this makes it much easier for other projects to rely on specific versions of your code (and for multiple versions to be simultaneously installed with different non-code files, if you're using package_data
).
In my experience, it's just a matter of iteration. Put your data and code wherever you think they go. Chances are, you'll be wrong anyway. But once you get a better idea of exactly how things are going to shape up, you're in a much better position to make these kinds of guesses.
As far as extension sources, we have a Code directory under trunk that contains a directory for python and a directory for various other languages. Personally, I'm more inclined to try putting any extension code into its own repository next time around.
With that said, I go back to my initial point: don't make too big a deal out of it. Put it somewhere that seems to work for you. If you find something that doesn't work, it can (and should) be changed.
Check out Open Sourcing a Python Project the Right Way.
Let me excerpt the project layout part of that excellent article:
When setting up a project, the layout (or directory structure) is important to get right. A sensible layout means that potential contributors don't have to spend forever hunting for a piece of code; file locations are intuitive. Since we're dealing with an existing project, it means you'll probably need to move some stuff around.
Let's start at the top. Most projects have a number of top-level files (like setup.py, README.md, requirements.txt, etc). There are then three directories that every project should have:
- A docs directory containing project documentation
- A directory named with the project's name which stores the actual Python package
- A test directory in one of two places
- Under the package directory containing test code and resources
- As a stand-alone top level directory To get a better sense of how your files should be organized, here's a simplified snapshot of the layout for one of my projects, sandman:
$ pwd
~/code/sandman
$ tree
.
|- LICENSE
|- README.md
|- TODO.md
|- docs
| |-- conf.py
| |-- generated
| |-- index.rst
| |-- installation.rst
| |-- modules.rst
| |-- quickstart.rst
| |-- sandman.rst
|- requirements.txt
|- sandman
| |-- __init__.py
| |-- exception.py
| |-- model.py
| |-- sandman.py
| |-- test
| |-- models.py
| |-- test_sandman.py
|- setup.py
As you can see, there are some top level files, a docs directory (generated is an empty directory where sphinx will put the generated documentation), a sandman directory, and a test directory under sandman.
In my experience, it's just a matter of iteration. Put your data and code wherever you think they go. Chances are, you'll be wrong anyway. But once you get a better idea of exactly how things are going to shape up, you're in a much better position to make these kinds of guesses.
As far as extension sources, we have a Code directory under trunk that contains a directory for python and a directory for various other languages. Personally, I'm more inclined to try putting any extension code into its own repository next time around.
With that said, I go back to my initial point: don't make too big a deal out of it. Put it somewhere that seems to work for you. If you find something that doesn't work, it can (and should) be changed.
Source: Stackoverflow.com