The Pip Python Package Manager
Before we begin, let's do a quick glossary check and determine what a 'Python package' REALLY is. Package is a Python module which can contain other modules or recursively, other packages. It is the kind of Python package that you import in your Python code. However, this article is NOT about such packages.
This article is about 'Distribution Package', which is a versioned archive file that contains Python modules, packages and other resource files that are used to distribute a particular Release (think of it as versioned projects). The archive file is what you as an end-user download from the internet and install - thus, distribution packages are of major importance to the community to share and distribute their projects. This article aims to help you understand the various available tools that make your life easier for using and maintaining the many Python packages that you shall come across as a data scientist and a programmer in general.
A distribution package is more commonly referred to as 'package' or 'distribution'. However the context is important, so as to not confuse it with 'import Package' - which is also commonly called just a 'package' or with another kind of distribution, like a Linux distribution or a computer language distribution - which are often referred to with the single term 'distribution'.
So now that the meaning of 'package' is determined in the context of this article, let's begin....
Be sure to check out our Intro to Python for Data Science course.
Installing Python
Well, the first step would be to actually make sure you have Python installed in your system. Ensure you can run Python from the command line. You can check this and the version of Python installed by typing:
python --version
If you get an error as below:
>>> python --version
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'python' is not defined
or this:
python --version
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-a4637bbefc43> in <module>()
----> 1 python --version
NameError: name 'python' is not defined
This is because the intended code was to be run in your operating system's terminal (also called shell or console). When you do this correctly and you do have Python installed, you will get an output like this one: Python 3.6.3
. This determines that you have Python version 3.6.3 installed in your system.
If you do not have Python, please go ahead and install the latest 3.x version. You can use the Hitchhiker’s Guide to Python guide to walk you through the steps.
If you have Python installed but are using an enhanced shell like IPython or the Jupyter notebook, you can preface the command with a !
to see the version of Python you are working with:
!python --version
Python 3.5.2 :: Continuum Analytics, Inc.
Python Package Index
The Python Package Index (abbreviated as PyPI) and also known as the Cheese Shop is the official third-party software repository for Python. It primarily hosts Python packages in the form of archives called 'sdists' (source distributions) or precompiled wheels (you will see this later). In a sentence: PyPI is as a giant online repository of modules that are accepted by the Python community.
PyPI lets you submit any number of versions of your distribution to the index. If you alter the metadata for a particular version, you can submit it again and the index will be updated. PyPI holds a record for each (name, version) combination submitted. As an end-user you can search for packages by keywords or by filters against their metadata, and thus behaving as an index. Over 113,000 Python packages can be accessed through PyPI.
Why should you be aware of PyPI? Because it describes distributions packaged with 'distutils', as well as package data like distribution files if the package author wishes. It is also where 'easy_install' and 'Pip' search for available packages, by default (More on this coming up later).
Distutils
Distutils is the standard tool for packaging in Python. It is included in the standard library (Python 2 and Python 3.0 to 3.6). Distutils exposes two commands for submitting package data to PyPI: the register command for submitting metadata to PyPI and the upload command for submitting distribution files.
Register your package
The distutils command register
is used to submit your distribution’s metadata to the index. You can invoke it with the command:
python setup.py register
You will then be prompted to login or register to be able to submit your distribution package. You may submit any number of versions of your distribution to the index. If you alter the metadata for a particular version, you may submit it again and the index will be updated.
Upload your package
The distutils command upload
pushes the distribution files to PyPI. For more details on the steps and distutils, check out this page.
Setuptools
Setuptools is a package development process library (or tool, just like Distutils) designed to facilitate packaging Python projects. It is a collection of enhancements to the Python distutils and allow you to more easily build and distribute Python distributions, especially ones that have dependencies on other packages.
Why do Setuptools and Disutils coexist?
It is essentially because of the division of responsibility drafted by the Python core team. They reserved the "core standards" and "minimal necessary compilation" parts for themselves - developing the distutils while leaving all the features beyond that (extended compiler/package format/other support) to the 3rd parties. Setuptools is a third party library and is not developed by the core Python team. Thus, not included in the standard Python library.
Building and Distributing Packages
Check out the developer's guide section to learn more about installing setuptools to build and distribute your packages. In this tutorial, let's try to concentrate more on tools that let you manage Python packages to make your life easy.
Easy Install
Easy Install (easy_install) is a package manager for Python bundled with setuptools. It automatically downloads, builds, installs and manages Python packages for you. For download links and installation instructions for each of the supported platforms, head over to the setuptools PyPI page.
For basic use of easy_install, you only need to supply the filename or URL of a source distribution. Easy Install accepts URLs, filenames, PyPI package names (distutils “distribution” names), and package+version specifiers. It will attempt to locate the latest available version that meets your criteria. When downloading or processing downloaded files, Easy Install recognizes distutils source distribution files with extensions of .tgz, .tar, .tar.gz, .tar.bz2, or .zip. It also handles already-built .egg distributions as well as .win32.exe installers built using distutils.
By default, packages are installed to the running Python installation's site-packages
directory. site-packages is by default part of the python search path and is the target directory of manually built python packages. Modules installed here can be imported easily afterwards. You can override this using the -s
or --script-dir
option.
Let's see some commands to download, install, upgrade or even delete a package using easy_install:
Install a package by name, searching PyPI for the latest version. This automatically downloads, builds, and installs the package:
>>easy_install PackageName
Install or upgrade a package by name and version by finding links on a given download page:
>>easy_install -f URL PackageName
Upgrade an already-installed package to the latest version listed on PyPI:
>>easy_install --upgrade PackageName
Else to upgrade to a specific version, you can type the package name followed by the required version:
>>easy_install "PackageName==2.0"
If you have upgraded a package, but want to revert to a previously installed version, you can use the command:
>>easy_install PackageName==1.3.4
To uninstall a package, first run the command:
>>easy_install -m PackageName
This ensures that Python will not search for a package you are planning to delete. After you’ve done this, you can safely delete the .egg files or directories, along with any scripts you wish to remove. If you have replaced a package with another version, then you can just delete the package(s) you don't need by deleting the PackageName-versioninfo.egg file or directory (found in the installation directory).
Deprecated/abandoned tools
Distribute
Distribute was a fork of setuptools and merged back into setuptools 0.7. It shared the same namespace, so if you had distribute installed, import setuptools would actually import the package distributed with Distribute. You don't need to use Distribute any more. In fact, the version available on Pypi is just a compatibility layer that installs setuptools.
Distutils2
Distutils2 was an attempt to take the best of Distutils, Setuptools and Distribute and become the standard tool included in Python's standard library. The idea was that Distutils2 would be distributed for old Python versions, and that Distutils2 would be renamed to packaging for Python 3.3, which would include it in its standard library. These plans did not go as intended and currently Distutils2 is an abandoned project. The latest release was in March 2012, and its Pypi home page has finally been updated to reflect its death with a tl;dr: "keep using setuptools and pip for now, don’t use distutils2".
Pip
Pip is one of the most famous and widely used package management system to install and manage software packages written in Python and found in Python Package Index (PyPI). Pip is a recursive acronym that can stand for either "Pip Installs Packages" or "Pip Installs Python". Alternatively, pip stands for "preferred installer program".
Python 2.7.9 and later (on the python2 series), and Python 3.4 and later include pip (pip3 for Python 3) by default. It is an explicit replacement and indirect successor to easy_install that you saw earlier. Check out Python Packaging User Guide's page on pip vs easy_install for a detailed discussion.
To ensure you can run pip from the command line, type:
>>pip --version
If pip isn't installed, you can do so through the system package manager or by invoking cURL (a client-side data transfer tool):
>>curl https://bootstrap.pypa.io/get-pip.py | python
While you are at it, it is a good idea to update pip, setuptools and wheel:
>>python -m pip install --upgrade pip setuptools wheel
While pip alone is sufficient to install from pre-built binary archives (the end file(s) that are ready to be executed. The output is the machine instructions that are loaded into the CPU and executed), updated copies of the setuptools and wheel projects are useful to ensure you can also install from source archives.
Let's check out some handy commands to use pip:
To install the latest version of a package:
>>pip install 'PackageName'
To install a specific version, type the package name followed by the required version:
>>pip install 'PackageName==1.4'
To upgrade an already installed package to the latest from PyPI:
>>pip install --upgrade PackageName
Uninstalling/removing a package is very easy with pip:
>>pip uninstall PackageName
Pip has a feature to manage full lists of packages and corresponding version numbers through a requirements file: requirements.txt
. Typically, this file outlines all the pip packages that that project uses. You can install everything in that file by using:
>>pip install -r requirements.txt
You can read more about the requirement files here.
Packaging formats: Egg and Wheel
There has been a substantial amount of mentions of the two terms: 'Python egg' and 'wheel'. They are both packaging formats that aim to support the use case of needing an install artifact that doesn’t require building or compilation. Building and compilations can be costly in testing and production workflows.
The egg format (.egg) was introduced by setuptools in 2004. It is a logical structure embodying the release of a specific version of a Python project, comprising its code, resources, and metadata. Basically, a .zip folder with metadata. They follow the same concept as .jar file in Java.
A wheel is a ZIP-format archive with a specially formatted filename and the .whl extension. The Wheel format was introduced by PEP 427 in 2012. Earlier, installing a python package using pip or easy_install could require you to compile a bunch of underlying code, making the import longer. Wheels provide the option to pre-compile the code for a target architecture and operating system. Wheels are the new standard of Python distribution and are intended to replace eggs. Support is offered in pip >= 1.4 and setuptools >= 0.8.
You can read the important differences between Wheel and Egg here.
Conda
Conda is an open source package management system and environment management system. It is maintained by Continuum Analytics. Conda can quickly install, run and update packages and their dependencies. It is also an environment manager and can easily create, save, load and switch between environments on your local computer.
And although Conda was created for Python programs, it can package and distribute software for many language. The conda package and environment manager is included in all versions of Anaconda - a free and open source distribution of the Python and R programming languages for data science and machine learning related applications. Additionally, Anaconda still has the useful interaction with pip that allows you to install any additional libraries which are not available in conda.
Hence, it is a good idea to download and work with Anaconda. This page will walk you through Anaconda installation, the whole process is pretty straightforward. Once you successfully have Anaconda installed, if you are working on Windows - head to the Start menu and search for, open Anaconda Prompt. If you are working on the MacOS or Linux, open the terminal window. Then go ahead and type:
>>conda --version
This will verify that conda is installed and running on your system and displays the version number that you have installed. To update conda to the current version, type the following:
>>conda update conda
Conda allows you to to create separate environments containing files, packages and their dependencies that will not interact with other environments. When you begin using conda, you already have a default environment named base. But don’t put programs into your base environment. Rather, create separate environments to keep your programs isolated from each other. You can learn more about creating environment and managing python and python packages in Conda here.
Packaging it all up
You have been introduced to quite a lot of terminologies and tools in this tutorial. Take a break and let it all sink in. This is a general overview of all the tools there are to manage your Python packages, what you finally use depends a lot on the task at hand and the environment you are working on. In the end, there isn't one package manager that suits everyone and you have to pick your own poison.
Be sure to check out DataCamp's Intermediate Python for Data Science course, to learn more about Python.