[python] How to do sed like text replace with python?

I would like to enable all apt repositories in this file

cat /etc/apt/sources.list
## Note, this file is written by cloud-init on first boot of an instance                                                                                                            
## modifications made here will not survive a re-bundle.                                                                                                                            
## if you wish to make changes you can:                                                                                                                                             
## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg                                                                                                                
##     or do the same in user-data
## b.) add sources in /etc/apt/sources.list.d                                                                                                                                       
#                                                                                                                                                                                   

# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to                                                                                                           
# newer versions of the distribution.                                                                                                                                               
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                                   
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                               

## Major bug fix updates produced after the final release of the                                                                                                                    
## distribution.                                                                                                                                                                    
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                           
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                       

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu                                                                                                         
## team. Also, please note that software in universe WILL NOT receive any                                                                                                           
## review or updates from the Ubuntu security team.                                                                                                                                 
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                               
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                           
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu 
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in 
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse

## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu maverick partner
# deb-src http://archive.canonical.com/ubuntu maverick partner

deb http://security.ubuntu.com/ubuntu maverick-security main
deb-src http://security.ubuntu.com/ubuntu maverick-security main
deb http://security.ubuntu.com/ubuntu maverick-security universe
deb-src http://security.ubuntu.com/ubuntu maverick-security universe
# deb http://security.ubuntu.com/ubuntu maverick-security multiverse
# deb-src http://security.ubuntu.com/ubuntu maverick-security multiverse

With sed this is a simple sed -i 's/^# deb/deb/' /etc/apt/sources.list what's the most elegant ("pythonic") way to do this?

This question is related to python regex linux

The answer is


[None of the answers works properly above !]

I have a case of multiple key-value replacement in one file around 1000 lines. And after replacement the file structure should keep the same. for example:

key1=value_tobe_replaced1
key2=value_tobe_replaced1
.     .
.     .
key1000=value_tobe_replaced1000

I've tried:

  1. the voted answer from @elmotec for massedit.

  2. answer from @Cecil Curry.

  3. answer from @Keithel.

The three answers definitely helped me a lot but after test I found it costs nearly 40-50s for 1st and 2ed. 3rd is not suitable for multi-replacement so I fixed it.

Notice: refer to the answers before go on.

Here's my code:

Line replacement mode:

start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
    with open(abs_keypair_file) as kf:
        for line in kf:
            line_to_write = ''
            match_flag = False
            for (key, value) in tuple_list:
                # print '  %s = %r' % (key, value)
                if  not re.search(patten, line, flags=re.I):
                    continue
                line_to_write = re.sub(r'\$\({}\)'.format(key), value, line, flags=re.I)
                match_flag = True

            if not match_flag:
                line_to_write = line
            tmp_file.write(line_to_write)

shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)

time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs
time costs: 0:00:42.533879

file replacement mode:

start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
    with open(abs_keypair_file) as kf:
        text = kf.read()
        for (key, value) in tuple_list:
            text = re.sub(patten, value, text, flags=re.M|re.I)
        tmp_file.write(text)
shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)

time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs
time costs: 0:00:00.348458

So I suggest if you match my case and your file size is not too large you may follow file replacement mode.

How to replace if file size is huge? I have no idea.

Hope this helps.


This is such a different approach, I don't want to edit my other answer. Nested with since I don't use 3.1 (Where with A() as a, B() as b: works).

Might be a bit overkill to change sources.list, but I want to put it out there for future searches.

#!/usr/bin/env python
from shutil   import move
from tempfile import NamedTemporaryFile

with NamedTemporaryFile(delete=False) as tmp_sources:
    with open("sources.list") as sources_file:
        for line in sources_file:
            if line.startswith("# deb"):
                tmp_sources.write(line[2:])
            else:
                tmp_sources.write(line)

move(tmp_sources.name, sources_file.name)

This should ensure no race conditions of other people reading the file. Oh, and I prefer str.startswith(...) when you can do without a regexp.


Authoring a homegrown sed replacement in pure Python with no external commands or additional dependencies is a noble task laden with noble landmines. Who would have thought?

Nonetheless, it is feasible. It's also desirable. We've all been there, people: "I need to munge some plaintext files, but I only have Python, two plastic shoelaces, and a moldy can of bunker-grade Maraschino cherries. Help."

In this answer, we offer a best-of-breed solution cobbling together the awesomeness of prior answers without all of that unpleasant not-awesomeness. As plundra notes, David Miller's otherwise top-notch answer writes the desired file non-atomically and hence invites race conditions (e.g., from other threads and/or processes attempting to concurrently read that file). That's bad. Plundra's otherwise excellent answer solves that issue while introducing yet more – including numerous fatal encoding errors, a critical security vulnerability (failing to preserve the permissions and other metadata of the original file), and premature optimization replacing regular expressions with low-level character indexing. That's also bad.

Awesomeness, unite!

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            for line in src_file:
                tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

# Do it for Johnny.
sed_inplace('/etc/apt/sources.list', r'^\# deb', 'deb')

If you really want to use a sed command without installing a new Python module, you could simply do the following:

import subprocess
subprocess.call("sed command")

You can do that like this:

with open("/etc/apt/sources.list", "r") as sources:
    lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
    for line in lines:
        sources.write(re.sub(r'^# deb', 'deb', line))

The with statement ensures that the file is closed correctly, and re-opening the file in "w" mode empties the file before you write to it. re.sub(pattern, replace, string) is the equivalent of s/pattern/replace/ in sed/perl.

Edit: fixed syntax in example


I wanted to be able to find and replace text but also include matched groups in the content I insert. I wrote this short script to do that:

https://gist.github.com/turtlemonvh/0743a1c63d1d27df3f17

The key component of that is something that looks like like this:

print(re.sub(pattern, template, text).rstrip("\n"))

Here's an example of how that works:

# Find everything that looks like 'dog' or 'cat' followed by a space and a number
pattern = "((cat|dog) (\d+))"

# Replace with 'turtle' and the number. '3' because the number is the 3rd matched group.
# The double '\' is needed because you need to escape '\' when running this in a python shell
template = "turtle \\3"

# The text to operate on
text = "cat 976 is my favorite"

Calling the above function with this yields:

turtle 976 is my favorite

Not sure about elegant, but this ought to be pretty readable at least. For a sources.list it's fine to read all the lines before hand, for something larger you might want to change "in place" while looping through it.

#!/usr/bin/env python
# Open file for reading and writing
with open("sources.list", "r+") as sources_file:
    # Read all the lines
    lines = sources_file.readlines()

    # Rewind and truncate
    sources_file.seek(0)
    sources_file.truncate()

    # Loop through the lines, adding them back to the file.
    for line in lines:
        if line.startswith("# deb"):
            sources_file.write(line[2:])
        else:
            sources_file.write(line)

EDIT: Use with-statement for better file-handling. Also forgot to rewind before truncate before.


Try pysed:

pysed -r '# deb' 'deb' /etc/apt/sources.list

You could do something like:

p = re.compile("^\# *deb", re.MULTILINE)
text = open("sources.list", "r").read()
f = open("sources.list", "w")
f.write(p.sub("deb", text))
f.close()

Alternatively (imho, this is better from organizational standpoint) you could split your sources.list into pieces (one entry/one repository) and place them under /etc/apt/sources.list.d/


Cecil Curry has a great answer, however his answer only works for multiline regular expressions. Multiline regular expressions are more rarely used, but they are handy sometimes.

Here is an improvement upon his sed_inplace function that allows it to function with multiline regular expressions if asked to do so.

WARNING: In multiline mode, it will read the entire file in, and then perform the regular expression substitution, so you'll only want to use this mode on small-ish files - don't try to run this on gigabyte-sized files when running in multiline mode.

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl, multiline = False):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    re_flags = 0
    if multiline:
        re_flags = re.M

    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern, re_flags)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            if multiline:
                content = src_file.read()
                tmp_file.write(pattern_compiled.sub(repl, content))
            else:
                for line in src_file:
                    tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

from os.path import expanduser
sed_inplace('%s/.gitconfig' % expanduser("~"), r'^(\[user\]$\n[ \t]*name = ).*$(\n[ \t]*email = ).*', r'\1John Doe\[email protected]', multiline=True)

If you are using Python3 the following module will help you: https://github.com/mahmoudadel2/pysed

wget https://raw.githubusercontent.com/mahmoudadel2/pysed/master/pysed.py

Place the module file into your Python3 modules path, then:

import pysed
pysed.replace(<Old string>, <Replacement String>, <Text File>)
pysed.rmlinematch(<Unwanted string>, <Text File>)
pysed.rmlinenumber(<Unwanted Line Number>, <Text File>)

If I want something like sed, then I usually just call sed itself using the sh library.

from sh import sed

sed(['-i', 's/^# deb/deb/', '/etc/apt/sources.list'])

Sure, there are downsides. Like maybe the locally installed version of sed isn't the same as the one you tested with. In my cases, this kind of thing can be easily handled at another layer (like by examining the target environment beforehand, or deploying in a docker image with a known version of sed).


Here's a one-module Python replacement for perl -p:

# Provide compatibility with `perl -p`

# Usage:
#
#     python -mloop_over_stdin_lines '<program>'

# In, `<program>`, use the variable `line` to read and change the current line.

# Example:
#
#         python -mloop_over_stdin_lines 'line = re.sub("pattern", "replacement", line)'

# From the perlrun documentation:
#
#        -p   causes Perl to assume the following loop around your
#             program, which makes it iterate over filename arguments
#             somewhat like sed:
# 
#               LINE:
#                 while (<>) {
#                     ...             # your program goes here
#                 } continue {
#                     print or die "-p destination: $!\n";
#                 }
# 
#             If a file named by an argument cannot be opened for some
#             reason, Perl warns you about it, and moves on to the next
#             file. Note that the lines are printed automatically. An
#             error occurring during printing is treated as fatal. To
#             suppress printing use the -n switch. A -p overrides a -n
#             switch.
# 
#             "BEGIN" and "END" blocks may be used to capture control
#             before or after the implicit loop, just as in awk.
# 

import re
import sys

for line in sys.stdin:
    exec(sys.argv[1], globals(), locals())
    try:
        print line,
    except:
        sys.exit('-p destination: $!\n')

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?