The following gives multiple approaches in EmacsLisp code for managing duplicate lines in a file. Try them out by EvaluatingExpressions or by adding to your InitFile. For duplicating lines see CopyFromAbove.
See this blog post by Bozhidar Batsov on how to use ‘delete-duplicate-lines’
:
http://emacsredux.com/blog/2014/03/01/a-peek-at-emacs-24-dot-4-delete-duplicate-lines/
You can try to remove duplicate lines with ReplaceRegexp or `C-M-%
’ (‘query-replace-regexp’
), by replacing \([^ C-q C-j ]+ C-q C-j \)\1+
with \1
. A newline is entered with a ‘C-q C-j’
. The Lisp version of this command would be
(replace-regexp "\\([^\n]+\n\\)\\1+" "\\1")
Since it doesn’t backtrack, there’s no way remove a line with more than one duplicate.
Duplicate line 1 Unique line 1 Duplicate line 1 Duplicate line 1
It will leave duplicate lines.
Duplicate line 1 Unique line 1 Duplicate line 1
See bug #13032 for delete-duplicate-lines in core emacs.
Here’s some Lisp to find duplicate lines and keep only the first occurrence by starting each search and replace at the start of the last duplicate – (goto-char start)
.
(defun uniquify-all-lines-region (start end) "Find duplicate lines in region START to END keeping first occurrence." (interactive "*r") (save-excursion (let ((end (copy-marker end))) (while (progn (goto-char start) (re-search-forward "^\\(.*\\)\n\\(\\(.*\n\\)*\\)\\1\n" end t)) (replace-match "\\1\n\\2"))))) (defun uniquify-all-lines-buffer () "Delete duplicate lines in buffer and keep first occurrence." (interactive "*") (uniquify-all-lines-region (point-min) (point-max)))
So for this buffer:
Duplicate line 1 Unique line 1 Duplicate line 1 Unique line 2 Unique line 3 Duplicate line 1 Duplicate line 2 Duplicate line 2 Unique line 4
running ‘M-x uniquify-all-lines-buffer’
produces:
Duplicate line 1 Unique line 1 Unique line 2 Unique line 3 Duplicate line 2 Unique line 4
Another implementation that adds distinct lines to a temporary list, and checks each line in the list with ‘assoc’
will give the same result as the previous.
(defun uniquify-all-lines-region (start end) "Find duplicate lines in region START to END keeping first occurrence." (interactive "*r") (save-excursion (let ((lines) (end (copy-marker end))) (goto-char start) (while (and (< (point) (marker-position end)) (not (eobp))) (let ((line (buffer-substring-no-properties (line-beginning-position) (line-end-position)))) (if (member line lines) (delete-region (point) (progn (forward-line 1) (point))) (push line lines) (forward-line 1)))))))
The unix utility called uniq
removes duplicate consecutive lines, keeping only one instance. The output of uniq
on the example above is:
Duplicate line 1 Unique line 1 Duplicate line 1 Unique line 2 Unique line 3 Duplicate line 1 Duplicate line 2 Unique line 4
Note that non-consecutive duplication of the first line are not removed.
The unix utility sort
and its -u
argument can give the same result of unique lines as ‘M-x uniquify-all-lines-buffer’
. However, the lines are sorted rather than kept in the order of first appearance.
The output of sort -u
on the example above is:
Duplicate line 1 Duplicate line 2 Unique line 1 Unique line 2 Unique line 3 Unique line 4
The command ‘M-x uniquify-buffer-lines’
will remove identical adjacent lines in the current buffer, similar to what is obtained with the unix uniq
command.
(defun uniquify-region-lines (beg end) "Remove duplicate adjacent lines in region." (interactive "*r") (save-excursion (goto-char beg) (while (re-search-forward "^\\(.*\n\\)\\1+" end t) (replace-match "\\1")))) (defun uniquify-buffer-lines () "Remove duplicate adjacent lines in the current buffer." (interactive) (uniquify-region-lines (point-min) (point-max)))
It is important to note that functions which find duplicate lines don’t always sort lines before looking for dups as this may or may not be what one expects or desires of a particular function.
Where the lines of a file are presorted it can be convenient to use something like this:
(defun find-duplicate-lines (&optional insertp interp) (interactive "i\np") (let ((max-pon (line-number-at-pos (point-max))) (gather-dups)) (while (< (line-number-at-pos) max-pon) (= (forward-line) 0) (let ((this-line (buffer-substring-no-properties (line-beginning-position 1) (line-end-position 1))) (next-line (buffer-substring-no-properties (line-beginning-position 2) (line-end-position 2)))) (when (equal this-line next-line) (setq gather-dups (cons this-line gather-dups))))) (if (or insertp interp) (save-excursion (new-line) (princ gather-dups (current-buffer))) gather-dups)))
This function, while inefficient (note cons in tail of while form) is quite handy for locating duplicates before removing them, i.e. situations of type: ‘uniquify-maybe’
. Extend ‘find-duplicate-lines’
by comparing its result list with one or more of the list comparison procedures ‘set-difference’
, ‘union’
, ‘intersection’
, etc. from the CL package (require ‘cl).
I had a file that looks like
0001 line1 original 0001 line1 modified 0002 line2 0003 line3 original 0003 line3 modified
Use this regexp in query-replace-regexp
to find the duplicated lines : \(\([0-9]\{6,7\}\).*\)<newline, insert it with C-q C-j>\(\2.*\)
then you can replace it with \1
to keep the original lines, or \3
to keep the modified ones