How can I write a stored procedure that imports data from a CSV file and populates the table?

share|improve this question
8  
Why a stored procedure? COPY does the trick – Frank Heikens Jun 7 '10 at 6:43
    
I have a user interface that uploads the csv file, to hook up this i need the stored procedure that actually copies the data from the cvs file – vardhan Jun 7 '10 at 6:58
    
That's what COPY does... – Frank Heikens Jun 7 '10 at 6:59
12  
Bozhidar Batsov already gave you a link to an example, the fine manual could also help: postgresql.org/docs/8.4/interactive/sql-copy.html – Frank Heikens Jun 7 '10 at 7:11
2  
Current manual: postgresql.org/docs/current/static/sql-copy.html – Basil Bourque Aug 9 '14 at 20:37

10 Answers 10

Take a look at this short article.


Solution paraphrased here:

Create your table:

CREATE TABLE zip_codes 
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision, 
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);

Copy data from your CSV file to the table:

COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV;
share|improve this answer
34  
actually use \copy would do the same trick if you do not have the super user access; it complaints on my Fedora 16 when using COPY with a non-root account. – asksw0rder Oct 15 '12 at 17:07
59  
TIP: you can indicate what columns you have in the CSV using zip_codes(col1, col2, col3). The columns must be listed in the same order that they appear in the file. – David Pelaez Jan 2 '13 at 5:16
1  
@asksw0rder does \copy have the same syntax? bcoz I'm getting a syntax error with \copy – JhovaniC May 29 '13 at 19:59
4  
Should I include the header row? – bernie2436 Oct 27 '13 at 23:09
64  
You can easily include the header row -- just add HEADER in the options: COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV HEADER; postgresql.org/docs/9.1/static/sql-copy.html – Barrett Clark Nov 8 '13 at 15:17

If you don't have permission to use COPY (which work on the db server), you can use \copy instead (which works in the db client). Using the same example as Bozhidar Batsov:

Create your table:

CREATE TABLE zip_codes 
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision, 
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);

Copy data from your CSV file to the table:

\copy zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV

You can also specify the columns to read:

\copy zip_codes(ZIP,CITY,STATE) FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV
share|improve this answer
    
\copy voters(ZIP,CITY) FROM '/Users/files/Downloads/WOOD.TXT' DELIMITER ',' CSV HEADER; ERROR: extra data after last expected column CONTEXT: COPY voters, line 2: "OH0012781511,87,26953,HOUSEHOLDER,SHERRY,LEIGH,,11/26/1965,‌​08/19/1988,,211 N GARFIELD ST , ,BLOOMD..." – JZ. Sep 6 '15 at 17:29
    
@JZ. I had a similar error. It was because I had extra blank columns. Check your csv and if you have blank columns, that could be the reason. – alex bennett Jul 8 '16 at 4:32
1  
This is somewhat misleading: the difference between COPY and \copy is much more than just permissions, and you can't simply add a `` to make it magically work. See the description (in the context of export) here: stackoverflow.com/a/1517692/157957 – IMSoP Jan 26 at 16:29
    
@IMSoP: you're right, I added a mention of server and client to clarify – bjelli Jan 27 at 9:03

One quick way of doing this is with the Python pandas library (version 0.15 or above works best). This will handle creating the columns for you - although obviously the choices it makes for data types might not be what you want. If it doesn't quite do what you want you can always use the 'create table' code generated as a template.

Here's a simple example:

import pandas as pd
df = pd.read_csv('mypath.csv')
df.columns = [c.lower() for c in df.columns] #postgres doesn't like capitals or spaces

from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/dbname')

df.to_sql("my_table_name", engine)

And here's some code that shows you how to set various options:

#Set is so the raw sql output is logged
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)

df.to_sql("my_table_name2", 
          engine, 
          if_exists="append",  #options are ‘fail’, ‘replace’, ‘append’, default ‘fail’
          index=False, #Do not output the index of the dataframe
          dtype={'col1': sqlalchemy.types.NUMERIC,
                 'col2': sqlalchemy.types.String}) #Datatypes should be [sqlalchemy types][1]
share|improve this answer
5  
In addition, the if_exists parameter can be set to replace or append to an existing table, e.g. df.to_sql("fhrs", engine, if_exists='replace') – Joel Ostblom Apr 30 '15 at 0:47
    
username and password : need to create Login and assign DB to user. If uses pgAdmin, then create "Login/Group role" using GUI – Somnath Kadam Mar 24 at 12:52
1  
Pandas is a super slow way of loading to sql (vs csv files). Can be orders of magnitude slower. – user48956 May 4 at 18:46

You could also use pgAdmin, which offers a GUI to do the import. That's shown in this SO thread. The advantage of using pgAdmin is that it also works for remote databases.

Much like the previous solutions though, you would need to have your table on the database already. Each person has his own solution but what I usually do is open the CSV in Excel, copy the headers, paste special with transposition on a different worksheet, place the corresponding data type on the next column then just copy and paste that to a text editor together with the appropriate SQL table creation query like so:

CREATE TABLE my_table (
    /*paste data from Excel here for example ... */
    col_1 bigint,
    col_2 bigint,
    /* ... */
    col_n bigint 
)
share|improve this answer

As Paul mentioned, import works in pgAdmin:

right click on table -> import

select local file, format and coding

here is a german pgAdmin GUI screenshot:

pgAdmin import GUI

similar thing you can do with DbVisualizer (I have a license, not sure about free version)

right click on a table -> Import Table Data...

DbVisualizer import GUI

share|improve this answer
    
DBVisualizer took 50 seconds to import 1400 rows with three fields -- and I had to cast everything back from a String to whatever it was supposed to be. – Noumenon Sep 29 '16 at 10:46

Most other solutions here require that you create the table in advance/manually. This may not be practical in some cases (e.g., if you have a lot of columns in the destination table). So, the approach below may come handy.

Providing the "path to" and "the number of columns" in your csv file, you can use the following function to load your table to a temp table:

CREATE OR REPLACE FUNCTION csv_to_temp_table (IN path TEXT, IN col_count INTEGER)

RETURNS VOID AS $$

BEGIN 
    DROP TABLE IF EXISTS temp;

    CREATE TABLE temp ();

    FOR iter IN 1..col_count
    LOOP
        EXECUTE 'ALTER TABLE temp ADD COLUMN col_' || iter || ' VARCHAR;';
    END LOOP;

    EXECUTE FORMAT('COPY temp FROM %L WITH DELIMITER '',''', path);
END;

$$ LANGUAGE plpgsql;

Columns are named as: col_1, col_2, ..., etc.

Furthermore, in most typical scenarios, the top row usually have the column names. To rename the columns as such, you can use the following, a bit more sophisticated version:

create or replace function data.load_csv_file
(
    target_table text,
    csv_path text,
    col_count integer
)

returns void as $$

declare

iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    set schema 'your-schema';

    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);

    iter := 1;
    col_first := (select col_1 from temp_table limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row
    execute format('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length(target_table) > 0 then
        execute format('alter table temp_table rename to %I', target_table);
    end if;

end;

$$ language plpgsql;
share|improve this answer
    
Hi Mehmet, thanks for the answer you posted but when I run your code I get the following error message : ERROR: schema "data" does not exist – user2867432 Nov 8 '16 at 5:34
    
user2867432 you need to change schema name that you use accordingly (e.g., public) – mehmet Nov 8 '16 at 13:05
COPY table_name FROM 'path/to/data.csv' DELIMITER ',' CSV HEADER;
share|improve this answer

Use this SQL code

    copy table_name(atribute1,attribute2,attribute3...)
    from 'E:\test.csv' delimiter ',' csv header

the header keyword lets the DBMS know that the csv file have a header with attributes

for more visit http://www.postgresqltutorial.com/import-csv-file-into-posgresql-table/

share|improve this answer

IMHO, the most convenient way is to follow "Import CSV data into postgresql, the comfortable way ;-)", using csvsql from csvkit, which is a python package installable via pip.

share|improve this answer
1  
Link rot is voracious! The article you linked to no longer works, which makes me uncomfortable :( – chbrown Jul 27 '16 at 20:18
    
you might want to mention that his is py. – mountainclimber Aug 9 '16 at 14:46
    
For me I get a MemoryError if trying to import a large CSV so it looks like it doesn't stream. – DavidC Oct 20 '16 at 12:32
    
@DavidC Interesting. How big is your file? How much memory do you have? If it doesnt stream as it appears, I suggest chunking the data before insertion – sal Oct 31 '16 at 12:13
    
@DavidC Or you use the csvql command without the --insert option and insert the data later via COPY, or you go by RobinL's answer stackoverflow.com/a/29722393/2772305 – sal Oct 31 '16 at 12:20

Maybe this tool be helpful http://www.convertcsv.com/csv-to-sql.htm It have a lot of configuration options to convert from CSV to database script.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.