Python judges sqlite connection status_Some tips for querying sqlite3 in Python

Original title: Some tips for querying sqlite3 in Python

35f3f06063cf4e4f81de8ff5fac77db0.jpeg

I've been using sqlite3 a lot lately in projects involving heavy data processing. My initial attempt didn't involve any database at all, all data would be kept in memory , including queries like dictionary lookups, iterations, and conditions. That's fine, but there's only so much that can fit into memory, and regenerating or loading data from disk into memory is a tedious and time-consuming process.

I decided to give sqlite3 a try. This increases the amount of data that can be processed and reduces the load time of the application to zero, since only the connection to the database needs to be opened. Also, I can replace a lot of Python logic statements with SQL queries.

I would like to share some insights and findings about this experience.

TL; DR

Use lots of operations (aka executemany).

You don't need to use the cursor (most of the time).

Cursors can be iterated over.

Use a context manager.

Use pragmas (when it makes sense).

Postpone index creation.

Use placeholders to insert python values.

If you need to insert many rows into the database at once, then you really shouldn't use execute. The sqlite3 module provides a way to insert batches: executemany.

instead of doing something like this:

for row in iter_data():

connection.execute('INSERT INTO my_table VALUES (?)', row)

You can take advantage of the fact that executemany accepts a generator of tuples as arguments:

connection.executemany(

'INSERT INTO my_table VALUE (?)',

iter_data()

)

This is not only more concise, but also more efficient. In fact, sqlite3 uses executemany to implement execute behind the scenes, but the latter inserts one row instead of multiple rows.

I wrote a small benchmark to insert a million rows into an empty table (database in memory):

executemany: 1.6 seconds

execute: 2.7 seconds

One of the things I often get confused with at first is cursor management. Online examples and documentation are usually as follows:

connection = sqlite3.connect(':memory:')

cursor = connection.cursor()

# Do something with cursor

But in most cases, you don't need the cursor at all, you can just use the connection object (mentioned at the end of this article).

Operations like execute and executemany can be called directly on the connection. Here is an example to demonstrate this:

import sqlite3

connection = sqlite3(':memory:')

# Create a table

connection.execute('CREATE TABLE events(ts, msg)')

# Insert values

connection.executemany(

'INSERT INTO events VALUES (?,?)',

[

(1, 'foo'),

(2, 'bar'),

(3, 'base')

]

)

# Print inserted rows

for row in connnection.execute('SELECT * FROM events'):

print(row)

You may often see examples of using fetchone or fetchall to process the results of a SELECT query. But I found the most natural way to handle these results is to iterate directly over the cursor:

for row in connection.execute('SELECT * FROM events'):

print(row)

This way, as long as you get enough results, you can terminate the query without wasting resources. Of course, if you know in advance how many results you need, you can use the LIMIT SQL statement instead, but Python generators are very handy and allow you to separate data generation from data consumption.

Even in the middle of processing SQL transactions, nasty things can happen. To avoid manually handling rollbacks or commits, you can simply use the connection object as a context manager. In the following example, we create a table and insert duplicate values ​​by mistake:

import sqlite3

connection = sqlite3.connect(':memory:')

with connection:

connection.execute(

'CREATE TABLE events(ts, msg, PRIMARY KEY(ts, msg))')

try:

with connection:

connection.executemany('INSERT INTO events VALUES (?, ?)', [

(1, 'foo'),

(2, 'bar'),

(3, 'base'),

(1, 'foo'),

])

except (sqlite3.OperationalError, sqlite3.IntegrityError) as e:

print('Could not complete operation:', e)

# No row was inserted because transaction failed

for row in connection.execute('SELECT * FROM events'):

print(row)

connection.close()

...when it really works

There are several pragmas in your program that can be used to adjust the behavior of sqlite3. In particular, one of the things that can improve performance is synchronous:

connection.execute('PRAGMA synchronous = OFF')

You should know this can be dangerous. If the application crashes unexpectedly in the middle of a transaction, the database may be left in an inconsistent state. So please use it carefully! But if you're going to insert a lot of rows faster then this might be an option.

Suppose you need to create several indexes on the database, and you need to create indexes while inserting many rows. Postponing index creation until after all rows have been inserted can lead to substantial performance improvements.

It is convenient to include values ​​into queries using Python string manipulation. But this is very unsafe, and sqlite3 gives you a better way to do it:

# Do not do this!

my_timestamp = 1

c.execute("SELECT * FROM events WHERE ts = '%s'" % my_timestamp)

# Do this instead

my_timestamp = (1,)

c.execute('SELECT * FROM events WHERE ts = ?', my_timestamp)

Also, string interpolation using Python %s (or format or format string constants) is not always possible with executemany. So there's no real point in trying here!

Keep in mind that these little tricks may (or may not) benefit you, depending on the specific use case. You should always try it yourself and decide if it's worth it.

Translator: Holy Son, Tocy

Original : https://remusao.github.io/posts/2017-10-21-few-tips-sqlite-perf.html

Translation: https://www.oschina.net/translate/few-tips-sqlite-perf

Editor:

Related: Python judges sqlite connection status_Some tips for querying sqlite3 in Python