To insert multiple rows at once using DML in BigQuery, you can do something like this:
from google.cloud import bigquery
from google.cloud.bigquery import dbapi
from faker import Faker
...
rows_to_insert = []
for i in range(0, args.count):
id = str(uuid.uuid4())
createdAt = fake.date_time_between(args.start_date, args.end_date).isoformat()
rows_to_insert.append({
"id": id,
"createdAt": createdAt
})
bqclient = bigquery.Client(project = args.storage_project_id)
dataset = args.storage_project_id
table = 'table'
query_text = f"""
INSERT INTO {dataset}.{table} (id, createdAt) VALUES (%(id:STRING)s, %(createdAt:TIMESTAMP)s)
"""
conn = dbapi.Connection(bqclient)
cur = dbapi.Cursor(conn)
cur.executemany(query_text, rows_to_insert)
conn.close()
The above snippet inserts multiple rows into the table (table with columns: id and createdAt) in one go.
You can also see that values types are being supplied (id:STRING and createdAt:TIMESTAMP). Binding the values this way (using parameters) will help with preventing SQL Injection while also ensuring data integrity.
I am using the DB API instead of the Streaming API here because the rows inserted via streaming API are not available for update for up-to 90 minutes.