Rails 7 is introducing attribute-level encryption for Rails models. We need to do a bit of setup and we are good to go with an encrypted column for a model using -
class User < ApplicationRecord
encrypts :card_number
end
Please refer to the following articles for setup -
Though Rails has made it dead simple to use encryption for rails models, it comes at a cost of memory consumption and some extra computation cycle. Here in this article, we would like to explore more about the impact of the encryption on models. Let's consider the following areas of impact -
- Performance
- Search limitation
- Migrations
Performance
We've done benchmarking of the encryption for INSERT and SELECT queries. Here is some explanation about the setup for benchmarking script -
- We've created 2 models User and EncryptedUser with the same attributes and same data types:
t.string "name"
t.integer "age"
t.string "email"
t.string "card"
- EncryptedUser model has an encrypted column card while User model has card data in the plain text.
class EncryptedUser < ApplicationRecord
encrypts :card, deterministic: true
end
# User
class User < ApplicationRecord
end
We are trying to compare the performance difference of operations between the encrypted model and the unencrypted model. Let's look at the create script -
require 'benchmark'
n = 100000
Benchmark.bmbm do |x|
x.report("plaintext") do
(1..n).each do |i|
User.create!(name: "Name1234#{i}", age: i, email: "tes#{i}t@test.com", card: "test123#{i}")
end
end
x.report("encrypted") do
(1..n).each do |i|
EncryptedUser.create!(name: "Name1234#{i}", age: i, email: "tes#{i}t@test.com", card: "test123#{i}")
end
end
end
Rehearsal ---------------------------------------------
plaintext 19.729288 20.898011 40.627299 ( 46.350099)
encrypted 26.822756 20.954334 47.777090 ( 53.442452)
----------------------------------- total: 88.404389sec
user system total real
plaintext 19.537394 20.863304 40.400698 ( 46.030128)
encrypted 27.223730 21.005352 48.229082 ( 53.446412)
It's clear from the above benchmark that the encrypted model is spending 35% more time in the user process. Rails is trying to generate the encrypted record, which is an extra overhead resulting in the performance loss.
3.0.3 :004 > EncryptedUser.create!(name: "Name1234", age: 25, email: "test@test.com", card: "test123")
TRANSACTION (0.0ms) begin transaction
EncryptedUser Create (0.7ms) INSERT INTO "encrypted_users" ("name", "age", "email", "card", "created_at", "updated_at")
VALUES (?, ?, ?, ?, ?, ?)
[["name", "Name1234"], ["age", 25], ["email", "test@test.com"],
["card", "{\"p\":\"rfUfxkqjcw==\",\"h\":{\"iv\":\"RAp2wrgaYxAdp0ks\",\"at\":\"KblUHHVkbfKvwC3elwOsmg==\"}}"],
["created_at", "2022-01-27 14:11:29.916323"], ["updated_at", "2022-01-27 14:11:29.916323"]]
TRANSACTION (0.2ms) commit transaction
Let's quickly check the benchmarking for the SELECT from the database as well.
require 'benchmark'
n = 100000
Benchmark.bmbm do |x|
x.report("plaintext") { (1..n).each {|i| User.find_by(card: "test123#{i}")} }
x.report("encrypted") { (1..n).each {|i| EncryptedUser.find_by(card: "test123#{i}")} }
end
Rehearsal ---------------------------------------------
plaintext 18.433694 0.475423 18.909117 ( 18.909963)
encrypted 22.760291 0.499193 23.259484 ( 23.260218)
----------------------------------- total: 42.168601sec
user system total real
plaintext 18.387190 0.468856 18.856046 ( 18.856920)
encrypted 22.755669 0.505394 23.261063 ( 23.262506)
Here also CPU is spending 22% more time in the User process for the same reason as above for the encrypted model.
Search Limitation
Note: Search is only possible for the deterministic encryption
Format of the encrypted data in the database will help us in understanding the limitation of search. It will help us build the mental model for the possible search queries.
3.0.3 :005 > enc_user = JSON.parse(EncryptedUser.connection.
execute('SELECT card FROM encrypted_users LIMIT 1').first["card"])
(0.4ms) SELECT card FROM encrypted_users LIMIT 1
=> {"p"=>"VO78P4kccg==", "h"=>{"iv"=>"OwJbRnrMcXncCYsL", "at"=>"+HxFTBKD8DEQzpJjDDlYfA=="}}
As we can see rails stores encrypted data in 2 main parts -
- p(payload) - This is our encrypted data
- h(headers) - This stores the context in which our data got encrypted or in other words, this is the meta data of the encrypted data which helps in decryption. This also has 2 components -
- iv is the initialization vector that was used for the encryption
- at is the authentication header that contains the time stamp to determine if the payload has been altered.
It is evident from the payload that if ActiveRecord want to fetch the from the database then it has to search for the full payload this has the following complications -
- Fuzzy search - Fuzzy search won't be possible because the generated payload will not be able to match with the encrypted payload.
- Case Insensitive Search - Let's take an example to understand this scenario
3.0.3 :006 > EncryptedUser.find_by(card: "hello")
EncryptedUser Load (42.5ms) SELECT "encrypted_users".* FROM "encrypted_users" WHERE "encrypted_users"."card" = ? LIMIT ? [["card", "{\"p\":\"vfWQJw8=\",\"h\":{\"iv\":\"MGIPdU60GC2eD2+g\",\"at\":\"1vTWU2vq+L4xSidsjpb9rQ==\"}}"], ["LIMIT", 1]]
=> nil
3.0.3 :007 > EncryptedUser.find_by(card: "HELLO")
EncryptedUser Load (30.8ms) SELECT "encrypted_users".* FROM "encrypted_users" WHERE "encrypted_users"."card" = ? LIMIT ? [["card", "{\"p\":\"myHo5A8=\",\"h\":{\"iv\":\"B9DhI2q/yWiyZ+Zb\",\"at\":\"VhIiFBhe7MC7DhjODkAbGQ==\"}}"], ["LIMIT", 1]]
As we can see that for the same text there 2 different payloads got generated and because of this, case insensitive search is not possible by default. But there are 2 options to enable this-
- Save the content in the lowercase and keep the search query also in the lowercase. ActiveRecord allows this using downcase: true. This way our content and search term both gets lowercased by default.
class User encrypts :card, deterministic: true, downcase: true end
- Original formatting of data is lost using the above approach so we can use ignore_case: true. This adds one more column with the name original_<column_name> which retains the original formatting of the content. This is a better way if case sensitivity of data is important.
class User encrypts :card, deterministic: true, ignore_case: true # content with the original case will be stored # in the original_card column. end
- Unique Validations - We'll only be able to use unique validations for the deterministic encrypted columns. Payload will always be different even for the same data in case of non-deterministic encryption so it won't work in that case.
- Unique Indexes - We need to ensure that ciphertext of the encrypted column doesn't ever change to support this. Don't use unique indexes if you are periodically planning to rotate keys for the encryption.
- SQL queries - We will not be able to run direct SQL queries on the table for the encrypted data.
Migrations
There are 2 scenarios when we want to encrypt already existing columns:
Unencrypted columns
We can use config.active_record.encryption.support_unencrypted_data option to support a column with the plain as well as encrypted data. In this case, generated queries will have plain text as well in the queries to search on the plain text. Please note this option should only be used in the case of the transition period when the table has encrypted and plain data. This should be disabled once migrations are completed.
Already encrypted columns
If data is encrypted using some other gems like Lockbox or attr_encrypted then we need to devise a migration strategy to decrypt data to plain text, we need to store data temporarily in a new column and then re-encrypt using the default ActiveRecord encryption. We can use the following steps to switch to default Rails default migration -
- Add a new column to the table for the plaintext data
- Write a rake job to copy the plain text data from the encrypted column to the unencrypted column
- Run the rake job for the data migration
- Setup Rails encryption in the app.
- Add the encrypts :<column_name> in the model
- Write a rake job to copy the plain text data to the encrypted column
- Delete the plain text column.
Happy coding :).
You can connect with us on Twitter for your feedback or suggestions.