9. Advanced - How to Define Custom Gates
In an earlier example notebook, we showed you the basics of defining custom gates. The approach demonstrated there may already be enough for your purposes, and potentially already as efficient as possible. In this notebook, we will discuss the considerations that go into making high-performing gates.
using Revise
using PauliPropagationThe SWAP example
Let us again consider the SWAP gate example
struct CustomSWAPGate <: StaticGate
qinds::Tuple{Int, Int} # The two sites to be swapped
endAgain define the action,
function PauliPropagation.apply(gate::CustomSWAPGate, pstr, coeff; kwargs...)
# get the Pauli on the first site
pauli1 = getpauli(pstr, gate.qinds[1])
# get the Pauli on the second site
pauli2 = getpauli(pstr, gate.qinds[2])
# set the Pauli on the first site to the second Pauli
pstr = setpauli(pstr, pauli2, gate.qinds[1])
# set the Pauli on the second site to the first Pauli
pstr = setpauli(pstr, pauli1, gate.qinds[2])
# apply() is always expected to return a tuple of (pstr, coeff) tuples
return tuple((pstr, coeff))
endNow set up a bigger simulation with 25 qubits on a 5 by 5 grid.
nx = 5
ny = 5
nq = nx * ny
topology = rectangletopology(nx, ny)nl layers of a circuit consisting of RX and RZZ Pauli rotations, but insert a layer of swaps in between.
trotter_layer = tfitrottercircuit(nq, 1; topology=topology);
nl = 3
ourSWAP_circuit = Gate[]
append!(ourSWAP_circuit, trotter_layer)
for _ in 2:nl
append!(ourSWAP_circuit, (CustomSWAPGate(pair) for pair in topology))
append!(ourSWAP_circuit, trotter_layer)
endnparams = countparameters(ourSWAP_circuit)195Define our observable as $Z_7 Z_{13}$.
pstr = PauliString(nq, [:Z, :Z], [7, 13])PauliString(nqubits: 25, 1.0 * IIIIIIZIIIIIZIIIIIII...)Circuit parameters with a random seed.
using Random
Random.seed!(42)
thetas = randn(nparams)For this notebook, we will use a minimum coefficient threshold. The results are still almost exact in this simple case.
min_abs_coeff = 1e-30.001Run the circuit
@time ourSWAP_psum = propagate(ourSWAP_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff) 3.075229 seconds (2.46 M allocations: 121.152 MiB, 0.77% gc time, 97.81% compilation time)
PauliSum(nqubits: 25, 13198 Pauli terms:
-0.013537 * IIZZIZXXIIIIYZIIIZII...
0.0025156 * IIIIIIXYIIIIZIIIIXZI...
-0.0016705 * IIIIIIYYXIIXYZIIZZII...
0.002661 * IIIZIIZYYIIZYXIIIZZI...
-0.0027996 * IIXZIIYYIIIIYZIIIYZI...
-0.001725 * IIZZIIYYZIIIZYIIIIZI...
0.0013354 * IIZZIYYYIIIIXIIIIXII...
0.001438 * IIIIIIXYZIIZIYIIIIZZ...
0.0015348 * IZIIIIXIIIIXYIIIIXZI...
-0.0018285 * IZIIIZYIIIIYYIIIIYZI...
0.0034953 * IIIIIIZYIIIIZIIIIZII...
0.0027765 * IIIIIIYIZIIXXXZIZYZI...
-0.0010834 * IIZIIZXYZIIIYYZIIZZZ...
-0.0030723 * IIZZIYYZZIIIYYIIIYII...
-0.0011065 * IZIZIIXZIIIIYZIIIXZI...
0.0025227 * IIXIIZZIIIIZXIIIIYZI...
0.001622 * IZIIIYYYIIIXXIIIZZII...
-0.0020525 * IIIIIYYYIIIXYIIIZZII...
-0.0025499 * IIIIIIXIIIIXXIIIZXZI...
-0.0066057 * IIYZIZZIIIIZXIIIIZII...
⋮)Overlap with the zero-state:
overlapwithzero(ourSWAP_psum)0.11159958430219122We already mentioned that PauliPropagation.jl contains a CliffordGate implementation of SWAP. Let's implement the same thing and compare performance.
cliffSWAP_circuit = Gate[]
append!(cliffSWAP_circuit, trotter_layer)
for _ in 2:nl
append!(cliffSWAP_circuit, (CliffordGate(:SWAP, pair) for pair in topology))
append!(cliffSWAP_circuit, trotter_layer)
end@time cliffSWAP_psum = propagate(cliffSWAP_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff) 0.308252 seconds (110.23 k allocations: 6.868 MiB, 79.18% compilation time)Are the results the same?
overlapwithzero(cliffSWAP_psum)0.11159958430219122cliffSWAP_psum == ourSWAP_psumtrueYes! We can also benchmark the performance.
using BenchmarkTools@btime propagate($ourSWAP_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff) 44.344 ms
(1355 allocations: 1.83 MiB)@btime propagate($cliffSWAP_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff) 41.943 ms
(1355 allocations: 1.83 MiB)No downside at all from defining our custom gate. How? This is because the apply function for this gate is type stable! Type stability is absolutely crucial in Julia, and codes live and die by it.
@code_warntype apply(CustomSWAPGate((1, 1)), pstr.term, 1.0)MethodInstance for apply(::CustomSWAPGate, ::PauliPropagation.UInt56, ::Float64)
from apply(
[90mgate[39m::[1mCustomSWAPGate[22m, [90mpstr[39m, [90mcoeff[39m; kwargs...)[90m @[39m [90mMain[39m [90m[4mIn[3]:1[24m[39m
Arguments
#self#[36m::Core.Const(PauliPropagation.PropagationBase.apply)[39m
gate[36m::CustomSWAPGate[39m
pstr[36m::PauliPropagation.UInt56[39m
coeff[36m::Float64[39m
Body[36m::Tuple{Tuple{PauliPropagation.UInt56, Float64}}[39m
[90m1 ─[39m
%1 = Main
.:(var"#apply#1")[36m::Core.Const(Main.var"#apply#1")[39m
[90m│ [39m %2 = Core
.NamedTuple()[36m::Core.Const(NamedTuple())[39m
[90m│ [39m %3 = Base.pairs(%
2)[36m::Core.Const(Base.Pairs{Symbol, Union{}, Nothing, @NamedTuple{}}())[39m
[90m│ [39m %4 = (%1)(%3, #self#, gate, pstr, coeff)[36m::Tuple{Tuple{PauliPropagation.UInt56, Float64}}[39m
[90m└──[39m return %4All blue means that everything is great! If correctly implemented, apply will be type stable if it returns a known number of Pauli and coefficient pairs. Here it is just 1 because it is a Clifford gate.
A gate that branches into more than one Pauli string
Onto an example of a gate that can split a Pauli string into two: The T gate.
struct CustomTGate <: StaticGate
qind::Int
endA T gate is a non-Clifford gate that commutes with I and Z, splits X into cos(π/4)X - sin(π/4)Y, and Y into cos(π/4)Y + sin(π/4)X (in the Heisenberg picture).
Let's write the code for that.
function PauliPropagation.apply(gate::CustomTGate, pstr, coeff; kwargs...)
# get the Pauli on the site `gate.qind`
pauli = getpauli(pstr, gate.qind)
if pauli == 0 || pauli == 3 # I or Z commute
# return a tuple of one (pstr, coeff) tuple
return tuple((pstr, coeff))
end
if pauli == 1 # X goes to X, -Y
new_pauli = 2 # Y
# set the Pauli
new_pstr = setpauli(pstr, new_pauli, gate.qind)
# adapt the coefficients
new_coeff = -1 * coeff * sin(π/4)
else # Y goes to Y, X
new_pauli = 1 # X
# set the Pauli
new_pstr = setpauli(pstr, new_pauli, gate.qind)
# adapt the coefficients
new_coeff = coeff * sin(π/4)
end
updated_coeff = coeff * cos(π/4)
# return a tuple of two (pstr, coeff) tuples
return tuple((pstr, updated_coeff), (new_pstr, new_coeff))
endInsert a layer of TGates after each Trotter layer.
ourT_circuit = Gate[]
for _ in 1:nl
append!(ourT_circuit, trotter_layer)
append!(ourT_circuit, (CustomTGate(qind) for qind in 1:nq))
endAnd run:
@time ourT_psum = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff) 0.370778 seconds (119.49 k allocations: 7.802 MiB, 75.85% compilation time)
PauliSum(nqubits: 25, 15787 Pauli terms:
-0.0032424 * IXIIIIZXXIIZXZIIIZII...
0.0027962 * IZZIIIIYIIIZXZIIIZII...
-0.0014795 * IIZIIIZXZIIZZXZIIIZI...
0.0037487 * IZIIIYXZIIIXXIIIZZII...
0.0014359 * IZIIIIYZIIIXYIIIZXZI...
-0.0017245 * IYYIIIXXXIIIXIIIIZII...
-0.0029594 * IXIIIIXXIIIIYZIIIZII...
0.0010417 * XXZIIZXYIIIIYZIIIZII...
0.0025217 * IIIIIZXIZIIIYYIIZXII...
-0.0018537 * IXIIIIYIIIIZXZIIIYZI...
-0.0010255 * IZIIIZYZIIIXYZIIZYII...
-0.0037346 * IZIIIYXYIIIXYZIIZZII...
-0.0010858 * IZIIIIYIIIIZXZIIZYZI...
-0.0017078 * IZZIIYXYZIIZXZIIIZII...
0.0013272 * IYIIIYYXXIIIYZIIIZII...
0.0010339 * IIIIIIZIZZIIYXYIIYIZ...
0.0011072 * IZIIIIXIIIIXYIIIIXZI...
-0.0010679 * IIIZIIZIYIIIYZZIIZII...
-0.0010275 * IXZIIYXYZIIZXZIIIZII...
0.0012996 * IYIIIYYIIIIZZIIIIZII...
⋮)overlapwithzero(ourT_psum)0.3146654070299996But did it work? Again, we have an implementation of a TGate in our library. In case you are interested, we currently implement T gates as Pauli Z rotations at an angle of π/4. Let's compare to that.
libraryT_circuit = Gate[]
for _ in 1:nl
append!(libraryT_circuit, trotter_layer)
append!(libraryT_circuit, (TGate(qind) for qind in 1:nq))
endIf you call PauliGate(:Z, qind, parameter), this will create a so-called FrozenGate wrapping the parametrized PauliGate, with a fixed parameter at the time of circuit construction.
Run it and compare
@time libraryT_psum = propagate(libraryT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff) 0.100648 seconds (15.11 k allocations: 2.890 MiB, 16.24% compilation time)overlapwithzero(libraryT_psum)0.3146654070299997libraryT_psum == ourT_psumtrueIt works! But is it optimal?
using BenchmarkTools@btime propagate($ourT_circuit, $pstr, $thetas;min_abs_coeff=$min_abs_coeff) 65.407 ms
(1351 allocations: 2.22 MiB)@btime propagate($libraryT_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff) 54.233 ms
(1656 allocations: 2.24 MiB)No, because apply for the CustomTGate is not type-stable.
@code_warntype apply(CustomTGate(1), pstr.term, 0.0)MethodInstance for apply(::CustomTGate, ::PauliPropagation.UInt56, ::Float64)
from apply([90mgate[39m::[1mCustomTGate[22m, [90mpstr[39m, [90mcoeff[39m; kwargs...)[90m @[39m [90mMain[39m [90m[4mIn[21]:1[24m[39m
Arguments
#self#[36m::Core.Const(PauliPropagation.PropagationBase.apply)[39m
gate[36m::CustomTGate[39m
pstr[36m::PauliPropagation.UInt56[39m
coeff[36m::Float64[39m
Body[33m[1m::Union{Tuple{Tuple{PauliPropagation.UInt56, Float64}}, Tuple{Tuple{PauliPropagation.UInt56, Float64}, Tuple{PauliPropagation.UInt56, Float64}}}[22m[39m
[90m1 ─[39m %1 = Main.:(var"#apply#8")[36m::Core.Const(Main.var"#apply#8")[39m
[90m│ [39m %2 = Core.NamedTuple()[36m::Core.Const(NamedTuple())[39m
[90m│ [39m %3 = Base.pairs(%2)[36m::Core.Const(Base.Pairs{Symbol, Union{}, Nothing, @NamedTuple{}}())[39m
[90m│ [39m %4 = (%1)(%3, #self#, gate, pstr, coeff)[33m[1m::Union{Tuple{Tuple{PauliPropagation.UInt56, Float64}}, Tuple{Tuple{PauliPropagation.UInt56, Float64}, Tuple{PauliPropagation.UInt56, Float64}}}[22m[39m
[90m└──[39m return %4It either returns a tuple of one tuple Tuple{Tuple{UInt64, Float64}} or a tuple of two tuples Tuple{Tuple{UInt64, Float64}, Tuple{UInt64, Float64}}. Yellow @code_warntype output means it might be okay (it is not that much slower after all), but be wary of red. When this is the case, you may want to define some more involved functions above apply() for optimal performance. This is how we would do it. To avoid such type instabilities, we can overload a higher level function applytoall!(). This is also usefull because the runtime of the T-gate simulation is dominated by commutation (because the Pauli I is very comon for local observables), we could leave those commuting Pauli strings where they are -> in their original Pauli sum. For this, we can overload the function applytoall!(), which differs in that one performs the loop over the Pauli strings in the Pauli sum here, and one can thus use the old Pauli sum more flexibly. You will receive a AbstractPauliPropagationCache object (here a PauliPropagationCache that wraps two PauliSum objects. Our convention is that anything left in the main psum or the auxiliary aux_psum is later merged back into psum. Thus, we can simply skip the commuting Pauli strings, and edit the coefficient of Pauli strings in-place. See this version of the function:
function PauliPropagation.applytoall!(gate::CustomTGate, prop_cache::AbstractPauliPropagationCache; kwargs...)
psum = mainsum(prop_cache)
aux_psum = auxsum(prop_cache)
for (pstr, coeff) in psum
# the content of the previous apply() function:
pauli = getpauli(pstr, gate.qind)
if pauli == 0 || pauli == 3 # I or Z commute
# do nothing
continue
end
if pauli == 1 # X goes to X, -Y
new_pauli = 2 # Y
new_pstr = setpauli(pstr, new_pauli, gate.qind)
new_coeff = -1 * coeff * sin(π/4)
else # Y goes to Y, X
new_pauli = 1 # X
new_pstr = setpauli(pstr, new_pauli, gate.qind)
new_coeff = coeff * sin(π/4)
end
updated_coeff = coeff * cos(π/4)
# you can use add!() if the Pauli sums may already contain that string
# here we know they don't, so we use set!() for minorly better performance
set!(psum, pstr, updated_coeff)
set!(aux_psum, new_pstr, new_coeff)
end
return prop_cache
end@time ourT_psum2 = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff) 0.208963 seconds (46.88 k allocations: 4.395 MiB, 46.51% compilation time: 100% of which was recompilation)overlapwithzero(ourT_psum2)0.3146654070299997ourT_psum == ourT_psum2trueAnd check the performance. It is equivalent for all intents and purposes.
@btime propagate($ourT_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff) 54.878 ms
(1356 allocations: 2.23 MiB)@btime propagate($libraryT_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff) 53.936 ms
(1656 allocations: 2.24 MiB)Enjoy defining custom and high-performance gates!