The CLAS12 Trigger System

This article describes the CLAS12 Trigger System. The simulation, hardware, and software design, as well as all validation procedures, are discussed. The ﬁrmware development tools used are discussed as well, including our experience with VIVADO High Level Synthesis.


Overview
The CLAS12 Trigger System provides trigger signals for the CLAS12 detector Data Acquisition (DAQ) system [1]. It was originally designed to select physics events with scattered electrons detected in the CLAS12 Electromagnetic Calorimeter 5 System (ECAL) [2] and High Threshold Cherenkov Counter (HTCC) [3], with the possibility to require a track in the CLAS12 Drift Chambers (DC) [4]. In later stages of development, signals from additional detectors were included into the Trigger System, making it flexible and efficient to select events 10 for the different experiments within the CLAS12 physics program.

Requirements
The CLAS12 detector (see Ref. [5]) was designed to study the interactions of electrons and photons with nucleons and nu-15 clei at a nominal luminosity of 1 × 10 35 cm −2 s −1 . The CLAS12 Trigger System has to provide trigger signals for these processes. Based on the simulation of the physics processes of interest, the required event rate was estimated to be up to 20 kHz. The trigger latency is required to be not less than 8 µs to provide 20 sufficient time for trigger logic processing.
The following detectors were defined to be part of the trigger system:

Design
The CLAS12 Trigger System was designed as a 3-stage 35 pipeline-style system with total latency up to 8 µs. Input information for the Trigger System comes from two sources: Flash Analog-to-Digital Converters (FADCs) used in the photomultiplier tube (PMT)-based detectors, and Drift Chamber Readout Boards (DCRBs) used in the Drift Chambers. The FADCs and 40 DCRBs work at the pre-trigger level, reporting information to the Trigger System in the appropriate form. Stage 1 receives information from the FADCs and DCRBs, and performs data processing according to the type of detector. Stage 2 performs a timing and geometry coincidence between different subsets 45 of the detectors in six groups, corresponding to the six-sector CLAS12 Forward Detector structure, as well as requires coincidence with information from the central detectors. Stage 3 forms the final trigger decision. The CLAS12 trigger diagram is shown in Fig. 1.

FADCs as Pre-trigger
All PMT-based detectors in CLAS12 participating in the Trigger System use JLab VXS 250 MHz flash ADCs as the starting point of the trigger logic (FADC) [1]. Each channel of the FADC boards is pre-programmed with gain, pedestal, and 55 amplitude threshold above pedestal. Every pulse above amplitude threshold is integrated and sent to the corresponding section of the Stage 1 trigger logic. The 16-channel FADC boards report 13-bit pulse integrals and 3-bit pulse time every 32 ns, which allows the following trigger logic to restore 4 ns pulse 60 resolution, while the double pulse resolution remains 32 ns. Based on the FADC reporting schedule, the following trigger logic stages can work on a 250 MHz clock. However, in that case we found it problematic to meet the Field Programmable Gate Array (FPGA) timing. Because of this, our Stage 1 algo-65 rithms run on 125 MHz or slower clocks as described below. The trigger information is provided to the following stages using VXS backplane serial lines.

DCRBs as Pre-trigger
The Drift Chamber-based trigger uses JLab 125 MHz dis-70 criminator/TDC boards (DCRBs) [1] to feed the Trigger System. These 96-channel units report hits above the preprogrammed thresholds every 16 ns. As for the FADC boards, the DCRBs are implemented in VXS format and provide trigger information using VXS backplane serial lines.

Stage 1 Trigger
The Stage 1 trigger uses specially designed VXS Trigger Processor boards (VTPs) (see Section 4.2). The VTP boards are installed in switch slots in every VXS crate participating in the Trigger System. The VTPs collect trigger data from the pre-80 trigger boards (FADCs and DCRBs) over VXS serial lines.
The most complex processing is performed for the Electromagnetic Calorimeter system (cluster finding) and the Drift Chambers (segment and road finding). In the following sections we describe the design of the various trigger components.

Electromagnetic Calorimeters
The CLAS12 Electromagnetic Calorimeter (ECAL) [2] includes two separate subsystems, the EC and PCAL. Each consists of multiple layers of scintillating strips and lead sheets with PMT readout on one side of the scintillators (the PCAL 90 is shown in Fig. 2, the EC is similar). The primary purpose of these detectors is electron identification by defining the energy and coordinate of their electromagnetic showers, referred to as clusters. The cluster finding algorithm was well established during offline data processing development, and was adopted 95 for the trigger implementation with some simplifications.
The algorithm first searches for one-dimensional clusters in each of the three calorimeter views (U,V,W), sorting them by energy and keeping only those above threshold, with a maximum number of four clusters in each view. Next the algorithm 100 searches for two-dimensional clusters looking for overlap between the three views. For all two-dimension clusters found, it performs attenuation corrections based on pre-loaded tables of the attenuation lengths of the scintillation strips using the distance from the cluster to the PMT, to determine the correct 105 cluster energy. Finally, the algorithm sorts the two-dimensional clusters by energy and reports those above threshold, with a maximum number limited to four. For every cluster, the energy and coordinates are reported to the Stage 2 trigger every 8 ns. There is a persistency parameter that allows the same clusters 110 to be reported for several consecutive 8 ns intervals to check for a timing coincidence with the other trigger components, as well as a timing delay parameter for the same purpose. One event with a single cluster is shown in the PCAL (Pre-shower Calorimeter) in Fig. 2. The corrected energies are shown for the 115 individual strips.
It should be mentioned that such an algorithm is designed to find clusters with a maximum energy to target electron identification. For some CLAS12 experiments, it is necessary to identify minimum-ionizing particles (MIPs) using the same trigger 120 component. For that purpose, clusters with energy below a certain defined threshold can be selected. Such a method works for events where the number of clusters does not exceed four, otherwise there is a risk of losing low-energy clusters corresponding to MIPs. Intensive trigger efficiency studies were conducted for 125 such cases, and the trigger efficiency was measured and found Figure 2: Trigger System representation of a cluster reconstruction using the three views of the PCAL in one sector of CLAS12. Each of the three peaks is shown as a bar before (black) and after (red) the energy correction. The crossing lines indicate that the current event satisfies the Dalitz rule and will be accepted. The green line represents a single peak in one of the views that does not have partners in the other two views that satisfies Dalitz rule.

High Threshold Cherenkov Counter
The CLAS12 High Threshold Cherenkov Counter (HTCC) [3] serves as one of the primary components of the electron trigger logic. It was specially designed to discriminate electrons from other charged particles. The HTCC consists of 48 mirror sections readout by PMTs connected to FADCs (see Fig. 3). For trigger purposes, a 2×2 section sliding window is used to identify clusters. The cluster may include 135 from one to four PMT signals collecting the Cherenkov light from the adjacent mirrors as shown in Fig. 3. The configuration parameters include the single channel energy threshold, cluster multiplicity threshold, and cluster energy threshold. The results are reported to the Stage 2 trigger as 48-bit masks every 4 ns.

140
The FADC "gain" configuration parameter allows for PMT energy calibrations, making it possible to set energy thresholds in terms of the number of photoelectrons.

Drift Chambers
The CLAS12 Drift Chambers (DCs) [4] contain six superlay-145 ers in each of the six CLAS12 forward sectors. Each superlayer contains six layers with 112 sense wires in each layer. There is no signal amplitude information available, only hit information can be used in the trigger. The trigger algorithm was designed as a two-step process.

150
In the first step it searches for segments in each of the six superlayers, reporting a 112-bit mask with the bits set for the segments found on each superlayer. The search for segments is conducted based on a pre-loaded segment dictionary, generated by the Drift Chamber simulation software based on the wire lo-155 cations in the superlayers. The 112-bit mask indicates whether a segment was found that has a hit starting at the bit position of the first wire layer (see Fig. 4). If several segments are found around the same location, the one with the maximum number  of hits is kept. In theory, the number of layers contributing to 160 each segment must be equal to 6, and the number of hit wires in a segment can vary from 6 to 12 depending on the track position and angle. In practice, the number of layers and hits in each segment can be less because of Drift Chamber inefficiencies and hardware problems, so the threshold for the segment 165 finder in the trigger logic is set to require 4 out of 6 layers to match.
After the segment search is complete and the six 112-bit masks are ready, the second step is performed, in which a preloaded road dictionary is used to identify possible track candi-170 dates (so-called road finding). The road dictionaries were generated by the GEMC Monte Carlo simulation program [10] or taken from the beam data (see details in Section 10.4). At least five out of six superlayers are required to satisfy the trigger condition. A 512-bit mask is generated every 32 ns contain-175 ing the road projections to other detectors (HTCC, PCAL, and FTOF) and sent to the Stage 2 trigger where geometry matching is done. Figure 5: Drift chamber reconstructed tracks used for road generation of the scattered electron. Tracks are from data taken with beam. The black and red lines in the left plot of polar angle θ vs. momentum indicate the region used for the road dictionary generator. Additional requirements include a 1 GeV minimum momentum and a z-vertex position cut (along the beamline) of -3 cm ± 7 cm -see the right plot.
To reduce the size of the road dictionary and improve the selection purity, we constrained the particle energy, vertex posi-180 tion, and kinematics of the tracks used to generate the roads (see Fig. 5). Multiple dictionaries can be run simultaneously based on the needs of the physics program. The data-based electron dictionary generator can be seen to slowly converge as more data-based tracks are fed into it (see Fig. 6). Segment positions 185 are smeared by ±1 to fill in holes in the dictionary, which brings the efficiency to nearly 100% with much less data to generate the dictionary (also at the cost of lower purity). The dictionary efficiencies for smearing vs. no smearing are demonstrated in Fig. 7 and Fig. 8, respectively. The current road dictionary 190 uses a single FPGA LUT6 element for each unique road entry, allowing for roughly 200k unique entries. Using six 112-bit masks, the dictionary size requirement exceeded the available FPGA resources, so we reduced the bit masks to 56 bits each by combining every two consecutive bits. The scattered elec-195 tron dictionary size is only 36k LUT6 elements, leaving space for several additional dictionaries.

Forward Time-of-Flight System
The CLAS12 Forward Time-of-Flight System (FTOF) [6] contains two layers of scintillating counters in each forward 200 sector, but only one layer is used by the trigger logic. This layer contains 62 counters with PMT readout on both ends. When both PMTs report a signal above threshold, the trigger system considers it as a hit. A 62-bit hit mask is reported to the Stage 2 trigger every 4 ns. The trigger logic configuration in-205 cludes a single channel energy threshold and a counter average energy threshold (geometric mean). The FTOF participates in non-electron triggers such as the muon trigger.

Central Time-of-Flight System
The CLAS12 Central Time-of-Flight System (CTOF) [7] 210 consists of 48 scintillation counters, surrounding the target as a barrel, with PMT readout from both ends. Its trigger logic is similar to that for FTOF, with a 48-bit mask reported to the Stage 2 trigger every 4 ns.

215
The CLAS12 Central Neutron Detector (CND) [8] consists of three layers of scintillation counters, installed radially outward from CTOF, with 24 counters per layer and 72 counters total. Its trigger logic is similar to that for FTOF and CTOF, with a 24-bit mask reported to the Stage 2 trigger every 4 ns 220 (usually the inner layer only).

Forward Tagger Calorimeter and Hodoscope
The CLAS12 Forward Tagger Calorimeter and Hodoscope (FT) [9] trigger is designed to trigger on electrons at small forward polar angles (from 2 • to 5 • ). The calorimeter is a stack 225 of 332 lead tungstate crystals connected to avalanche photodiodes (APDs) that are readout by FADCs. The hodoscope consists of two scintillating fiber layers, each having 116 pixels (of two sizes) that match the geometry of the calorimeter. The calorimeter trigger finds clusters by looking for a seed hit at 230 each crystal location. If the deposited energy in a crystal is greater than the seed threshold and is a local maximum in space (using a 3×3 crystal view) and time, then it is considered a seed hit. For each seed hit, a cluster is formed by summing all of the energies centered on the seed hit in a 3×3 crystal view for all 235 hit times coincident with the seed hit (up to ±16 ns). The seed hit time, which due to time-walk effects is the earliest hit in the cluster, is used for the cluster time stamp, providing a 4 ns resolution. The geometrically matched hodoscope pixels for both layers are checked for time coincident hits with the calorime-240 ter seed hit and the cluster is tagged as having none, layer 1, layer 2, or both layers of the hodoscope present. Found clusters are serialized and streamed to the Stage 2 trigger where several programmable trigger cuts can discriminate clusters based on energy, charge, and multiplicity. Figure 7: Dictionary efficiency for electron roads with no segment position smearing the data-generated road dictionary shows efficiencies at 85% and above.

Stage 2 Trigger
The Stage 2 trigger collects data from Stage 1 using fiber optics. It is based on 7 SubSystem Processor boards (SSPs) (see Section 4.3) all installed in one VXS crate. After receiving the Stage 1 trigger streams, the SSPs form subsystem coincidences 250 for the six identical sets of forward detectors (called sectors) and the central detectors (all separately). Each subsystem trigger stream goes through a programmable delay that provides 4 ns resolution when deskewing to optimize the time coincidence. Next follows a programmable coincidence window for 255 each subsystem trigger stream, also with a 4 ns step resolution, to ensure that the different subdetector signals will remain stable long enough to form a time coincidence regardless of jitter due to particle time-of-flight, detector response, and trigger jitter. The Stage 2 trigger specifications are shown in Table 1.  ticipating in the trigger consist of CTOF, CND, and FT. 2 A single SSP collects all forward detector trigger streams from a single sector of CLAS12, and a single SSP collects all cen-265 tral detector trigger streams. After the delay and coincidence widths are applied to each input stream, the input streams are copied to 8 programmable sector trigger bits. Each sector trigger bit contains a variety of trigger primitives and customizable thresholds/cuts that can be tailored for a particular trigger type.

270
The sector trigger bits are computed and sent to the final Stage 3 trigger. The Forward Detector trigger primitives are shown in Table 2 and the Central Detector trigger primitives are shown in Table 3.

275
The Stage 3 trigger is the final stage and collects all sector and central trigger bit streams in a single module where they can be combined in a variety of ways to generate the global trigger bits used for reading out the Data Acquisition System (DAQ). It is implemented on a single VTP board installed in 280 the switch slot on the same VXS crate where all Stage 2 trigger SSPs reside. There are 32 independent trigger bits that can form a trigger based on any combination of sector and/or central trigger bits. Each trigger bit contains two sector trigger bit PCAL clusters. Since the PCALu strips are close and parallel to the FTOF bars, they are used to make a geometrical coincidence. 2 Note that the FT is actually part of the CLAS12 Forward Detector, but because it is not divided into sectors like the other Forward Detector subsystems, for triggering purposes it is listed here as part of the Central Detector.   conditions (required to both be true) and a single central trig-285 ger bit condition. Additionally, each trigger bit contains a 16bit prescaler, final pulse width, and scaler. The Stage 3 trigger specifications are shown in Table 4.

Trigger Information in Data Stream
An important part of the Trigger System is the Event Builder, 290 which allows the trigger components to participate in event-byevent readout the same way as is done for the DAQ components. All three stages of the Trigger System are equipped with Event Builders. Every time the CLAS12 DAQ is triggered, Stage 1 will build the data bank(s) with trigger decision details (such 295 as the ECAL cluster coordinate/energy or DC segment/road information), Stage 2 will build the data bank with sector-level and Central Detector coincidence results, and Stage 3 will build the data bank that contains the trigger bit decisions for all final 32 trigger bit decisions. Event Builders read information from 300 the pipeline-style buffers for a given programmable window related to the readout trigger time. All trigger-related data banks  are available in the data stream along with the DAQ data banks, providing detailed information about the trigger decision for every accepted event. In particular, this allows the Trigger System 305 to be run in "tagging mode", which is a powerful way to test the trigger efficiency (using either a loose or a random trigger).

Hardware Implementation
The CLAS12 Trigger System is implemented using High Speed Serial (VXS) techniques for a complete fully pipelined 310 multi-crate Trigger System that takes advantage of the elegant high-speed VXS serial extensions for VME. This Trigger System includes a pre-trigger level and three stages, starting with the front-end VXS crate Trigger Processor (VTP), a sector-level SubSystem Processor (SSP), a global VTP processor (GTP), 315 and a Trigger Supervisor (TS) that manages the timing, synchronization, and front-end event readout.
Within a front-end crate, the trigger information is gathered from the pre-trigger boards, consisting of 16-channel, 12-bit FADC and 96-channel DCRB ( [1]) modules via the VXS back-320 plane, to a VXS Trigger Processor (VTP). Each VTP is capable of handling these 500 MBps VXS links from the 16 modules, and then performs real-time crate-level trigger algorithms. The VTP transmits the Stage 1 trigger information through multiple Gigabit transceivers that are combined into a fiber link. The

325
VTP uses a multi-fiber link to increase the aggregate trigger data transfer rate to the global trigger to 10 Gbps.
The trigger data is transmitted on the VXS backplane, and on the multi-fiber link using the Aurora protocol from Xilinx. The front-end VXS modules use Virtex-V devices with Gigabit 330 Transceivers operating at 2.5 Gbps. The VTP collects these serial streams with a Virtex-7 device and works with a Zynq7 processor to manage the network interface and on-board Linux operating system.
The entire Trigger System is synchronous and operates at 335 250 MHz with the Trigger Supervisor managing not only the front-end event readout, but also the distribution of the critical timing clocks, synchronization signals, and the global trigger signals to each front-end readout crate. These signals are distributed to the front-end crates on a separate fiber link, and 340 each crate is synchronized using a unique encoding scheme to guarantee that each front-end crate is synchronous with a fixed latency, independent of the distance between each crate. The overall trigger signal latency is <8 µs, and the CLAS12 experiments require a trigger rate of up to 20 kHz, which can be easily 345 handled since the hardware has an ability to operate with a trigger rate of up to 200 kHz. The following sections describe the main Trigger System hardware components.

Pre-trigger Boards
Two type of boards are used at the pre-trigger level to supply 350 information to the trigger system: FADCs and DCRBs. They are described in detail in the CLAS12 DAQ paper [1].

VTP Board
The VXS Trigger Processor (VTP, see Fig. 9) is a VXS switch card that is used to implement the trigger logic on the 355 front-end crates (Stage 1) and global trigger crate (Stage 3). There are 80 full-duplex serial links each capable of running at up to 8.5 Gbps that can be used for transporting the trigger data. The links are bonded in groups of 4 for a total of 20 channels, which include 16 VXS payload slot interfaces (copper) and 4 360 QSFP interfaces (optical).
Front-end (Stage 1) Crate Processing. The VTP in the frontend crate collects data from the VXS payload FADC and DCRB modules (and optionally from some of the QSFP links), where it aligns the data in time for all links, and presents it to the 365 detector-specific trigger logic, which resides in a XC7V550T FPGA. The trigger logic processes the data and produces an output trigger stream that is sent to the Stage 2 trigger crate (and optionally to other Stage 1 VTP modules) using up to 4 QSFP optical links. The QSFP optical links allow the Stage 1 trigger 370 logic to use information from multiple Stage 1 crates, which is required for some detectors that span multiple VXS crates (e.g. the DC and FT subsystems). The QSFP optical links also allow multiple links to go to Stage 2 when more bandwidth is needed (e.g. HTCC and CTOF).

375
Global Trigger (Stage 3) Crate Processing. In the global trigger crate the VTP collects data from the VXS payload SSP modules. The SSP modules supply a stream of trigger bits for each sector (HTCC, FTOF, EC, PCAL, and DC) and also a stream of trigger bits for the central detectors (CTOF, CND, and 380 FT). These sector and central trigger bit streams have already performed timing, multiplicity, and geometry coincidences between the detectors within the sector or central detectors. The Stage 3 VTP allows the final ("global") trigger bits (up to 32) to be defined using different combinations for sectors, sector 385 trigger bits, and central trigger bits. The 32 global trigger bit decisions are evaluated at 250 MHz so that no additional jitter is introduced by this stage. These bits are sent to the TS using the high-density LVDS front-panel output using a twisted-pair ribbon cable.

390
Event Builder. A Zynq FPGA is used on the VTP to run the standard CLAS12 CODA readout controller (ROC) component, which allows the VTP to be configured and read out the same as other VME/Intel-based CODA components. Event data can be generated by the VTPs that contain the trigger decisions for 395 both the Stage 1 and 3 components, which is used to understand the trigger efficiency. Additionally, there is a large buffer (4 GB with 200 Gbps bandwidth) and 40 Gbps Ethernet interface that is intended for future upgrades of the front-end crate readout system, which would use the VTP and 40 Gbps Ethernet for 400 event readout rather than the VME interface. Fig. 10 shows the interfaces between the FPGAs, memory, network, VXS, and fiber modules.

SSP Board
The SubSystem Processor (SSP, see Fig. 11) is a VXS pay-405 load card used to collect data from multiple front-end (Stage 1) crates. The SSP performs the Stage 2 trigger processing by creating sector and central trigger bit decisions. Up to 16 SSP modules can be housed in a single VXS crate, but only 7 are currently needed: 6 for the sector-based detectors and 1 for the 410 central detectors. There are 36 full-duplex serial links each capable of running at up to 6.5 Gbps that can be used for transporting the trigger data. The l inks are bonded in groups of 4 for a total of 8 channels: 1 VXS switch slot interface (copper) and 8 QSFP interfaces (optical). All VXS and QSFP lanes run  Based on a Xilinx XC5VTX150T FPGA, it is responsible for the making the Stage 2 trigger decision.
Stage 2 Trigger Processing. The Stage 1 optical data arrives at the SSP where it is aligned and processed through various algorithms to make the sector and central trigger bit decisions. There are 8 sector and central trigger bits (expandable to 32) 420 that evaluate at 250 MHz so that no jitter is introduced by this stage. These bits are sent to the Stage 3 VTP using the VXS switch serial interface. As for the VTP, the SSP has an Event Builder that allows for readout of the trigger information and its insertion into the data stream.

High Level Synthesis in CLAS12 Trigger System Development
A significant portion of the Trigger System components were developed using Vivado High-Level Synthesis (HLS) from Xilinx [11]. HLS was introduced to reduce the electronics knowl-430 edge required to design hardware. It also makes the hardware design flow easier when it comes to achieving a certain behavioral model required by the hardware without worrying too much about the electronics underneath.

435
HLS makes it easier to incorporate well-established data processing algorithms, typically written in C++ or other high-level languages, into FPGA-based projects. HLS allows for scientists to be involved in the CLAS12 Trigger System development who do not have an electronics engineering background. It al-440 lows for the involvement of programmers who developed the algorithms for offline data processing but who have limited or no FPGA programming experience. It also makes it possible to validate code with the offline processing framework.

445
HLS was used to develop most of the Stage 1 components of the CLAS12 trigger. These include the following elements: • High Threshold Cherenkov Counter (cluster energy reconstruction); • Forward and Central Time-of-Flight Counters (clustering 450 and timing correction); • Electromagnetic and Pre-shower Calorimeters (cluster energy and position reconstruction).
The Time-of-Flight System and Cherenkov counter trigger implementation was rather straightforward. This typically takes 455 less than 10-15% of the Virtex-7 chip and the timing requirements were easily met.
The calorimeter trigger implementation required much more effort because of its complex nature that requires significant FPGA resources. The details are further explained in the next 460 section using the ECAL as an example.
It should be mentioned that it took a significant amount of time to implement the desired ECAL algorithm, mostly because of the lack of experience in HLS usage. As soon as all important details of the HLS tool were understood, the development 465 process converged, and the trigger components related to various other CLAS12 detectors were implemented in a prompt manner.

CLAS12 Electromagnetic Calorimeters (ECAL)
Among all of the Trigger System elements, the most chal-470 lenging for the FPGA implementation is the trigger component serving the two CLAS12 electromagnetic calorimeters. Due to their structure, these calorimeters do not provide cluster coordinates or energies without significant event reconstruction. The trigger implementation details are described in Section 3.3.1.

475
Below we describe our experience with HLS using the ECAL as an example.

C++ vs. HLS C++
The FPGA implementation of the ECAL trigger was done in a 125 MHz domain, a balance between speed and resource uti-480 lization. The Trigger System components, in general, require a fixed latency, which sets certain constraints on the design. The reconstruction algorithm borrowed from the offline analysis framework was adopted for VIVADO HLS by rewriting it to C++, using HLS streams, HLS pragmas, unrolling for-loops, HLS and VIVADO tools to address various issues related to generating an FPGA image that met the timing requirements 490 and fit within the resource allotment.

HLS and HDL
When HLS is used, compiling the design consists of the following main steps: • VIVADO HLS -convert C++ to Hardware Description 495 Language (HDL); • VIVADO synthesis -HDL to FPGA primitives; • VIVADO implementation -map FPGA primitives to chip and route connections.
For large designs, VIVADO HLS will very often report ex-500 tremely optimistic results that suggest a viable solution, but during VIVADO implementation will fail to meet the timing requirements. To address this the failing paths must be traced back to the HLS component where it can be changed to try to improve the design. It often took many iterations to either find 505 the workable HLS settings, code structure, or clock period adjustment.

HLS Clock Domain
For the different trigger components related to the different CLAS12 detectors, we use different clock domains between 510 250 MHz and 31.25 MHz. In the 250 MHz domain, the modules occupying more than 10% of a XC7V550 Xilinx FPGA failed to meet the timing requirements. In the 125 MHz and lower frequency domains, the FPGA utilization was close to 100%. For the ECAL project with a chip utilization of about 515 70%, the 125 MHz clock was used.
In general, a slower clock speed (31.25 MHz) was preferable for smaller projects where resources were plentiful. When using a slow clock, the HLS code was able to be written as a single module and had no problem meeting the timing require-520 ments during implementation.
Larger projects, such as for the ECAL, require more efficient use of the FPGA resources and have latency requirements that require a faster clock, but cannot be too fast such that the HLS modules cannot reliably meet the timing requirements. The 525 125 MHz clock was found to be the optimal middle ground for the "-1" speed grade Virtex-7 used in the CLAS12 Trigger System.

HLS Project Size and Organization
The typical HLS project for the CLAS12 Trigger System 530 contains only a few routines, and uses HLS streams in the function parameter list to communicate easily with the surrounding HDL. That scheme works well for small projects.
For the ECAL with some versions being close to 100% of FPGA utilization, the situation was quite different. The biggest 535 problem we faced was the inability to meet the timing requirements during the implementation (even when HLS reports that the timing is good). HLS uses state machines to schedule the operations it synthesizes. For large HLS components, the generated state machines can have massive control signal fanouts. 540 As the clock period shrinks, so must the maximum signal fanout for the general control signals for a design to reliably meet the timing requirements. For a clock period of 8 ns using a "-1" speed grade Virtex-7, each HLS module was kept smaller than the 30k look-up tables (LUTs) (<10% of the LUT resources) 545 to achieve a design that consistently meets the timing requirements.
The original ECAL project consisted of about 20 C++ procedures that occupied most of the FPGA resources -with HLS generating big fanouts on this scale, it was impossible to meet 550 the 8 ns timing on the implementation stage. The workaround was to split the entire project into smaller procedures, glued together in HDL by using well defined, simple interfaces between the separate procedures. Still, some procedures were too big, especially for the sorting algorithms. We were able to split 555 some procedures further until finally the entire project met the timing requirements and the resulting FPGA image was loaded into the hardware.
After every significant change, we re-tested the code on simulated data, making sure it still produced correct results. The 560 chart in Fig. 12 shows how many HLS projects were created in the end.
Another reason for subdividing the project is the lack of multi-clock domain support. Since the Event Builder in the VTP board works on a 250 MHz domain and most projects 565 use a slower clock, every project was subdivided and separate pieces communicated over the HDL-written interface. The necessity of subdividing HLS projects and of using HDL to assemble them together, is probably the most restricting feature in HLS usage. Much of this subdividing can be reduced by 570 using the HLS DATAFLOW directives (which can isolate functions communicating through registered FIFO interfaces), but it requires further code restructuring to be compatible with this flow and does not support multiple clock domains. Figure 12: ECAL HLS project chart. The entire project is split into multiple smaller projects to satisfy the timing requirements. In particular, the hit sorting section is split into seven identical projects.

575
As mentioned before, splitting the project into smaller pieces allowed us to meet the timing requirements. This worked, in particular, because we were able to eliminate combinatorial paths between the HLS projects connected by the streams. Such dependences can be clearly seen looking into the schematics for the failed timing chains, and were usually related to the large state machine control signals going between modules. Initially we used HLS version 2015, but we could not eliminate these long combinatorial paths across modules. This was resolved after switching to HLS version 2017, where the streams could be 585 fully registered (with pragma "axis register both port="). This meant that if the registered HLS streams were used between separate HLS projects, then the state machine paths were also registered between modules. With that, it was only a matter of splitting projects into smaller pieces to improve/meet the timing 590 requirements.

HLS Settings
The clock uncertainty is set as 30% of the main clock, which we found forces HLS to produce more realistic timing estimates. A single HLS project often cannot exceed several per-595 cent of the flip-flops (FF) and LUT budget, otherwise it may be a problem to meet the timing requirement on the VIVADO implementation step. A typical HLS project for one of the PCAL trigger elements is shown in Fig. 13.

600
Common settings for VIVADO were used as shown in Fig. 14. It usually takes 3+ hours to compile the PCAL project on a Dell R730 server under RHEL7. For some firmware versions, we were able to utilize 100% of the LUTs and still meet the timing requirements if the clock domain was 125 MHz or lower.

605
The VIVADO project for the PCAL trigger is shown in Fig. 14.

Firmware Validation for HLS-based Projects
The ability to validate the firmware using a C++ implementation is the one of the biggest advantages of HLS. During the course of development and commissioning, we ran HLS C++ 610 code on simulated and beam data from the CLAS12 detectors, implementing the required features and fixing bugs. During data taking we were able to find and fix observed problems or add new features in several hours, which was very important to save beam time.

Conclusions about HLS Usage
The CLAS12 ECAL and other detectors were successfully implemented into the Trigger System using HLS to produce the core part of the firmware. This trigger was used in the first physics run in 2018 and worked as expected. We were able to 620 select events based on individual ECAL cluster energy, something which was possible before only during offline data processing.
HLS in general appears to be a useful tool, especially to implement smaller trigger components like the Cherenkov or 625 Time-of-Flight counters. For components utilizing a significant portion of the FPGA, it will benefit development significantly to improve HLS in the following directions: Figure 13: Typical HLS project for one of the PCAL trigger elements. 4% LUTs is close to the maximum possible to meet the timing requirements of the following steps. With an 8 ns target, the clock uncertainty is set to 3 ns.
• Support multi-clock domains; • Improve subroutine calls by allowing the option to fully 630 register paths between modules; • Improve state machine logic, for example support streams between routines inside the project and be able to generate separate state machines for separate routines. This will allow for the avoidance of splitting the project manually 635 and using HDL as a top interface as we are currently forced to do.

Software
This section describes the various software tools used during the CLAS12 Trigger System development, validation, and 640 operation.

Development Software
Several software packages were used to implement and test the trigger logic developed for CLAS12. These were the FPGA Figure 14: VIVADO project for the PCAL trigger. The strategies used were "Vivado Synthesis Defaults" and "Performance ExplorePostRoutePhysOpt". LUT utilization is about 2/3 of the total balance. It was relatively easy to meet the timing requirements with an FPGA clock 125 MHz or slower. synthesis and implementation packages, and the FPGA high-645 level synthesizer.
For FPGA synthesis and implementation, Xilinx ISE/Planahead and Vivado were used. Most of the frontend boards were developed years ago and use Virtex-5 and Spartan 6 FPGAs, which are not supported by Xilinx Vivado, 650 so we relied on Xilinx ISE/Planahead for synthesis and implementation. Even though these tools are no longer updated by Xilinx, they have proved to be stable, reliable, and deliver consistent results. Newer designs use Xilinx 7-series parts (we used Artix, Kintex, and Virtex), so we used the Vivado tools, 655 which had far better support than ISE/Planahead.
Vivado HLS was used to implement a variety of trigger algorithms, Event Builder logic, and general purpose logic. HLS components were able to be verified with C/C++ test benches on their own without anything more than GNU GCC and asso-660 ciated C/C++ header files from the HLS toolchain. This tool often allowed for faster FPGA implementation, but many components were still implemented in HDL where resource and/or timing requirements became critical. Occasionally simulation of the HDL-generated files from HLS was required to debug 665 C/C++ modules when simulation under GCC found no problem, but the HDL result did. Differences between the C/C++ simulation and the corresponding HDL output were primarily due to: C/C++ assumption of infinite length buffering while HDL buffers were finite, ambiguous C/C++ coding where GCC 670 and HLS behaviors differed, and latency issues due to the C/C++ simulation having no concept of elapsed time. Our experience with HLS is described in detail in Section 5.

Operating Systems
Linux operating systems are used on all readout controllers in 675 the CLAS12 DAQ. The VME crate controllers use Intel-based CPUs and run a standard Centos and Linux kernel distribution. The VTP modules use an ARMv7 CPU with custom hardware and run Arch Linux using a custom Linux kernel. Both CPUs boot from the network using a shared kernel and root filesystem 680 images, which simplifies administration. shown for example in (Fig. 15). Every line in these configuration files contains a key word and the number of corresponding parameter values. The directive "include" can be used to create a hierarchical set of configuration files. Normally the main configuration file is selected during the run startup procedure, 710 and the CLAS12 run control software resolves all "include" directives, resulting in the creation of one big configuration file. That file is used to program all trigger hardware registers, and its content is also written to the data stream for bookkeeping purposes. The register contents are read back and the results 715 are recorded into the data stream as well, providing full control of the Trigger System settings. Normally the same configuration files contain the DAQ settings as well, making it a complete source for the entire DAQ/Trigger System settings.

720
One of the important aspects in setting up the trigger is the measurement of the relative timing between the signals from the detector elements used to define a trigger. These are referred to as delay curves. For this purpose software procedures were developed. This includes special trigger configuration files and 725 software tools to scan individual subsystem latencies, record sets of beam current normalized scalers, and produce the corresponding delay plots (see Fig. 16). The trigger time setting for a specific detector element was kept at a constant value, which Figure 15: Configuration file example. Flat text configuration files were used for all DAQ and trigger components. Download and Upload procedures were implemented, with uploaded settings being stored in data files and databases. The format used allows personnel with appropriate training to understand and modify the system settings using text editors. Version control is enforced using Github.
determined the DAQ readout time window width and offset, 730 while the timing for the other subsystems were changed step by step to monitor the delay curve. Delays and coincidence widths were adjusted to account for known jitter sources to ensure no events were lost due to poor timing alignment. This procedure was repeated every time the trigger logic was changed.

Gain Calibration and Threshold Settings
One of the important settings in the Trigger System is for the FADCs. As stated above, the FADC boards serve as the pre-trigger for most of the Stage 1 trigger components (except the Drift Chambers), and correct pedestal and gain calibrations 740 for these units are critical for correct Trigger System performance. Pedestal and gain measurements are conducted before run startup, and the values are loaded using configuration files. All of the thresholds in the configuration files are set using physical units such as MeV for the calorimeter energy and the 745 number of photoelectrons for the Cherenkov counter.

Readout and Control Software
All trigger hardware modules were implemented in VXS format and installed into VXS crates along with the other DAQ electronics. All readout and control libraries were developed as 750 part of the DAQ software project as described in Ref. [1]. From the software point of view, the DAQ and Trigger Systems can be considered as one system equipped with standard software tools.

755
Several simulation tools were used in the development of the CLAS12 Trigger System. These tools were very useful during the development and implementation stages, and are still in use now for continued validation. The primary simulation tools for the CLAS12 Trigger System are detailed below. Figure 16: CTOF×FT Delay Scan. In this scan of CTOF×FT coincidence rate vs. CTOF trigger delay, the FT trigger time was fixed as having the smallest jitter, and the CTOF trigger delay was changed. Similar delay scans were measured for all detectors participating in the trigger logic, to make sure all trigger components were in time.

GEMC/Geant4 Simulation Tool
In the beginning of the CLAS12 Trigger System development, the hardware was not available, so input data were generated by the CLAS12 Geant4 Monte-Carlo package (GEMC) [10]. This package was used to produce data files with 765 data banks in the same form as produced by the DAQ. The Trigger System software includes a playback package that is able to read GEMC-generated files and produce FADC and DCRB responses identical to those from the hardware. All Stage 1 trigger components implemented with HLS were developed us-770 ing simulated data, including for the most complex responses in the calorimeter and Drift Chambers. For VHDL-written components, simulated data were used as well along with other specialized tools described below.
For example, Fig. 17 and Fig. 18 show a comparison of the 775 energy and coordinates of the EC clusters reconstructed by the offline analysis software and by the Trigger System. Different FPGA components were used to perform the "divide" operation in the Trigger System, including one based on digital signal processing (DSP) and another based on using look-up ta-780 bles. The first method provides better results but requires more resources than the second method. Similar comparisons were used to guide many decisions during system development.
Another example (see Fig. 19) shows the absolute EC cluster energy obtained by offline analysis and by the Trigger System.

785
As can be seen, the Trigger System provides the same result as predicted by simulation and obtained from offline analysis.

FPGA Simulation Tools
A cycle accurate simulator was setup to model, test, and debug the full Trigger System using Aldec Riviera. This tool 790 is able to perform simulations of the full Trigger System in a mixed language environment (VHDL, Verilog, C/C++) for all FPGA components used in CLAS12 (Xilinx series 5, 6, and 7 Figure 17: EC cluster finding: difference between results from the offline reconstruction and Trigger System (using dividing in coordinate calculation). Ideally it must be a delta function, but in reality we see a difference between the offline reconstruction and the trigger decision for the cluster energy. Figure 18: EC cluster finding: difference between results from the offline reconstruction and Trigger System (using a look-up table in coordinate calculation). In this case we see a difference not only in the cluster energy, but in the cluster coordinates as well. This means that the look-up table method should not be used for the cluster coordinate definition in the trigger.   ger System (DCRB, FADC, VTP, SSP). HDL wrappers were created to model the VXS crates that include: backplane, fiber interconnect, trigger distribution, clock distribution, configuration, and readout. Table 5 summarizes the trigger simulation components.

815
Each of the front-end crates uses a VTP trigger module that runs the detector-specific trigger algorithm. The front-end VTP modules feed the trigger data to the global trigger (GT) crate Stage 2 (SSP). The SSP modules feed the trigger data into the final trigger Stage 3 (GTP). There are 10 different VTP firmware 820 types to support the Stage 1 (front-end) and Stage 3 (GTP) algorithm. There are 2 different SSP firmware types that support the Stage 2 CLAS12 Forward and Central Detector trigger logic.
This full simulation is primarily run for two scenarios. The first is whenever a significant firmware change is made. In this 825 case a small number of specially selected events (about 2k) can be fed through to tag the trigger decisions on each. The second is whenever the DAQ system records events where the trigger failed to properly tag them (typically during a random trigger run to assess the efficiency). For both cases the failed events can 830 be loaded into the simulation and the failed decisions can be explored in detail to determine the cause. The first scenario takes one day or more depending on the number of events needed to check, while the second case can take minutes, since only failed events are presented to the simulation so problems can 835 immediately begin to be examined.

Trigger System Firmware Development
After generic Trigger System design was complete and the hardware components entered their production stage, the firmware development began. Both HLS and VHDL tools were 840 used. Work was performed using data samples generated by the CLAS12 GEMC/Geant4 package and cosmic data after the hardware components were installed. This section describes our procedures. Additional development and validation with beam are described in Section 10. 845

Preparation of Simulated Data Sets
Simulated data sets for Trigger System development were prepared using GEMC (the Geant4-based CLAS12 simulation package [10]). GEMC has a fully realistic CLAS12 geometry description and complete maps of the magnetic field, and 850 produces digitized results suitable to be converted into the pretrigger data format.
Various data sets were generated depending on what was needed for the development of particular trigger components. For example, fixed-energy, single electron sets were produced 855 for the initial development of the EC and PCAL components. For these data sets, all detectors positioned upstream of the EC or PCAL were disabled to make sure the single electron directly hit the EC or PCAL. In this way the cluster-finding algorithm could be developed and tested in ideal conditions. After that, re-860 alistic data samples were produced and the algorithm was tested again.
Another data set was used to create the road dictionary for the Drift Chamber-based trigger component. For this purpose, positively and negatively tracks were generated uniformly in a 865 selected momentum, θ, and φ range, and tracked through the CLAS12 detector to determine the list of DC wires associated with the particle trajectory. More details can be found in Section 10.4.

870
The trigger development process consisted of several methods that depended on the nature of the trigger component. Most Stage 1 components were implemented using the HLS/VIVADO tool, where the firmware was written using an HLS C++ extension. In that case, it was possible to develop 875 and validate the firmware as part of the offline reconstruction framework using a standard desktop computer. Usually the offline processing algorithms were re-written using HLS/C++, with appropriate simplification and structural changes to make it suitable for the FPGA firmware. Simulated data were used as 880 input, which were processed directly by the HLS/C++ code and compared with the initial simulation parameters. In addition, the same samples were processed by the offline reconstruction software and the results were compared with the trigger output. This double-check method practically guarantees bug-free 885 implementation. There was no single case when the C++ implementation passed tests on the simulated data and then failed during the final validation stage. The most complicated Stage 1 components were developed and tested using this method.
Several components of the trigger were written mostly in 890 VHDL and initially no software existed for feeding GEMC data into the HDL simulations. This was the case for the Stage 1 FT-Cal+FT-Hodo and DC trigger, as well as the Stage 2 and Stage 3 components of the trigger. These modules relied on standard VHDL test benches to feed/generate test vectors for evaluating 895 the correctness of the design modules. For example, the FT test bench generated clusters at each position of the calorimeter and hodoscope to test the channel mapping and geometry matching. Additional specific test cases verified the FT-Cal+FT-Hodo trigger clustering time coincidence, cluster multiplicity, and la-900 tency to ensure it operated as expected. C/C++ modules were written that emulated the FT-Cal+FT-Hodo and the DC trigger so the algorithms could work in the same offline framework as described above for the other Stage 1 components.

905
When the hardware components for the CLAS12 detector were constructed and mostly installed, and the first version of the firmware was ready for testing, all three Trigger System firmware stages were loaded and development continued for the entire Trigger System using cosmic data. At that point we 910 started to perform Trigger System validation for some components, while development was continued for others, as described in the following sections.

Alternative "Hit-Based" Trigger System
The CLAS12 detector inherited some components from the 915 original CLAS detector (see Ref. [12]), in particular its Trigger System. That system was fed by TDC/discriminator boards and was able to produce "hit-based" information only (i.e. based only on the list of channels above threshold The efficiency and spatial uniformity of the cluster finding trigger in the EC/PCAL described in Section 3.3.1 requires already calibrated calorimeters with pre-determined PMT gain 935 and light attenuation constants loaded into the VTP/FPGA trigger firmware. Calibration runs using a special-purpose MIP trigger were used to obtain these constants. For that purpose a so-called "pixel trigger" was developed and loaded into the Stage 1 firmware along with the main trigger, so it was possible to calibrate the system using a pixel trigger and then switch to the main one for data taking. This "pixel trigger" used a simple multiplicity condition on 1D cluster size for each U,V,W view to reject undesirable muon trajectories and select normally incident tracks. This reduced the trigger and data rate by 95% and 945 ensured the same MIP energy was deposited for all possible triple intersections of single strips. The pixel trigger pipeline executes these steps in parallel, with user configurable parameters in bold: 1) If FADC hit energy > EMIN, make a pulse HITWIDTH*4 ns for that strip. 2) Look for coincidences of U,V,W pixel strip candidates from step 1. 3) Evaluate multiplicity EVALDELAY*4 ns clock cycles after the leading edge of a candidate pixel from step 2. 4) Generate a pixel trigger if the multiplicity requirement is met and we still have a hit on U, V, and W.

955
Additional configurable trigger elements were introduced, including a total energy sum threshold ESUM and a look-up table for triplets of strips that satisfy the geometrical constraint dU + dV + dW = DALITZ, where d is the normalized distance to the hit strip indicated by the arrows in Fig. 20 and 960 DALITZ = 2 for perfect pixel events. The latter test was sometimes necessary to prevent noisy PMTs from saturating the multiplicity (N=3) trigger condition. Offline analysis showed that about 90% of pixel triggers satisfied the Dalitz test (see Fig. 21), while adjacent calorimeter elements that did not use the trigger 965 had a much smaller pixel fraction. This suggests the pixel trigger helps to suppress events that undergo multiple scattering, which would trigger adjacent strips and violate the multiplicity requirement. Trigger System with Cosmic Data Early tests of the Drift Chamber tracking trigger were done using cosmic events triggered from the ECAL. A small fraction of events had tracks near the target location where the road dictionary was defined, but within a day enough statistics could be 975 Figure 21: Offline analysis of events that satisfied the pixel trigger in the EC inner calorimeter. The left plot shows that 89% of the EC inner triggers satisfied the pixel test dU + dV + dW = 2. The right plot shows that only 14% of the EC inner triggers found an EC outer event that satisfied both the N = 3 and pixel test.
collected to do checks that the tracking trigger was functioning. Offline reconstruction of events with reconstructed tracks was checked to see if the tracking trigger fired as shown in Fig. 22. Any events with tracks that passed through the target, but failed to be tagged by the tracking trigger were run through simula-980 tions to identify the reason. The tests clearly showed very loose acceptance and motivated tighter kinematic constraints on the dictionary generation that were eventually done when studies with beam were later performed. Figure 22: DC negatively charged cosmic tracks rejected (left) and accepted (right) by the tracking trigger in plots of reconstructed momentum vs. reconstructed z-vertex position. Any track rejected above 1 GeV with z-vertex within ±20 cm would indicate a trigger failure, but the tests showed 100% efficiency.

with Cosmic Data
While the Stage 1 trigger components were validated separately from each other during the development stage, the Stage 2 and Stage 3 components required the entire system to be assembled to perform validation. Initially those two stages were 990 programmed with simplified algorithms to test signal propagation and basic Trigger System functionality, and only the timing coincidence between different detectors was implemented. The development of Stage 2 and Stage 3 continued during cosmic run operations and later with beam operations, adding geomet-995 rical matches between different detectors and increasing the coincidence logic complexity.

Trigger System Flexibility and "Permanent Development" Mode
The initial plan was to develop the Trigger System firmware to satisfy all CLAS12 experiments for the entire duration of CLAS12 operation, meaning high (close to 100%) trigger efficiency and reasonable purity. As the power and flexibility of the Trigger System was revealed to the community, additional requirements were requested to improve the system purity, and reached the point when relatively small improvements in the trigger purity could only be achieved with significant effort. At that point the development was declared complete. The nature of the FPGA-based Trigger System allows almost endless improvements, but such a "permanent development" mode is not 1015 practical.

Electron Trigger
The electron trigger is designed to select inclusive electron scattering from the CLAS12 targets: e(p, n, A) → e X. (1) The trigger selects events with at least one scattered electron detected by the forward detectors. The High Threshold The HTCC discriminates electrons from other charged particles. This detector must be calibrated in terms of the number of photoelectrons before the start of any experiment. The HTCC trigger logic searches for clusters and calculates the total num-1030 ber of photoelectrons detected by the HTCC. The cluster may include up to four PMT signals that collect the Cherenkov light from the adjacent mirrors as described in Section 3.3.2. The minimum number of photoelectrons in the cluster is one of the main electron trigger parameters. Usually this threshold is set to 1-2 photoelectrons depending on the experiment requirements.
The PCAL and EC calorimeters are designed to detect photons and electrons as described in Section 3.3.1. A high energy deposition in the calorimeters is a signature of electron detection, and is one of the electron trigger parameters. The PCAL 1040 and EC detectors must be calibrated before the start of any experiment in terms of energy deposition measured in MeV. The electron trigger uses cuts on the cluster energy in the PCAL (E PCAL ) and EC (E EC ) separately, and cuts on the total energy deposition in both detectors E T otal = E PCAL + E EC . These cuts depend on the beam energy and the experiment requirements, and usually lie in the range from 150-300 MeV (corresponding to a minimum electron energy from 600-1200 MeV when accounting for the sampling fraction of the ECAL [2]) for the energy sum E T otal .

1050
Geometrical matching between the HTCC signal and the position of the shower in the PCAL calorimeter helps to suppress random coincidences between the two detectors. The trigger firmware uses an HTCC-PCAL look-up table to make a proper event selection.

1055
The track reconstruction in the DC system at the trigger level is very useful for the further suppression of accidental background, as described in Section 3.3.3. The trigger decision requires at least 3 layers in every superlayer and at least 5 superlayers in every road, which is a standard setting for all trig-1060 gers where the DC-based component is used. The geometrical matching between track candidates and hits in the PCAL and EC detectors is used to strengthen the trigger performance in terms of event purity. The electron trigger configuration may be represented by the formula: where index i is the CLAS12 sector number and N phe is the 1065 number of photoelectrons detected by the HTCC in a defined cluster. N HTCC min , E PCAL min , E T otal min are trigger parameters, and DC means that a track was reconstructed by the DC-system. The space correlations between all detectors and coordinates of the track are implemented as well. As an example, the event dis-1070 play with a 4.5 GeV electron selected by the trigger is shown in Fig. 23.

Photoproduction Trigger
The photoproduction trigger is designed to select events where a scattered electron is detected by the Forward Tagger in the polar angular range from 2 • to 5 • . Strictly speaking it is not a photoproduction process but electron scattering with low four-momentum transfer Q 2 = 4E beam E sin 2 θ/2. The trigger logic continuously searches for clusters in the FT calorimeter (FT-Cal) from an electromagnetic shower, and calculates the 1080 shower energy and space coordinates. The cluster energy is the sum of all crystal energies within a 3×3 spatial array that meet the time-matching constraints. Once the clustering algorithm has identified a cluster, the corresponding data is reported to the next trigger stage. This includes the time stamp, the energy, and the spatial coordinates (center of the seed crystal). The cluster energy is not corrected for shower leakage effects at this stage. Finally, the trigger processor makes the trigger decision by applying further cuts to the clusters.
The trigger selection is based on lower and upper energy limits and the number of hits in the cluster. The trigger may also select events with a specified number of clusters detected by the calorimeter. The coincidence with the two-plane scintillating hodoscope FT-Hodo, located in front of the calorimeter, serves to discriminate charged particles from high-energy photons. The geometry matching between FT cluster and FT-Hodo hit helps to suppress background coming from photons. The trigger logics also provides the possibility to select reactions with an electron and several photons in the final state, for example ep → e γγX.
The display of two events with one and two clusters in the FT- Cal, selected by the FT trigger is shown in Fig. 24. The Trigger System may use the information from the CLAS12 Forward and Central Detectors to select events with several charged or neutral particles in coincidence with the electron in the FT-Cal. The trigger detector composition depends on 1095 the reaction under study.
Charged particles in the forward detectors are selected by a coincidence between the FTOF, PCAL, and EC with tracks reconstructed by the DC system. Space correlations between all trigger detectors are required, including coordinates of tracks 1100 crossing the detector planes. Hit matching along the track is an important part of the background reduction at the trigger level. The cuts on the energy depositions in the trigger detectors are used to select charged and neutral particles.
The trigger configuration The central detectors, such as Central Time-of-Flight (CTOF) and Central Neutral Detector (CND), were used for the selection of the events with at least one particle detected in the Central Detector. The trigger configuration min ) was used for the selection of events with an electron in the FT, at least one charged particle going in the forward direction, and at least one particle detected in the central detectors.
Here h +/− C stands for the charged particle in the Central Detector.
The CND detector could be added to the coincidence chain with a space correlation between the CTOF and CND counters in case the trigger rate is too high As stated above, the minimum energy depositions in all detectors in the trigger are parameters that depend on the individual 1120 experiment requirements.
Two decay modes are useful for the selection of the J/ψ meson: J/ψ → e + e − and J/ψ → µ + µ − . The conventional electron and photoproduction triggers select the J/ψ-meson in case of its decay to an electron-positron pair. However, these trigger configurations do not work with muons in the final state. Therefore, another trigger was added to select one more decay mode for this experiment. The CLAS12 spectrometer has no dedicated muon system, but it turns out that the selection of particles with energy deposition in the PCAL-EC calorimeters close to the minimum-ionizing value is sufficient to suppress the background from pions when the invariant mass of the two particles (muons or pions) is near the J/ψ-mass. The muons from the J/ψ decay appear in opposite CLAS12 sectors, which allowed for the trigger configuration: The energy units are in MeV. Note, that there is no requirement to search for the scattered electron at all. This gives an order of magnitude advantage in the virtual photon flux in com-1125 parison with the case when the electron is detected in the FT calorimeter. The event display with two particles with opposite charges and in the opposite sectors, selected by the muon trigger, is shown in Fig. 25.

1130
When beam operations started, the Trigger System validation was completed as part of the entire CLAS12 detector commissioning. This section describes the trigger validation procedures.

1135
The ultimate validation of the trigger is done using the socalled "Random Trigger" (RT) runs. RT runs are special runs where the event readout is initiated not by the trigger logic, but by an external random generator that can be tuned to the desired frequency. Most of the events in the RT runs do not contain any 1140 tracks, however, a small fraction of the events will have real particles that were reconstructed because the particles accidentally fell in the readout window that was initiated by the random generator. In the event readout, in addition to various data banks from the DAQ system, the trigger decisions are stored as well 1145 (see Section 3.6). These accidental "good" events are used to check whether the desired trigger bit in the Stage 3 32-bit trigger mask was set by the trigger logic. In case it is not set, information from the Stage 1 and Stage 2 trigger is available to analyze the possible 1150 reasons for the inefficiency. The technique of the trigger validation is as follows. The trigger logic is configured exactly as it will be set in an experiment, but it runs in "tagging mode", reporting trigger decisions into the data stream for every randomly generated event. After 1155 several hours of running we collect at least 100 million events.
After the data is processed and the events are reconstructed, we select a subset of events with the correct trigger time. This is done using FADC spectra for the detectors participating in the trigger logic. We need to select events with FADC pulse times 1160 similar to those in the data obtained using the regular trigger. Figure 27 (a) shows typical FADC pulse arrival timing for the regular beam (not random) trigger data. Reconstructing and analyzing the data obtained using a random trigger, we select events with a signal in the middle of the FADC window to make sure we do not have boundary effects when the signal region is selected. Based on the typical pulse shape, we ignore areas with hit times below 50 ns and above 150 ns (see Fig. 27 (b)).
(a) (b) Figure 27: HTCC FADC pulse arrival times: a) physics triggers, b) random trigger. Plot a) was used to select "good" events from the Random Trigger runs. For such events, the FADC timing has to be at least 50 ns from both timing window edges to avoid boundary effects.
We typically find several thousand events that accidently fall into the correct trigger window. These events can be used to 1170 obtain the trigger efficiency and purity assuming that our offline reconstruction software works correctly. It should be mentioned that correct working of the offline reconstruction is an important pre-requisite for complete trigger validation.

1175
As a reminder, the electron trigger logic uses responses from the PCAL, EC, HTCC, and DC (see Eq. 2), and as was described in Section 10.1, for trigger validation we have used Random Trigger data. The first step in the validation of an electron trigger is a selection of events with a "clean electron". The 1180 CLAS12 offline reconstruction software assigns a particle identification (PID) to each reconstructed particle [13] (for electrons PID=11), however in these studies, we imposed additional cuts. In particular • DC roads are optimized for tracks originating from the tar-1185 get, which is why in the offline analysis we put a cut on the vertex z coordinate to make sure the track originates from the target; • Selected events where the electron hits the calorimeters in the fiducial region, to make sure the shower energy is fully 1190 reconstructed; • Applied trigger condition cuts on the offline cluster energies in the PCAL and EC, and also on number of photoelectrons in the HTCC.
After applying the above-mentioned cuts for each of these elec-1195 trons, we checked whether the electron trigger bit was set for the corresponding sector. At the end, the trigger efficiency is defined as the number of "Bit Set" events over the number of all events with a "clean" electron. The CLAS12 experiments required a trigger efficiency close to 100% for electrons above 1200 2 GeV. Since both the PCAL and EC are sampling calorimeters, 2 GeV electrons will deposit only part (in average about 25% in our case) of their total energy. Because of shower and light fluctuations, some 2 GeV electrons will have less than 25% of their energy reconstructed in the calorimeters. Based on this, 1205 we required the energy threshold in the trigger to be more than 300 MeV, which guarantees that more than 99% of 2 GeV electrons will deposit energy above the threshold. (b) Figure 28: (a) Momentum distribution of "good electrons". The brown distribution represents all "good electrons", the blue histogram represents all events where the electron trigger bit was not set, the black histogram represents events, that do not have a EC×PCAL trigger bit, and the red histogram represents events that missed the electron trigger bit. (b) Distribution of the number of photoelectrons for events where the electron has more than 2 GeV energy and missed the HTCC trigger bit. Figure 28a shows the momentum distributions of all "good" electrons (in brown), electrons when the electron trigger bit was 1210 not set (in blue), when the EC×PCAL bit was not set (in black), and events when the HTCC bit was not set (in red). Above 2 GeV most events have only the HTCC bit missing. Figure 28b shows the distribution of the number of photoelectrons for the events that have no HTCC trigger bit. About 90% of these 1215 events are at the threshold region (a 2 photoelectron threshold was employed). The Trigger System has a different precision of gains and pedestals values from the offline reconstruction. In particular, in the trigger the pedestal value is constant for all events in the run, but in the offline reconstruction the pedestal is 1220 calculated event by event using samples of the FADC readout data before the signal region. This will create such threshold related effects. The final trigger efficiency is shown in Fig. 29, which shows that the trigger efficiency is above 99.5% for electrons with momentum above 2 GeV.

Validation of the Photoproduction Trigger
As described in Section 9.2, the CLAS12 photoproduction trigger requires a coincidence between one electron measured in the Forward Tagger (FT) detector and two hadrons measured in the CLAS12 detector in either the forward or central part.

1230
The validation procedure aims to verify if, for a given event foreseeing one final-state electron in the FT acceptance and two or more hadrons within the CLAS12 acceptance, whether the Trigger System would recognize it properly, resulting in event readout. In order to validate the system with beam during commissioning, the following strategy was adopted. First, the electron detection by the FT was validated using Random Trigger runs. After this, the detection of single hadrons in CLAS12 was studied in special runs, where the only trigger source was the FT. Finally, the coincidence between the two systems was 1240 assessed. Full details are described below.

Validation of Electron Detection in the FT
A scattered electron in the FT is identified as an electromagnetic shower in the FT Calorimeter (FT-Cal) within a defined energy range, in time coincidence and geometrically matched 1245 to a hit in both layers of the FT Hodoscope (FT-Hodo). The map providing the matching between the cluster seed position in the FT-Cal and the tile position in the FT-Hodo was first derived from the nominal detector geometry, and then confirmed by Monte Carlo simulations.

1250
The identification of the scattered electron in the FT was validated through a similar procedure as the one adopted for the CLAS12 electron trigger discussed in Section 10.2 based on Random Trigger runs. The recorded events were processed through the standard CLAS12 reconstruction software and fil-1255 tered, keeping only those with a reconstructed electron in the FT system. Since event readout was triggered by a random pulser, events with the reconstructed electron signal close to the margins of the readout window were also rejected. For these events, the electromagnetic clusters found by the reconstruction 1260 software ("offline" clusters) were compared to those reported by the Trigger System and stored in the trigger data banks.
The efficiency of the FT-Cal clustering algorithm in the Trigger System was evaluated by comparing all "offline" clusters to those matched -in space and time -to the "online" clusters 3 . The efficiency was computed as: where N all and N trigger are, respectively, the total number of "offline" clusters and the number of "offline" clusters matched to an "online" cluster. The result is shown in Fig. 30, reporting the 1265 FT trigger efficiency for electromagnetic clusters as a function of the corresponding corrected energy. The efficiency is higher than 99.5% over the full energy range of interest. This small inefficiency is mainly due to the fact that the clustering algorithm in the Trigger System works on a 3×3 matrix of crystals,  the U-strips of the Pre-shower Calorimeter [2] associated with a cluster with energy larger than a programmable threshold. The map providing the geometrical matching between the FTOF 1290 counter and the PCAL U-strip was first derived from the nominal detector geometry, and then confirmed by Monte Carlo simulations. To reduce the rate of random coincidences, the Trigger System also requires the presence of a segment in 5 out of 6 Drift Chamber superlayers in a given CLAS12 sector. The 1295 charged hadron identification algorithm was validated in special data-taking runs in which the Forward Tagger was the only enabled event readout source. In these runs, the Trigger System was configured to report in the output trigger bank the presence of a charged hadron in any CLAS12 sector, as defined in Sec-1300 tion 3.6.
The recorded events were processed through the standard reconstruction software and filtered, keeping only those with a well reconstructed forward charged track measured in CLAS12. The track was required to be within the nominal acceptance of 1305 the CLAS12 PCAL, and a momentum threshold of 300 MeV was applied. The Trigger System efficiency was evaluated by comparing all reconstructed tracks to those recognized by the Trigger System. During commissioning, the efficiency was evaluated as a function of different observables, such as the energy deposited in the FTOF counters and in the PCAL, and the topology of the geometrical matching window. The trigger parameters were individually tuned to maximize the trigger efficiency. In the final configuration, an energy threshold of 2 MeV and 10 MeV for the FTOF counters and PCAL clusters, respectively, was selected. The result is reported in Fig. 32, showing the CLAS12 Forward Detector trigger efficiency for charged hadrons as a function of the track momentum. The efficiency is larger than 99% in the full momentum range, with the inefficiency dominated by threshold effects for the PCAL clusters. coincidence with other detectors, such as the CND ([8]), was implemented. As for the CLAS12 Forward Detector case, the CLAS12 Central Detector charged hadron identification algorithm was validated in special data-taking runs in which the FT was the only enabled event readout source.

1330
The recorded events were processed through the standard reconstruction software and filtered, keeping only those with a well reconstructed charged track measured in the CLAS12 Central Detector. The track was required to be associated with a CTOF hit, with energy deposition larger than the correspond-1335 ing threshold of 1 MeV. The result is reported in Fig. 33, showing the CLAS12 Central Detector trigger efficiency for charged hadrons as a function of the track momentum. At low momenta (∼1 GeV), where most of the tracks are concentrated, the trigger efficiency is ∼99%.

Full Trigger Efficiency
As described in Section 9.2, the two main CLAS12 trigger configurations for photoproduction measurements are: (i) the coincidence between one charged cluster in the FT-Cal with two tracks in different CLAS12 Forward Detector sectors and (ii) the coincidence between one charged cluster in the FT-Cal, one track in any CLAS12 Forward Detector sector, and one track in the CLAS12 Central Detector. The corresponding trigger efficiency was evaluated using data from special data-taking runs in which the FT was the only enabled event readout source.

1350
The recorded events were processed through the standard reconstruction software and filtered, keeping only those with two forward tracks or one forward track and one central track. The efficiency was evaluated as the ratio between the number of events with the trigger bit set and the total number of events.

1355
The obtained efficiency is ∼97.5% for both configurations.

Drift Chamber-Based Trigger Components and Data-Based Dictionary
The road dictionary for the DCs used within the Trigger System was initially generated using a fast Monte Carlo ap-1360 proach, where positively and negatively charged particles in a selected momentum and angular range were randomly generated, tracked in the CLAS12 magnetic field using the CLAS12 "swimmer" developed for the offline reconstruction based on a 4th-order Runge-Kutta approach [13], and projected onto the 1365 DC wire planes to determine the hit position and therefore the DC wire numbers. This method has intrinsic limitation because of the approximation done in tracking the particle through the detector that does not include energy loss, multiple scattering, or other effects due to the particle interactions with the detector 1370 material.
To overcome these limitations, roads were also generated from full Geant4 simulations of the CLAS12 detector based on the GEMC package as described in Section 8.1. This provides an accurate description of the relevant materials the particles 1375 travel through, resulting in a more accurate road dictionary at the expense of a significantly higher computing time to generate the same size dictionary.
The effectiveness of these two approaches was tested by using real tracks from beam data to evaluate the completeness of the dictionaries, i.e. the fraction of tracks for which a matching road is found. This study indicated that very large statistics is needed in the dictionary-making to populate specific regions of the phase space.
As a third alternative approach, dictionaries were also pro-1385 duced from real tracks from beam data: in this case dictionaries with very large statistics can be produced in limited computing time with the advantage of the best accuracy in accounting both for particle interactions in matter and for the actual detector geometry. These were the dictionaries that were used in the final 1390 trigger implementation.

Final Validation Before Experiment Start-up
Even though the Trigger System components were validated when the CLAS12 detector was commissioned, we still have to execute our validation processes for the entire system at the 1395 beginning of every experiment. This is necessary because different experiments request configuration changes in the Trigger System that take advantage of its flexibility. Also, we apply firmware changes occasionally to improve the Trigger System components based on our previous experience, and then the 1400 changes have to be validated. The final Trigger System validation is performed by taking beam data with a random trigger (see Section 10.1).
The final trigger validation procedure was executed several times during the first year of CLAS12 experiments and has 1405 proven to be very useful: bugs in the trigger firmware were found and fixed, and the trigger configuration parameters were optimized. On one occasion a firmware bug was introduced into the PCAL Stage 1 trigger logic during the firmware update that was expected to be small and simple. The final valida-1410 tion procedure revealed an irregularity in the spatial distribution of clusters (see Fig. 34) (it also shows one CLAS12 sector is missing but this was another problem unrelated to the Trigger System). Since the PCAL Stage 1 trigger firmware is implemented in C++/HLS, the Geant4 data sample was reprocessed 1415 through the C++ firmware implementation (see Fig. 35), and the problem was confirmed and subsequently found and fixed. The firmware was recompiled and reloaded, and the final trigger validation was repeated showing that the problem was fixed. It took only several hours between finding the problem and being 1420 ready to run again. Every experiment in CLAS12 begins with a complete Trigger System validation. Figure 34: PCAL trigger bug in beam data. The red crosses inside the blue areas correspond to trigger inefficiencies. This was discovered during beam data processing. Figure 35: PCAL trigger bug in Geant4 simulation. The blue lines correspond to trigger inefficiencies. This is visible much better in simulation than in beam data, and points to an exact problem.

Performance and Reliability
The CLAS12 Trigger System operates as a free-running, pipeline-style system run from a global 250 MHz clock. It provides 32 global trigger decisions based on >10 different subdetectors (>28k channels), which allows for multiple experiments to acquire data simultaneously. The trigger efficiencies have in general been measured to be about 99%, indicating a reliable and efficient trigger implementation. The trigger pu-1430 rity has been measured to be about 55% for electrons (negatives inbending torus polarity configuration), taking advantage of energy-corrected clustering in the ECAL along with Drift Chamber track matching. It is possible to improve this purity by reducing the timing coincidence windows, jitter, and cell size of 1435 the Drift Chamber tracking dictionary. This can be checked by re-analyzing existing event data since it contains the raw waveform data. Not all physics triggers utilize the tracking trigger due to an incomplete road dictionary (e.g. neutral particle decays into charged particles with a detached vertex). Further 1440 work to expand the dictionary roads can be investigated to further increase the selection purity.

Conclusions
The work on the CLAS12 Trigger System started in 2008. The system was designed and implemented from 2008 until 1445 2017 and has been successfully used during the development, testing, and commissioning phases of all CLAS12 detectors. In December 2017, the CLAS12 Trigger System was ready for the first beam experiment. During the first year of operation of CLAS12, the Trigger System was improved to take advantage 1450 of its flexible and powerful design, to account for the performance of the various components, and to add new features to increase system efficiency and, most of all, purity. By the end of 2018, the system was in full operation mode, allowing accumulation of data with the portion of "good" events on the level 1455 of higher than 50%. The achieved performance of the CLAS12 Trigger System allows use without significant changes for the entire CLAS12 physics program.

Acknowledgments
We are grateful to the administrative, engineering, and tech-1460 nical staff of JLab for constant support. The CLAS12 Trigger System development was conducted in close cooperation with all CLAS12 detector groups, who contributed to the system design. In particular, we appreciate hard work of the JLab Fast Electronics Group and the JLab CODA Group personnel. Our 1465 special thanks to CLAS12 project leaders Volker Burkert and Latifa Elouadrhiri for leadership and setting goals. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Nuclear Physics under contract DE-AC05-06OR23177.