All Hardware Types Interview Questions

Digital Design/RTL

Topics

Overview

RTL Design involves designing and integrating digital logic modules for chips and IP blocks.

Successful interviews require both technical knowledge and strong behavioral preparation.

Behavioral Preparation

Critical for success. Be ready to discuss past projects with specific, measurable details.

"Improved circuit speed by 25%" is far more compelling than "worked hard on the project."

Prepare to explain:
  • • Architecture decisions and tradeoffs
  • • Performance improvements and optimization strategies
  • • Debugging approaches and problem-solving
  • • Design verification methods

Verilog/System Verilog

  • Undoubtedly, Verilog and SystemVerilog are essential languages to master. You don't need to be an expert, but you must understand common syntax blocking/non-blocking, always, initial, the three types of fork-join, synthesizable statements, and frequently tested modules like sequence detectors and edge detectors. More challenging ones include FIFO and round robin. System Verilog focuses on OOP aspects; I wasn't asked about it much this time, but it's frequently tested.
  • Be able to discuss and quickly implement the following structures:
    • • Simple Counter
    • • State Machine
    • • FIFOs (synchronous and asynchronous)
      • • You should know the minimum depth of asynchronous FIFOs.
      • • Also, you need a reasonable test plan for FIFOs.
      • • Full/empty detection for circular FIFOs (pointer comparison vs counter methods)
    • • Arbiters (round-robin, priority, etc.)

Digital Design Fundamentals

  • Logic design basics: universal gates, MUX implementations
  • Counter design
  • Basic multipliers and adders
  • State Machines (major topic):
    • • Draw state diagrams for scenarios
    • • Mealy vs Moore models - critical to master both
    • • Convert diagrams to Verilog - practice templates for quick implementation
    • • Binary vs one-hot encoding tradeoffs

Computer Architecture - (See Computer Architecture section)

  • OS Fundamentals (Important)
  • Cache (Important)
  • Memory hierarchy
  • 5-stage pipeline basics
  • Out of order execution
  • Branch Prediction
  • Parallelism

Static Timing Analysis

  • Definitions: setup time, hold time, clock insertion delay, variation
  • Analyze Flop-Logic-Flop configurations:
    • • Detect violations
    • • Fix violations
  • Time borrowing, metastability, MTBF
  • Logic optimization:
    • • Karnaugh maps, mux/decoder implementations
    • • Glitches in combinational circuits - identify and resolve (hazard covers)

Programming & Scripting - (More details in DV section)

  • C/C++:
    • • Leetcode (View the most common DV questions at the bottom of the page)
    • • How to write Makefile
    • • Typical C++ knowledge based questions - OOP concepts, Classes
    • • Given a snippet of code, be able to explain the logic.
  • Scripting: Python

Protocols & Interfaces

  • Understand the working principles of some SOC communication protocols, such as AXI, PCIe, I2C, UART, and SPI; these are frequently asked in interviews. Having a project you're particularly proud of is a great asset. When interviewers ask me to describe a project on my resume, I always choose that one because it's something my teammates and I worked on from the very beginning. We had a clear understanding of all the design and optimization details, and the final result was quite good, so I felt confident when discussing it.

Things you should cover:

  • Valid-Ready handshaking (ubiquitous)
  • AXI protocols if applicable
  • Backpressure mechanisms (valid-ready, valid-afull, valid-credit)

Clock Domain Crossing (CDC)

  • CDC techniques and synchronization methods
  • How to synchronize 1 bit or multiple bits, grey code and potential problems
  • Know how to calculate the minimum clock cycle or propagation delay of a given circuit
  • Handling data crossing clock domains safely, some cases to consider:
    • • Slower clock domain to faster clock domain
    • • Faster clock domain to slower clock domain
    • • Crossing between clock domains with lots of data
    • • ... and know what happens if you don't handle it properly for these cases
  • Synchronizers (2-flop, multi-flop)
  • Handshaking protocols for CDC

FPGA-Specific Roles

  • FPGA vs ASIC differences
  • Optimized RTL for FPGA architectures (register abundance, DSP blocks)

RTL Design Fundamentals

1.
What is the difference between wire and reg?
2.
What's the difference between a Mealy and Moore machine? How will you convert your Mealy machine to Moore? What's an advantage of a Mealy machine?
3.
Difference between blocking and non-blocking assignments? What's the difference between always_ff and always_comb?
4.
What is the difference between bit and logic?
5.
What are the tradeoffs between the FSM v/s shift register approach?
6.
What is pipelining? Explain the 5 stages of pipelining.
7.
If we have a reset tree that's too big and we can't meet reset deassertion timing, what can we do in this case?
8.
NVIDIA phone screen - Design a XOR gate with 2:1 MUX.
9.
Why are latches discouraged?
10.
How do you do synchronous deassertion of reset?
11.
What are the considerations before designing the microarchitecture?
12.
FPGA Engineer - Optiver Take Home Test Question:

Select the boolean equation that matches the following truth table:

ABCO
0001
0010
0101
0110
1000
1011
1100
1111
Pick ONE option:
  • a) (A xor B) and C
  • b) A or (B and C)
  • c) A xnor C
  • d) B
  • e) B and C
  • f) A xor (B xor C)
13.
FPGA Engineer - Optiver Take Home Test Question:

If we used lookup tables (LUTs) with 4 inputs and 1 output to implement the LogicModule module below, how many lookup tables would be used?

module LogicModule ( input logic Clk, input logic Rst, input logic [7:0] DataIn, output logic [7:0] DataOut ); always @(posedge Clk) begin DataOut[7] <= DataIn[0] | DataIn[1]; DataOut[6] <= DataIn[1] | DataIn[2]; DataOut[5] <= DataIn[2] | DataIn[3]; DataOut[4] <= DataIn[3] | DataIn[4]; DataOut[3] <= DataIn[4] | DataIn[5]; DataOut[2] <= DataIn[5] | DataIn[6]; DataOut[1] <= DataIn[6] | DataIn[7]; DataOut[0] <= DataIn[7] | DataIn[0]; end endmodule
14.
Verilog Timing Analysis Question:

What is the value of A & B at various times of simulation - 0 time_unit, 1 time_unit, 2 time_unit, 3 time_unit?

initial begin
A = 0;
B = 1;
end
// Section 1.1:
always @(posedge clk) begin
A <= 2;
end
always @(posedge clk) begin
B <= A;
end
// Section 1.2:
always @(posedge clk) begin
A = 2;
end
always @(posedge clk) begin
B <= A;
end
// Section 1.3:
always @(posedge clk) begin
A <= 2;
end
always @(posedge clk) begin
B = A;
end
// Section 1.4:
always @(posedge clk) begin
A = 2;
end
always @(posedge clk) begin
B = A;
end
15.
Given the initial values a=1, b=0, c=0 just before a rising clock edge, compute the final values of a, b, c after one clock for each snippet.

Variant A – Non-blocking (<=)
always_ff @(posedge clk) begin a <= b; b <= c; c <= a; end
Variant B – Blocking (=)
always @(posedge clk) begin a = b; b = c; c = a; end
What are the final values of a, b, c for Variant A and Variant B? (Show your steps by drawing a waveform or a small timing table.)
16.
Once you write RTL, how do you make sure that it's synthesized alright? How do you make sure that there's no unintended latch or no combinational block?
17.
What is wrong with this code?
always @(posedge clk) begin : pipeline
    Q1 = in;
    Q2 = Q1;
    Q3 = Q2;
  end
💡 Click to reveal solution
Solution: Uses blocking assignments inside a clocked block, so the current input propagates through Q1→Q2→Q3 in the same cycle. That doesn’t model pipeline flops (each stage should capture theprevious cycle’s value). Use non-blocking (<=) oralways_ff to model sequential behavior.
18.
How do you write a fix priority arbiter?
19.
Meta phone screen - How do you write a round robin arbiter? Talk it out loud.

RTL Coding Challenges

1.
Design a circuit that can detect 10X011 (X can be either 0 or 1)
2.
Let's say we have a 3 bit signal coming in, we want the output to be high when the input is 1, 2 and 4. How will you design that? (very simple question to start off)
3.
Basic Verilog question - Write a Sequence Detector for 101
4.
Basic Verilog question - Write a Sequence Detector for 101011
5.
Swap 2 variables without using a temporary variable.
6.
Meta phone screen - You have valid / ready interface. How will you add a pipe stage.
7.
AMD Xilinx - Write Verilog code for a 2-bit up/down counter. Inputs are clk, async rstb, up, down and output is count. Don't jump into writing RTL first, draw truth table first for ALL possible input combinations and start coding after that. Drawing circuit diagram after helps as well.
8.
AMD Xilinx - An input bit pattern is coming in. Determine at any point if the number is divisible by 5 or not.
9.
AMD Xilinx - Do a non-state machine approach for the previous problem (divisible by 5).
10.
Apple RTL Role Interview Question: Find if a stream of bits is divisible by 5 for infinite length, 1 bit per cycle
💡 Click to reveal solution
Solution: https://electronics.stackexchange.com/questions/345189/vhdl-interview-question-detecting-if-a-number-can-be-divided-by-5-without-rema
11.
Design a block with inputs d and clk and output match. Detect a pattern of 1101. How would you make this block configurable to detect all 4-bit patterns?
12.
Write RTL for a 512:1 multiplexer.
13.
NVIDIA phone screen - You have a 7 bit vector coming in a[6:0]. How would you find out the number of 1s in this vector?
14.
Write RTL - You've 1 bit per cycle data coming in. Take that input and make it 3 bit. Detect whether the content of the 3 bit register can be divided by 3 or not. After reset, the register value will be 0.
15.
Write RTL for a 128-bit memory with 32-bit data interface. What happens if both wr_en and rd_en are 1 in the same cycle?
16.
Create a data buffer - type and number of samples are configurable. Depth of buffer can change (could be say 5, 10 or 100). Samples gathered at the same time periodically (every 10 clks). Read can happen at any time. We want the latest X samples to be available. We can stall writes, can't stall reads.
17.
Amazon phone screen - You have a 10-bit tag coming in. Each 10-bit tag has to be assigned to a unique 4-bit ID and sent downstream. When the 4-bit ID comes back as response, the 10-bit tag has to be returned to the master. There is a valid and ready on the incoming tag side and a valid and ready on the downstream side to send the tag out. Design this block (Had to write RTL to design this)
18.
Popular question - Design a traffic light controller (Hint - use a FSM)
19.
Build a 32-bit adder with two 16-bit adders and other logic. The 16-bit adder doesn't have Cin.
20.
Write RTL for an SRAM (behavioral model) with parameterizable depth and width.
21.
Draw a circuit to detect even number of 1s.
22.
Determine Sum and Carry for Half Adder and Full Adder. (Yep Even this. Actually took me a minute to derive Cout for a full-adder)
23.
RTL for determining number of 1s in an input bit-stream.
24.
Write a verilog module which sorts the values the given memory, so the lowest value is at address 0x00 and the highest value is at 0xFF. Memory has one read port and one write port.
25.
MSFT Question

Implement a frame filter module that accepts input frames and forwards only good ones. The module should:

• Drop any frame with an error on any beat
• Drop frames with invalid start/end sequence
• Apply backpressure when FIFO/buffer is almost full
• Handle frames of variable length (1 beat to many beats)
• Ensure no partial frames are sent to the output

// Frame filter: accept input frames, forward only good ones // Drop any frame with an error on any beat // Drop frames with invalid start/end sequence // Apply backpressure when FIFO/buffer is almost full // FPGA-friendly implementation module frame_filter #( parameter int DW = 512, parameter int PW = 6 )( input logic clk, input logic rst_n, // Incoming stream input logic in_vld, // valid beat input logic in_sof, // start-of-frame input logic in_eof, // end-of-frame input logic in_err, // error on this beat input logic [PW-1:0] in_tail_pad, // valid only when in_eof=1 input logic [DW-1:0] in_data, output logic in_backpress, // assert to stall sender // Outgoing stream output logic out_vld, output logic out_sof, output logic out_eof, output logic [PW-1:0] out_tail_pad, output logic [DW-1:0] out_data, input logic out_stall ); // Tasks: // 1) Track when a frame starts and ends // 2) If any beat in a frame has in_err=1, mark frame as bad // 3) Do not send any beat of a bad frame to the output // 4) Handle frames of variable length (1 beat to many beats) // 5) Apply backpressure when buffer is almost full // 6) What happens if frame never sends in_eof? (flush? timeout?) // 7) Make sure you don't send partial frames to the output endmodule
26.
Microsoft Interview Question (asked to code this up in 45 minutes) - One value arrives each clock. Using a stack-based approach, track the second-largest value observed so far. When out_valid is asserted, output (latest_value, second_largest_so_far). Example stream: 1, 4, 5, 2, 3 → on first out_valid: latest=3, second_largest=4; on next out_valid: latest=2, second_largest=4. Duplicates count as the second-largest value. State assumptions (bit width, signed/unsigned, duplicate handling, reset behavior, latency)
💡 Click to reveal solution
Solution: Maintain two stacks, one for the input order (S1) and one to track second largest (S2) in each cycle. Also one additional flop to track largest, let's call this L1
Every new input:
1. Push input into S1
2.
  If input >= L1
    Push L1 into S2
    Push input into L1
  else if input >= S2
    Push input into S2
  else
    Push head S2 into S2
27.
Please write RTL for the following register design.
Clk_50Mhz – 50 MHz input clock
Reset_n – Active-low reset
Data_in_1..4 – four 16-bit data inputs
Data_out – 16-bit data output
WE – write enable for selected input
Address – 2-bit selects which input to write
28.
Design a circuit that counts number of 1s in a[3:0] if the only available component is a full-adder
a3 a2 a1 a0 || | | | ________________________ || | ||_______________________| | count
29.
Design a component to rotate a 4x4 byte array & write out the output. Data comes in 4 cycles.
Input: One row of bytes per cycle (4 inputs for the array):
Cycle 0 : B3, B2, B1, B0 Cycle 1 : B7, B6, B5, B4 Cycle 2 : B11, B10, B9, B8 Cycle 3 : B15, B14, B13, B12
Output: 1 rotated column of bytes per cycle:
Cycle 0 : B12, B8, B4, B0 Cycle 1 : B13, B9, B5, B1 Cycle 2 : B14, B10, B6, B2 Cycle 3 : B15, B11, B7, B13
30.
Create this Fibonacci generator system -
___ Clk --| | Rst --| |--fib[15:0] Next --|___|
Next is a 1 bit pulse. When next is 1, generate next number in sequence. Hold previous value until next is 1.
33.
Design this system:
We get an async event.
If ≥1 event: Count as 1 event
If 0 event: Miss.

Requirements:
  1. • Create system to raise error flag if we have 2 misses every 40 cycles
  2. • Modify to raise error flag for 2 misses in any 40 cycle window
34.
How can a fixed 16-bit adder (black box) be used as two independent 8-bit adders? You may add external logic on inputs/outputs but cannot modify the adder internals. Normally it computes a[15:0]+b[15:0] → s[15:0]; now requirea[7:0]+b[7:0] → s[7:0] and a[15:8]+b[15:8] → s[15:8] using that single adder. State assumptions (single-cycle vs multi-cycle/time-multiplexed, latency allowed, carry behavior).
💡 Click to reveal solution
Solution: At the input to the lower adder, force a[7] and b[7] to 0. Then at the output, S0[7] = S0_from_adder[7] ^ a[7] ^ b[7]. Upper sum remains as is.
35.
Design a parameterized encoder, which converts an N bit one hot signal to a binary value, specifying the location of the set bit. It should not synthesize with priority and you can assume a "don't care" output for invalid inputs.
36.
Design hardware to implement the IIR filter
H(z) = Y(z)/X(z) = (1 + 2z-1 + z-2) / (1 − 1.5z-1 + 1.5z-2).
Specify structure (Direct Form I/II or transposed), fixed-point widths/quantization, overflow/saturation behavior, latency, and reset strategy.
37.
What's the minimum multipliers required to implement Y= (AX^4)+(BX^3)+(CX^2)+(DX)?
💡 Click to reveal solution
Solution: 4 x[x{x(ax+b)+c}+d]

FIFO Design

1.
Sync FIFO + RTL code. Full/empty condition
2.
AMD Xilinx - Write Async FIFO RTL
3.
Gray to binary and Bin to gray conversion + RTL code
4.
How do you design non power of 2 depth Async FIFO?
5.
Qualcomm phone screen -
a) A system has a 100MHz write clock. The write logic performs 16 write operations within 80 clock cycles. The write pattern is flexible (can be burst or random). What is the minimum read frequency required to ensure that the write operations are never backpressured?
b) After finding the frequency, what is the minimum depth that this FIFO should have?
6.
Question from NVIDIA - FIFO Write: 250 MHz, FIFO Read: 200 MHz, 70 valid data burst every 100 cycles, calculate FIFO depth.
💡 Click to reveal solution
Solution: Time to write one burst = 4ns, Time required to write 70 bursts = 280ns, Time required to read one burst = 5ns, Number of bursts read in 280ns = 56, Depth of fifo = 70-56 = 14. The key point here is he asked me to show steps for assuming 70 bursts when he originally mentioned 70 valid in 100 cycles
7.
Calculate fifo depth and width: write side writes 18 bit/cycle, write frequency = 60MHz, read side, reads 20bit/cycle, read frequency = 36 MHz, burst size = 100 writes.
💡 Click to reveal solution
Solution: Width: 20 bits with 30 entries (+4 to sync)
Burst I assume is 100 cycles and not bits. So 1800 bits written in 100 cycles while read drains (36/60 x 20/18) x 1800 = 1200 bits in the same time. 600 bits to buffer, divides by 20 for 30 entries.
If you want to add sync delay to this, assuming read starts after a delay, can have 3-4 more entries.
8.
Question from NVIDIA - Number generator. Synchronous reset—while reset is asserted, the first post-reset output should be 1. Generate the sequence 1, 4, 9, 16, 25, 36, 49, … without using multipliers. How many adders are needed? (Hint: it's not "square the number.")
💡 Click to reveal solution
Solution: Use sum of consecutive odd numbers to generate squares.
Architecture:
• Register the output (out_r).
• Keep an odd-number counter starting at 3 and increment by 2.
• On each step: out_r ← out_r + odd_count.
Adders required: 2 (one for the odd counter, one for out_r + odd_count).
9.
Is there something you could change in your design so that you could use a single port SRAM but still have the capability of continuous data coming in and out?

RAM Design

1.
Design a RAM with a write port and a read port. The write port is 16 bits wide and the read port is 16 bits wide. The RAM should be able to store 16 words.
2.
What is the difference between Open page and Closed page policy?
3.
MSFT Hardware Engineer II. FPGA Virtualization/SDN team

How would you implement malloc() and free() in hardware (Verilog)?

module hw_malloc_free #( parameter DEPTH = 16, // number of memory blocks parameter ADDR_WIDTH = 4 // log2(DEPTH) )( input wire clk, input wire rst, // Allocation request input wire alloc_req, // request to allocate a block output reg [ADDR_WIDTH-1:0] alloc_addr, // allocated address index // Free request input wire free_req, // request to free a block input wire [ADDR_WIDTH-1:0] free_addr, // address to free // Status output wire full, // no free blocks output wire empty // all blocks free );
4.
FPGA Image Processing Interview Question

AXI-Stream 5x5 Line-Buffer Design

You are given an AXI-Stream video-style input:
s_axis_tdata — 8-bit pixel
s_axis_tvalid
s_axis_tready
s_axis_tlast — end of line
s_axis_tuser — start of frame

Resolution is fixed (e.g., 1920 pixels per line), 1 pixel per cycle.

Question: Design an RTL block that outputs a 5x5 pixel window every cycle using only line buffers (BRAM-based).

Output is also AXI-Stream:
m_axis_tdata — 25 pixels (5x5 window)
m_axis_tvalid
m_axis_tready
m_axis_tuser — aligned to center pixel
m_axis_tlast — aligned to center pixel

What you must explain in your answer:

1. How many line buffers are required and why?
2. How horizontal pixel delays are created for each line.
3. How the module knows when the 5x5 window is "valid."
4. How tuser and tlast must be delayed to align with the center of the 5x5 window.
5. What happens at borders (first 2 rows/columns).
6. How you keep the AXI-Stream protocol compliant (tvalid/tready).

Low-Power & Power Intent (UPF)

1.
Latch vs FF for clock gating. Which is preferred and why?
💡 Click to reveal solution
Solution: Negative level-sensitive latch is preferred.

Why:
1) Area & Power: Latches are smaller/lower power than flops; savings multiply across thousands of ICGs.
2) Timing slack: A negative latch is transparent when clk=0, giving the enable nearly a full cycle to meet timing. A negedge FF gives ~½ cycle, tighter and riskier.
3) Glitch-free gating: In a standard ICG, latch output is ANDed with clk. While clk=0, the AND output stays low so enable changes can’t glitch the gated clock; latch closes at clk↑ and propagates a stable enable.
Using a posedge FF risks races at the AND; a negedge FF offers no benefit over the latch but costs more area/power.
2.
How did you implement power-saving in your design?
3.
How do you handle signals going from an ON to OFF domain and vice versa? How do you manage isolation, does it matter?
4.
AMD Xilinx - If clock gating logic is moved into the DEN (Data Enable), will this cause LEC (Logical Equivalence Check) to fail or not? Why?
5.
Static and dynamic power, what are some ways to reduce both.

Advanced RTL Topics

1.
Different way of arbitrating for resources? Difference between Find First and Round Robin arbiters. What are some pros and cons, why you might want to use find first and why you might want to use a round robin? What about the HW cost tradeoffs? RTL code for these?
2.
NVIDIA phone screen - Open ended question: You have to design a memory controller block. It is an SRAM memory that you are accessing. How would you design the block?

Things to think about:
a) What interfaces would you use? How many?
b) How would you arbitrate between requests?
c) How many ports does memory have?
d) What other blocks your memory controller should have?
3.
Apple phone screen - In a system how will you prevent RAW hazard?

Design Verification

Topics

Overview

Design Verification involves writing tests for digital logic modules to ensure comprehensive coverage of all use cases and corner conditions.

Behavioral Preparation

Same concepts as RTL Design roles (see RTL section)

Verilog/System Verilog

You won't need to write SV for complex structures such as round robin arbiters, but you will be asked how to test them (e.g. how would you test a FIFO?).

See RTL section for more details.

Digital Design Fundamentals

Same concepts as RTL Design roles (see RTL section)

Computer Architecture

Same concepts as RTL Design roles (see RTL section)

Static Timing Analysis

Same concepts as RTL Design roles (see RTL section)

Clock Domain Crossing (CDC)

Same concepts as RTL Design roles (see RTL section)

Programming & Scripting

Enhanced requirements compared to RTL roles (see RTL section):

Object-Oriented Programming
  • You should at least be proficient in one of C++ or Python, with a preference for C++. Then, do some LeetCode easy-level problems. If you really can't solve them, just look at the answers, but after looking at them, you should definitely write some code and run it yourself.
  • I recommend some recursive and string problems, such as Fibonacci sequence and palindrome detection, although my reason for recommending them is simply that I've seen others encounter these problems on the forum.
  • I only did about a dozen problems in total, so the key isn't how many problems you do, but rather understanding the usage of data structures and algorithms. We're not computer science, so we don't need to know so much.
  • There is a list of all the most common leetcode questions for DV at the bottom of the page.
  • C++ is good for SystemVerilog's OOP foundation
  • Sometimes skipped in interviews, but valuable knowledge regardless
  • Understand and code examples for OOP principles:
    • • Classes, Objects, Abstraction, Encapsulation
    • • Inheritance and Polymorphism (frequently tested - identifying object class relationships)
Scripting
  • Automation scripting is also very important; for example, printing specific information from a file. I didn't know how to do that, so I felt quite helpless when asked about it.
  • Python works well for entry-level interviews these days since CocoTB is becoming more and more popular.
  • Common pattern: parsing and processing large text files
    • • Practice file I/O operations
    • • Basic string parsing

UVM

This is only for non-internship roles, unless you have experience with UVM at work before.

UVM Cookbook should be enough preparation for the interview. Chipverify is also good. UVM mostly asks sequence/driver questions.

You need a good understanding of the verification architecture, including how generators, drivers, monitors, and scoreboards work. Ideally, you should be able to do some simple projects yourself. UVM is supposedly Big Plus, but it's normal not to know it; it's not a negative factor.

What to Expect

DV interviews are particularly challenging because you're expected to be a strong designer as well. Prepare all RTL Design content thoroughly.

  • Verilog design questions often appear (typically simpler than RTL-focused roles)
  • Strong digital design fundamentals separate good candidates from great ones
  • For experienced candidates:
    • • With prior DV experience (including internships), expect deep dives into UVM and advanced verification concepts
    • • RTL Design questions may be minimal or skipped entirely in these cases

Verification Methodologies & Strategies

1.
How would you test a memory black box?
2.
How do you verify a Round Robin arbiter?
3.
How would you test a FIFO?

UVM

1.
What is a UVM factory? What is the difference between type_id::create() and new()?
💡 Click to reveal solution
Solution:

UVM factory is used to register classes and instances with a factory. e.g. uvm_object_utils, uvm_component_utils, type_id::create("<inst_name>"). New() is the default constructor of SV.

2.
What are the two different types of override that UVM supports?
💡 Click to reveal solution
Solution:

Class override and instance override

3.
What is the method that is called to run a test?
💡 Click to reveal solution
Solution:

run_test()

4.
What is a virtual interface?
💡 Click to reveal solution
Solution:

Pointer to the actual interface. uvm_config_db is used for setting and getting the pointer to the actual interface.

5.
What is a virtual sequencer?
💡 Click to reveal solution
Solution:

A sequencer that contains pointers to all other sequencers.

6.
What are the different phases of a component? What is the check_phase used for? Which phases are task(s) and which are function(s)?
💡 Click to reveal solution
Solution:

Most commonly used phases: build_phase(), connect_phase(), run_phase(), check_phase(). Check_phase can be used to check size of queues and flag an error if queue is expected to be empty at the end of a simulation. Run_phase() is a task while the rest are funcitons.

7.
Draw a testbench diagram and explain each component. Explain the handshake between a sequencer and a driver.
💡 Click to reveal solution
Solution:

Testbench Components:

  • Test
  • Environment
    • Scoreboard
    • Predictor (optional)
    • Agent(s)
      • Monitor
      • Sequencer
      • Driver
  • Virtual sequencer (optional)
  • Coverage monitor (applicable if you do class based coverage collection)
  • Other models necessary in your testbench (e.g., memory model, etc.)

Sequencer-Driver Handshake: The sequencer generates transactions and sends them to the driver through a TLM (Transaction Level Modeling) port. The driver requests transactions from the sequencer using get_next_item() or try_next_item(), and when done processing, calls item_done() to signal completion back to the sequencer.

8.
Write a monitor to capture a transaction. Transaction starts when vld is asserted and ends when vld is de-asserted. Transaction needs to be sent to scoreboard once vld is de-asserted.
💡 Click to reveal solution
Solution:

Note: All but 2 of my screening interviews had this question. The interviewers mainly want to see whether or not you understand some key components of a monitor. Interviewers don't care about syntax.

// parameterization of intf is not necessary. you can add clk and rst as signals in the intf itself interface intf (logic clk, logic rst); logic vld; logic [7:0] data; // clocking block is not necessary but if you understand it then it'll look good to interviewer // if a clocking block is used by you, interviewer will ask to explain what it is and what it does clocking mon_cb @(posedge clk); input vld; input data; endclocking endinterface class trxn; logic [7:0] data_q[$]; endclass class mon extends uvm_monitor; uvm_analysis_port m_port; virtual intf m_vif; // vif = virtual intf function void build_phase(); m_port = new(); uvm_config_db(get(), "blah", "blah", m_vif); // getting address of actual interface endfunction function void connect_phase(); // analysis port will be connected to its subscribers in the environment since the subscribers are not within scope of this monitor endfunction task run_phase(); // not necessary to fork but good coding practice fork mon_intf(); join_none endtask task mon_intf(); trxn data_trxn; forever begin @(vif.mon_cb); if (!prev_vld && vif.mon_cb.vld) begin data_trxn = trxn::type_id::create("data_trxn"); end if (vif.mon_cb.vld) data_trxn.data_q.push_back(vif.mon_cb.data); if (prev_vld && !vif.mon_cb.vld) begin m_port.write(data_trxn); // global write to subscribers of this analysis port end prev_vld = vif.mon_cb.vld; end endtask endclass
9.
You have 5 parallel tasks running. Once any 4 tasks complete, the 5th remaining task should be terminated. How would you implement this behavior using SystemVerilog?
10.
Which phase are task, which are function?
11.
What phase is top-down and what is bottom up?
12.
What is difference between uvm_config_db and uvm_resource_db?
13.
Why do we need uvm_factory? (for easy type overriding without modifying base code)
14.
What is a virtual sequence/sequencer, why do we need?
15.
How to pass a virtual interface through uvm_config_db?
16.
What is UVM reactive sequence, my understanding is stimulus will be based on response from driver, put_response()/get_response(), also note you need to set_id_info() to guide it from driver into correct sequence.

SystemVerilog Testbench & Constraints

1.
Question from Meta DV role - Write a constraint for 4 monkeys having to share 10 bananas and make sure every monkey gets atleast 1 banana
💡 Click to reveal solution
Solution:
rand int unsigned fruits_per_monkey [NUM_MONKEYS]; constraint fruits_per_monkey_c { fruits_per_monkey.sum() == 10; foreach(fruits_per_monkey[id]) { fruits_per_monkey[id] != 0; } }
2.
Question from NVIDIA panel - You are generating a 32-bit random variable in SystemVerilog. Write a constraint to ensure that the randomized value does not contain more than five consecutive bits that are all 1's or all. Solve without using 'unique' or 'post_randomize' keyword. 0's.

In other words, within the 32-bit value, there should be no sequence of 5 contiguous bits that are all set or all unset.
💡 Click to reveal solution
Solution:
rand bit [31:0] arr; constraint c { foreach (arr[i]) { if (i <= 27) { !(&arr[i+:5]); !(&(~arr[i+:5])); } } }

Explanation:

  • foreach (arr[i]) - Iterates through each bit position
  • if (i <= 27) - Only check positions where we can fit 5 consecutive bits (positions 0-27)
  • !(&arr[i+:5]) - Ensures no 5 consecutive 1's (AND of 5 bits is 0)
  • !(&(~arr[i+:5])) - Ensures no 5 consecutive 0's (AND of inverted 5 bits is 0)
3.
You are given a queue of integers:

int array[$] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};

Write SystemVerilog constraints to divide the elements of this queue into three new queues (q1, q2, and q3) such that:

  1. • Every element from the original queue appears in exactly one of the three new queues
  2. • All three queues together contain unique elements (no duplicates)
  3. • Each queue must have at least one element
  4. • You cannot use post_randomize() to perform the split — it must be handled entirely within the constraint block

How would you approach this problem?
💡 Click to reveal solution
Solution:
module test(); class temp; int array[$] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; rand int idx[15]; rand int q1[$], q2[$], q3[$]; constraint c1 { // Total elements must match q1.size() + q2.size() + q3.size() == array.size(); // Size bounds q1.size() < 10; q2.size() < 10; q3.size() < 10; // Each queue must have at least one element q1.size() > 0; q2.size() > 0; q3.size() > 0; // Generate unique shuffled indices unique {idx}; } constraint assign_to_queue { // Assign elements using shuffled indices foreach(q1[i]) { q1[i] == array[idx[i]]; } foreach(q2[i]) { q2[i] == array[idx[i + q1.size()]]; } foreach(q3[i]) { q3[i] == array[idx[i + q1.size() + q2.size()]]; } } endclass initial begin temp t = new(); assert(t.randomize()); $display("Indices: %p", t.idx); $display("Queue 1: %p", t.q1); $display("Queue 2: %p", t.q2); $display("Queue 3: %p", t.q3); $finish; end endmodule

Explanation:

  • idx[15] - Array of shuffled indices to randomize element assignment
  • unique {idx} - Ensures all indices are unique (no duplicates)
  • q1.size() + q2.size() + q3.size() == array.size() - Ensures all elements are distributed
  • q1.size() > 0; q2.size() > 0; q3.size() > 0 - Each queue has at least one element
  • The assign_to_queue constraint uses the shuffled indices to assign elements from the original array to the three queues
4.
Write SystemVerilog constraints for an integer array with 10 elements such that:

  1. • Exactly three elements are identical (i.e., one value appears three times)
  2. • All the remaining seven elements are unique

How would you implement this constraint?
💡 Click to reveal solution
Solution:
class aa; rand byte unsigned val[10], val3; constraint s1 { foreach(val[i]) { val[i] < 10; // Every value except val3 must appear only once val[i] != val3 -> val.sum() with (int'(item==val[i])) == 1; } // val3 must appear three times in the array val.sum() with (int'(item==val3)) == 3; } endclass

Explanation:

  • val[10] - Array of 10 random unsigned bytes
  • val3 - The value that appears exactly 3 times
  • val[i] < 10 - Constrains each element to be less than 10
  • val[i] != val3 -> val.sum() with (int'(item==val[i])) == 1 - If an element is not equal to val3, it must appear exactly once
  • val.sum() with (int'(item==val3)) == 3 - val3 must appear exactly 3 times in the array
5.
Write a constraint to make sure every even index in an array is even and every odd index is odd.
💡 Click to reveal solution
Solution:
constraint x { foreach (arr[i]) { (i & 1'b1) == (arr[i] & 1'b1); } }
6.
Write a constraint where you have a 32 bit value bit [31:0] val where you’d want to randomize this to where every randomization would only allow 2 bits to differ from the previous randomization.
7.
SystemVerilog Multithreading Question:

What are some use cases for wait fork and disable fork? How would one kill thread_2() after fork-join_any below finishes and then continue waiting for remaining threads to finish? You can make changes to the code below if needed.

fork
thread_1();
thread_2();
thread_3();
join_any
// how would one kill thread_2() after fork-join_any finishes?
Answer:
fork
begin : tid_1
thread_1();
end
begin : tid_2
thread_2();
end
begin : tid_3
thread_3();
end
join_any
disable tid_2;
wait fork;
8.
SystemVerilog OOP Virtual Keyword Question:

Upcasting vs downcasting. What does "virtual" keyword do in SystemVerilog?

class base;
function void print();
$display("INSIDE BASE \\n");
endfunction : print
endclass : base
class derived extends base;
// what are the prints when virtual keyword is 1) missing and 2) present?
<virtual> function void print();
$display("INSIDE DERIVED \\n");
endfunction : print
endclass : derived
initial begin
base b_h;
derived d_h;

b_h = new();
b_h.print();
//what's the print?

d_h = new();
d_h.print();
//what's the print?

b_h = d_h;
b_h.print();
//what's the print?
end
9.
SystemVerilog Thread Output Question:

What will be the output of the following code block? Update the code block to print the lines below (in any order).

Desired Output:
• thread id: 0
• thread id: 1
• thread id: 2

// Original Code:
task automatic run_thread(int thread_id);
$display("thread id: %0d", thread_id);
endtask

for (int i = 0; i < 3; i++) begin
fork
run_thread(i);
join_none
end
// Updated Code:
task automatic run_thread(int thread_id);
$display("thread id: %0d", thread_id);
endtask

for (int i = 0; i < 3; i++) begin
fork
begin
int j = i;
run_thread(j);
end
join_none
end

wait fork;
10.
SpaceX Constraint Question:

// Question:
class even_odd;
// array of integers for which we want a
// patterned series of N even values followed by
// N odd values
rand int array[];
rand int N;
// constraint to enforce the even-odd
// pattern
constraint pattern;
endclass: even_odd
/*
* Create a constraint which will ensure that
* the array is randomized according
* to our pattern
* EX (N = 2): {0, 2, 3, 5}
* EX (N = 5): {18, 8, 18, 24, 98, 9, 15, 33, 71, 1}
* EX (N = 1): {4, 1, 22, 9, 36}
*/
constraint even_odd::pattern{
}
11.
Write constraint to generate a random number with only 5 bits set and consecutively set for 80% of the time
12.
NVIDIA Constraint Question: Randomize three poker cards each with different type and color.
13.
NVIDIA Constraint Question: Create a constraint for an int array with 10 elements. Value is 1 to 10. 2 of the elements will have the same number and the rest will all have different numbers, the index of the 2 same elements also have to be randomized.
14.
AMD DV Interview Question: Write constraint for an integer array with 10 elements such that exactly 3 of them are same and rest are unique
15.
YoE 4-5 Meta DV Interview question - Constraint: 2D image 320×240, each pixel is 16 bits. Constrain each pixel such that pixel is less than the sum of its 4 neighbours (top, bottom, left, right).
16.
More SystemVerilog Constraint Problems:

1. Write constraint such that sum of arr[10] is 100 without using .sum method
2. Write constraint for an integer array with 10 elements such that exactly 3 of them are same and rest are unique
3. Write a constraint to randomize 3x3x3 array with unique elements
4. Write constraints to generate MxN matrix with each element with 0,1 and sum of all elements less the MAX_SUM
5. Write constraints – to pick a ball out of 10 different colored balls and that color should not be repeated for in next 3 draws
6. Write a constraint to divide values of 1 queue into 3 queues so that all 3 queues have unique elements
7. Write a constraint to generate dynamic array of 300 elements. each element can have value 0/1/2/3/4 each of the above values should be present more than 40 times in the array element 0 can be repeated while 1/2/3/4 are not allowed to repeat consecutively ex: 001342.. allowed(0 can be repeated) ex: 0122431.. not allowed(2 is repeated)
8. Generate a solved sudoku puzzle
9. For a 8 bit variable if the past randomization resulted in a odd value, the next randomization should be even with 75% probability else be even with 25% probability. Write a constraint
10. Stream of random numbers are getting generated. how to write constraints to make sure most recent four numbers are unique.
11. Write a constraint to generate two dynamic arrays such that array1 size = [6:9], array2 size = array1 size. Array 1 should be assembled in ascending order while array2 should have all the values picked from array1
12. Write a constraint on a 4 bit variable such that the probability of values being the same on the lower two bits have only 5% chance
13. Write a constraint for generating leap years
14. Write a constraint to make sure every even item in an array is even and every odd item is odd.
15. a. Matrix size should be randomized with only odd numbered square matrix. b. Each sub square matrix should have only one max element and rest less than max and can be repetitive c. Max elements of each square sub matrix should be unique, mean no two sub matrix should have same max element d. Constraint Should scale up to any size of limited by 32 bit
16. Write a constraint for a 10 bit variable so that; -> 10% of the time 1 bit in en is high -> 10% of the time 2 bits in en are high ... -> 10% of the time all 10 bits in en are high.
17. Write a constraint to generate a number which has binary all 1s grouped to gather
18. Write a constraint to randomly map elements from an array into N non-empty queues
19. Write code so each element from input_array appears in output_queues, no output queue should be empty
bit [7:0] input_array[]; int unsigned input_count; rand bit [7:0] output_queues[$][]; constraint test_c { output_queues.size() == input_count; }
20. The dynamic array size should be 300 and the elements in it can be 0,1,2,3,4,5. Each element should occur atleast 40 times in the array. Also, there shouldn't be 0's consecutively.
21. Write a constraint for distinct adjacent element (adjacent elements should not be the same) in a 2d array (specifically for the boundary elements size of 2d array is mxm where m can be any integer)
22. Write a constraint for a 2d array such that it has a unique max value in each row and that max value should not be equal to any other max value in other rows
23. Generate a parameterized array whose values are equal to a Magic square using constraints
24. How to write a constraint to generate a random value of 10 bits as 1 and no two bits should be next to each other. This should be written without using $countones.
25. Write SV constraint to limit sum of odd elements of an array to be 30 and sum of even elements to be 60
26. Write constraint for number to be power of 4.
27. Write a constraint that generates Add, mul, sub, nop instructions. Such that no Add instruction is repeated in 3 clock cycles and sub is not repeated in the last 3 valid instructions. Nop is not a valid instruction
28. Write a SV constraint to generate 2 3x3 matrices s.t the min value in each matrix is unique
29. Write a SV constraint for a tic tac toe game. Matrix size is 3x3.
30. Write constraint for an integer array with 10 elements such that exactly 3 of them are same and rest are unique
31. Constraint to randomize a 100 bit var such that always and only 5 consecutive bits are 1s
32. Constraint to randomize an array such that one specific element picked is always a constant value, say element at index 5 is always 100.
33. Write constraint to generate a random number with only 5 bits set and consecutively set for 80% of the time
34. Write a constraint for square matrix and then rotate 90 counter clock wise
35. Write a constraint for 3x3x3 array to have unique elements
36. Say you have 4 instructions: ADD, SUB, MUL, NOP. They execute in X cycles, Y cycles, Z cycles, and 1 cycle respectively. How would you generate a constraint to inject these into a design so that two like instructions don't overlap during the same period of time. Bonus: Is there a way you could generate this without using SV constraints
37. Write system verilog constraints for the eight queens problem?
38. Write system verilog constraints for the Knight's tour problem.
39. Add "size" number of entries to a queue. The entry of queue is randomized between 0 to "size"
40. Write constraints to generate a n bit value such that the number of bits set is equal to number of bits that are zero

STA/Static Timing Analysis

1.
What're setup and hold times? - fundamental concept
2.
What is Metastability? How can it occur? How to resolve it? How to design metastability free circuits?
3.
How to solve timing violations if the logic cannot be split into two stages.
4.
How would you resolve hold time violations in BE? What if there isn't enough setup margin?
5.
As an RTL design engineer, how would you design to avoid timing issues? How do you get timing feedback?
6.
We have a shift register with 4 back to back Flip Flops. Each flop has +ve clock-to-q delay and +ve hold time requirement. Is it possible for this path to have hold violations?
7.
When you analyze a timing path, apart from setup and hold time, what all parameters need to be considered?
8.
For setup should you use the min or max delay? What about hold?
9.
Apple phone screen - Given the following RTL code:
systemverilog logic [63:0] c; logic [63:0] b; logic [63:0] a; assign c = a + b;

  1. How does this synthesize?
  2. How will you optimize this design for timing?
  3. Change the RTL based on your new optimized design.
  4. Given the following unit area values for each of the gates in this design:
    • Full Adder: 8 unit area
    • Half Adder: 5 unit area
    • XOR gate: 3 unit area
    • AND gate: 1 unit area
    Now answer:
    • What will be the area for the old design?
    • What will be the area for the optimized design?
    • What will be the critical timing path of the old design?
    • What will be the critical timing path of the optimized design?
10.
Amazon phone screen - How would you constrain an asynchronous FIFO?
11.
Amazon phone screen - You have a timeslice (regslice). The input data is a million bits (million registers). What would be the critical path in this design? (Ans: the regslice address select logic would either pick entry[0] or entry[1] of the regslice, but either way it would be driving select to a HUGE fanout of million flops. This would be the critical path)
12.
AMD Xilinx - Explain set_input_delay, set_output_delay, set_max_delay, and set_false_path.
13.
The STA tool reports available skew budget on critical paths. How do you use this information during clock tree synthesis? What is useful skew and when would you intentionally introduce it?
14.
How do you guarantee coherence between constraints written in CDC versus the ones written in STA? Say a clock is constrained as async in CDC and sync in STA. How do you catch that? How do timing folks mark a path as asynchronous?

Clock Domain Crossing (CDC)

1.
How to handle data crossing clock domains safely going from slower clock domain to faster clock domain
2.
How to handle data crossing clock domains safely going from faster clock domain to slower clock domain
3.
How to handle data crossing clock domains safely with LOTS of data?
4.
What CDC synchronization technique would you use for a 1-bit level signal when FRx is greater than 1.5 times FTx? (Tx - Transmitter/Sender, Rx- Receiver)
5.
What CDC technique would you use for a 1-bit pulse from slow Tx to fast Rx when pulses are infrequent?
6.
What CDC sequence would you use for a 1-bit pulse from fast Tx to slow Rx when pulses are infrequent?
7.
AMD Xilinx - Design a reset synchronizer circuit.
8.
For an n-bit signal, how do the following CDC options compare:
  • a) Recirculation-mux with a synchronized ready
  • b) FSM-based synchronized req/ack (any frequency ratio, but limited throughput)
  • c) Asynchronous FIFO (higher throughput, extra hardware)
9.
What are CDC reconvergence and divergence issues?
10.
Why can't we just place a synchronizer after the AND gate?
11.
What method do you use to determine whether a CDC violation is real?

Bus Interconnect Design (AXI/APB/Custom)

1.
How would you do a async crossing for AXI
2.
How would you merge 2 AXI interfaces (e.g. 2 master that can communicate with the same slave)
3.
How would you split an AXI interface (e.g. 1 master can communicate with 2 slaves)
4.
Combine the 2 above. I.e. 2 master that can both communicate with 2 slaves independently
5.
How would you add a register slice between a master and slave
6.
If a master can put out multiple outstanding transactions, then what are the pros and cons of issuing all transactions with same ID compared to issuing all transactions with different ID
7.
Where exactly AXI is used and how is it diff from others like APB
8.
What is meant by register slicing
9.
What is meant by split transaction
10.
Senior FPGA Engineer Question

Why might an AXI memory-mapped burst transaction hang if the target AXI-MM slave is reset during an active transaction, and what measures can be taken to prevent this situation?
11.
Senior FPGA Engineer Question

What does the following constraint indicate to the tool about the logic driven by clocks clk_a, clk_b, and clk_c?

set_clock_groups -asynchronous -group clk_a -group clk_b -group clk_c

Embedded Systems/Firmware

System Design & Architecture

1.
Design a thermostat (not just system design but also implement core functions)
2.
Design a system that monitors a sensor:
  • Block diagram of tasks and functionality
  • RTOS vs no RTOS decision
  • Data protection and passing mechanisms
  • Reliable periodic data transfer while another task sends data at a non-deterministic rate
3.
Design RTOS for safety critical system with sensor module, safety microcontroller, and motor controller
4.
Design a system that gets data from GPS at certain Hz and sends data to user on demand (store 15 mins of data, handle data loss/corruption)
5.
Design audio/video system with input, signal processing and output to display/speaker
6.
Design elevator control system
7.
Design traffic light controller with state machine and different tasks
8.
Design linked list - basic leetcode question
9.
Design stack (with an array) - basic leetcode question
10.
Apple Wireless firmware role interview - Why do we need digital communication and coding? What are the benefits?
11.
Apple Wireless firmware role interview - What is compression? Distinguish source vs. channel coding

Protocols & Interconnects

1.
Is there a pre-check needed for a bus lock?
2.
What mechanism does your block use to ensure an ack is received?
3.
Compare I2C vs. SPI vs. CAN: topology, bandwidth, addressing, reliability, use cases, and typical failure modes
4.
Does your IP use a shared or independent bus? How do you handle two requesters issuing at the same time?
5.
If a processor writes to memory, how does the communication happen? What kinds of read/write requests exist?
6.
How many cores does the fabric support? How do you make the IP configurable for different core counts?
7.
Open-ended: Connect one AHB primary to two AHB secondaries, each running on its own clock. How would you design the interconnect and CDC boundaries?

Embedded Coding Challenges

1.
Write a program to find the second largest element in an array
2.
In a sorted array, find the index of a target value. Provide O(log n) and discuss edge cases.
3.
Return the n-th node from the end of a singly linked list in one pass. State assumptions.
4.
Implement pow(x, y) for integers with fast exponentiation. Handle negative y and overflow.
5.
Write C code to detect if the system is little-endian or big-endian without stdlib helpers.
6.
Write a snippet to read an input key value with debouncing
7.
Implement a circular buffer that supports chunk read/write
8.
Scale a 5-bit value to 16-bit value (use simple bitwise ops rather than division)
9.
Implement 64-bit × 64-bit multiply on a 32-bit architecture
10.
Swap even and odd bits
11.
Get stack frame size without using assembly
12.
Apple Embedded Role Interview Question - Design an embedded system to count people entering a zoo through a turnstile. The turnstile has a single output wire connected to a microcontroller that generates a "high pulse" (low → high → low) each time the turnstile completes a full rotation (one person entering).

Questions:
  • How would you design the system to count the number of people who have entered using this signal?
  • How would you keep track of the current state of the turnstile within the microcontroller?
  • Suppose a person stops halfway through the turnstile, keeping the signal high until they move again. How would you handle this condition to ensure the count remains accurate?
13.
Write a C function that reads a single input line and prints the tokens in reverse order in the exact format shown below.
  • Delimiters: any non-alphanumeric character (treat consecutive delimiters as one split)
  • Do not use strtok/strtok_r/strsep. Use pointer arithmetic or your own scanning
  • Time: O(n)

Function to implement (one of):

void print_reversed_tokens(const char *s); // or size_t tokenize_reverse(const char *s, const char *out[], size_t max_out); // returns count, caller prints

Output format (exact):

<TOKEN_COUNT> tokens: [<TOKEN_k>|<TOKEN_k-1>|...|<TOKEN_1>]

Examples:

Input: "Hello, Apple! team-2025 :)" Output: 4 tokens: [team-2025|Apple|Hello|,|:]
14.
Order Structure (2021 Amazon Embedded SWE OA) - Write a function that takes an `OrderBatch` data structure and serializes that structure along with the orders it contains into the defined output format.

Function signature:

FuncStatus_t serialize_order(const struct OrderBatch *order_batch, const size_t out_max_length, uint8_t *out);

Input parameters:

  • const struct OrderBatch *order_batch: a pointer to an `OrderBatch` struct to be serialized.
  • const size_t out_max_length: the number of bytes allocated in `out`. Do not put serialized data outside of the pre-allocated buffer.

Output parameters:

  • uint8_t *out: the pre-allocated buffer to store the serialized `OrderBatch` data.

Return:

  • FuncStatus_t: the status of serialization effort. See enum descriptions below.

Provided non-standard type definitions:

Structures:

struct OrderBatch { uint32_t order_count; uint16_t batch_id; struct Order *orders; }; struct Order { uint16_t quantity; uint64_t order_id; uint8_t part_number[16]; char email_address[32]; /* NULL terminated string */ };

FuncStatus_t:

typedef enum { STATUS_SUCCESS, /* Conversion performed successfully, 'out' */ STATUS_INSUFFICIENT_OUTPUT_BUFFER, /* Unable to serialize the input data stru */ STATUS_NULL_INPUT /* OrderBatch struct is a null pointer, 'out' */ } FuncStatus_t;

Background:

The serialized output data will have the following format:

  • Note 1: Assume the host is little-endian.
  • Note 2: You may receive an `OrderBatch` where `order_count` is less than 1.
  • Note 3: Payload Len is the number of bytes in the serialized input including the first 10 bytes (i.e., `0xFACE` and Payload Len).

Serialized Layout:

+--------+-------------+-------------+----------+--------------------+ ... +--------------------+ | 0xFACE | Payload Len | Order count | Batch ID | Order #1 | | Order #N | | 2 B | 8 B | 4 B | 8 B | 58 B | | 58 B | +--------+-------------+-------------+----------+--------------------+ ... +--------------------+ Each Order (58 bytes total): +----------+-----------+-------------+------------------------------+ | Quantity | Order ID | Part Number | Email Address | | 2 B | 8 B | 16 B | 32 B (NULL-terminated/padded)| +----------+-----------+-------------+------------------------------+

Examples:

Example 1

Inputs:

OrderBatch(2, 42, ordersPtr) ordersPtr[0]: quantity = 8 order_id = 12 part_number = {0,1,2,3,4,5,6,7, 0,1,2,3,4,5,6,7} // 16 bytes email_address = "xyz@abc.com" ordersPtr[1]: quantity = 2 order_id = 14 part_number = {1,2,3,4,5,6,7,8, 9,0,1,2,3,4,5,6} // 16 bytes email_address = "abc@abc.com"

Output (hex dump):

CE FA 8A 00 00 00 00 00 00 00 02 00 00 00 2A 00 00 00 00 00 00 00 08 00 0C 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 78 79 7A 40 61 62 63 2E 63 6F 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 0E 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 00 01 02 03 04 05 06 61 62 63 40 61 62 63 2E 63 6F 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Returns:

STATUS_SUCCESS
15.
Starting in a bit (2021 Amazon Embedded SWE OA) - Write an algorithm to find the starting bit position of the first occurrence of a specific 32-bit, big-endian pattern (`0xFE6B2840`) within a given byte array. The input byte array is in network byte order. A crucial detail is that the pattern may or may not be byte-aligned in the input.

Function signature:

int findPattern(const uint32_t numBytes, const uint8_t data[]);

Description of function parameters:

Input parameters:

  • uint32_t numBytes: The number of bytes in the array named `data`.
  • uint8_t data[]: The byte stream of data to search.

Return:

  • -1: Returned if the given pattern (`0xFE6B2840`) is not found.
  • -2: Returned if the input data is `NULL` or the size of the `data` is insufficient to find the pattern (`0xFE6B2840`).
  • Otherwise: the pattern is found. Return the starting bit position of the pattern (`0xFE6B2840`).

Background:

Two C library functions, htonl() and ntohl(), are provided for endian conversion:

  • uint32_t htonl(uint32_t hostlong); (host to network order)
  • uint32_t ntohl(uint32_t netlong); (network to host order)

Note: Network byte order is big-endian, host byte order is little-endian.

Examples:

Example 1 - Byte Aligned:

Inputs:

numBytes: 8 data: [ 0x00, 0x01, 0xFE, 0x6B, 0x28, 0x40, 0x02, 0x03 ]

Returns:

16 // Starting position is here at bit 16

Example 2 - Non-byte Aligned:

This is the same as Example 1, left-shifted by 1 bit.

Inputs:

numBytes: 8 data: [ 0x00, 0x03, 0xFC, 0xD6, 0x50, 0x80, 0x04, 0x06 ]

Returns:

15 // Starting position is here at bit 15 // (least significant bit of the second byte of the input)

Starter Code:

#include <arpa/inet.h> int findPattern(const uint32_t numBytes, const uint8_t data[]) { // Implement your code here! }
16.
Read data from the flash (2021 Amazon Embedded SWE OA) - Implement a wrapper driver API to read an arbitrary number of bytes from a custom flash memory that can have "bad bytes" (unreliable data). You must use an existing `read_8b` API that reads exactly 8 bytes at a time and provides a bit mask indicating which bytes are invalid.

Function signature:

int read(uint8_t* buffer, uint32_t n_bytes, uint32_t offset);

Existing API - read_8b:

int read_8b(uint8_t* buf, uint8_t* mask);

Description of read_8b:

  • Reads exactly 8 consecutive bytes from flash memory into buf
  • Returns 0 on success, 1 when end of flash memory is reached
  • mask is an 8-bit value where each bit corresponds to a byte in buf
  • Bit i in mask corresponds to buf[i]
  • If mask bit i is 1, then buf[i] is a "bad byte" (invalid)
  • If mask bit i is 0, then buf[i] is valid

Mask bit mapping:

| BIT | BIT | BIT | BIT | BIT | BIT | BIT | BIT | |-----|-----|-----|-----|-----|-----|-----|-----| | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | buf[7] | buf[6] | buf[5] | buf[4] | buf[3] | buf[2] | buf[1] | buf[0] |

Example of mask usage:

If read_8b returns mask = 5 (binary 00000101):

  • buf[0] is invalid (bit 0 = 1)
  • buf[1] is valid (bit 1 = 0)
  • buf[2] is invalid (bit 2 = 1)
  • buf[3] through buf[7] are valid (bits 3-7 = 0)

Your read function requirements:

  • Read n_bytes of valid data from flash memory
  • Skip the first offset valid bytes before starting to store data in buffer
  • Skip any "bad bytes" encountered during reading
  • Return the actual number of valid bytes successfully read and stored in buffer
  • Stop reading when end of flash memory is reached (when read_8b returns 1)

Examples:

Example 1:

Input:

Data in flash memory: [0x03, 0x04, 0xa2, 0xa6, 0x10, 0x73, 0xff, 0x99] Corresponding bit mask: 0x01 (only buf[0] is bad) n_bytes: 5 offset: 1

Output:

buffer: [0x04, 0xa2, 0xa6, 0x10, 0x73] Return: 5

Example 2:

Input:

Data in flash memory: [0x83, 0x84, 0xa2, 0xa6, 0x18, 0x73, 0xff, 0x99, 0x53, 0x2a, 0x83, 0xf3, 0xab, 0xea, 0x11, 0x38] Corresponding bit masks: [0x01, 0x05] (buf[0] and buf[2] are bad in first block, buf[0] and buf[2] are bad in second block) n_bytes: 11 offset: 0

Output:

buffer: [0x84, 0xa2, 0xa6, 0x18, 0x73, 0xff, 0x99, 0x2a, 0x83, 0xf3, 0xab] Return: 11

Example 3:

Input:

Data in flash memory: [0x83, 0x84, 0xa2, 0xa6, 0x18, 0x73, 0xff, 0x99, 0x53, 0x2a, 0x83, 0xf3, 0xab, 0xea, 0x11, 0x38] Corresponding bit masks: [0x01, 0x05] (same as Example 2) n_bytes: 11 offset: 3

Output:

buffer: [0x18, 0x73, 0xff, 0x99, 0x2a, 0x83, 0xf3, 0xab, 0xea, 0x11, 0x38] Return: 10

Starter Code:

// Assume read_8b is already implemented int read_8b(uint8_t* buf, uint8_t* mask); int read(uint8_t* buffer, uint32_t n_bytes, uint32_t offset) { // Implement your code here! }

Concurrency, Synchronization & OS

1.
Explain the difference between process and thread: contrast memory layout, scheduling, IPC costs, and context-switch overhead
2.
Return two 32-bit registers (high & low) timer value as a 64-bit value (avoid race conditions)
3.
Design a method to handle floating point operations in RTOS (assume no GPU)
4.
How differently would you design thread safety mechanisms for data transfer to NAND flash vs serial port
5.
Explain exactly what happens during a context switch from Thread A to Thread B. What registers and state must be saved/restored? How does this differ from a process context switch?
6.
Your RTOS guarantees a maximum interrupt latency of 10μs. What factors contribute to interrupt latency, and how would you measure and optimize it in your system?
7.
Implement priority inheritance for a mutex. Explain the scenario where priority inversion occurs and how your implementation prevents it.
8.
Explain how a virtual address is translated to a physical address. Walk through the complete MMU process including TLB lookup and page table walk.
9.
What's the difference between a page fault and a segmentation fault? Provide scenarios where each would occur.
10.
Quickly design a circular buffer (also called ring buffer). Follow up question: Make it thread safe.
11.
Show how to determine if the stack is growing upward or downward

Memory & Drivers

1.
Explain the volatile keyword (the favourite of every company)
2.
What is a linker file?
3.
Implement memcpy and memmove (follow up on why they're different
4.
Design malloc without using system calls (from scratch)
5.
Write a simple driver for a protocol: includes length field, data, and checksum
6.
Design OTA update for all software stacks (firmware/driver/kernel to application layer) with different rollback mechanisms
7.
Apple Core OS question - Process a continuous ADC buffer (12-bit, ISR-sampled) and return valid samples.
8.
Apple Core OS question - Explain the path for a user-space read/write request to hardware

Cache & Coherency

1.
How does cache work? - explain cache lines, tags, hit/miss, write policies (through/back)
2.
Qualcomm phone screen - What are the different types of cache?
3.
Qualcomm phone screen - If you have a 32 bit address coming in; there are 128 cache lines and each cache line is 64B; how will you use the address to determine where the cache line should go for each type of these caches?
4.
Qualcomm phone screen - Suppose you have a 16-way set associative cache; you have a traffic coming in that is very uneven; it only accesses cache line 0,1,2,3 and never accesses the rest; how will you improve the overall performance of this system?
5.
How is a cache designed in terms of the locality rules? How can we design a cache to exploit spatial and temporal locality?
6.
How do we know when to replace contents in a cache? Won't it be nice to have an infinitely large cache?
7.
How to optimize access based on the cache size?
8.
How does the cache line size affect our design?
9.
What is coherency?
10.
Implement Least Recently Used algorithm, LRU Cache
11.
Is there a reason why we don't care about coherency?
12.
A core has a cache line in modified state in its L2. A non-coherent requestor wants to write a new value to the same cache line. What might be some hazards?

Advanced / Unique Questions

1.
Design a software timer handler for 10 tasks, each with its own timeout and callback, using only one HW timer
2.
Design a charging system manager for M machines and N charging points (M>N) with plug/unplug APIs based on threshold and charge level
3.
Encode/decode functions:
  • Encode a string inside an image without visually distorting it
  • Decoder retrieves the string from the encoded image

Computer Architecture / Performance modeling

Topics

Overview

Architecture interviews combine theoretical knowledge with programming skills. Expect conversational deep dives where your answers guide the questioning direction.

Cache & Memory Hierarchy

  • Memory hierarchy basics
  • Cache mapping: direct-mapped, set-associative, fully associative
  • 3C's of cache misses: Compulsory, Capacity, Conflict (+ Coherence for multicore)
  • Cache sizing and properties across levels
  • Replacement policies: LRU, FIFO, SRRIP
  • VIPT vs PIPT caches, homonyms/synonyms
  • Inclusive vs exclusive hierarchies
  • Banking schemes [Advanced - rarely covered in undergrad]
  • Coherence protocols: MESI, MOESI, MSI [Advanced]
    • Walk through state transitions for each
    • Know 'O' state (MOESI), 'E' state (MESI)
  • Coherence vs Consistency [Advanced]

Architecture Fundamentals

  • Von Neumann vs Harvard architecture
  • ISA basics (RISC vs CISC)

Pipeline & Execution

  • Hazards in in-order pipelines (structural, data, control)
  • Superscalar architectures
  • Multi-functional unit pipelines with different latencies
  • Tomasulo's algorithm - walk through it
  • Hazard mitigation in OoO pipelines
  • Precise exceptions, ROB, reservation stations [Graduate]
  • Register renaming: RAT, physical register file, ROB-based [Graduate]
  • Register recycling/reuse [Graduate]
  • Memory disambiguation, load-store queues [Graduate]
  • MSHR-cache-processor interaction [Graduate]

Branch Prediction

  • Schemes: Gshare, Gselect, Bimodal, global vs local
  • Branch Target Buffer, Return Address Stack
  • Value prediction/prefetching [Only if in your projects]

Virtual Memory & OS

  • Multi-level page tables, TLB operation
  • Page table walks on TLB misses
  • VIPT/PIPT addressing
  • Stack vs Heap
  • Processes vs threads, context switching
  • Mutex/Semaphore implementation (TestAndSet, CompareAndSwap, LL/SC)
  • Atomics, memory barriers, memory fences
  • Interrupts vs exceptions

Programming Skills

See RTL Design section for C++ and data structures coverage. Architecture interviews expect strong programming fundamentals.

Interview Dynamics

Conversational exploration: Your answers determine depth and direction.

Example flow:
  • "Explain the fetch stage" → "Mention icache" → "sizing/associativity questions" → "Mention I-TLB" → "virtual memory dive" → "Mention branch prediction" → "BTB/predictor details"

Key notes:

  • • Architecture falls under "CS" at many companies - strong programming expected
  • • EE backgrounds: Programming may be your challenge - practice extensively
  • • Syntax often matters more than algorithms
  • • Pseudocode typically acceptable

Sample Questions

  • • Shallow vs deep copy differences
  • • Graph connectivity algorithms
  • • Design swap instruction for MIPS with 2 registers but many ALUs

Analog/Mixed-Signal Design

Topics

Basic RC Circuits

  • Understand fundamentals of resistor-capacitor behavior

High/Low Pass Filters

  • Be able to sketch input/output waveforms and frequency response

Ideal Amplifier Applications

  • Know the standard op-amp configurations and when to use them

Current Mirror Applications

  • Understand biasing, matching, and typical use cases

OTA (Operational Transconductance Amplifier) Design Details

  • Miller effect and compensation
  • How to adjust the amplification factor
  • Transistor sizing and purpose of each device
  • Function of each capacitor in the design

On-the-Spot Circuit Analysis

  • Some interviewers will draw a circuit and ask you to analyze:
    • Gain
    • Output impedance
    • Feedback topology

Basic Device Physics

  • CMOS/BJT small-signal models and their tradeoffs
  • Deriving gm and rout for CMOS and BJT
  • PMOS/NMOS cross-section drawings
  • Latch-up mechanism and prevention
  • Vth vs temperature relationship
  • gm vs temperature relationship
  • Channel length modulation and pinch-off
  • Body effect

Bandgap Reference Circuits

  • PTAT (Proportional To Absolute Temperature) and CTAT concepts
  • Basic bandgap circuit topology
  • Vbe temperature coefficient

Layout Techniques for Analog

  • Common-centroid layout
  • Dummy resistors and their purpose
  • Layout techniques to reduce mismatch

Power Converter Fundamentals

  • Buck/boost converter topologies
  • Voltage control mode vs current control mode
  • CCM, DCM, and BCM operation modes
  • Inductor selection and current waveforms
  • Type-III compensation theory
  • Non-overlapping buffers (dead-time control)

GaN Power Devices

  • Advantages/disadvantages vs Silicon
  • Gate charging characteristics

LDO Design

  • Basic LDO topology
  • PMOS vs NMOS pass transistor tradeoffs
  • Compensation techniques
  • PSRR fundamentals
  • Layout considerations for feedback resistors

Key Takeaways from Interviewers

  1. Razavi's book is essential - "Design of Analog CMOS Integrated Circuits" is heavily referenced in interviews.
  2. Tape-out experience matters - New graduates without tape-out experience may be at a disadvantage; be prepared to discuss any hands-on silicon experience.
  3. Fundamentals are critical for new grads - Expect many basic questions requiring you to draw circuits and derive formulas on the spot.
  4. Ask for hints if stuck - Don't immediately say "I don't know." Ask the interviewer for a hint and work through the problem.

Recommended Study Resources

  • Razavi, "Design of Analog CMOS Integrated Circuits"
  • Gray & Meyer, "Analysis and Design of Analog Integrated Circuits"
  • Company-specific PMIC application notes
  • Practice drawing circuits and deriving transfer functions by hand

Analog Design Fundamentals

1.
Difference between analog-on-top versus digital-on-top design?
2.
Explain how to build an ADC and talk about how to choose a component
3.
Apple Wireless firmware role interview - Why do you use filtering? What are different types of filters?
4.
Apple Wireless firmware role interview - What is the relation between RSSI and TX power? How would you measure it? What tools can you use?
5.
Draw the small-signal model for CMOS and BJT. Explain the advantages and disadvantages of each.
6.
Derive gm and rout for both CMOS and BJT transistors.
7.
Draw the cross-section of PMOS and NMOS transistors.
8.
How does CMOS latch-up occur, and how do you prevent it?
9.
Draw a common-source amplifier and source follower. Derive the transfer function using small-signal analysis.
10.
What is the relationship between threshold voltage (Vth) and temperature?
11.
What is the relationship between transconductance (gm) and temperature?
12.
Explain channel length modulation and pinch-off.
13.
Explain body effect and its impact on circuit performance.
14.
Explain PTAT and CTAT. How are they used in bandgap reference design?
15.
Draw a basic bandgap reference circuit.
16.
What is the temperature coefficient of Vbe?
17.
Why does a cascode current mirror have higher accuracy than a simple current mirror? Derive the formula.
18.
How do you reduce the headroom requirement of a cascode current mirror?
19.
What is the input resistance of a current mirror?
20.
Explain common-centroid layout technique and when to use it.
21.
What are dummy resistors and why are they used in analog layout?
22.
How can layout techniques reduce mismatch in matched devices?

MOSFETs & Power Electronics

1.
Explain LDO regulator vs switch mode - pros, cons, and operation of both.
2.
Explain how to improve feedback regulator on linear regulator.
3.
What are the advantages and disadvantages of using GaN instead of Silicon for power transistors?
4.
Draw the gate voltage vs. time curve when charging a GaN transistor. Use this to explain GaN's advantages over Si.
5.
Draw a basic buck converter. Explain voltage control mode vs current control mode.
6.
Draw the voltage waveform at the switch-side terminal of the inductor over time.
7.
If the voltage at the switch-side of the inductor exceeds the input voltage and causes reverse current, how do you handle it?
8.
How do you compensate a buck converter? Why use Type-III (third-order) compensation?
9.
What determines whether a converter operates in CCM, DCM, or BCM? Explain Rcrit.
10.
Draw the inductor current waveform over time. How do you select the inductor value? Write out the formula.
11.
Why use non-overlapping buffers (dead-time control) in power converters?
12.
Draw a basic LDO circuit.
13.
What are the differences between using PMOS vs NMOS as the pass transistor in an LDO?
14.
How do you compensate an LDO for stability?
15.
What layout considerations are important for feedback resistors in an LDO?
16.
Explain PSRR (Power Supply Rejection Ratio) in LDOs. How do you improve it?

Amplifiers & Filters

1.
Given an inverting op-amp with R₁ || C₁ at the input and R₂ || C₂ as feedback, describe the operation of this circuit.
2.
Sketch the transfer function (Vout/Vin in dB) versus frequency for this circuit. Do not use equations—reason through the behavior qualitatively.

Follow up Q1 - What happens to the transfer function at extremely high frequencies? What determines the final rolloff?
Follow up Q2 - What causes the additional pole and zero beyond those set by R₁C₁ and R₂C₂?
Follow up Q3 - What potential issues could arise from this circuit topology? Consider what the Bode plot implies about noise performance.
💡 Click to reveal solution
Solution: Transfer function sketch for inverting op-amp circuit

Bode Plot Walkthrough

FrequencyWhat's HappeningGain
LowCaps open → resistors dominateR₂/R₁
MidCaps short → capacitors dominateC₁/C₂
HighOp-amp runs out of gain, rolls offDropping
Very HighEverything is shorted, passive divider~0 dB

Key Answers

Follow up Q1 - What happens at extreme high frequency?

Both caps are shorts. Op-amp can't keep up. Circuit becomes a passive network → gain flattens near 0 dB.

Follow up Q2 - What causes the extra zero (z₂)?

The op-amp's open-loop gain is falling. When it can no longer enforce the virtual ground, the rolloff stops and a zero appears. It's the op-amp GBW interacting with the feedback network.

Follow up Q3 - Limitations?

The peaking region (C₁/C₂) amplifies noise. Higher mid-band gain = worse noise performance.

3.
Draw telescopic and folded-cascode opamp structures. Compare their advantages and disadvantages. Derive gain and rout for both. Analyze the poles and zeros.
4.
For a basic two-stage opamp (7-transistor topology): derive gain, rout, and input/output swing. What causes systematic offset vs random offset?
5.
Explain Miller compensation. Draw the Bode plot and explain why it provides stability. Label GBW, phase margin, and gain margin. How do you make tradeoffs between these parameters?
6.
Draw an input rail-to-rail opamp structure. What are the design challenges?
7.
How do you improve slew rate in an opamp?
8.
How do you improve gain in an opamp? Explain gain boosting techniques.

Physical Design

Physical Design Fundamentals

1.
What is use of multi bit cells in physical design?
2.
What are techniques to reduce dynamic power consumption?
3.
What is the use of clock gating?
4.
How will you reduce power through clock gating?
5.
What are advantages and disadvantages of clock gating?
6.
What are the different techniques to reduce static ir drop and dynamic ir drop?
7.
What is disadvantage of dcap cells?
8.
How will you apply derates for launch clock and capture clock?
9.
If the cells are sitting far will you apply lesser derates or more derates
10.
What are checks done after synthesis?

Study Resources & Leetcode Numbers

🌐 Online Resources we use

  1. EDA Playground
    Online Verilog/SystemVerilog simulator
  2. montychoy.com
    Ultimate list of hardware engineering internship interview questions
  3. HDLBits
    Leetcode but RTL Ver.1 - I personally like this one better
  4. Chipdev
    Leetcode but RTL Ver.2
  5. srnvl/Embedded_SWE_Prep
    Embedded SWE Prep Github Repository
  6. FIFO Depth Calculation Made Easy (PDF)
    FIFO depth calculation guide
  7. RFIC Interview Questions
    RFIC interview questions
  8. Asynchronous FIFO Verilog Code
    Asynchronous FIFO Verilog Code
  9. RTL Design Practice Problems
    More RTL Interview Questions
  10. UVM CookBook
    Siemens Verification Academy UVM CookBook
  11. Verification Guide (UVM TestBench architecture)
    Coding UVM components
  12. FSM-Finite State Machine-Questions
    Some challenging FSM Questions
  13. VLSI and hardware engineering interview questions
    A painstakingly long question list (worth skimming through, there are few interesting questions)
  14. DV Interview Prep Guide
    Github repository for DV Interview Prep Guide
  15. Example Interview Questions for a job in FPGA, VHDL, Verilog
    From nandland youtube channel
  16. Chipverify
    Good resource for SystemVerilog/UVM/Verification
  17. XOR Trick
    XOR Trick that is good to know for interviews
  18. Electronics Interview Questions: STA part 1
    The typical STA interview question
  19. Electronics Interview Questions: STA part 2
    The 2nd typical STA interview question
  20. What is AXI
    Dillon Huff - Good Youtube channel for AXI protocol basics
  21. Quick Review of Computer Architecture
    Good Youtube channel for Computer Architecture for quick review of concepts
  22. VLSI Verify
    Verilog Interview Questions
  23. Technical Internship Interview Questions
    More interview questions
  24. Computer Architecture Youtube Playlist
    High Performance Computer Architecture (Udacity)
  25. Ultimate Folder
    Misc. list of VLSI interview questions

🔢 Leetcode Numbers

Most commonly asked LeetCode problems for Hardware Interview.

Embedded/Firmware Tips

We recommend using C and Python to solve these LeetCode questions. If you have to choose only one language, use C.

Stack & String Processing

Sorting & Searching