All Hardware Types Interview Questions

Digital Design/RTL

Topics

RTL Design involves the design/integration of digital logic modules that go into a chip or IP.

Topics Covered

Verilog Syntax and Programming
  • Try to be familiar with all the synthesizable constructs used in Verilog
    • Ex. Blocking vs Non-blocking statements - trivial, but it is surprising how often it comes up (and how often interviewees stumble)
  • Casex, Casez and the variations
A very common question pattern:
  • Write Verilog code to do 'something' (Ex. BCD counter)
  • What do the different sections of your code synthesize to - very important to know this, this is where Verilog interviews are different from other coding interviews. For example, always @ (posedge clock) would typically result in a logic involving Flip-Flops
  • Make some kind of changes to the input/output (Ex. Output should be triggered by another signal, Output should be seen with a delay of one cycle, Synchronous vs Asynchronous reset, etc)
Digital Design Concepts
  • Basic logic design - questions typically found in undergraduate digital electronics courses
    • Universal gates, design different gates using a MUX
  • Designing a counter - commonly asked question
  • State Machine design - major subsection:
    • Design state machines (Draw the state diagram) for the given scenario
    • Mealy vs Moore model - very very important - learn how to come up with both models any scenario
    • Typically, state machine questions are converted to Verilog questions (i.e. write verilog code for the state machine you designed). It is good to practice a template for state machines (both models) so that you can convert any state diagram to code easily
    • Another variation here could be binary encoding vs one-hot encoding for the states - each has pros and cons
Static Timing Analysis
  • What the different terms (setup time, hold time, etc) mean
  • Given a Flop - Logic - Flop configuration:
    • Is there setup or hold violations
    • If there are any, how to fix them
    • There are multiple variations of this question, and they are well documented in any STA book or tutorial - spend time to master this
  • Time borrowing (Especially if you are a grad student)
  • Metastability and how to deal with it
  • Know the basics for all interviews, but also learn about specifics like MTBF in case of clocking-related roles
C Programming

Basic C programming, up to the complexity of Linked Lists, could be asked in both RTL design and verification roles. Sorting is commonly asked.

Things to Note

  • Prepare a little above and below the VLSI stack for an RTL design role interview:
    • Up the stack: A theoretical understanding of Computer Architecture, including typical 5-stage pipelines, Memory hierarchy, and Branch prediction
    • Down the stack: MOSFETs and CMOS design, specifically knowing about CMOS inverters and basic logic gates at the transistor level, though these are not asked as often
  • If you have previous experience in RTL design:
    • You should be able to identify key blocks from your previous design (e.g., Content Addressable Memories, FIFOs) to demonstrate depth of experience
    • You should also be able to quickly implement these structures, writing basic working HDL, even if skipping edge cases
  • Understanding basic Valid-Ready handshaking and how to design for it, as it's used in almost every digital block
    • Read about AXI protocols (channels and transactions) if there's experience with AXI blocks
    • Backpressure: how to tell a block to stop sending data. While valid-ready is common, other protocols like valid-afull and valid-credit exist
  • If you are applying for an FPGA company or team, make sure you understand FPGAs well, and how they are different from ASICs
    • And also how to write better RTL for FPGAs (they are register rich, have hard DSP blocks, etc - how does this impact your design decisions?)

RTL Design Fundamentals

1.
What is the difference between wire and reg?
2.
What's the difference between a Mealy and Moore machine? How will you convert your Mealy machine to Moore? What's an advantage of a Mealy machine?
3.
Difference between blocking and non-blocking assignments? What's the difference between always_ff and always_comb?
4.
What is the difference between bit and logic?
5.
What are the tradeoffs between the FSM v/s shift register approach?
6.
What is pipelining? Explain the 5 stages of pipelining.
7.
If we have a reset tree that's too big and we can't meet reset deassertion timing, what can we do in this case?
8.
NVIDIA phone screen - Design a XOR gate with 2:1 MUX.
9.
Why are latches discouraged?
10.
How do you do synchronous deassertion of reset?
11.
What are the considerations before designing the microarchitecture?
12.
Verilog Timing Analysis Question:

What is the value of A & B at various times of simulation - 0 time_unit, 1 time_unit, 2 time_unit, 3 time_unit?

initial begin
A = 0;
B = 1;
end
// Section 1.1:
always @(posedge clk) begin
A <= 2;
end
always @(posedge clk) begin
B <= A;
end
// Section 1.2:
always @(posedge clk) begin
A = 2;
end
always @(posedge clk) begin
B <= A;
end
// Section 1.3:
always @(posedge clk) begin
A <= 2;
end
always @(posedge clk) begin
B = A;
end
// Section 1.4:
always @(posedge clk) begin
A = 2;
end
always @(posedge clk) begin
B = A;
end
13.
Given the initial values a=1, b=0, c=0 just before a rising clock edge, compute the final values of a, b, c after one clock for each snippet.

Variant A – Non-blocking (<=)
always_ff @(posedge clk) begin a <= b; b <= c; c <= a; end
Variant B – Blocking (=)
always @(posedge clk) begin a = b; b = c; c = a; end
What are the final values of a, b, c for Variant A and Variant B? (Show your steps by drawing a waveform or a small timing table.)
14.
Once you write RTL, how do you make sure that it's synthesized alright? How do you make sure that there's no unintended latch or no combinational block?
15.
What is wrong with this code?
always @(posedge clk) begin : pipeline
    Q1 = in;
    Q2 = Q1;
    Q3 = Q2;
  end
💡 Click to reveal solution
Solution: Uses blocking assignments inside a clocked block, so the current input propagates through Q1→Q2→Q3 in the same cycle. That doesn’t model pipeline flops (each stage should capture theprevious cycle’s value). Use non-blocking (<=) oralways_ff to model sequential behavior.
16.
How do you write a fix priority arbiter?
17.
Meta phone screen - How do you write a round robin arbiter?

RTL Coding Challenges

1.
Let's say we have a 3 bit signal coming in, we want the output to be high when the input is 1, 2 and 4. How will you design that? (very simple question to start off)
2.
Design a circuit that can detect 10X011 (X can be either 0 or 1)
3.
Swap 2 variables without using a temporary variable.
4.
Meta phone screen - You have valid / ready interface. How will you add a pipe stage.
5.
AMD Xilinx - Write Verilog code for a 2-bit up/down counter. Inputs are clk, async rstb, up, down and output is count. Don't jump into writing RTL first, draw truth table first for ALL possible input combinations and start coding after that. Drawing circuit diagram after helps as well.
6.
AMD Xilinx - An input bit pattern is coming in. Determine at any point if the number is divisible by 5 or not.
7.
AMD Xilinx - Do a non-state machine approach for the previous problem (divisible by 5).
8.
Apple RTL Role Interview Question: Find if a stream of bits is divisible by 5 for infinite length, 1 bit per cycle
💡 Click to reveal solution
Solution: https://electronics.stackexchange.com/questions/345189/vhdl-interview-question-detecting-if-a-number-can-be-divided-by-5-without-rema
9.
Design a block with inputs d and clk and output match. Detect a pattern of 1101. How would you make this block configurable to detect all 4-bit patterns?
10.
Write RTL for a 512:1 multiplexer.
11.
NVIDIA phone screen - You have a 7 bit vector coming in a[6:0]. How would you find out the number of 1s in this vector?
12.
Write RTL - You've 1 bit per cycle data coming in. Take that input and make it 3 bit. Detect whether the content of the 3 bit register can be divided by 3 or not. After reset, the register value will be 0.
13.
Write RTL for a 128-bit memory with 32-bit data interface. What happens if both wr_en and rd_en are 1 in the same cycle?
14.
Create a data buffer - type and number of samples are configurable. Depth of buffer can change (could be say 5, 10 or 100). Samples gathered at the same time periodically (every 10 clks). Read can happen at any time. We want the latest X samples to be available. We can stall writes, can't stall reads.
15.
Amazon phone screen - You have a 10-bit tag coming in. Each 10-bit tag has to be assigned to a unique 4-bit ID and sent downstream. When the 4-bit ID comes back as response, the 10-bit tag has to be returned to the master. There is a valid and ready on the incoming tag side and a valid and ready on the downstream side to send the tag out. Design this block (Had to write RTL to design this)
16.
Write RTL for an SRAM (behavioral model) with parameterizable depth and width.
17.
Draw a circuit to detect even number of 1s.
18.
Determine Sum and Carry for Half Adder and Full Adder. (Yep Even this. Actually took me a minute to derive Cout for a full-adder)
19.
RTL for determining number of 1s in an input bit-stream.
20.
Write a verilog module which sorts the values the given memory, so the lowest value is at address 0x00 and the highest value is at 0xFF. Memory has one read port and one write port.
21.
Microsoft Interview Question (asked to code this up in 45 minutes) - One value arrives each clock. Using a stack-based approach, track the second-largest value observed so far. When out_valid is asserted, output (latest_value, second_largest_so_far). Example stream: 1, 4, 5, 2, 3 → on first out_valid: latest=3, second_largest=4; on next out_valid: latest=2, second_largest=4. Duplicates count as the second-largest value. State assumptions (bit width, signed/unsigned, duplicate handling, reset behavior, latency)
💡 Click to reveal solution
Solution: Maintain two stacks, one for the input order (S1) and one to track second largest (S2) in each cycle. Also one additional flop to track largest, let's call this L1
Every new input:
1. Push input into S1
2.
  If input >= L1
    Push L1 into S2
    Push input into L1
  else if input >= S2
    Push input into S2
  else
    Push head S2 into S2
22.
Please write RTL for the following register design.
Clk_50Mhz – 50 MHz input clock
Reset_n – Active-low reset
Data_in_1..4 – four 16-bit data inputs
Data_out – 16-bit data output
WE – write enable for selected input
Address – 2-bit selects which input to write
23.
Design a circuit that counts number of 1s in a[3:0] if the only available component is a full-adder
a3 a2 a1 a0 || | | | ________________________ || | ||_______________________| | count
24.
Design a component to rotate a 4x4 byte array & write out the output. Data comes in 4 cycles.
Input: One row of bytes per cycle (4 inputs for the array):
Cycle 0 : B3, B2, B1, B0 Cycle 1 : B7, B6, B5, B4 Cycle 2 : B11, B10, B9, B8 Cycle 3 : B15, B14, B13, B12
Output: 1 rotated column of bytes per cycle:
Cycle 0 : B12, B8, B4, B0 Cycle 1 : B13, B9, B5, B1 Cycle 2 : B14, B10, B6, B2 Cycle 3 : B15, B11, B7, B13
25.
Create this Fibonacci generator system -
___ Clk --| | Rst --| |--fib[15:0] Next --|___|
Next is a 1 bit pulse. When next is 1, generate next number in sequence. Hold previous value until next is 1.
28.
Design this system:
We get an async event.
If ≥1 event: Count as 1 event
If 0 event: Miss.

Requirements:
  1. Create system to raise error flag if we have 2 misses every 40 cycles
  2. Modify to raise error flag for 2 misses in any 40 cycle window
29.
How can a fixed 16-bit adder (black box) be used as two independent 8-bit adders? You may add external logic on inputs/outputs but cannot modify the adder internals. Normally it computes a[15:0]+b[15:0] → s[15:0]; now requirea[7:0]+b[7:0] → s[7:0] and a[15:8]+b[15:8] → s[15:8] using that single adder. State assumptions (single-cycle vs multi-cycle/time-multiplexed, latency allowed, carry behavior).
💡 Click to reveal solution
Solution: At the input to the lower adder, force a[7] and b[7] to 0. Then at the output, S0[7] = S0_from_adder[7] ^ a[7] ^ b[7]. Upper sum remains as is.
30.
Design a parameterized encoder, which converts an N bit one hot signal to a binary value, specifying the location of the set bit. It should not synthesize with priority and you can assume a "don't care" output for invalid inputs.
31.
Design hardware to implement the IIR filter
H(z) = Y(z)/X(z) = (1 + 2z-1 + z-2) / (1 − 1.5z-1 + 1.5z-2).
Specify structure (Direct Form I/II or transposed), fixed-point widths/quantization, overflow/saturation behavior, latency, and reset strategy.
32.
What's the minimum multipliers required to implement Y= (AX^4)+(BX^3)+(CX^2)+(DX)?
💡 Click to reveal solution
Solution: 4 x[x{x(ax+b)+c}+d]

STA/Static Timing Analysis

1.
What're setup and hold times? - fundamental concept
2.
What is Metastability? How can it occur? How to resolve it? How to design metastability free circuits?
3.
How would you resolve hold time violations in BE? What if there isn't enough setup margin?
4.
As an RTL design engineer, how would you design to avoid timing issues? How do you get timing feedback?
5.
We have a shift register with 4 back to back Flip Flops. Each flop has +ve clock-to-q delay and +ve hold time requirement. Is it possible for this path to have hold violations?
6.
When you analyze a timing path, apart from setup and hold time, what all parameters need to be considered?
7.
For setup should you use the min or max delay? What about hold?
8.
Apple phone screen - Given the following RTL code:
systemverilog logic [63:0] c; logic [63:0] b; logic [63:0] a; assign c = a + b;

  1. How does this synthesize?
  2. How will you optimize this design for timing?
  3. Change the RTL based on your new optimized design.
  4. Given the following unit area values for each of the gates in this design:
    • Full Adder: 8 unit area
    • Half Adder: 5 unit area
    • XOR gate: 3 unit area
    • AND gate: 1 unit area
    Now answer:
    • What will be the area for the old design?
    • What will be the area for the optimized design?
    • What will be the critical timing path of the old design?
    • What will be the critical timing path of the optimized design?
9.
Amazon phone screen - How would you constrain an asynchronous FIFO?
10.
Amazon phone screen - You have a timeslice (regslice). The input data is a million bits (million registers). What would be the critical path in this design? (Ans: the regslice address select logic would either pick entry[0] or entry[1] of the regslice, but either way it would be driving select to a HUGE fanout of million flops. This would be the critical path)
11.
AMD Xilinx - Explain set_input_delay, set_output_delay, set_max_delay, and set_false_path.
12.
The STA tool reports available skew budget on critical paths. How do you use this information during clock tree synthesis? What is useful skew and when would you intentionally introduce it?
13.
How do you guarantee coherence between constraints written in CDC versus the ones written in STA? Say a clock is constrained as async in CDC and sync in STA. How do you catch that? How do timing folks mark a path as asynchronous?

RAM Design

1.
Design a RAM with a write port and a read port. The write port is 16 bits wide and the read port is 16 bits wide. The RAM should be able to store 16 words.
2.
What is the difference between Open page and Closed page policy?

Low-Power & Power Intent (UPF)

1.
Latch vs FF for clock gating. Which is preferred and why?
💡 Click to reveal solution
Solution: Negative level-sensitive latch is preferred.

Why:
1) Area & Power: Latches are smaller/lower power than flops; savings multiply across thousands of ICGs.
2) Timing slack: A negative latch is transparent when clk=0, giving the enable nearly a full cycle to meet timing. A negedge FF gives ~½ cycle, tighter and riskier.
3) Glitch-free gating: In a standard ICG, latch output is ANDed with clk. While clk=0, the AND output stays low so enable changes can’t glitch the gated clock; latch closes at clk↑ and propagates a stable enable.
Using a posedge FF risks races at the AND; a negedge FF offers no benefit over the latch but costs more area/power.
2.
How did you implement power-saving in your design?
3.
How do you handle signals going from an ON to OFF domain and vice versa? How do you manage isolation, does it matter?
4.
AMD Xilinx - If clock gating logic is moved into the DEN (Data Enable), will this cause LEC (Logical Equivalence Check) to fail or not? Why?
5.
Static and dynamic power, what are some ways to reduce both.

Advanced RTL Topics

1.
Different way of arbitrating for resources? Difference between Find First and Round Robin arbiters. What are some pros and cons, why you might want to use find first and why you might want to use a round robin? What about the HW cost tradeoffs? RTL code for these?
2.
NVIDIA phone screen - Open ended question: You have to design a memory controller block. It is an SRAM memory that you are accessing. How would you design the block?

Things to think about:
a) What interfaces would you use? How many?
b) How would you arbitrate between requests?
c) How many ports does memory have?
d) What other blocks your memory controller should have?
3.
Apple phone screen - In a system how will you prevent RAW hazard?

Design Verification

Topics

Design Verification involves writing tests for digital logic modules to cover all the different use-cases.

Topics Covered

Digital Design Concepts (Same as in RTL Design)
C Programming (Same as in RTL Design)
Scripting
  • Be very familiar with one scripting language
    • Industry uses a lot of TCL and Perl, but Python works well for entry level interviews
  • Questions usually involve large text files
    • So read about how to parse information from text files, and maybe some efficient ways to do it

Object Oriented Programming (OOP)

  • Since a lot of System Verilog constructs are OOP based, understanding all the OOP concepts is important
    • Sometimes this is completely skipped, but good to know anyway
  • Be able to come up with code examples for each of the OOP principles (like Classes, Objects, Abstraction, Encapsulation, etc)
  • Inheritance and Polymorphism are usually favorites (Finding out which class different objects belong to)

Things to Note

  • DV interviews are usually challenging because you are expected to also be a good designer. So I suggest preparing all the content of RTL design too.
    • Verilog design questions might be asked (Although they tend to be simpler)
  • If you have previous DV experience (even in an internship), then you also need to work on UVM and other DV concepts that you might have used
    • In this case, the RTL Design part will probably be skipped

Verification Methodologies & Strategies

1.
How would you test a memory black box?
2.
How do you verify a Round Robin arbiter?
3.
How would you test a FIFO?

UVM

1.
What is a UVM factory? What is the difference between type_id::create() and new()?
💡 Click to reveal solution
Solution:

UVM factory is used to register classes and instances with a factory. e.g. uvm_object_utils, uvm_component_utils, type_id::create("<inst_name>"). New() is the default constructor of SV.

2.
What are the two different types of override that UVM supports?
💡 Click to reveal solution
Solution:

Class override and instance override

3.
What is the method that is called to run a test?
💡 Click to reveal solution
Solution:

run_test()

4.
What is a virtual interface?
💡 Click to reveal solution
Solution:

Pointer to the actual interface. uvm_config_db is used for setting and getting the pointer to the actual interface.

5.
What is a virtual sequencer?
💡 Click to reveal solution
Solution:

A sequencer that contains pointers to all other sequencers.

6.
What are the different phases of a component? What is the check_phase used for? Which phases are task(s) and which are function(s)?
💡 Click to reveal solution
Solution:

Most commonly used phases: build_phase(), connect_phase(), run_phase(), check_phase(). Check_phase can be used to check size of queues and flag an error if queue is expected to be empty at the end of a simulation. Run_phase() is a task while the rest are funcitons.

7.
Draw a testbench diagram and explain each component. Explain the handshake between a sequencer and a driver.
💡 Click to reveal solution
Solution:

Testbench Components:

  • Test
  • Environment
    • Scoreboard
    • Predictor (optional)
    • Agent(s)
      • Monitor
      • Sequencer
      • Driver
  • Virtual sequencer (optional)
  • Coverage monitor (applicable if you do class based coverage collection)
  • Other models necessary in your testbench (e.g., memory model, etc.)

Sequencer-Driver Handshake: The sequencer generates transactions and sends them to the driver through a TLM (Transaction Level Modeling) port. The driver requests transactions from the sequencer using get_next_item() or try_next_item(), and when done processing, calls item_done() to signal completion back to the sequencer.

8.
Write a monitor to capture a transaction. Transaction starts when vld is asserted and ends when vld is de-asserted. Transaction needs to be sent to scoreboard once vld is de-asserted.
💡 Click to reveal solution
Solution:

Note: All but 2 of my screening interviews had this question. The interviewers mainly want to see whether or not you understand some key components of a monitor. Interviewers don't care about syntax.

// parameterization of intf is not necessary. you can add clk and rst as signals in the intf itself interface intf (logic clk, logic rst); logic vld; logic [7:0] data; // clocking block is not necessary but if you understand it then it'll look good to interviewer // if a clocking block is used by you, interviewer will ask to explain what it is and what it does clocking mon_cb @(posedge clk); input vld; input data; endclocking endinterface class trxn; logic [7:0] data_q[$]; endclass class mon extends uvm_monitor; uvm_analysis_port m_port; virtual intf m_vif; // vif = virtual intf function void build_phase(); m_port = new(); uvm_config_db(get(), "blah", "blah", m_vif); // getting address of actual interface endfunction function void connect_phase(); // analysis port will be connected to its subscribers in the environment since the subscribers are not within scope of this monitor endfunction task run_phase(); // not necessary to fork but good coding practice fork mon_intf(); join_none endtask task mon_intf(); trxn data_trxn; forever begin @(vif.mon_cb); if (!prev_vld && vif.mon_cb.vld) begin data_trxn = trxn::type_id::create("data_trxn"); end if (vif.mon_cb.vld) data_trxn.data_q.push_back(vif.mon_cb.data); if (prev_vld && !vif.mon_cb.vld) begin m_port.write(data_trxn); // global write to subscribers of this analysis port end prev_vld = vif.mon_cb.vld; end endtask endclass
9.
You have 5 parallel tasks running. Once any 4 tasks complete, the 5th remaining task should be terminated. How would you implement this behavior using SystemVerilog?
10.
Which phase are task, which are function?
11.
What phase is top-down and what is bottom up?
12.
What is difference between uvm_config_db and uvm_resource_db?
13.
Why do we need uvm_factory? (for easy type overriding without modifying base code)
14.
What is a virtual sequence/sequencer, why do we need?
15.
How to pass a virtual interface through uvm_config_db?
16.
What is UVM reactive sequence, my understanding is stimulus will be based on response from driver, put_response()/get_response(), also note you need to set_id_info() to guide it from driver into correct sequence.

SystemVerilog Testbench & Constraints

1.
Question from Meta DV role - Write a constraint for 4 monkeys having to share 10 bananas and make sure every monkey gets atleast 1 banana
💡 Click to reveal solution
Solution:
rand int unsigned fruits_per_monkey [NUM_MONKEYS]; constraint fruits_per_monkey_c { fruits_per_monkey.sum() == 10; foreach(fruits_per_monkey[id]) { fruits_per_monkey[id] != 0; } }
2.
Question from NVIDIA panel - You are generating a 32-bit random variable in SystemVerilog. Write a constraint to ensure that the randomized value does not contain more than five consecutive bits that are all 1's or all. Solve without using 'unique' or 'post_randomize' keyword. 0's.

In other words, within the 32-bit value, there should be no sequence of 5 contiguous bits that are all set or all unset.
💡 Click to reveal solution
Solution:
rand bit [31:0] arr; constraint c { foreach (arr[i]) { if (i <= 27) { !(&arr[i+:5]); !(&(~arr[i+:5])); } } }

Explanation:

  • foreach (arr[i]) - Iterates through each bit position
  • if (i <= 27) - Only check positions where we can fit 5 consecutive bits (positions 0-27)
  • !(&arr[i+:5]) - Ensures no 5 consecutive 1's (AND of 5 bits is 0)
  • !(&(~arr[i+:5])) - Ensures no 5 consecutive 0's (AND of inverted 5 bits is 0)
3.
You are given a queue of integers:

int array[$] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};

Write SystemVerilog constraints to divide the elements of this queue into three new queues (q1, q2, and q3) such that:

  1. Every element from the original queue appears in exactly one of the three new queues
  2. All three queues together contain unique elements (no duplicates)
  3. Each queue must have at least one element
  4. You cannot use post_randomize() to perform the split — it must be handled entirely within the constraint block

How would you approach this problem?
💡 Click to reveal solution
Solution:
module test(); class temp; int array[$] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; rand int idx[15]; rand int q1[$], q2[$], q3[$]; constraint c1 { // Total elements must match q1.size() + q2.size() + q3.size() == array.size(); // Size bounds q1.size() < 10; q2.size() < 10; q3.size() < 10; // Each queue must have at least one element q1.size() > 0; q2.size() > 0; q3.size() > 0; // Generate unique shuffled indices unique {idx}; } constraint assign_to_queue { // Assign elements using shuffled indices foreach(q1[i]) { q1[i] == array[idx[i]]; } foreach(q2[i]) { q2[i] == array[idx[i + q1.size()]]; } foreach(q3[i]) { q3[i] == array[idx[i + q1.size() + q2.size()]]; } } endclass initial begin temp t = new(); assert(t.randomize()); $display("Indices: %p", t.idx); $display("Queue 1: %p", t.q1); $display("Queue 2: %p", t.q2); $display("Queue 3: %p", t.q3); $finish; end endmodule

Explanation:

  • idx[15] - Array of shuffled indices to randomize element assignment
  • unique {idx} - Ensures all indices are unique (no duplicates)
  • q1.size() + q2.size() + q3.size() == array.size() - Ensures all elements are distributed
  • q1.size() > 0; q2.size() > 0; q3.size() > 0 - Each queue has at least one element
  • The assign_to_queue constraint uses the shuffled indices to assign elements from the original array to the three queues
4.
Write SystemVerilog constraints for an integer array with 10 elements such that:

  1. Exactly three elements are identical (i.e., one value appears three times)
  2. All the remaining seven elements are unique

How would you implement this constraint?
💡 Click to reveal solution
Solution:
class aa; rand byte unsigned val[10], val3; constraint s1 { foreach(val[i]) { val[i] < 10; // Every value except val3 must appear only once val[i] != val3 -> val.sum() with (int'(item==val[i])) == 1; } // val3 must appear three times in the array val.sum() with (int'(item==val3)) == 3; } endclass

Explanation:

  • val[10] - Array of 10 random unsigned bytes
  • val3 - The value that appears exactly 3 times
  • val[i] < 10 - Constrains each element to be less than 10
  • val[i] != val3 -> val.sum() with (int'(item==val[i])) == 1 - If an element is not equal to val3, it must appear exactly once
  • val.sum() with (int'(item==val3)) == 3 - val3 must appear exactly 3 times in the array
5.
Write a constraint to make sure every even index in an array is even and every odd index is odd.
💡 Click to reveal solution
Solution:
constraint x { foreach (arr[i]) { (i & 1'b1) == (arr[i] & 1'b1); } }
6.
Write a constraint where you have a 32 bit value bit [31:0] val where you’d want to randomize this to where every randomization would only allow 2 bits to differ from the previous randomization.
7.
SystemVerilog Multithreading Question:

What are some use cases for wait fork and disable fork? How would one kill thread_2() after fork-join_any below finishes and then continue waiting for remaining threads to finish? You can make changes to the code below if needed.

fork
thread_1();
thread_2();
thread_3();
join_any
// how would one kill thread_2() after fork-join_any finishes?
Answer:
fork
begin : tid_1
thread_1();
end
begin : tid_2
thread_2();
end
begin : tid_3
thread_3();
end
join_any
disable tid_2;
wait fork;
8.
SystemVerilog OOP Virtual Keyword Question:

Upcasting vs downcasting. What does "virtual" keyword do in SystemVerilog?

class base;
function void print();
$display("INSIDE BASE \\n");
endfunction : print
endclass : base
class derived extends base;
// what are the prints when virtual keyword is 1) missing and 2) present?
<virtual> function void print();
$display("INSIDE DERIVED \\n");
endfunction : print
endclass : derived
initial begin
base b_h;
derived d_h;

b_h = new();
b_h.print();
//what's the print?

d_h = new();
d_h.print();
//what's the print?

b_h = d_h;
b_h.print();
//what's the print?
end
9.
SystemVerilog Thread Output Question:

What will be the output of the following code block? Update the code block to print the lines below (in any order).

Desired Output:
• thread id: 0
• thread id: 1
• thread id: 2

// Original Code:
task automatic run_thread(int thread_id);
$display("thread id: %0d", thread_id);
endtask

for (int i = 0; i < 3; i++) begin
fork
run_thread(i);
join_none
end
// Updated Code:
task automatic run_thread(int thread_id);
$display("thread id: %0d", thread_id);
endtask

for (int i = 0; i < 3; i++) begin
fork
begin
int j = i;
run_thread(j);
end
join_none
end

wait fork;
10.
SpaceX Constraint Question:

// Question:
class even_odd;
// array of integers for which we want a
// patterned series of N even values followed by
// N odd values
rand int array[];
rand int N;
// constraint to enforce the even-odd
// pattern
constraint pattern;
endclass: even_odd
/*
* Create a constraint which will ensure that
* the array is randomized according
* to our pattern
* EX (N = 2): {0, 2, 3, 5}
* EX (N = 5): {18, 8, 18, 24, 98, 9, 15, 33, 71, 1}
* EX (N = 1): {4, 1, 22, 9, 36}
*/
constraint even_odd::pattern{
}
11.
Write constraint to generate a random number with only 5 bits set and consecutively set for 80% of the time
12.
NVIDIA Constraint Question: Randomize three poker cards each with different type and color.
13.
NVIDIA Constraint Question: Create a constraint for an int array with 10 elements. Value is 1 to 10. 2 of the elements will have the same number and the rest will all have different numbers, the index of the 2 same elements also have to be randomized.
14.
AMD DV Interview Question: Write constraint for an integer array with 10 elements such that exactly 3 of them are same and rest are unique
15.
YoE 4-5 Meta DV Interview question - Constraint: 2D image 320×240, each pixel is 16 bits. Constrain each pixel such that pixel is less than the sum of its 4 neighbours (top, bottom, left, right).
16.
More SystemVerilog Constraint Problems:

1. Write constraint such that sum of arr[10] is 100 without using .sum method
2. Write constraint for an integer array with 10 elements such that exactly 3 of them are same and rest are unique
3. Write a constraint to randomize 3x3x3 array with unique elements
4. Write constraints to generate MxN matrix with each element with 0,1 and sum of all elements less the MAX_SUM
5. Write constraints – to pick a ball out of 10 different colored balls and that color should not be repeated for in next 3 draws
6. Write a constraint to divide values of 1 queue into 3 queues so that all 3 queues have unique elements
7. Write a constraint to generate dynamic array of 300 elements. each element can have value 0/1/2/3/4 each of the above values should be present more than 40 times in the array element 0 can be repeated while 1/2/3/4 are not allowed to repeat consecutively ex: 001342.. allowed(0 can be repeated) ex: 0122431.. not allowed(2 is repeated)
8. Generate a solved sudoku puzzle
9. For a 8 bit variable if the past randomization resulted in a odd value, the next randomization should be even with 75% probability else be even with 25% probability. Write a constraint
10. Stream of random numbers are getting generated. how to write constraints to make sure most recent four numbers are unique.
11. Write a constraint to generate two dynamic arrays such that array1 size = [6:9], array2 size = array1 size. Array 1 should be assembled in ascending order while array2 should have all the values picked from array1
12. Write a constraint on a 4 bit variable such that the probability of values being the same on the lower two bits have only 5% chance
13. Write a constraint for generating leap years
14. Write a constraint to make sure every even item in an array is even and every odd item is odd.
15. a. Matrix size should be randomized with only odd numbered square matrix. b. Each sub square matrix should have only one max element and rest less than max and can be repetitive c. Max elements of each square sub matrix should be unique, mean no two sub matrix should have same max element d. Constraint Should scale up to any size of limited by 32 bit
16. Write a constraint for a 10 bit variable so that; -> 10% of the time 1 bit in en is high -> 10% of the time 2 bits in en are high ... -> 10% of the time all 10 bits in en are high.
17. Write a constraint to generate a number which has binary all 1s grouped to gather
18. Write a constraint to randomly map elements from an array into N non-empty queues
19. Write code so each element from input_array appears in output_queues, no output queue should be empty
bit [7:0] input_array[]; int unsigned input_count; rand bit [7:0] output_queues[$][]; constraint test_c { output_queues.size() == input_count; }
20. The dynamic array size should be 300 and the elements in it can be 0,1,2,3,4,5. Each element should occur atleast 40 times in the array. Also, there shouldn't be 0's consecutively.
21. Write a constraint for distinct adjacent element (adjacent elements should not be the same) in a 2d array (specifically for the boundary elements size of 2d array is mxm where m can be any integer)
22. Write a constraint for a 2d array such that it has a unique max value in each row and that max value should not be equal to any other max value in other rows
23. Generate a parameterized array whose values are equal to a Magic square using constraints
24. How to write a constraint to generate a random value of 10 bits as 1 and no two bits should be next to each other. This should be written without using $countones.
25. Write SV constraint to limit sum of odd elements of an array to be 30 and sum of even elements to be 60
26. Write constraint for number to be power of 4.
27. Write a constraint that generates Add, mul, sub, nop instructions. Such that no Add instruction is repeated in 3 clock cycles and sub is not repeated in the last 3 valid instructions. Nop is not a valid instruction
28. Write a SV constraint to generate 2 3x3 matrices s.t the min value in each matrix is unique
29. Write a SV constraint for a tic tac toe game. Matrix size is 3x3.
30. Write constraint for an integer array with 10 elements such that exactly 3 of them are same and rest are unique
31. Constraint to randomize a 100 bit var such that always and only 5 consecutive bits are 1s
32. Constraint to randomize an array such that one specific element picked is always a constant value, say element at index 5 is always 100.
33. Write constraint to generate a random number with only 5 bits set and consecutively set for 80% of the time
34. Write a constraint for square matrix and then rotate 90 counter clock wise
35. Write a constraint for 3x3x3 array to have unique elements
36. Say you have 4 instructions: ADD, SUB, MUL, NOP. They execute in X cycles, Y cycles, Z cycles, and 1 cycle respectively. How would you generate a constraint to inject these into a design so that two like instructions don't overlap during the same period of time. Bonus: Is there a way you could generate this without using SV constraints
37. Write system verilog constraints for the eight queens problem?
38. Write system verilog constraints for the Knight's tour problem.
39. Add "size" number of entries to a queue. The entry of queue is randomized between 0 to "size"
40. Write constraints to generate a n bit value such that the number of bits set is equal to number of bits that are zero

FIFO Design

1.
Sync FIFO + RTL code. Full/empty condition
2.
AMD Xilinx - Write Async FIFO RTL
3.
Gray to binary and Bin to gray conversion + RTL code
4.
How do you design non power of 2 depth Async FIFO?
5.
Qualcomm phone screen -
a) A system has a 100MHz write clock. The write logic performs 16 write operations within 80 clock cycles. The write pattern is flexible (can be burst or random). What is the minimum read frequency required to ensure that the write operations are never backpressured?
b) After finding the frequency, what is the minimum depth that this FIFO should have?
6.
Question from NVIDIA - FIFO Write: 250 MHz, FIFO Read: 200 MHz, 70 valid data burst every 100 cycles, calculate FIFO depth.
💡 Click to reveal solution
Solution: Time to write one burst = 4ns, Time required to write 70 bursts = 280ns, Time required to read one burst = 5ns, Number of bursts read in 280ns = 56, Depth of fifo = 70-56 = 14. The key point here is he asked me to show steps for assuming 70 bursts when he originally mentioned 70 valid in 100 cycles
7.
Calculate fifo depth and width: write side writes 18 bit/cycle, write frequency = 60MHz, read side, reads 20bit/cycle, read frequency = 36 MHz, burst size = 100 writes.
💡 Click to reveal solution
Solution: Width: 20 bits with 30 entries (+4 to sync)
Burst I assume is 100 cycles and not bits. So 1800 bits written in 100 cycles while read drains (36/60 x 20/18) x 1800 = 1200 bits in the same time. 600 bits to buffer, divides by 20 for 30 entries.
If you want to add sync delay to this, assuming read starts after a delay, can have 3-4 more entries.
8.
Question from NVIDIA - Number generator. Synchronous reset—while reset is asserted, the first post-reset output should be 1. Generate the sequence 1, 4, 9, 16, 25, 36, 49, … without using multipliers. How many adders are needed? (Hint: it’s not “square the number.”)
💡 Click to reveal solution
Solution: Use sum of consecutive odd numbers to generate squares.
Architecture:
• Register the output (out_r).
• Keep an odd-number counter starting at 3 and increment by 2.
• On each step: out_r ← out_r + odd_count.
Adders required: 2 (one for the odd counter, one for out_r + odd_count).
9.
Is there something you could change in your design so that you could use a single port SRAM but still have the capability of continuous data coming in and out?

Bus Interconnect Design (AXI/APB/Custom)

1.
How would you do a async crossing for AXI
2.
How would you merge 2 AXI interfaces (e.g. 2 master that can communicate with the same slave)
3.
How would you split an AXI interface (e.g. 1 master can communicate with 2 slaves)
4.
Combine the 2 above. I.e. 2 master that can both communicate with 2 slaves independently
5.
How would you add a register slice between a master and slave
6.
If a master can put out multiple outstanding transactions, then what are the pros and cons of issuing all transactions with same ID compared to issuing all transactions with different ID
7.
Where exactly AXI is used and how is it diff from others like APB
8.
What is meant by register slicing
9.
What is meant by split transaction

Clock Domain Crossing (CDC)

1.
What CDC synchronization technique would you use for a 1-bit level signal when FRx is greater than 1.5 times FTx? (Tx - Transmitter/Sender, Rx- Receiver)
2.
What CDC technique would you use for a 1-bit pulse from slow Tx to fast Rx when pulses are infrequent?
3.
What CDC sequence would you use for a 1-bit pulse from fast Tx to slow Rx when pulses are infrequent?
4.
AMD Xilinx - Design a reset synchronizer circuit.
5.
For an n-bit signal, how do the following CDC options compare:
  • a) Recirculation-mux with a synchronized ready
  • b) FSM-based synchronized req/ack (any frequency ratio, but limited throughput)
  • c) Asynchronous FIFO (higher throughput, extra hardware)
6.
What are CDC reconvergence and divergence issues?
7.
Why can't we just place a synchronizer after the AND gate?
8.
What method do you use to determine whether a CDC violation is real?

DFT & Testability

1.
What's a stuck-at fault?

Embedded Systems/Firmware

System Design & Architecture

1.
Design a thermostat (not just system design but also implement core functions)
2.
Design a system that monitors a sensor:
  • Block diagram of tasks and functionality
  • RTOS vs no RTOS decision
  • Data protection and passing mechanisms
  • Reliable periodic data transfer while another task sends data at a non-deterministic rate
3.
Design RTOS for safety critical system with sensor module, safety microcontroller, and motor controller
4.
Design a system that gets data from GPS at certain Hz and sends data to user on demand (store 15 mins of data, handle data loss/corruption)
5.
Design audio/video system with input, signal processing and output to display/speaker
6.
Design elevator control system
7.
Design traffic light controller with state machine and different tasks
8.
Design linked list - basic leetcode question
9.
Design stack (with an array) - basic leetcode question
10.
Apple Wireless firmware role interview - Why do we need digital communication and coding? What are the benefits?
11.
Apple Wireless firmware role interview - What is compression? Distinguish source vs. channel coding

Protocols & Interconnects

1.
Is there a pre-check needed for a bus lock?
2.
What mechanism does your block use to ensure an ack is received?
3.
Compare I2C vs. SPI vs. CAN: topology, bandwidth, addressing, reliability, use cases, and typical failure modes
4.
Does your IP use a shared or independent bus? How do you handle two requesters issuing at the same time?
5.
If a processor writes to memory, how does the communication happen? What kinds of read/write requests exist?
6.
How many cores does the fabric support? How do you make the IP configurable for different core counts?
7.
Open-ended: Connect one AHB primary to two AHB secondaries, each running on its own clock. How would you design the interconnect and CDC boundaries?

Embedded Coding Challenges

1.
Write a program to find the second largest element in an array
2.
In a sorted array, find the index of a target value. Provide O(log n) and discuss edge cases.
3.
Return the n-th node from the end of a singly linked list in one pass. State assumptions.
4.
Implement pow(x, y) for integers with fast exponentiation. Handle negative y and overflow.
5.
Write C code to detect if the system is little-endian or big-endian without stdlib helpers.
6.
Write a snippet to read an input key value with debouncing
7.
Implement a circular buffer that supports chunk read/write
8.
Scale a 5-bit value to 16-bit value (use simple bitwise ops rather than division)
9.
Implement 64-bit × 64-bit multiply on a 32-bit architecture
10.
Swap even and odd bits
11.
Get stack frame size without using assembly
12.
Apple Embedded Role Interview Question - Design an embedded system to count people entering a zoo through a turnstile. The turnstile has a single output wire connected to a microcontroller that generates a "high pulse" (low → high → low) each time the turnstile completes a full rotation (one person entering).

Questions:
  • How would you design the system to count the number of people who have entered using this signal?
  • How would you keep track of the current state of the turnstile within the microcontroller?
  • Suppose a person stops halfway through the turnstile, keeping the signal high until they move again. How would you handle this condition to ensure the count remains accurate?
13.
Write a C function that reads a single input line and prints the tokens in reverse order in the exact format shown below.
  • Delimiters: any non-alphanumeric character (treat consecutive delimiters as one split)
  • Do not use strtok/strtok_r/strsep. Use pointer arithmetic or your own scanning
  • Time: O(n)

Function to implement (one of):

void print_reversed_tokens(const char *s); // or size_t tokenize_reverse(const char *s, const char *out[], size_t max_out); // returns count, caller prints

Output format (exact):

<TOKEN_COUNT> tokens: [<TOKEN_k>|<TOKEN_k-1>|...|<TOKEN_1>]

Examples:

Input: "Hello, Apple! team-2025 :)" Output: 4 tokens: [team-2025|Apple|Hello|,|:]
14.
Order Structure (2021 Amazon Embedded SWE OA) - Write a function that takes an `OrderBatch` data structure and serializes that structure along with the orders it contains into the defined output format.

Function signature:

FuncStatus_t serialize_order(const struct OrderBatch *order_batch, const size_t out_max_length, uint8_t *out);

Input parameters:

  • const struct OrderBatch *order_batch: a pointer to an `OrderBatch` struct to be serialized.
  • const size_t out_max_length: the number of bytes allocated in `out`. Do not put serialized data outside of the pre-allocated buffer.

Output parameters:

  • uint8_t *out: the pre-allocated buffer to store the serialized `OrderBatch` data.

Return:

  • FuncStatus_t: the status of serialization effort. See enum descriptions below.

Provided non-standard type definitions:

Structures:

struct OrderBatch { uint32_t order_count; uint16_t batch_id; struct Order *orders; }; struct Order { uint16_t quantity; uint64_t order_id; uint8_t part_number[16]; char email_address[32]; /* NULL terminated string */ };

FuncStatus_t:

typedef enum { STATUS_SUCCESS, /* Conversion performed successfully, 'out' */ STATUS_INSUFFICIENT_OUTPUT_BUFFER, /* Unable to serialize the input data stru */ STATUS_NULL_INPUT /* OrderBatch struct is a null pointer, 'out' */ } FuncStatus_t;

Background:

The serialized output data will have the following format:

  • Note 1: Assume the host is little-endian.
  • Note 2: You may receive an `OrderBatch` where `order_count` is less than 1.
  • Note 3: Payload Len is the number of bytes in the serialized input including the first 10 bytes (i.e., `0xFACE` and Payload Len).

Serialized Layout:

+--------+-------------+-------------+----------+--------------------+ ... +--------------------+ | 0xFACE | Payload Len | Order count | Batch ID | Order #1 | | Order #N | | 2 B | 8 B | 4 B | 8 B | 58 B | | 58 B | +--------+-------------+-------------+----------+--------------------+ ... +--------------------+ Each Order (58 bytes total): +----------+-----------+-------------+------------------------------+ | Quantity | Order ID | Part Number | Email Address | | 2 B | 8 B | 16 B | 32 B (NULL-terminated/padded)| +----------+-----------+-------------+------------------------------+

Examples:

Example 1

Inputs:

OrderBatch(2, 42, ordersPtr) ordersPtr[0]: quantity = 8 order_id = 12 part_number = {0,1,2,3,4,5,6,7, 0,1,2,3,4,5,6,7} // 16 bytes email_address = "xyz@abc.com" ordersPtr[1]: quantity = 2 order_id = 14 part_number = {1,2,3,4,5,6,7,8, 9,0,1,2,3,4,5,6} // 16 bytes email_address = "abc@abc.com"

Output (hex dump):

CE FA 8A 00 00 00 00 00 00 00 02 00 00 00 2A 00 00 00 00 00 00 00 08 00 0C 00 00 00 00 00 00 00 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 78 79 7A 40 61 62 63 2E 63 6F 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 0E 00 00 00 00 00 00 00 01 02 03 04 05 06 07 08 09 00 01 02 03 04 05 06 61 62 63 40 61 62 63 2E 63 6F 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Returns:

STATUS_SUCCESS
15.
Starting in a bit (2021 Amazon Embedded SWE OA) - Write an algorithm to find the starting bit position of the first occurrence of a specific 32-bit, big-endian pattern (`0xFE6B2840`) within a given byte array. The input byte array is in network byte order. A crucial detail is that the pattern may or may not be byte-aligned in the input.

Function signature:

int findPattern(const uint32_t numBytes, const uint8_t data[]);

Description of function parameters:

Input parameters:

  • uint32_t numBytes: The number of bytes in the array named `data`.
  • uint8_t data[]: The byte stream of data to search.

Return:

  • -1: Returned if the given pattern (`0xFE6B2840`) is not found.
  • -2: Returned if the input data is `NULL` or the size of the `data` is insufficient to find the pattern (`0xFE6B2840`).
  • Otherwise: the pattern is found. Return the starting bit position of the pattern (`0xFE6B2840`).

Background:

Two C library functions, htonl() and ntohl(), are provided for endian conversion:

  • uint32_t htonl(uint32_t hostlong); (host to network order)
  • uint32_t ntohl(uint32_t netlong); (network to host order)

Note: Network byte order is big-endian, host byte order is little-endian.

Examples:

Example 1 - Byte Aligned:

Inputs:

numBytes: 8 data: [ 0x00, 0x01, 0xFE, 0x6B, 0x28, 0x40, 0x02, 0x03 ]

Returns:

16 // Starting position is here at bit 16

Example 2 - Non-byte Aligned:

This is the same as Example 1, left-shifted by 1 bit.

Inputs:

numBytes: 8 data: [ 0x00, 0x03, 0xFC, 0xD6, 0x50, 0x80, 0x04, 0x06 ]

Returns:

15 // Starting position is here at bit 15 // (least significant bit of the second byte of the input)

Starter Code:

#include <arpa/inet.h> int findPattern(const uint32_t numBytes, const uint8_t data[]) { // Implement your code here! }
16.
Read data from the flash (2021 Amazon Embedded SWE OA) - Implement a wrapper driver API to read an arbitrary number of bytes from a custom flash memory that can have "bad bytes" (unreliable data). You must use an existing `read_8b` API that reads exactly 8 bytes at a time and provides a bit mask indicating which bytes are invalid.

Function signature:

int read(uint8_t* buffer, uint32_t n_bytes, uint32_t offset);

Existing API - read_8b:

int read_8b(uint8_t* buf, uint8_t* mask);

Description of read_8b:

  • Reads exactly 8 consecutive bytes from flash memory into buf
  • Returns 0 on success, 1 when end of flash memory is reached
  • mask is an 8-bit value where each bit corresponds to a byte in buf
  • Bit i in mask corresponds to buf[i]
  • If mask bit i is 1, then buf[i] is a "bad byte" (invalid)
  • If mask bit i is 0, then buf[i] is valid

Mask bit mapping:

| BIT | BIT | BIT | BIT | BIT | BIT | BIT | BIT | |-----|-----|-----|-----|-----|-----|-----|-----| | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | | buf[7] | buf[6] | buf[5] | buf[4] | buf[3] | buf[2] | buf[1] | buf[0] |

Example of mask usage:

If read_8b returns mask = 5 (binary 00000101):

  • buf[0] is invalid (bit 0 = 1)
  • buf[1] is valid (bit 1 = 0)
  • buf[2] is invalid (bit 2 = 1)
  • buf[3] through buf[7] are valid (bits 3-7 = 0)

Your read function requirements:

  • Read n_bytes of valid data from flash memory
  • Skip the first offset valid bytes before starting to store data in buffer
  • Skip any "bad bytes" encountered during reading
  • Return the actual number of valid bytes successfully read and stored in buffer
  • Stop reading when end of flash memory is reached (when read_8b returns 1)

Examples:

Example 1:

Input:

Data in flash memory: [0x03, 0x04, 0xa2, 0xa6, 0x10, 0x73, 0xff, 0x99] Corresponding bit mask: 0x01 (only buf[0] is bad) n_bytes: 5 offset: 1

Output:

buffer: [0x04, 0xa2, 0xa6, 0x10, 0x73] Return: 5

Example 2:

Input:

Data in flash memory: [0x83, 0x84, 0xa2, 0xa6, 0x18, 0x73, 0xff, 0x99, 0x53, 0x2a, 0x83, 0xf3, 0xab, 0xea, 0x11, 0x38] Corresponding bit masks: [0x01, 0x05] (buf[0] and buf[2] are bad in first block, buf[0] and buf[2] are bad in second block) n_bytes: 11 offset: 0

Output:

buffer: [0x84, 0xa2, 0xa6, 0x18, 0x73, 0xff, 0x99, 0x2a, 0x83, 0xf3, 0xab] Return: 11

Example 3:

Input:

Data in flash memory: [0x83, 0x84, 0xa2, 0xa6, 0x18, 0x73, 0xff, 0x99, 0x53, 0x2a, 0x83, 0xf3, 0xab, 0xea, 0x11, 0x38] Corresponding bit masks: [0x01, 0x05] (same as Example 2) n_bytes: 11 offset: 3

Output:

buffer: [0x18, 0x73, 0xff, 0x99, 0x2a, 0x83, 0xf3, 0xab, 0xea, 0x11, 0x38] Return: 10

Starter Code:

// Assume read_8b is already implemented int read_8b(uint8_t* buf, uint8_t* mask); int read(uint8_t* buffer, uint32_t n_bytes, uint32_t offset) { // Implement your code here! }

Concurrency, Synchronization & OS

1.
Explain the difference between process and thread: contrast memory layout, scheduling, IPC costs, and context-switch overhead
2.
Return two 32-bit registers (high & low) timer value as a 64-bit value (avoid race conditions)
3.
Design a method to handle floating point operations in RTOS (assume no GPU)
4.
How differently would you design thread safety mechanisms for data transfer to NAND flash vs serial port
5.
Explain exactly what happens during a context switch from Thread A to Thread B. What registers and state must be saved/restored? How does this differ from a process context switch?
6.
Your RTOS guarantees a maximum interrupt latency of 10μs. What factors contribute to interrupt latency, and how would you measure and optimize it in your system?
7.
Implement priority inheritance for a mutex. Explain the scenario where priority inversion occurs and how your implementation prevents it.
8.
Explain how a virtual address is translated to a physical address. Walk through the complete MMU process including TLB lookup and page table walk.
9.
What's the difference between a page fault and a segmentation fault? Provide scenarios where each would occur.
10.
Quickly design a circular buffer (also called ring buffer). Follow up question: Make it thread safe.
11.
Show how to determine if the stack is growing upward or downward

Memory & Drivers

1.
Explain the volatile keyword (the favourite of every company)
2.
Implement memcpy and memmove (follow up on why they're different
3.
Design malloc without using system calls (from scratch)
4.
Write a simple driver for a protocol: includes length field, data, and checksum
5.
Design OTA update for all software stacks (firmware/driver/kernel to application layer) with different rollback mechanisms
6.
Apple Core OS question - Process a continuous ADC buffer (12-bit, ISR-sampled) and return valid samples.
7.
Apple Core OS question - Explain the path for a user-space read/write request to hardware

Cache & Coherency

1.
How does cache work? - explain cache lines, tags, hit/miss, write policies (through/back)
2.
Qualcomm phone screen - What are the different types of cache?
3.
Qualcomm phone screen - If you have a 32 bit address coming in; there are 128 cache lines and each cache line is 64B; how will you use the address to determine where the cache line should go for each type of these caches?
4.
Qualcomm phone screen - Suppose you have a 16-way set associative cache; you have a traffic coming in that is very uneven; it only accesses cache line 0,1,2,3 and never accesses the rest; how will you improve the overall performance of this system?
5.
How is a cache designed in terms of the locality rules? How can we design a cache to exploit spatial and temporal locality?
6.
How do we know when to replace contents in a cache? Won't it be nice to have an infinitely large cache?
7.
How to optimize access based on the cache size?
8.
How does the cache line size affect our design?
9.
What is coherency?
10.
Implement Least Recently Used algorithm, LRU Cache
11.
Is there a reason why we don't care about coherency?
12.
A core has a cache line in modified state in its L2. A non-coherent requestor wants to write a new value to the same cache line. What might be some hazards?

Advanced / Unique Questions

1.
Design a software timer handler for 10 tasks, each with its own timeout and callback, using only one HW timer
2.
Design a charging system manager for M machines and N charging points (M>N) with plug/unplug APIs based on threshold and charge level
3.
Encode/decode functions:
  • Encode a string inside an image without visually distorting it
  • Decoder retrieves the string from the encoded image

Analog/Mixed-Signal Design

Analog Design Fundamentals

1.
Difference between analog-on-top versus digital-on-top design?
2.
Explain LDO regulator vs switch mode - pros, cons, and operation of both
3.
Explain how to improve feedback regulator on linear regulator
4.
Explain how to build an ADC and talk about how to choose a component
5.
Apple Wireless firmware role interview - Why do you use filtering? What are different types of filters?
6.
Apple Wireless firmware role interview - What is the relation between RSSI and TX power? How would you measure it? What tools can you use?

Physical Design

Physical Design Fundamentals

1.
What is use of multi bit cells in physical design?
2.
What are techniques to reduce dynamic power consumption?
3.
What is the use of clock gating?
4.
How will you reduce power through clock gating?
5.
What are advantages and disadvantages of clock gating?
6.
What are the different techniques to reduce static ir drop and dynamic ir drop?
7.
What is disadvantage of dcap cells?
8.
How will you apply derates for launch clock and capture clock?
9.
If the cells are sitting far will you apply lesser derates or more derates
10.
What are checks done after synthesis?

Study Resources & Leetcode Numbers

🌐 Online Resources we use

  1. EDA Playground
    Online Verilog/SystemVerilog simulator
  2. montychoy.com
    Ultimate list of hardware engineering internship interview questions
  3. HDLBits
    Leetcode but RTL Ver.1 - I personally like this one better
  4. Chipdev
    Leetcode but RTL Ver.2
  5. srnvl/Embedded_SWE_Prep
    Embedded SWE Prep Github Repository
  6. FIFO Depth Calculation Made Easy (PDF)
    FIFO depth calculation guide
  7. RFIC Interview Questions
    RFIC interview questions
  8. Asynchronous FIFO Verilog Code
    Asynchronous FIFO Verilog Code
  9. RTL Design Practice Problems
    More RTL Interview Questions
  10. Verification Guide (UVM TestBench architecture)
    Coding UVM components
  11. FSM-Finite State Machine-Questions
    Some challenging FSM Questions
  12. VLSI and hardware engineering interview questions
    A painstakingly long question list (worth skimming through, there are few interesting questions)
  13. DV Interview Prep Guide
    Github repository for DV Interview Prep Guide
  14. Example Interview Questions for a job in FPGA, VHDL, Verilog
    From nandland youtube channel
  15. Chipverify
    Good resource for SystemVerilog/UVM/Verification
  16. Electronics Interview Questions: STA part 1
    The typical STA interview question
  17. Electronics Interview Questions: STA part 2
    The 2nd typical STA interview question
  18. What is AXI
    AXI protocol basics playlist
  19. Technical Internship Interview Questions
    More interview questions
  20. Computer Architecture Youtube Playlist
    High Performance Computer Architecture (Udacity)
  21. Ultimate Folder
    Misc. list of VLSI interview questions

🔢 Leetcode Numbers

Most commonly asked LeetCode problems for Hardware Interview.

Embedded/Firmware Tips

We recommend using C and Python to solve these LeetCode questions. If you have to choose only one language, use C.

Stack & String Processing

Sorting & Searching