All Hardware Types Interview Questions
Digital Design/RTL
Topics
RTL Design involves the design/integration of digital logic modules that go into a chip or IP.
Topics Covered
Verilog Syntax and Programming
- Try to be familiar with all the synthesizable constructs used in Verilog
- Ex. Blocking vs Non-blocking statements - trivial, but it is surprising how often it comes up (and how often interviewees stumble)
- Casex, Casez and the variations
A very common question pattern:
- Write Verilog code to do 'something' (Ex. BCD counter)
- What do the different sections of your code synthesize to - very important to know this, this is where Verilog interviews are different from other coding interviews. For example, always @ (posedge clock) would typically result in a logic involving Flip-Flops
- Make some kind of changes to the input/output (Ex. Output should be triggered by another signal, Output should be seen with a delay of one cycle, Synchronous vs Asynchronous reset, etc)
Digital Design Concepts
- Basic logic design - questions typically found in undergraduate digital electronics courses
- Universal gates, design different gates using a MUX
- Designing a counter - commonly asked question
- State Machine design - major subsection:
- Design state machines (Draw the state diagram) for the given scenario
- Mealy vs Moore model - very very important - learn how to come up with both models any scenario
- Typically, state machine questions are converted to Verilog questions (i.e. write verilog code for the state machine you designed). It is good to practice a template for state machines (both models) so that you can convert any state diagram to code easily
- Another variation here could be binary encoding vs one-hot encoding for the states - each has pros and cons
Static Timing Analysis
- What the different terms (setup time, hold time, etc) mean
- Given a Flop - Logic - Flop configuration:
- Is there setup or hold violations
- If there are any, how to fix them
- There are multiple variations of this question, and they are well documented in any STA book or tutorial - spend time to master this
- Time borrowing (Especially if you are a grad student)
- Metastability and how to deal with it
- Know the basics for all interviews, but also learn about specifics like MTBF in case of clocking-related roles
C Programming
Basic C programming, up to the complexity of Linked Lists, could be asked in both RTL design and verification roles. Sorting is commonly asked.
Things to Note
- Prepare a little above and below the VLSI stack for an RTL design role interview:
- Up the stack: A theoretical understanding of Computer Architecture, including typical 5-stage pipelines, Memory hierarchy, and Branch prediction
- Down the stack: MOSFETs and CMOS design, specifically knowing about CMOS inverters and basic logic gates at the transistor level, though these are not asked as often
- If you have previous experience in RTL design:
- You should be able to identify key blocks from your previous design (e.g., Content Addressable Memories, FIFOs) to demonstrate depth of experience
- You should also be able to quickly implement these structures, writing basic working HDL, even if skipping edge cases
- Understanding basic Valid-Ready handshaking and how to design for it, as it's used in almost every digital block
- Read about AXI protocols (channels and transactions) if there's experience with AXI blocks
- Backpressure: how to tell a block to stop sending data. While valid-ready is common, other protocols like valid-afull and valid-credit exist
- If you are applying for an FPGA company or team, make sure you understand FPGAs well, and how they are different from ASICs
- And also how to write better RTL for FPGAs (they are register rich, have hard DSP blocks, etc - how does this impact your design decisions?)
RTL Design Fundamentals
wire and reg?What is the value of A & B at various times of simulation - 0 time_unit, 1 time_unit, 2 time_unit, 3 time_unit?
a=1, b=0, c=0 just before a rising clock edge, compute the final values of a, b, c after one clock for each snippet.Variant A – Non-blocking (<=)
a, b, c for Variant A and Variant B? (Show your steps by drawing a waveform or a small timing table.)always @(posedge clk) begin : pipeline
Q1 = in;
Q2 = Q1;
Q3 = Q2;
end💡 Click to reveal solution
Q1→Q2→Q3 in the same cycle. That doesn’t model pipeline flops (each stage should capture theprevious cycle’s value). Use non-blocking (<=) oralways_ff to model sequential behavior.RTL Coding Challenges
💡 Click to reveal solution
wr_en and rd_en are 1 in the same cycle?out_valid is asserted, output (latest_value, second_largest_so_far). Example stream: 1, 4, 5, 2, 3 → on first out_valid: latest=3, second_largest=4; on next out_valid: latest=2, second_largest=4. Duplicates count as the second-largest value. State assumptions (bit width, signed/unsigned, duplicate handling, reset behavior, latency)💡 Click to reveal solution
Every new input:
1. Push input into S1
2.
If input >= L1
Push L1 into S2
Push input into L1
else if input >= S2
Push input into S2
else
Push head S2 into S2
Clk_50Mhz – 50 MHz input clockReset_n – Active-low resetData_in_1..4 – four 16-bit data inputsData_out – 16-bit data outputWE – write enable for selected inputAddress – 2-bit selects which input to writeWe get an async event.
If ≥1 event: Count as 1 event
If 0 event: Miss.
Requirements:
- Create system to raise error flag if we have 2 misses every 40 cycles
- Modify to raise error flag for 2 misses in any 40 cycle window
a[15:0]+b[15:0] → s[15:0]; now requirea[7:0]+b[7:0] → s[7:0] and a[15:8]+b[15:8] → s[15:8] using that single adder. State assumptions (single-cycle vs multi-cycle/time-multiplexed, latency allowed, carry behavior).💡 Click to reveal solution
H(z) = Y(z)/X(z) = (1 + 2z-1 + z-2) / (1 − 1.5z-1 + 1.5z-2).
Specify structure (Direct Form I/II or transposed), fixed-point widths/quantization, overflow/saturation behavior, latency, and reset strategy.
💡 Click to reveal solution
STA/Static Timing Analysis
systemverilog logic [63:0] c; logic [63:0] b; logic [63:0] a; assign c = a + b;- How does this synthesize?
- How will you optimize this design for timing?
- Change the RTL based on your new optimized design.
- Given the following unit area values for each of the gates in this design:
- Full Adder: 8 unit area
- Half Adder: 5 unit area
- XOR gate: 3 unit area
- AND gate: 1 unit area
- What will be the area for the old design?
- What will be the area for the optimized design?
- What will be the critical timing path of the old design?
- What will be the critical timing path of the optimized design?
RAM Design
Low-Power & Power Intent (UPF)
💡 Click to reveal solution
Why:
1) Area & Power: Latches are smaller/lower power than flops; savings multiply across thousands of ICGs.
2) Timing slack: A negative latch is transparent when clk=0, giving the enable nearly a full cycle to meet timing. A negedge FF gives ~½ cycle, tighter and riskier.
3) Glitch-free gating: In a standard ICG, latch output is ANDed with clk. While clk=0, the AND output stays low so enable changes can’t glitch the gated clock; latch closes at clk↑ and propagates a stable enable.
Using a posedge FF risks races at the AND; a negedge FF offers no benefit over the latch but costs more area/power.
Advanced RTL Topics
Things to think about:
a) What interfaces would you use? How many?
b) How would you arbitrate between requests?
c) How many ports does memory have?
d) What other blocks your memory controller should have?
Design Verification
Topics
Design Verification involves writing tests for digital logic modules to cover all the different use-cases.
Topics Covered
Digital Design Concepts (Same as in RTL Design)
C Programming (Same as in RTL Design)
Scripting
- Be very familiar with one scripting language
- Industry uses a lot of TCL and Perl, but Python works well for entry level interviews
- Questions usually involve large text files
- So read about how to parse information from text files, and maybe some efficient ways to do it
Object Oriented Programming (OOP)
- Since a lot of System Verilog constructs are OOP based, understanding all the OOP concepts is important
- Sometimes this is completely skipped, but good to know anyway
- Be able to come up with code examples for each of the OOP principles (like Classes, Objects, Abstraction, Encapsulation, etc)
- Inheritance and Polymorphism are usually favorites (Finding out which class different objects belong to)
Things to Note
- DV interviews are usually challenging because you are expected to also be a good designer. So I suggest preparing all the content of RTL design too.
- Verilog design questions might be asked (Although they tend to be simpler)
- If you have previous DV experience (even in an internship), then you also need to work on UVM and other DV concepts that you might have used
- In this case, the RTL Design part will probably be skipped
Verification Methodologies & Strategies
UVM
💡 Click to reveal solution
UVM factory is used to register classes and instances with a factory. e.g. uvm_object_utils, uvm_component_utils, type_id::create("<inst_name>"). New() is the default constructor of SV.
💡 Click to reveal solution
Class override and instance override
💡 Click to reveal solution
run_test()
💡 Click to reveal solution
Pointer to the actual interface. uvm_config_db is used for setting and getting the pointer to the actual interface.
💡 Click to reveal solution
A sequencer that contains pointers to all other sequencers.
💡 Click to reveal solution
Most commonly used phases: build_phase(), connect_phase(), run_phase(), check_phase(). Check_phase can be used to check size of queues and flag an error if queue is expected to be empty at the end of a simulation. Run_phase() is a task while the rest are funcitons.
💡 Click to reveal solution
Testbench Components:
- Test
- Environment
- Scoreboard
- Predictor (optional)
- Agent(s)
- Monitor
- Sequencer
- Driver
- Virtual sequencer (optional)
- Coverage monitor (applicable if you do class based coverage collection)
- Other models necessary in your testbench (e.g., memory model, etc.)
Sequencer-Driver Handshake: The sequencer generates transactions and sends them to the driver through a TLM (Transaction Level Modeling) port. The driver requests transactions from the sequencer using get_next_item() or try_next_item(), and when done processing, calls item_done() to signal completion back to the sequencer.
💡 Click to reveal solution
Note: All but 2 of my screening interviews had this question. The interviewers mainly want to see whether or not you understand some key components of a monitor. Interviewers don't care about syntax.
SystemVerilog Testbench & Constraints
💡 Click to reveal solution
In other words, within the 32-bit value, there should be no sequence of 5 contiguous bits that are all set or all unset.
💡 Click to reveal solution
Explanation:
foreach (arr[i])- Iterates through each bit positionif (i <= 27)- Only check positions where we can fit 5 consecutive bits (positions 0-27)!(&arr[i+:5])- Ensures no 5 consecutive 1's (AND of 5 bits is 0)!(&(~arr[i+:5]))- Ensures no 5 consecutive 0's (AND of inverted 5 bits is 0)
Write SystemVerilog constraints to divide the elements of this queue into three new queues (q1, q2, and q3) such that:
- Every element from the original queue appears in exactly one of the three new queues
- All three queues together contain unique elements (no duplicates)
- Each queue must have at least one element
- You cannot use
post_randomize()to perform the split — it must be handled entirely within the constraint block
How would you approach this problem?
💡 Click to reveal solution
Explanation:
idx[15]- Array of shuffled indices to randomize element assignmentunique {idx}- Ensures all indices are unique (no duplicates)q1.size() + q2.size() + q3.size() == array.size()- Ensures all elements are distributedq1.size() > 0; q2.size() > 0; q3.size() > 0- Each queue has at least one element- The
assign_to_queueconstraint uses the shuffled indices to assign elements from the original array to the three queues
- Exactly three elements are identical (i.e., one value appears three times)
- All the remaining seven elements are unique
How would you implement this constraint?
💡 Click to reveal solution
Explanation:
val[10]- Array of 10 random unsigned bytesval3- The value that appears exactly 3 timesval[i] < 10- Constrains each element to be less than 10val[i] != val3 -> val.sum() with (int'(item==val[i])) == 1- If an element is not equal to val3, it must appear exactly onceval.sum() with (int'(item==val3)) == 3- val3 must appear exactly 3 times in the array
💡 Click to reveal solution
What are some use cases for wait fork and disable fork? How would one kill thread_2() after fork-join_any below finishes and then continue waiting for remaining threads to finish? You can make changes to the code below if needed.
Upcasting vs downcasting. What does "virtual" keyword do in SystemVerilog?
What will be the output of the following code block? Update the code block to print the lines below (in any order).
• thread id: 0
• thread id: 1
• thread id: 2
bit [7:0] input_array[]; int unsigned input_count; rand bit [7:0] output_queues[$][]; constraint test_c { output_queues.size() == input_count; }FIFO Design
a) A system has a 100MHz write clock. The write logic performs 16 write operations within 80 clock cycles. The write pattern is flexible (can be burst or random). What is the minimum read frequency required to ensure that the write operations are never backpressured?
b) After finding the frequency, what is the minimum depth that this FIFO should have?
💡 Click to reveal solution
💡 Click to reveal solution
Burst I assume is 100 cycles and not bits. So 1800 bits written in 100 cycles while read drains (36/60 x 20/18) x 1800 = 1200 bits in the same time. 600 bits to buffer, divides by 20 for 30 entries.
If you want to add sync delay to this, assuming read starts after a delay, can have 3-4 more entries.
💡 Click to reveal solution
Architecture:
• Register the output (out_r).
• Keep an odd-number counter starting at 3 and increment by 2.
• On each step: out_r ← out_r + odd_count.
Adders required: 2 (one for the odd counter, one for out_r + odd_count).
Bus Interconnect Design (AXI/APB/Custom)
Clock Domain Crossing (CDC)
- a) Recirculation-mux with a synchronized ready
- b) FSM-based synchronized req/ack (any frequency ratio, but limited throughput)
- c) Asynchronous FIFO (higher throughput, extra hardware)
DFT & Testability
Embedded Systems/Firmware
System Design & Architecture
- Block diagram of tasks and functionality
- RTOS vs no RTOS decision
- Data protection and passing mechanisms
- Reliable periodic data transfer while another task sends data at a non-deterministic rate
Protocols & Interconnects
Embedded Coding Challenges
pow(x, y) for integers with fast exponentiation. Handle negative y and overflow.Questions:
- How would you design the system to count the number of people who have entered using this signal?
- How would you keep track of the current state of the turnstile within the microcontroller?
- Suppose a person stops halfway through the turnstile, keeping the signal high until they move again. How would you handle this condition to ensure the count remains accurate?
- Delimiters: any non-alphanumeric character (treat consecutive delimiters as one split)
- Do not use strtok/strtok_r/strsep. Use pointer arithmetic or your own scanning
- Time: O(n)
Function to implement (one of):
Output format (exact):
Examples:
Function signature:
Input parameters:
const struct OrderBatch *order_batch: a pointer to an `OrderBatch` struct to be serialized.const size_t out_max_length: the number of bytes allocated in `out`. Do not put serialized data outside of the pre-allocated buffer.
Output parameters:
uint8_t *out: the pre-allocated buffer to store the serialized `OrderBatch` data.
Return:
FuncStatus_t: the status of serialization effort. See enum descriptions below.
Provided non-standard type definitions:
Structures:
FuncStatus_t:
Background:
The serialized output data will have the following format:
- Note 1: Assume the host is little-endian.
- Note 2: You may receive an `OrderBatch` where `order_count` is less than 1.
- Note 3: Payload Len is the number of bytes in the serialized input including the first 10 bytes (i.e., `0xFACE` and Payload Len).
Serialized Layout:
Examples:
Example 1
Inputs:
Output (hex dump):
Returns:
Function signature:
Description of function parameters:
Input parameters:
uint32_t numBytes: The number of bytes in the array named `data`.uint8_t data[]: The byte stream of data to search.
Return:
-1: Returned if the given pattern (`0xFE6B2840`) is not found.-2: Returned if the input data is `NULL` or the size of the `data` is insufficient to find the pattern (`0xFE6B2840`).- Otherwise: the pattern is found. Return the starting bit position of the pattern (`0xFE6B2840`).
Background:
Two C library functions, htonl() and ntohl(), are provided for endian conversion:
uint32_t htonl(uint32_t hostlong);(host to network order)uint32_t ntohl(uint32_t netlong);(network to host order)
Note: Network byte order is big-endian, host byte order is little-endian.
Examples:
Example 1 - Byte Aligned:
Inputs:
Returns:
Example 2 - Non-byte Aligned:
This is the same as Example 1, left-shifted by 1 bit.
Inputs:
Returns:
Starter Code:
Function signature:
Existing API - read_8b:
Description of read_8b:
- Reads exactly 8 consecutive bytes from flash memory into
buf - Returns
0on success,1when end of flash memory is reached maskis an 8-bit value where each bit corresponds to a byte inbuf- Bit
iinmaskcorresponds tobuf[i] - If
maskbitiis1, thenbuf[i]is a "bad byte" (invalid) - If
maskbitiis0, thenbuf[i]is valid
Mask bit mapping:
Example of mask usage:
If read_8b returns mask = 5 (binary 00000101):
buf[0]is invalid (bit 0 = 1)buf[1]is valid (bit 1 = 0)buf[2]is invalid (bit 2 = 1)buf[3]throughbuf[7]are valid (bits 3-7 = 0)
Your read function requirements:
- Read
n_bytesof valid data from flash memory - Skip the first
offsetvalid bytes before starting to store data inbuffer - Skip any "bad bytes" encountered during reading
- Return the actual number of valid bytes successfully read and stored in
buffer - Stop reading when end of flash memory is reached (when
read_8breturns 1)
Examples:
Example 1:
Input:
Output:
Example 2:
Input:
Output:
Example 3:
Input:
Output:
Starter Code:
Concurrency, Synchronization & OS
Memory & Drivers
Cache & Coherency
Advanced / Unique Questions
- Encode a string inside an image without visually distorting it
- Decoder retrieves the string from the encoded image
Analog/Mixed-Signal Design
Analog Design Fundamentals
Physical Design
Physical Design Fundamentals
Study Resources & Leetcode Numbers
🌐 Online Resources we use
- EDA Playground
Online Verilog/SystemVerilog simulator - montychoy.com
Ultimate list of hardware engineering internship interview questions - HDLBits
Leetcode but RTL Ver.1 - I personally like this one better - Chipdev
Leetcode but RTL Ver.2 - srnvl/Embedded_SWE_Prep
Embedded SWE Prep Github Repository - FIFO Depth Calculation Made Easy (PDF)
FIFO depth calculation guide - RFIC Interview Questions
RFIC interview questions - Asynchronous FIFO Verilog Code
Asynchronous FIFO Verilog Code - RTL Design Practice Problems
More RTL Interview Questions - Verification Guide (UVM TestBench architecture)
Coding UVM components - FSM-Finite State Machine-Questions
Some challenging FSM Questions - VLSI and hardware engineering interview questions
A painstakingly long question list (worth skimming through, there are few interesting questions) - DV Interview Prep Guide
Github repository for DV Interview Prep Guide - Example Interview Questions for a job in FPGA, VHDL, Verilog
From nandland youtube channel - Chipverify
Good resource for SystemVerilog/UVM/Verification - Electronics Interview Questions: STA part 1
The typical STA interview question - Electronics Interview Questions: STA part 2
The 2nd typical STA interview question - What is AXI
AXI protocol basics playlist - Technical Internship Interview Questions
More interview questions - Computer Architecture Youtube Playlist
High Performance Computer Architecture (Udacity) - Ultimate Folder
Misc. list of VLSI interview questions
🔢 Leetcode Numbers
Most commonly asked LeetCode problems for Hardware Interview.
Embedded/Firmware Tips
We recommend using C and Python to solve these LeetCode questions. If you have to choose only one language, use C.