Hash Map/Set
Notes and typical questions for hash map or hash set related problems
Notes
If the array content is within a specific range, we can use
int[]to replaceMapIf there are multiple
int[]and you need to group them, you can convertint[]toStringand useStringas key in theHashMapIf the result needs to be deduped, a 2/3 pointers solution may be easier to write with sorting the input array.
Understanding HashMap Time Complexity
A common question that always arises is: why are hashmap lookups considered O(1) in terms of time complexity, even in worst-case scenarios? This seems counterintuitive, especially considering that hash collisions can occur.
If we use a predetermined hash function, the worst-case time for hashmap operations could indeed be O(n). Why? Because someone could craft a set of keys that all hash to the same value, causing a chain of collisions. This would force the lookup to scan through all n elements, resulting in O(n) time complexity.
The key to achieving O(1) time complexity lies in randomization. Instead of using a fixed hash function like
h(x) = (constant_a . x + constant_b) % constant_prime, we can use a randomized approach. For example, we might choose random values for the parameters in our hash function each time we initialize our hashmap, such ash(x) = (random_a . x + random_b) % random_prime. (This is just one way to construct a hash function; there are many other types you can design.)This randomization makes it virtually impossible for someone to predict and exploit the hash function's behavior.
From a mathematical perspective, when analyzing the "expected runtime" of hashmap operations using a randomized hash function, it averages out to O(1). While some individual operations might take longer due to collisions, the overall average remains constant.
It's crucial to understand that when we say "expected worst-case time is O(1)", we're referring to the average over all possible random choices of the hash function, for any given input.
This isn't just theoretical—it’s applied in practice. For instance, Google’s Abseil library randomizes hash functions at the program start. This helps prevent attacks that exploit hash collisions and makes systems more secure. Randomization also ensures that software doesn't become dependent on a specific hash function. Hardcoding a hash function and never changing it makes future updates to improve security or performance challenging.
This concept illustrates a broader principle in system design: the power of introducing controlled randomness to improve system performance and security. It also relates to Hyrum's Law, which suggests that all observable behaviors of a system will eventually be depended on by somebody. By randomizing hash functions, we prevent dependencies on specific hash behaviors, making systems more robust and flexible.
Additionally, when we say "expected value," it's not just a random term; it is formally defined, similar to worst-case and average-case scenarios. You can read the definition and understand the concept here in probability theory: Expected value.
Typical Questions (28)
Common Hash Map/Set Questions (22)
LC 242. Valid Anagram
Optimal Answer. TC: O(n), SC: O(1)
🟠 LC 49. Group Anagrams
Advanced version of LC 242, try to do it without sorting
Optimal Answer. TC: O(n∗m), SC: O(n∗m)
LC 349. Intersection of Two Arrays
Optimal Answer. TC: O(n), SC: O(1)
LC 454. 4Sum II
Optimal Answer. TC: O(n2), SC: O(n2)
LC 383. Ransom Note
Optimal Answer. TC: O(n), SC: O(1)
⚪ LC 205. Isomorphic Strings
Optimal Answer. TC: O(n), SC: O(1)
LC 290. Word Pattern
It's similar to above LC 205.
There is a single HashMap solution, but I still chose 2 HashMaps for better readability.
N represents the length of
sand M represents the length ofpatternTC: O(N+M)
SC: O(N)
LC 438. Find All Anagrams in a String
This is actually a 2-pointer problem; my LC submission's time complexity is O(n)
🌟 LC 15. 3Sum
Although "2 sum" can be solved by
HashMap, this 3 sum is actually a 3-pointers problem; it's hard to dedup results without sorting the input arrayKey point:
Dedup result
Convert 3 sum to 2 sum variant (how many combinations to satisfy x+y==target?)
Optimal Answer. TC: O(n2), SC: O(1)
⚪ LC 18. 4Sum
Similar to above LC 15. 3Sum
Be careful about the pruning operation using
breakinside theforloop. It is very error-prone.Optimal Answer. TC: O(n3), SC: O(1)
🟠 LC 3404. Count Special Subsequences
This problem can be considered a more complex version of the '2 Sum' problem. In '2 Sum', we use a later element in the array to check if a corresponding earlier element exists in a hash map. Similarly, in this problem, we leverage later elements (
nums[s]/nums[r]) to determine relationships with earlier elements (nums[p]/nums[q])Optimal Answer. TC: O(n2), SC: O(n2)
LC 1679. Max Number of K-Sum Pairs
It's basically as same as 2 sum.
Optimal Answer. TC: O(n), SC: O(n)
LC 2215. Find the Difference of Two Arrays
Set Answer. TC: O(n), SC: O(n)
Map Answer. TC: O(n), SC: O(n)
LC 1207. Unique Number of Occurrences
Optimal Answer. TC: O(n), SC: O(n)
⚪ LC 1657. Determine if Two Strings Are Close
It becomes straightforward once you realize this can be solved using
Map. Be careful thatSetcannot be used here and sort a constant array is O(1) in TC!Optimal Answer. TC: O(n), SC: O(1)
🟠 LC 2352. Equal Row and Column Pairs
To use an
int[]as a key in aMap, convert the array to aStringusingArrays.toString(array)and use the resultingStringas the key.Optimal Answer. TC: O(n2), SC: O(n2)
🌟 🟠 LC 128. Longest Consecutive Sequence
It's hard to resolve it in O(n). This brute force solution first.
3 key points to get the optimal solution based on TC O(n3) brute force solution.
Use
Setto do O(1) look up.Visited each sequence only once
Skip elements using this condition:
if (set.contains(num-1)). In this case, we make sure we only visit the sequence when the current number can be the first number of this sequence.Only using this condition still cannot guarantee we visit each sequence only once because the first number of this sequence can be duplicated in the input array. Therefore, we need to iterate
Setinstead of the input array to dedup.
TC: O(n). Although it looks like O(n2), it is actually O(n+n). Inner
whileloop only visits each element ONCE during the entire algorithm runtime.SC: O(n)
LC 217. Contains Duplicate
Optimal Answer. TC: O(n), SC: O(n)
⚪ LC 219. Contains Duplicate II
Try to resolve it using
Setinstead ofMapOptimal Answer. TC: O(n), SC: O(min(n,k))
⚪ LC 1930. Unique Length-3 Palindromic Subsequences
Optimal Answer. TC: O(n), SC: O(1)
LC 389. Find the Difference
Optimal Answer. TC: O(n), SC: O(1)
LC 2043. Simple Bank System
Optimal Answer. For each method, TC: O(1), SC: O(1)
⚪ LC 966. Vowel Spellchecker
Assume C is the total content of
wordlistandqueries.TC: O(C)
SC: O(C)
⚪ LC 348. Design Tic-Tac-Toe
To achieve the optimal answer, always think about “What numeric quantity reaches an extreme when the answer happens?”
Try to express the win/answer condition as a formula, not a state.
Look for sum / difference / count / max / min / parity.
Optimal Answer. TC: O(1), SC: O(n)
⚪ LC 1169. Invalid Transactions
Optimal Answer. TC: O(n2), SC: O(n)
🟠 LC 3583. Count Special Triplets
I only came up with Approach 1 by myself, but it's unnecessarily complicated.
Core lesson learned: I realized frequency counting could help, but I got stuck because I could only easily obtain the left frequency. The key insight is that we can maintain TWO maps — a total frequency map and a prefix map — so the right frequency can be derived as
right = total - prefix.Approach 1: Binary Search + 1 Map
Answer. TC: O(nlogn), SC: O(n)
👍 Approach 2: 2 Maps
Optimal Answer. TC: O(n), SC: O(n)
LC 1396. Design Underground System
TC: O(1) for all functions
SC: O(P+S2), where S is the number of stations on the network, and P is the number of passengers making a journey concurrently during peak time.
⚪ LC 3531. Count Covered Buildings
I mistakenly used TreeSet to address this problem. Although this can work but it's not optimal answer. TreeSet is for neighbor search (this is just a PLAIN translation of the question). However, the problem only requires checking existence, not finding actual neighbors. This can be simplified to a boundary check: a building is covered if it is strictly between the min and max in its row and column. So instead of maintaining a full ordered set, we ONLY need to track min/max per row and column, reducing both complexity and overhead.
Optimal Answer. TC: O(n), SC: O(n)
TreeMap/TreeSet (6)
Notes
Use cases
TreeMapis useful for maintaining a dynamically sorted collection of elements, especially when the collection is frequently updated with new elements being added and old elements being removed.Typical queries include:
Next greater / next smaller element
smallest element ≥ target →
ceiling()largest element ≤ target →
floor()smallest element > target →
higher()largest element < target →
lower()
Closest value to target ⭐ (very common)
Check both:
floor(target)ceiling(target)
Choose the one with the smaller difference
Pattern: nearest neighbor search in ordered set
Maintaining the candidate set
Dynamic ordered set with sliding window/prefix range. Maintain a set of valid candidates. Then perform ordered queries on the set: nearest/predecessor/successor
When NOT to use TreeSet
If only checking existence (e.g.
lower()/higher() != null)→ consider reducing it to use min/max boundary instead.
This way can reduce the TreeSet O(logn)
getoperation to the O(1) check using boundaries.
🟠 LC 1825. Finding MK Average
addElement(int num): TC O(logn), SC: O(n), where n is the number of unique elements in the map.calculateMKAverage(): TC O(k), SC O(1)
🔴 LC 975. Odd Even Jump
This solution is based on DP + TreeMap and this question can also be solved by DP + Monotonic Stack
Optimal Solution. TC: O(n∗logn), SC: O(n)
🔴 LC 715. Range Module
This problem is super hard and has too many corner cases.
For a given range [left,right) of add or remove action, we actually ONLY care about the interval that overlaps with the left side of the given range and the interval that overlaps with the right side of the given range. These 2 intervals could be the same interval.
This solution is based on the comment of this solution in LC.
Let k be the number of intervals to be cleared and N be the size of the map.
TC:
addRange: O(k+logN)queryRange:O(logN)removeRange: O(k+logN)
SC: O(N)
⚪ LC 1146. Snapshot Array
get()andset(): O(logn)snap(): O(1)
LC 2349. Design a Number Container System
Another approach using Heap instead of TreeSet is also interesting.
TC:
change(...)is O(logn),find(...)is O(1)SC: O(n)
LC 681. Next Closest Time
Optimal Answer. TC: O(1), SC: O(1)
🟠 LC 2561. Rearranging Fruits
I initially misinterpreted the problem statement. After clarifying it, I solved most of the problem (~90%), but I overlooked the crucial indirect swap case.
Let n =
basket1.length=basket2.length, and let k = number of distinct values across both baskets.TC: O(n+k∗logk)
SC: O(k)
🟠 LC 1488. Avoid Flood in The City
I only came up with a segment tree solution based on intuition because I initially modeled the problem as a range query problem. However, the real core operation is not “range sum”, but:
Find the smallest zero-day index strictly greater than
prev(successor query).This is naturally solved by an ordered set (
TreeSet.higher(prev)), so the segment tree was an over-generalized solution.Approach 1 using Segment Tree:
Answer. TC: O(n∗(logn)2), SC: O(n)
Approach 2 using Improved Segment Tree:
Optimal Answer. TC: O(n∗logn), SC: O(n)
👍 Approach 3 using Tree Set:
Optimal Answer. TC: O(n∗logn), SC: O(n)
LC 2817. Minimum Absolute Difference Between Elements With Constraint
Optimal Answer. TC: O(nlogn), SC: O(n)
Last updated