Hash Map/Set

Notes and typical questions for hash map or hash set related problems

Notes

If the array content is within a specific range, we can use int[] to replace Map
If there are multiple int[] and you need to group them, you can convert int[] to String and use String as key in the HashMap
If the result needs to be deduped, a 2/3 pointers solution may be easier to write with sorting the input array.
Understanding HashMap Time Complexity
- A common question that always arises is: why are hashmap lookups considered $O(1)$ in terms of time complexity, even in worst-case scenarios? This seems counterintuitive, especially considering that hash collisions can occur.
  If we use a predetermined hash function, the worst-case time for hashmap operations could indeed be $O(n)$ . Why? Because someone could craft a set of keys that all hash to the same value, causing a chain of collisions. This would force the lookup to scan through all $n$ elements, resulting in $O(n)$ time complexity.
  The key to achieving $O(1)$ time complexity lies in randomization. Instead of using a fixed hash function like h(x) = (constant_a . x + constant_b) % constant_prime, we can use a randomized approach. For example, we might choose random values for the parameters in our hash function each time we initialize our hashmap, such as h(x) = (random_a . x + random_b) % random_prime. (This is just one way to construct a hash function; there are many other types you can design.)
  This randomization makes it virtually impossible for someone to predict and exploit the hash function's behavior.
  From a mathematical perspective, when analyzing the "expected runtime" of hashmap operations using a randomized hash function, it averages out to $O(1)$ . While some individual operations might take longer due to collisions, the overall average remains constant.
  It's crucial to understand that when we say "expected worst-case time is $O(1)$ ", we're referring to the average over all possible random choices of the hash function, for any given input.
  This isn't just theoretical—it’s applied in practice. For instance, Google’s Abseil library randomizes hash functions at the program start. This helps prevent attacks that exploit hash collisions and makes systems more secure. Randomization also ensures that software doesn't become dependent on a specific hash function. Hardcoding a hash function and never changing it makes future updates to improve security or performance challenging.
  This concept illustrates a broader principle in system design: the power of introducing controlled randomness to improve system performance and security. It also relates to Hyrum's Law, which suggests that all observable behaviors of a system will eventually be depended on by somebody. By randomizing hash functions, we prevent dependencies on specific hash behaviors, making systems more robust and flexible.
  Additionally, when we say "expected value," it's not just a random term; it is formally defined, similar to worst-case and average-case scenarios. You can read the definition and understand the concept here in probability theory: Expected value.

Typical Questions (28)

Common Hash Map/Set Questions (22)

LC 242. Valid Anagram
- Optimal Answer. TC: $O(n)$ , SC: $O(1)$
🟠 LC 49. Group Anagrams
- Advanced version of LC 242, try to do it without sorting
- Optimal Answer. TC: $O(n*m)$ , SC: $O(n*m)$
LC 349. Intersection of Two Arrays
- Optimal Answer. TC: $O(n)$ , SC: $O(1)$
LC 454. 4Sum II
- Optimal Answer. TC: $O(n^2)$ , SC: $O(n^2)$
LC 383. Ransom Note
- Optimal Answer. TC: $O(n)$ , SC: $O(1)$
⚪ LC 205. Isomorphic Strings
- Optimal Answer. TC: $O(n)$ , SC: $O(1)$
LC 290. Word Pattern
- It's similar to above LC 205.
- There is a single HashMap solution, but I still chose 2 HashMaps for better readability.
- Optimal Answer.
  - $N$ represents the length of s and $M$ represents the length of pattern
  - TC: $O(N+M)$
  - SC: $O(N)$
LC 438. Find All Anagrams in a String
- This is actually a 2-pointer problem; my LC submission's time complexity is O(n)
🌟 LC 15. 3Sum
- Although "2 sum" can be solved by HashMap, this 3 sum is actually a 3-pointers problem; it's hard to dedup results without sorting the input array
- Key point:
  - Dedup result
  - Convert 3 sum to 2 sum variant (how many combinations to satisfy x+y==target?)
- Optimal Answer. TC: $O(n^2)$ , SC: $O(1)$
⚪ LC 18. 4Sum
- Similar to above LC 15. 3Sum
- Be careful about the pruning operation using break inside the for loop. It is very error-prone.
- Optimal Answer. TC: $O(n^3)$ , SC: $O(1)$
🟠 LC 3404. Count Special Subsequences
- This problem can be considered a more complex version of the '2 Sum' problem. In '2 Sum', we use a later element in the array to check if a corresponding earlier element exists in a hash map. Similarly, in this problem, we leverage later elements (nums[s]/nums[r]) to determine relationships with earlier elements (nums[p]/nums[q])
- Optimal Answer. TC: $O(n^2)$ , SC: $O(n^2)$
LC 1679. Max Number of K-Sum Pairs
- It's basically as same as 2 sum.
- Optimal Answer. TC: $O(n)$ , SC: $O(n)$
LC 2215. Find the Difference of Two Arrays
- Set Answer. TC: $O(n)$ , SC: $O(n)$
- Map Answer. TC: $O(n)$ , SC: $O(n)$
LC 1207. Unique Number of Occurrences
- Optimal Answer. TC: $O(n)$ , SC: $O(n)$
⚪ LC 1657. Determine if Two Strings Are Close
- It becomes straightforward once you realize this can be solved using Map. Be careful that Set cannot be used here and sort a constant array is $O(1)$ in TC!
- Optimal Answer. TC: $O(n)$ , SC: $O(1)$
🟠 LC 2352. Equal Row and Column Pairs
- To use an int[] as a key in a Map, convert the array to a String using Arrays.toString(array) and use the resulting String as the key.
- Optimal Answer. TC: $O(n^2)$ , SC: $O(n^2)$
🌟 🟠 LC 128. Longest Consecutive Sequence
- It's hard to resolve it in $O(n)$ . This brute force solution first.
- 3 key points to get the optimal solution based on TC $O(n^3)$ brute force solution.
  - Use Set to do $O(1)$ look up.
  - Visited each sequence only once
    Skip elements using this condition: if (set.contains(num-1)) . In this case, we make sure we only visit the sequence when the current number can be the first number of this sequence.
    Only using this condition still cannot guarantee we visit each sequence only once because the first number of this sequence can be duplicated in the input array. Therefore, we need to iterate Set instead of the input array to dedup.
- Optimal Solution
  - TC: $O(n)$ . Although it looks like $O(n^2)$ , it is actually $O(n+n)$ . Inner while loop only visits each element ONCE during the entire algorithm runtime.
  - SC: $O(n)$
LC 217. Contains Duplicate
- Optimal Answer. TC: $O(n)$ , SC: $O(n)$
⚪ LC 219. Contains Duplicate II
- Try to resolve it using Set instead of Map
- Optimal Answer. TC: $O(n)$ , SC: $O(min(n,k))$
⚪ LC 1930. Unique Length-3 Palindromic Subsequences
- Optimal Answer. TC: $O(n)$ , SC: $O(1)$
LC 389. Find the Difference
- Optimal Answer. TC: $O(n)$ , SC: $O(1)$
LC 2043. Simple Bank System
- Optimal Answer. For each method, TC: $O(1)$ , SC: $O(1)$
⚪ LC 966. Vowel Spellchecker
- Optimal Answer.
- Assume $C$ is the total content of wordlist and queries .
  - TC: $O(C)$
  - SC: $O(C)$
⚪ LC 348. Design Tic-Tac-Toe
- To achieve the optimal answer, always think about “What numeric quantity reaches an extreme when the answer happens?”
  - Try to express the win/answer condition as a formula, not a state.
  - Look for sum / difference / count / max / min / parity.
- Optimal Answer. TC: $O(1)$ , SC: $O(n)$
⚪ LC 1169. Invalid Transactions
- Optimal Answer. TC: $O(n^2)$ , SC: $O(n)$
🟠 LC 3583. Count Special Triplets
- I only came up with Approach 1 by myself, but it's unnecessarily complicated.
- Core lesson learned: I realized frequency counting could help, but I got stuck because I could only easily obtain the left frequency. The key insight is that we can maintain TWO maps — a total frequency map and a prefix map — so the right frequency can be derived as right = total - prefix.
- Approach 1: Binary Search + 1 Map
  - Answer. TC: $O(nlogn)$ , SC: $O(n)$
- 👍 Approach 2: 2 Maps
  - Optimal Answer. TC: $O(n)$ , SC: $O(n)$
LC 1396. Design Underground System
- Optimal Answer.
  - TC: $O(1)$ for all functions
  - SC: $O(P+S^2)$ , where $S$ is the number of stations on the network, and $P$ is the number of passengers making a journey concurrently during peak time.
⚪ LC 3531. Count Covered Buildings
- I mistakenly used TreeSet to address this problem. Although this can work but it's not optimal answer. TreeSet is for neighbor search (this is just a PLAIN translation of the question). However, the problem only requires checking existence, not finding actual neighbors. This can be simplified to a boundary check: a building is covered if it is strictly between the min and max in its row and column. So instead of maintaining a full ordered set, we ONLY need to track min/max per row and column, reducing both complexity and overhead.
- Optimal Answer. TC: $O(n)$ , SC: $O(n)$

TreeMap/TreeSet (6)

Notes
- Use cases
  - TreeMapis useful for maintaining a dynamically sorted collection of elements, especially when the collection is frequently updated with new elements being added and old elements being removed.
  - Typical queries include:
    Next greater / next smaller element
    smallest element ≥ target → ceiling()
    largest element ≤ target → floor()
    smallest element > target → higher()
    largest element < target → lower()
    Closest value to target ⭐ (very common)
    Check both:
    floor(target)
    ceiling(target)
    Choose the one with the smaller difference
    Pattern: nearest neighbor search in ordered set
- Maintaining the candidate set
  - Dynamic ordered set with sliding window/prefix range. Maintain a set of valid candidates. Then perform ordered queries on the set: nearest/predecessor/successor
- When NOT to use TreeSet
  - If only checking existence (e.g. lower()/higher() != null)
    → consider reducing it to use min/max boundary instead.
    This way can reduce the TreeSet $O(logn)$ get operation to the $O(1)$ check using boundaries.
🟠 LC 1825. Finding MK Average
- Optimal Answer
  - addElement(int num): TC $O(log{n})$ , SC: $O(n)$ , where $n$ is the number of unique elements in the map.
  - calculateMKAverage(): TC $O(k)$ , SC $O(1)$
🔴 LC 975. Odd Even Jump
- This solution is based on DP + TreeMap and this question can also be solved by DP + Monotonic Stack
- Optimal Solution. TC: $O(n*log{n})$ , SC: $O(n)$
🔴 LC 715. Range Module
- This problem is super hard and has too many corner cases.
- For a given range $[left, right)$ of add or remove action, we actually ONLY care about the interval that overlaps with the left side of the given range and the interval that overlaps with the right side of the given range. These 2 intervals could be the same interval.
- Optimal Solution.
  - This solution is based on the comment of this solution in LC.
  - Let $k$ be the number of intervals to be cleared and $N$ be the size of the map.
  - TC:
    addRange : $O(k+logN)$
    queryRange : $O(logN)$
    removeRange : $O(k+logN)$
  - SC: $O(N)$
⚪ LC 1146. Snapshot Array
- Optimal Answer.
  - get() and set(): $O(logn)$
  - snap(): $O(1)$
LC 2349. Design a Number Container System
- Another approach using Heap instead of TreeSet is also interesting.
- Optimal Answer.
  - TC: change(...) is $O(log{n})$ , find(...) is $O(1)$
  - SC: $O(n)$
LC 681. Next Closest Time
- Optimal Answer. TC: $O(1)$ , SC: $O(1)$
🟠 LC 2561. Rearranging Fruits
- I initially misinterpreted the problem statement. After clarifying it, I solved most of the problem (~90%), but I overlooked the crucial indirect swap case.
- Optimal Answer.
  - Let n = basket1.length = basket2.length, and let k = number of distinct values across both baskets.
  - TC: $O(n+k*logk)$
  - SC: $O(k)$
🟠 LC 1488. Avoid Flood in The City
- I only came up with a segment tree solution based on intuition because I initially modeled the problem as a range query problem. However, the real core operation is not “range sum”, but:
  Find the smallest zero-day index strictly greater than prev (successor query).
  This is naturally solved by an ordered set (TreeSet.higher(prev)), so the segment tree was an over-generalized solution.
- Approach 1 using Segment Tree:
  - Answer. TC: $O(n*(logn)^2)$ , SC: $O(n)$
- Approach 2 using Improved Segment Tree:
  - Optimal Answer. TC: $O(n*logn)$ , SC: $O(n)$
- 👍 Approach 3 using Tree Set:
  - Optimal Answer. TC: $O(n*logn)$ , SC: $O(n)$
LC 2817. Minimum Absolute Difference Between Elements With Constraint
- Optimal Answer. TC: $O(nlogn)$ , SC: $O(n)$

PreviousLinked List NextString

Last updated 13 days ago

hashtagNotes

hashtagTypical Questions (28)

hashtagCommon Hash Map/Set Questions (22)

hashtagTreeMap/TreeSet (6)

Notes

Typical Questions (28)

Common Hash Map/Set Questions (22)

TreeMap/TreeSet (6)