map-location-dotHash Map/Set

Notes and typical questions for hash map or hash set related problems

Notes

  • If the array content is within a specific range, we can use int[] to replace Map

  • If there are multiple int[] and you need to group them, you can convert int[] to String and use String as key in the HashMap

  • If the result needs to be deduped, a 2/3 pointers solution may be easier to write with sorting the input array.

  • Understanding HashMap Time Complexity

    • A common question that always arises is: why are hashmap lookups considered O(1)O(1) in terms of time complexity, even in worst-case scenarios? This seems counterintuitive, especially considering that hash collisions can occur.

      If we use a predetermined hash function, the worst-case time for hashmap operations could indeed be O(n)O(n). Why? Because someone could craft a set of keys that all hash to the same value, causing a chain of collisions. This would force the lookup to scan through all nn elements, resulting in O(n)O(n) time complexity.

      The key to achieving O(1)O(1) time complexity lies in randomization. Instead of using a fixed hash function like h(x) = (constant_a . x + constant_b) % constant_prime, we can use a randomized approach. For example, we might choose random values for the parameters in our hash function each time we initialize our hashmap, such as h(x) = (random_a . x + random_b) % random_prime. (This is just one way to construct a hash function; there are many other types you can design.)

      This randomization makes it virtually impossible for someone to predict and exploit the hash function's behavior.

      From a mathematical perspective, when analyzing the "expected runtime" of hashmap operations using a randomized hash function, it averages out to O(1)O(1). While some individual operations might take longer due to collisions, the overall average remains constant.

      It's crucial to understand that when we say "expected worst-case time is O(1)O(1)", we're referring to the average over all possible random choices of the hash function, for any given input.

      This isn't just theoretical—it’s applied in practice. For instance, Google’s Abseil library randomizes hash functions at the program start. This helps prevent attacks that exploit hash collisions and makes systems more secure. Randomization also ensures that software doesn't become dependent on a specific hash function. Hardcoding a hash function and never changing it makes future updates to improve security or performance challenging.

      This concept illustrates a broader principle in system design: the power of introducing controlled randomness to improve system performance and security. It also relates to Hyrum's Law, which suggests that all observable behaviors of a system will eventually be depended on by somebody. By randomizing hash functions, we prevent dependencies on specific hash behaviors, making systems more robust and flexible.

      Additionally, when we say "expected value," it's not just a random term; it is formally defined, similar to worst-case and average-case scenarios. You can read the definition and understand the concept here in probability theory: Expected valuearrow-up-right.

Typical Questions (28)

Common Hash Map/Set Questions (22)

  • LC 242. Valid Anagram

  • 🟠 LC 49. Group Anagrams

    • Advanced version of LC 242, try to do it without sorting

    • Optimal Answerarrow-up-right. TC: O(nm)O(n*m), SC: O(nm)O(n*m)

  • LC 349. Intersection of Two Arrays

  • LC 454. 4Sum II

  • LC 383. Ransom Note

  • LC 205. Isomorphic Strings

  • LC 290. Word Pattern

    • It's similar to above LC 205.

    • There is a single HashMap solution, but I still chose 2 HashMaps for better readability.

    • Optimal Answerarrow-up-right.

      • NN represents the length of s and MM represents the length of pattern

      • TC: O(N+M)O(N+M)

      • SC: O(N)O(N)

  • LC 438. Find All Anagrams in a String

    • This is actually a 2-pointer problem; my LC submission's time complexity is O(n)

  • 🌟 LC 15. 3Sum

    • Although "2 sum" can be solved by HashMap, this 3 sum is actually a 3-pointers problem; it's hard to dedup results without sorting the input array

    • Key point:

      • Dedup result

      • Convert 3 sum to 2 sum variant (how many combinations to satisfy x+y==target?)

    • Optimal Answerarrow-up-right. TC: O(n2)O(n^2), SC: O(1)O(1)

  • LC 18. 4Sum

    • Similar to above LC 15. 3Sum

    • Be careful about the pruning operation using break inside the for loop. It is very error-prone.

    • Optimal Answerarrow-up-right. TC: O(n3)O(n^3), SC: O(1)O(1)

  • 🟠 LC 3404. Count Special Subsequences

    • This problem can be considered a more complex version of the '2 Sum' problem. In '2 Sum', we use a later element in the array to check if a corresponding earlier element exists in a hash map. Similarly, in this problem, we leverage later elements (nums[s]/nums[r]) to determine relationships with earlier elements (nums[p]/nums[q])

    • Optimal Answerarrow-up-right. TC: O(n2)O(n^2), SC: O(n2)O(n^2)

  • LC 1679. Max Number of K-Sum Pairs

  • LC 2215. Find the Difference of Two Arrays

  • LC 1207. Unique Number of Occurrences

  • LC 1657. Determine if Two Strings Are Close

    • It becomes straightforward once you realize this can be solved using Map. Be careful that Set cannot be used here and sort a constant array is O(1)O(1) in TC!

    • Optimal Answerarrow-up-right. TC: O(n)O(n), SC: O(1)O(1)

  • 🟠 LC 2352. Equal Row and Column Pairs

    • To use an int[] as a key in a Map, convert the array to a String using Arrays.toString(array) and use the resulting String as the key.

    • Optimal Answerarrow-up-right. TC: O(n2)O(n^2), SC: O(n2)O(n^2)

  • 🌟 🟠 LC 128. Longest Consecutive Sequence

    • It's hard to resolve it in O(n)O(n). This brute force solution first.

    • 3 key points to get the optimal solution based on TC O(n3)O(n^3) brute force solution.

      • Use Set to do O(1)O(1) look up.

      • Visited each sequence only once

        • Skip elements using this condition: if (set.contains(num-1)) . In this case, we make sure we only visit the sequence when the current number can be the first number of this sequence.

        • Only using this condition still cannot guarantee we visit each sequence only once because the first number of this sequence can be duplicated in the input array. Therefore, we need to iterate Set instead of the input array to dedup.

    • Optimal Solutionarrow-up-right

      • TC: O(n)O(n). Although it looks like O(n2)O(n^2), it is actually O(n+n)O(n+n). Inner while loop only visits each element ONCE during the entire algorithm runtime.

      • SC: O(n)O(n)

  • LC 217. Contains Duplicate

  • LC 219. Contains Duplicate II

  • LC 1930. Unique Length-3 Palindromic Subsequences

  • LC 389. Find the Difference

  • LC 2043. Simple Bank System

  • LC 966. Vowel Spellchecker

  • LC 348. Design Tic-Tac-Toe

    • To achieve the optimal answer, always think about “What numeric quantity reaches an extreme when the answer happens?”

      • Try to express the win/answer condition as a formula, not a state.

      • Look for sum / difference / count / max / min / parity.

    • Optimal Answerarrow-up-right. TC: O(1)O(1), SC: O(n)O(n)

  • LC 1169. Invalid Transactions

  • 🟠 LC 3583. Count Special Triplets

    • I only came up with Approach 1 by myself, but it's unnecessarily complicated.

    • Core lesson learned: I realized frequency counting could help, but I got stuck because I could only easily obtain the left frequency. The key insight is that we can maintain TWO maps — a total frequency map and a prefix map — so the right frequency can be derived as right = total - prefix.

    • Approach 1: Binary Search + 1 Map

    • 👍 Approach 2: 2 Maps

  • LC 1396. Design Underground System

    • Optimal Answerarrow-up-right.

      • TC: O(1)O(1) for all functions

      • SC: O(P+S2)O(P+S^2), where SS is the number of stations on the network, and PP is the number of passengers making a journey concurrently during peak time.

  • LC 3531. Count Covered Buildings

    • I mistakenly used TreeSet to address this problem. Although this can work but it's not optimal answer. TreeSet is for neighbor search (this is just a PLAIN translation of the question). However, the problem only requires checking existence, not finding actual neighbors. This can be simplified to a boundary check: a building is covered if it is strictly between the min and max in its row and column. So instead of maintaining a full ordered set, we ONLY need to track min/max per row and column, reducing both complexity and overhead.

    • Optimal Answerarrow-up-right. TC: O(n)O(n), SC: O(n)O(n)

TreeMap/TreeSet (6)

  • Notes

    • Use cases

      • TreeMapis useful for maintaining a dynamically sorted collection of elements, especially when the collection is frequently updated with new elements being added and old elements being removed.

      • Typical queries include:

        • Next greater / next smaller element

          • smallest element ≥ targetceiling()

          • largest element ≤ targetfloor()

          • smallest element > targethigher()

          • largest element < targetlower()

        • Closest value to target ⭐ (very common)

          • Check both:

            • floor(target)

            • ceiling(target)

          • Choose the one with the smaller difference

          • Pattern: nearest neighbor search in ordered set

    • Maintaining the candidate set

      • Dynamic ordered set with sliding window/prefix range. Maintain a set of valid candidates. Then perform ordered queries on the set: nearest/predecessor/successor

    • When NOT to use TreeSet

      • If only checking existence (e.g. lower()/higher() != null)

        • → consider reducing it to use min/max boundary instead.

        • This way can reduce the TreeSet O(logn)O(logn) get operation to the O(1)O(1) check using boundaries.

  • 🟠 LC 1825. Finding MK Average

    • Optimal Answerarrow-up-right

      • addElement(int num): TC O(logn)O(log{n}), SC: O(n)O(n), where nn is the number of unique elements in the map.

      • calculateMKAverage(): TC O(k)O(k), SC O(1)O(1)

  • 🔴 LC 975. Odd Even Jump

    • This solution is based on DP + TreeMap and this question can also be solved by DP + Monotonic Stack

    • Optimal Solutionarrow-up-right. TC: O(nlogn)O(n*log{n}), SC: O(n)O(n)

  • 🔴 LC 715. Range Module

    • This problem is super hard and has too many corner cases.

    • For a given range [left,right)[left, right) of add or remove action, we actually ONLY care about the interval that overlaps with the left side of the given range and the interval that overlaps with the right side of the given range. These 2 intervals could be the same interval.

    • Optimal Solutionarrow-up-right.

      • This solution is based on the comment of this solutionarrow-up-right in LC.

      • Let kk be the number of intervals to be cleared and NN be the size of the map.

      • TC:

        • addRange : O(k+logN)O(k+logN)

        • queryRange :O(logN)O(logN)

        • removeRange : O(k+logN)O(k+logN)

      • SC: O(N)O(N)

  • LC 1146. Snapshot Array

  • LC 2349. Design a Number Container System

    • Another approach using Heap instead of TreeSet is also interesting.

    • Optimal Answerarrow-up-right.

      • TC: change(...) is O(logn)O(log{n}), find(...) is O(1)O(1)

      • SC: O(n)O(n)

  • LC 681. Next Closest Time

  • 🟠 LC 2561. Rearranging Fruits

    • I initially misinterpreted the problem statement. After clarifying it, I solved most of the problem (~90%), but I overlooked the crucial indirect swap case.

    • Optimal Answerarrow-up-right.

      • Let n = basket1.length = basket2.length, and let k = number of distinct values across both baskets.

      • TC: O(n+klogk)O(n+k*logk)

      • SC: O(k)O(k)

  • 🟠 LC 1488. Avoid Flood in The City

    • I only came up with a segment tree solution based on intuition because I initially modeled the problem as a range query problem. However, the real core operation is not “range sum”, but:

      Find the smallest zero-day index strictly greater than prev (successor query).

      This is naturally solved by an ordered set (TreeSet.higher(prev)), so the segment tree was an over-generalized solution.

    • Approach 1 using Segment Tree:

    • Approach 2 using Improved Segment Tree:

    • 👍 Approach 3 using Tree Set:

  • LC 2817. Minimum Absolute Difference Between Elements With Constraint

Last updated