How to find longest substring without repeating characters in Python?

Being in the top most asked questions in FAANG (or MAANG) and is comprised of more than one DSA concept. This question has a lot of complications leading to Time Limit Exceeded errors in coding platforms. Without any further delay, let’s discuss the various methods this question can be solved with starting from the easiest to the most optimal. Note that this question is a modification of the commonly found question on LeetCode: 3. Longest Substring Without Repeating Characters – https://leetcode.com/problems/longest-substring-without-repeating-characters/

Understanding The Question

Given a string s, find the longest substring without repeating characters.

A substring is a smaller string that fits on top of a string like each puzzle piece inside the whole puzzle. A substring is a string consisting of all the consecutive characters in the same order between two indexes of the parent string.
Only substrings will be selected for further checks with all distinct characters.

With this in mind, we have to select a substring whose size is the longest and consists of all unique characters.

Solution With Code In Python

Brute-Force Approach: Checking All Substrings

We take two nested loops with which we separate each substring from the original. Then we check whether each substring consists of all non-repeating characters or not. This is the worst approach to solve this problem before of its massive Time Complexity.

Time Complexity:

The nested loop to collect each substring: O(N*N) = O(N²).
Logic to check one substring has all unique characters: O(N).
Logic to check each substring having non repeating characters: O(N*N²) = O(N³).

Space Complexity:

If the logic for checking string having no repeating characters consist of using a set (most optimal logic): O(N).

Hence, this approach is the worst as not only it is taking O(N³) time complexity but also O(N) space complexity.

Solution Code in Python:


def uniqueChars(string):
    # Converting to a set and comparing the size takes O(N)
    # O(N) is the time complexity required to convert iterable to a set
    return len(set(string)) == len(string)

def bruteSolution(string : str) -> str:

    size = len(string)
    max_len = 0
    result_string = ""
    # Creating substrings

    for i in range(size):
        for j in range(i, size):
            # A substring is from i - j
            # String Slicing is also an O(M) complexity where M is substring size
            substring = string[i : j+1]

            if uniqueChars(substring):
                # substring contains only unique characters
                if(len(substring) > max_len):
                    result_string = substring
                    max_len = len(substring)

    return result_string

Code language: Python (python)

Better Approach – Two Pointer Sliding Window Approach

This approach requires Two Pointers hence reducing the O(N²) time complexity to O(N). Keeping track of the starting and ending of a substring, the head, and the tail, substituting the substring with these two pointers only. Alongside, this approach uses the Set data structure to store the characters of the substring.

Initially, both the pointers are pointing to the first index.
The tail pointer is incremented each step filling the Set with characters for the substring.
As a character is approached which is already present inside the Set, a loop is run to increment the head pointer and also remove the elements passing by this head pointer. This reduces the substring up to the point when the duplicate character is removed from the substring. Hence this current character now becomes the original as the previous repetition of this character has been removed from the Set.
Each operation checks whether the length of this new substring (calculated by [tail – head + 1]) is the longest. When the check is true, we store the head and tail of this substring in a separate variable.
At the end of all operations, we collect the resultant substring by iterating from head to tail indexes and creating it as a string, or just by slicing in Python.

Time Complexity:

As it only iterates through the string twice (worst case head goes upto tail making head traverse N once and tail traverse N once): O(N) + O(N) = O(2N).

Space Complexity:

For storing the characters inside the Set data structure: O(N).

Solution Code in Python:


def betterSolution(string : str) -> str:

    size = len(string)
    # Intializing the two pointers and the Set
    head = 0
    tail = 0
    # Substrings are not explicitly stored but is kept by this head and tail pointer
    chars = set()

    max_len = 1
    s = 0       # Starting index of the resultant substring
    e = 0       # Ending Index of the resultant substring
    # Both inclusive

    for tail in range(size):
        while string[tail] in chars:
            chars.remove(string[head])
            head += 1

        chars.add(string[tail])

        if max_len < (tail - head + 1):
            s = head
            e = tail
            max_len = e - s + 1

    result_string = string[s : e+1]
    return result_string

Code language: Python (python)

Most Optimal Solution: Advanced Sliding Window Two Pointer Approach

Instead of using the set, if a HashMap is used which to store the indexes, it will reduce the time complexity as the head would be jumping to the index next to the first duplicate character hence removing it.

An issue can arise that jumping the head pointer to the next index of the first duplicate, does not remove the characters before the duplicate residing inside the substring inside the HashMap. For this, we simply ignore the characters inside the HashMap whose index is before the head. This is because the substring only comprises of characters from head to tail index. Hence a new checking system must also be implemented to check whether a character is duplicate and is currently inside the substring.

Time Complexity:

Only the tail increments over the whole string: O(N).

Space Complexity:

For storing the characters inside the HashMap data structure: O(N).

Solution Code in Python:


def optimalSolution(string):

    size = len(string)
    head = 0
    tail = 0
    # Substrings are not explicitly stored but is kept by this head and tail pointer
    chars = dict()      # HashMap in Python

    max_len = 1
    s = 0       # Starting index of the resultant substring
    e = 0       # Ending Index of the resultant substring
    # Both inclusive

    for tail in range(size):
        if string[tail] in chars:
            # Current tail character already present inside HashMap
            if chars[string[tail]] >= head:
                # All characters between head and tail is inside current substring
                # If the character inside HashMap is after head index, then it is inside this current substring
                # Hence, the current tail is a duplicate character, reduce the substring
                head = chars[string[tail]] + 1

        chars[string[tail]] = max(chars.get(string[tail], 0), tail)

        if max_len < (tail - head + 1):
            s = head
            e = tail
            max_len = e - s + 1

    result_string = string[s : e+1]
    return result_string
Code language: Python (python)

Conclusion

Even though this question was a modification, the mechanics and logic were the same. Hence the approaches were similar to that of the original question.

Understanding The Question

Solution With Code In Python

Brute-Force Approach: Checking All Substrings

Time Complexity:

Space Complexity:

Solution Code in Python:

Better Approach – Two Pointer Sliding Window Approach

Time Complexity:

Space Complexity:

Solution Code in Python:

Most Optimal Solution: Advanced Sliding Window Two Pointer Approach

Time Complexity:

Space Complexity:

Solution Code in Python:

Conclusion

No comments so far