LeetCode - Implement strStr()

3 minute read

Problem description

description

Implement strStr().

Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:

1
2
Input: haystack = "hello", needle = "ll"
Output: 2

Example 2:

1
2
Input: haystack = "aaaaa", needle = "bba"
Output: -1

Clarification:

What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C’s strstr() and Java’s indexOf().

Analysis

There is a straightforward solution which is to find the first character of needle in haystack, and check if there is a string in haystack. Here’s the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Version 1
public class ImplementstrStr {
  public int strStr(String haystack, String needle) {
    if (needle == null || needle.isEmpty()){
      return 0;
    }

    if (haystack.length() < needle.length()){
      return -1;
    }

    int i = 0;

    while (i < haystack.length()){
      if (haystack.charAt(i) == needle.charAt(0)){
        int temp = i + 1, j = 1;
        boolean found = true;
        while (j < needle.length()){
          if (temp >= haystack.length() || haystack.charAt(temp) != needle.charAt(j)){
            found = false;
            break;
          }
          temp++;
          j++;
        }
        if (found){
          return i;
        }
      }
      i++;
    }

    return -1;
  }
}

However, this solution’s efficiency is low, with 2335 ms, faster than 5.00%. So we have to optimize it.

The first idea to optimize is to use String.equals.

Here’s the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Version 2
public int strStr2(String haystack, String needle) {
    if (needle == null || needle.isEmpty()){
      return 0;
    }

    if (haystack.length() < needle.length()){
      return -1;
    }

    int i = 0;

    while (i < haystack.length()){
      if (haystack.charAt(i) == needle.charAt(0)){
        int start = i;
        int end = start + needle.length();
        if (end < haystack.length() && haystack.substring(start, end).equals(needle)){
          return start;
        }
      }
      i++;
    }

    return -1;
  }

If we use substring and equals to check if there’s a needle exists, it improve the efficiency significantly, which is 2 ms, faster than 49.04%.

So here’s the question, why it can improve so much just using substring and equals.

First guess is the charAt method took a lot of time, so we use two array to store the character of haystack and needle to prevent calling charAt, here’s the main code:

1
2
3
4
5
6
7
8
9
// Version 3
char[] chars1 = haystack.toCharArray();
char[] chars2 = needle.toCharArray();
.....

        if (temp >= haystack.length() || chars1[temp] != chars2[j]){
            found = false;
            break;
          }

And the result is 1037 ms, faster than 6.28%.

However, I looked at a sample 1ms solution in LeetCode and it uses the charAt, here’s the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Version 4
class Solution {
    public int strStr(String haystack, String needle) {
        int n=haystack.length();
        int m = needle.length();
        if(needle.isEmpty())
            return 0;
        if(m>n)
            return -1;
        for(int i=0;i<=n-m;i++){
            int j;
            for(j=0;j<m;j++){
                if(haystack.charAt(i+j) != needle.charAt(j))
                    break;
            }
            if(j == m)
                return i;
        }
        return -1;
    }
}

The only difference is that my code have to check n times and his only have to check n-m times. And I modified the while loop in Version 3 to make

i <= haystack.length() - needle.length()

and result in 1 ms, faster than 63.05%

So I’m sure it’s because of the test cases in LeetCode but not charAt.

And the fastest code is to use Java’s default indexOf, I’ll write another post about this.

What to improve

  • boundary check when use index