LeetCode - Repeated DNA Sequences

less than 1 minute read

Problem description

description

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

1
2
3
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Analysis

The first idea is to use hashmap and substring to iterate the whole string for all substring with length of 10 and get the answer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        HashMap<String, Integer> map = new HashMap<>();
        for (int i = 0; i<=s.length() -10;i++){
            String str = s.substring(i, i+10);
            map.put(str, map.getOrDefault(str, 0) + 1);
        }
        
        LinkedList<String> res = new LinkedList<>();
        for (Map.Entry<String,Integer> entry:map.entrySet()){
            if (entry.getValue() > 1){
                res.add(entry.getKey());
            }
        }
        
        return res;
    }
}

What to improve