Hello Programmers, In this post, you will know how to solve the HackerRank Detect HTML Tags Solution. This problem is a part of the Regex HackerRank Series.
One more thing to add, don’t directly look for the solutions, first try to solve the problems of Hackerrank by yourself. If you find any difficulty after trying several times, then you can look for solutions.
HackerRank Detect HTML Tags Solution
Problem
In this challenge, we’re using regular expressions to detect the various tags used in an HTML document.
- Tags come in pairs. Some tag name, t, will have an opening tag, <t>, followed by some intermediate text, followed by a closing tag, </t>. The forward slash in a closing tag will always come before the tag name.
- The exception to this is self-closing tags, which consist of a single tag (not a pair) with a forward slash after the tag name: <p/>
Here are a few examples of tags:
- The p tag is for paragraphs: <p>This is a paragraph</p>
- There may be 1 or more spaces before or after a tag name: < p > This is also a paragraph</p>
- A void or empty tag involves an opening and closing tag with no intermediate characters: <p></p>
Some tags can also have attributes, such as the a tag, which is used to add a hyperlink to another document:
<a href = “http://www.google.com”>Google</a>
In the above case, a is the tag name and href is an attribute having the value http://www.google.com.
Task
Given N lines of HTML, find the tag names (ignore any attributes) and print them as a single line of lexicographically ordered semicolon–separated values (e.g.: tag1; tag2; tag3).
Input Format
The first line contains an integer, N, the number of HTML fragments.
Each of the N subsequent lines contains a fragment of an HTML document.
Constraints
- 1 <= N <= 100
- Each fragment contains < 10000 ASCII characters.
- The fragments are chosen from Wikipedia, so analyzing and observing their markup structure may help.
- Leading and trailing spaces/indentation have been trimmed from the HTML fragments.
Output Format
Print a single line containing all of the unique tag names found in the input. Your output tags should be semicolon–separated and ordered lexicographically (i.e.: alphabetically). Do not print the same tag name more than once.
Sample Input
2 <p><a href="http://www.quackit.com/html/tutorial/html_links.cfm">Example Link</a></p> <div class="more-info"><a href="http://www.quackit.com/html/examples/html_links_examples.cfm">More Link Examples...</a></div>
Sample Output
a;div;p
Explanation
The first line contains 2 tag names: {p, a}.
The second line contains 2 tag names: {div, a}.
Our set of unique tag names is {p, a, div}.
When we order these alphabetically and print them as semicolon-separated values, we get “a;div;p”.
HackerRank Detect HTML Tags Solution in Cpp
#include <stdio.h> #include <set> #include <string> #include <cmath> #include <cstdio> #include <vector> #include <iostream> #include <algorithm> using namespace std; set<string> tags; int main() { int n; cin >> n; getchar(); string aux, s; tags.clear(); while(n--) { getline(cin, aux); int size = aux.size(); for(int i=0; i<size; i++) { if(aux[i] == '<') { s.clear(); for(int j=aux[i+1] == '/' ? i+2 : i+1; aux[j] != ' ' && aux[j] != '/' && aux[j] != '>'; j++) { s.push_back(aux[j]); } if(!tags.count(s)) { tags.insert(s); } } } } bool first = true; for(set<string>::iterator it = tags.begin(); it != tags.end(); it++) { if(!first) cout << ";"; first = false; cout << *it; } cout << endl; return 0; }
HackerRank Detect HTML Tags Solution in Java
import java.io.*; import java.util.*; import java.text.*; import java.math.*; import java.util.regex.*; public class Solution { public static void main(String[] args) { /* Enter your code here. Read input from STDIN. Print output to STDOUT. Your class should be named Solution. */ Scanner in = new Scanner(System.in); String format = "(\\w+)(| ).*>.*"; ArrayList<String> tag = new ArrayList<String>(); int testcase = in.nextInt(); String dec = in.nextLine(); for(int i = 0; i<testcase;i++){ String x = in.nextLine(); String[] y = x.split("<"); for(int j = 0;j<y.length;j++){ if(y[j].matches(format) == true){ Pattern pattern = Pattern.compile(format); Matcher match = pattern.matcher(y[j]); match.matches(); match.groupCount(); if(tag.contains(match.group(1)) == false){ tag.add(match.group(1)); } } } } Collections.sort(tag); for(int k = 0;k<tag.size();k++){ if(k == tag.size()-1){ System.out.println(tag.get(k)); } else{ System.out.print(tag.get(k)+";"); } } } }
HackerRank Detect HTML Tags Solutions in Python
import re from sets import Set import sys n = int(input()) res = [] for i in range(n): s = raw_input() for z in re.findall('<\s*(\w+) ?[^>]*>',s): if(z not in res): res.append(z) res.sort() for i in range(len(res)): if(i==0 or res[i] != res[i-1]): if(i!=0): sys.stdout.write(';') sys.stdout.write(res[i]) sys.stdout.flush()
HackerRank Detect HTML Tags Solutions in JavaScript
'use strict'; function processData(input) { var parse_fun = function (s) { return parseInt(s, 10); }; var lines = input.split('\n'); var n = parse_fun(lines.shift()); var tagRE = /(?:<[ ]*([a-z][a-z0-9_]*)[^>]*>)/g; var data = lines.splice(0, n); var tags = {}; data.forEach(function (s) { var arr = null; while ((arr = tagRE.exec(s)) != null) { tags[arr[1]] = 0; } }); var res = []; for (var i in tags) { res.push(i); } res.sort(); console.log(res.join(';')); } process.stdin.resume(); process.stdin.setEncoding("ascii"); var _input = ""; process.stdin.on("data", function (input) { _input += input; }); process.stdin.on("end", function () { processData(_input); });
HackerRank Detect HTML Tags Solutions in PHP
<?php $_fp = fopen("php://stdin", "r"); /* Enter your code here. Read input from STDIN. Print output to STDOUT */ fscanf($_fp, "%d", $m); $lines = array(); for ($i = 0; $i < $m; $i++) { $lines[] = trim(fgets($_fp)); } $search = '/<\s*([a-z0-9]+)[^>]*(:?>(.*)<\/\\1\s*>|\\>)/i'; $matches = array(); $tags = array(); while (preg_match_all($search, implode($lines), $matches)) { $tags = array_merge($tags, $matches[1]); $lines = $matches[2]; } $tags = array_unique($tags); sort($tags); print implode(';', $tags) . PHP_EOL;
Disclaimer: This problem (Detect HTML Tags) is generated by HackerRank but the Solution is Provided by BrokenProgrammers. This tutorial is only for Educational and Learning purposes.