HackerRank Detect HTML Tags Solution

Hello Programmers, In this post, you will know how to solve the HackerRank Detect HTML Tags Solution. This problem is a part of the Regex HackerRank Series.

HackerRank Detect HTML Tags Solution
HackerRank Detect HTML Tags Solutions

One more thing to add, don’t directly look for the solutions, first try to solve the problems of Hackerrank by yourself. If you find any difficulty after trying several times, then you can look for solutions.

HackerRank Detect HTML Tags Solution

Problem

In this challenge, were using regular expressions to detect the various tags used in an HTML document.

  • Tags come in pairs. Some tag name, t, will have an opening tag, <t>, followed by some intermediate text, followed by a closing tag, </t>. The forward slash in a closing tag will always come before the tag name.
  • The exception to this is self-closing tags, which consist of a single tag (not a pair) with a forward slash after the tag name: <p/>

Here are a few examples of tags:

  • The p tag is for paragraphs: <p>This is a paragraph</p>
  • There may be 1 or more spaces before or after a tag name: < p > This is also a paragraph</p>
  • void or empty tag involves an opening and closing tag with no intermediate characters: <p></p>

Some tags can also have attributes, such as the a tag, which is used to add a hyperlink to another document: 
<a href = “http://www.google.com”>Google</a>

In the above case, a is the tag name and href is an attribute having the value http://www.google.com.

Task

Given N lines of HTML, find the tag names (ignore any attributes) and print them as a single line of lexicographically ordered semicolonseparated values (e.g.: tag1; tag2; tag3).

Input Format

The first line contains an integer, N, the number of HTML fragments.
Each of the N subsequent lines contains a fragment of an HTML document.

Constraints

  • 1 <= N <= 100
  • Each fragment contains < 10000 ASCII characters.
  • The fragments are chosen from Wikipedia, so analyzing and observing their markup structure may help.
  • Leading and trailing spaces/indentation have been trimmed from the HTML fragments.

Output Format

Print a single line containing all of the unique tag names found in the input. Your output tags should be semicolonseparated and ordered lexicographically (i.e.: alphabetically). Do not print the same tag name more than once.

Sample Input

2
<p><a href="http://www.quackit.com/html/tutorial/html_links.cfm">Example Link</a></p>
<div class="more-info"><a href="http://www.quackit.com/html/examples/html_links_examples.cfm">More Link Examples...</a></div>

Sample Output

a;div;p

Explanation

The first line contains 2 tag names: {p, a}.
The second line contains 2 tag names: {div, a}.
Our set of unique tag names is {p, a, div}.
When we order these alphabetically and print them as semicolon-separated values, we get “a;div;p”.

Ezoicreport this adHackerRank Detect HTML Tags Solution in Cpp

#include <stdio.h>
#include <set>
#include <string>
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
set<string> tags;
int main() {
    int n;
    cin >> n;
    getchar();
    string aux, s;
    tags.clear();
    while(n--) {
        getline(cin, aux);
        int size = aux.size();
        for(int i=0; i<size; i++) {
            if(aux[i] == '<') {
                s.clear();
                for(int j=aux[i+1] == '/' ? i+2 : i+1; aux[j] != ' ' && aux[j] != '/' && aux[j] != '>'; j++) {
                    s.push_back(aux[j]);
                }
                if(!tags.count(s)) {
                    tags.insert(s);
                }
            }
        }
    }
    bool first = true;
    for(set<string>::iterator it = tags.begin(); it != tags.end(); it++) {
        if(!first) cout << ";";
        first = false;
        cout << *it;
    }
    cout << endl;
    return 0;
}

HackerRank Detect HTML Tags Solution in Java

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution {
    public static void main(String[] args) {
        /* Enter your code here. Read input from STDIN. Print output to STDOUT. Your class should be named Solution. */
		Scanner in = new Scanner(System.in);
		String format = "(\\w+)(| ).*>.*";
		ArrayList<String> tag = new ArrayList<String>();
		int testcase = in.nextInt();
		String dec = in.nextLine();
		for(int i = 0; i<testcase;i++){
			String x = in.nextLine();
			String[] y = x.split("<");
			for(int j = 0;j<y.length;j++){
				if(y[j].matches(format) == true){
					Pattern pattern = Pattern.compile(format);
					Matcher match = pattern.matcher(y[j]);
					match.matches();
					match.groupCount();
					if(tag.contains(match.group(1)) == false){
						tag.add(match.group(1));
					}
				}
			}
		}
		Collections.sort(tag);
		for(int k = 0;k<tag.size();k++){
			if(k == tag.size()-1){
				System.out.println(tag.get(k));
			}
			else{
			System.out.print(tag.get(k)+";");
			}
		}
    }
}
Ezoicreport this ad

HackerRank Detect HTML Tags Solutions in Python

import re
from sets import Set
import sys
n = int(input())
res = []
for i in range(n):
    s = raw_input()
    for z in re.findall('<\s*(\w+) ?[^>]*>',s):
        if(z not in res):
            res.append(z)
res.sort()    
for i in range(len(res)):
    if(i==0 or res[i] != res[i-1]):
        if(i!=0):
            sys.stdout.write(';')
        sys.stdout.write(res[i])
sys.stdout.flush()
        

HackerRank Detect HTML Tags Solutions in JavaScript

'use strict';
function processData(input) {
    var parse_fun = function (s) { return parseInt(s, 10); };
    var lines = input.split('\n');
    var n = parse_fun(lines.shift());
    var tagRE = /(?:<[ ]*([a-z][a-z0-9_]*)[^>]*>)/g;
    var data = lines.splice(0, n);
    var tags = {};
    data.forEach(function (s) {
        var arr = null;
        while ((arr = tagRE.exec(s)) != null) {
            tags[arr[1]] = 0;
        }
    });
    var res = [];
    for (var i in tags) { res.push(i); }
    res.sort();
    console.log(res.join(';'));
}
process.stdin.resume();
process.stdin.setEncoding("ascii");
var _input = "";
process.stdin.on("data", function (input) { _input += input; });
process.stdin.on("end", function () { processData(_input); });

HackerRank Detect HTML Tags Solutions in PHP

<?php
$_fp = fopen("php://stdin", "r");
/* Enter your code here. Read input from STDIN. Print output to STDOUT */
fscanf($_fp, "%d", $m);
$lines = array();
for ($i = 0; $i < $m; $i++) {
    $lines[] = trim(fgets($_fp));
}
$search = '/<\s*([a-z0-9]+)[^>]*(:?>(.*)<\/\\1\s*>|\\>)/i';
$matches = array();
$tags = array();
while (preg_match_all($search, implode($lines), $matches)) {
    $tags = array_merge($tags, $matches[1]);
    $lines = $matches[2];
}
$tags = array_unique($tags);
sort($tags);
print implode(';', $tags) . PHP_EOL;

Disclaimer: This problem (Detect HTML Tags) is generated by HackerRank but the Solution is Provided by BrokenProgrammers. This tutorial is only for Educational and Learning purposes.

Next: HackerRank Find a Sub Word Solution

Sharing Is Caring

Leave a Comment

Ezoicreport this ad