The development of artificial intelligence involves the creation of highly specialized language models. The narrower the field of study, the more adequate answers AI gives in it. The peculiarity of the darknet is that many of its resources are inaccessible to ordinary browsers – therefore, popular language models may not be competent in it. To study the darknet in South Korea, the DarkBERT model was created based on the RoBERTa architecture. The goal is to help security researchers and law enforcement agencies.
The detailed manual of the model gives a general idea about the darknet and the methods of AI work in it. RoBERTa was developed back in 2019. Its peculiarity is that it is able to recognize the methods of encoding information in messages adopted in the Dark Web and extract useful information from them.
Turning to the model now, the researchers discovered its great potential and insufficient training in the early stages of development. They crawled the Dark Web through the Tor anonymization system and then filtered the raw data (using techniques such as deduplication, category balancing, and data preprocessing) to create a database. DarkBERT is a combination of this database and the RoBERTa model.
The result was worth the effort – by knowing and learning the specific “language” of the dark web, DarkBERT outperforms other language models in researching and “understanding” the dark web. The training and tuning of the model continues, and it has the potential to improve its results.
Tor Developers Announce Mullvad Secure Browser – A FireFox Clone With Its Own Privacy-Focused VPN
Source: Tom’s Hardware