Hi, I work as a data scientist and I am seeking advice for my research. I am running a project to find malicious smart contract addresses. My goal is to create machine learning models to predict the risk that each smart contract has. In order to achieve this, I need to collect as many malicious smart contract addresses as possible, and label them with risk tags.
​
**The steps I planned:**
1. Access web3 auditing websites such as Hacken, OpenZeppelin, Consensys Diligence, Kudelski Security, ChainSecurity halborn.com etc.
2. These websites have blog posts about web3 security incidents as well as trusted web3 projects. From those articles and papers, collect smart contract addresses. Target chains are Ethereum, Arbitrum, polygon and Optimism.
3. Together with addresses, gather risk information associated with each address. Especially if the smart contract address is a malicious one, pay close attention to the articles and papers and gather risk information.
4. For each address and associated risk information collected, put risk tags. Risk tags are a collection of dozens of pre-defined true/false labels. Examples of the risk tags in the list are: “*honeypot*”, “*Slippage Modifiable*”, “*Airdrop Scam*”.
​
**Example:**
Let’s visit hacken.io, a web3 auditing company. They have a lot of blog posts about security incidents. In [this entry](https://hacken.io/insights/kyberswap-hack-explained/), they explain some notable examples of smart contract hacks.
Down in the middle, under the section “**The Flaw: Reentrancy in the Mint Function**”, you can find:
Vulnerable Contract
KS2-RT: Smart Contract Tracker
Clicking the link above leads you to [a page](https://etherscan.io/token/0xcbec1e9407f1910c86f261eaeac27d85c0479e8c) in Etherscan.io. Down in the middle, there’s a link to a [contract page](https://etherscan.io/address/0xcbec1e9407f1910c86f261eaeac27d85c0479e8c).
Copy the contract address(*0xcbEc1e9407F1910c86F261eAeaC27d85c0479E8c*) and put it in the spreadsheet. Set “*Reentrancy*” risk tag to “*true*”.
​
**Ask:**
I want to get advice on
* If this is a feasible project. I am aiming to collect 20,000 smart contract addresses as well as risk tags. So it’s gonna be either to outsource this project to many freelancers, or seek a way to automate it with a script.
* I am counting on reports published by web3 auditing websites listed above. However I’m not sure if they have enough information. Is there any better way to collect addresses?
​
Thank you!
Source: self.CryptoCurrency