Open Access Paper
24 May 2022 Configurable image recognition framework design based on KNN and bit-based similarity model
Gang Lin, Yanchun Liang, Yonglin Chen, Wenbo Pan
Author Affiliations +
Proceedings Volume 12260, International Conference on Computer Application and Information Security (ICCAIS 2021); 122601I (2022) https://doi.org/10.1117/12.2637494
Event: International Conference on Computer Application and Information Security (ICCAIS 2021), 2021, Wuhan, China
Abstract
CAPTCHAs are widely utilized on the Internet to partially protect against computer attacks, and text-based CAPTCHAs are commonly used. In order to make the more flexible attack, this paper provides a framework with configurable options based on k-NN, including three major parts: preprocessing the binary image, building standard library and recognizing image. The standard library is built from training dataset, where the third part can be an option to drop out some characters with a high similarity, and the library is used for testing dataset. A bit-based similarity model is proposed, where "and" and "or" bit operations are executed, and the result is the ratio of both operations. Finally, the framework is applied into four typical scenarios, MNIST handwriting database, CAPTCHAs built by the CAPTCHA generator, online CAPTCHAs of CNKI website, and CAPTCHAs within open source PHP DedeCMS, the average classification accuracy is 97.05%. As a result, the model is simple but effective, the framework can work well for text-based CAPTCHAs and handwritten numbers, which may make associated websites pay more attention to current authentication mechanism, and it offers flexibility to cover more algorithms and application scenarios by implementing different logics of preprocessing according to defined APIs.

1.

INTRODUCTION

CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are widely used on the Internet to distinguish between human users and computer programs1-5. There are many kinds of CAPTCHAs available, such as text-based, image-based, audio-based, video/animation-based, puzzle, graphical slider, face-recognition, etc.6-9. Text-based CAPTCHAs are frequently used by whether enterprises or individuals due to convenience and user-friendly look10, 11. There are a lot of researches on text-based CAPTCHAs, but few of them focused on and depicted a generic framework to attack CAPTCHAs, much less building a framework flexible enough to meet different needs, such as different priority of accuracy, time-consuming, data storage space of standard library, and so on.

The k-NN (k-nearest neighbor) algorithm as a typical classification algorithm is one of the simplest algorithms among almost all current famous data mining algorithms such as C4.5, k-Means, SVM, Apriori, EM, PageRank, etc., which principle is also widely applied into other algorithms such as collaborative filtering, Rocchio12. The k nearest neighbor or k-NN classification determines the decision boundary locally where k is a parameter13, 14. Generally, in many cases, the 1-NN rule may be strictly better than the other k-NN (k > 1) rules12, 15. It is important to design a similarity model for k- NN algorithm with a high classification accuracy and a low storage requirement.

The standard process consists of both segmentation process and recognition process to identify Text-based CAPTCHAs16, 17, which can be designed in sequence or not. The segmentation then recognition approach executes segmentation process first, and the final results largely depend on the correct number of segments4. However, the segmentation simultaneously recognition approach scores all possible ways to segment a CAPTCHA and decides which combination is the most likely to be the correct one through 4 major components: the Cut-point Detector, the Slicer, the Scorer, and the Arbiter4. Additionally, a generic approach based on Gabor filters uses a single segmentation and recognition strategy with two main steps: “extracting components” and “partition and recognition”, where Gabor filters is applies to extract character components instead of an intact character from Captcha images along different directions respectively2. Although both the segmentation simultaneously recognition approach and the generic approach based on Gabor filters work well for a lot of types of CAPTCHAs, it is very difficult to integrate them with a generic framework based on k-NN. In order to make the framework as a global instead of local function, standard process, i.e., the segmentation then recognition approach is used.

This paper is organized as follows. Firstly, the design of the binarization image recognition framework is introduced briefly, including the overall design and key points of each process. Secondly, an implementation of the framework is illustrated with some key APIs, configuration files, key logics of k-NN algorithm and a bit-based similarity model. Finally, four scenarios were applied to the framework, many different sub scenarios were made for each one, and the info of each execution time, accuracy rate was gathered before its analysing.

2.

DEIGN ON IMAGE RECOGNITION FRAMEWORK

The framework consists of 3 major parts: preprocessing, building standard library, and recognition, as shown in Figure 1.

Figure 1.

Total Design of Image Recognition Framework.

00221_psisdg12260_122601i_page_2_1.jpg

2.1.

Preprocessing

After each image located, preprocessing is executed. The segmentation is based on the image after binarization and denoising, which converts to binary an input image having pixels defining text and background, each of its pixels is either dark or white18. Proper binarization is very important for separating the foreground object from the background for images19[19]. The foreground is with digital, character or other valid information, and the background without it. Once the foreground is separated from the background, the image can be distinguished by recognizing the shape of each foreground image one by one, such as handwritten characters and CAPTCHAs after preprocessing.

2.2.

Building standard library

All training images were annotated in advance before building standard library. There are 2 different methods when judging whether any character of each training image is to add into the standard library. One method is without any limits, i.e., the standard library contains all characters of training images, which size is always big and equal to the number of training characters. The other one is to set a maximum threshold (MXT_S), only characters are added when the similarity value between it and other existing items in standard library is less than the threshold, and the size of standard library is always smaller and depends on the threshold value, as shown in Figure 2. Generally, the bigger the threshold is set, the bigger the size of standard library, and the higher the accuracy. The method one could be seen as a special case of the method two since it means no limits when the threshold is set to 1. The framework uses the threshold configured in a configuration file or database.

Figure 2.

Design of Building Standard Library.

00221_psisdg12260_122601i_page_2_2.jpg

There are some popular algorithms of the similarity, such as Person Correlation Coefficient, Cosine Similarity20, Jaccard Similarity21, etc. The bit-based similarity model the framework used is similar to cosine similarity. The image data after preprocessing are stored in the memory in binary form, 0 for white and 1 for black, or vice versa. Two images are same when each individual pixel is same in the same position, and they are similar when most pixels are same, only few are different. Given 2 images A and B, P(A) and P(B) represent the binary matrix of A and B, respectively. The result of bitwise-or between P(A) and P(B) indicates the largest range of foreground like union set of P(A) and P(B), and the result of bitwise-and between P(A) and P(B) indicates same range of foreground area like intersection of P(A) and P(B). The smaller the difference between both results, the bigger the similarity. The bit-based similarity model defines the similarity by dividing the number of same foreground area of P(A) and P(B) into the number of the largest range of foreground area of P(A) and P(B), i.e., dividing the bitwise-or result by bitwise-and result, as showed in equation (1). The higher the similarity value is, the closer these 2 images or characters are. Hence the similarity is some number between 0 and 1, S(A, A) = 1, in another words, A resembles itself 100%, for any size21.

00221_psisdg12260_122601i_page_3_1.jpg

2.3.

Image recognition

The k-NN algorithm is used to predict which category each target sample belongs to by a majority of k neighbor standard samples closest to the target. The bit-based similarity model is used and k is configured which can be changed in different scenarios. Once k is set, as candidates, the top k standard items ranked in order of similarity are cached during recognition. Standard items are compared to the target, one item a time, until a match with the similarity higher than a threshold value of “direct recognition” or no more items required to match. Among the top k standard items, the final value is determined by the majority of candidate character values. If more than one candidate character values are tied for the same majority, the value with a higher similarity is the final result. Therefore, the final result is not always with the highest similarity when k is greater than 1.

The k-NN algorithm has the advantage of being easy and practical, and with a suitable accuracy20. There are some algorithms, such as CNN (Condensed Nearest Neighbor), ball-tree, kd-tree (k-dimensional tree), LSH (Locally Sensitive Hash) etc., to reduce storage space or improve the speed of locating k nearest neighbors20, 22, 23. In the framework, the byte array is used for reduction of storage space consumption. Each pixel of the image data after preprocessing is indicates in binary form such as 0 and 1 for white and black separately and 8 pixels are grouped within a byte. The image data is stored in a byte array. As shown in Table 1, it lists a binary array for a six after preprocessing, which is stored as an array of byte with length 40. It needs about 7 times more storage space if each one or zero is used directly in the application since a byte is a basic unit.

Table 1.

Array for a six after preprocessing.

Besides, one more feature is designed for quick locating associated standard items, such as foreground area percentage. There are many items with small similarity value wasting time if all of items within standard library are required to be matched per recognizing a character. In order to locate the standard item with high similarity value as quick as possible, a new feature, the foreground area, is used in the framework. That is to say, on executing each recognition, items in standard library are grouped by the foreground area, as a result, different group has different priority to match, and some with low priority are skipped. Let key as the feature value, shown in equation (2), where si is the foreground area of item i, and f is a customized function. A proper function makes a good balance between the number of groups and the coverage percentage of characters to recognize per group. In general, the less the number of groups is, more number of items each group contains, more coverage percentage of characters each group has.

00221_psisdg12260_122601i_page_3_2.jpg

A linear function is proposed, such as shown in equation (3), where a is a coefficient between 0 and 1, such as 0.1. Given that S is the total area including foreground and background area, the domain of the function is an open range as shown in equation (4), while the range is a close range as shown in equation (5).

00221_psisdg12260_122601i_page_3_3.jpg
00221_psisdg12260_122601i_page_3_4.jpg
00221_psisdg12260_122601i_page_3_5.jpg

For some character to be recognized, it is easy to get the feature value of key according to equation (3). Once the key is calculated, the priority of all items in group key is the highest, and the priority value goes down in the direction of distance between group number and key. The priority value can be calculated according to equation (6), where gni is the group number of standard item i, key is the feature value of a character to be recognized, and MAX is the maximum that is greater or equal than the total number of groups.

00221_psisdg12260_122601i_page_4_2.jpg

There are 2 threshold values called MXT_E, and MXN_E in the framework. When the maximum similarity is less than MXT_E and the number of different priorities matched is less than MXN_E, more groups with next priority are involved, otherwise, the recognition stops. The image result returns when all characters are recognized, as shown in Figure 3.

Figure 3.

Design of character recognition.

00221_psisdg12260_122601i_page_5_1.jpg

3.

API AND APPLICATION

3.1.

API

There are 3 main interfaces and associated classes in the framework, as shown in Figure 4.

Figure 4.

Three main classes of framework.

00221_psisdg12260_122601i_page_5_2.jpg

3.2.

Configuration values

According to the design, some common parameters are to configure in advance, as shown in Table 2.

Table 2.

Some configuration values.

NameExampleCommentsNameExampleComments
DATA_TYPE11:mnist/2:captcha/3:cnki/4:cmsMXT_E0.85Threshold of extending the search
MXT_I0.95Threshold of direct recognitionMXN_E5Max ranges of priority to search
MXT_S0.95Threshold of standard librarya0.1The coefficient for feature value of key

Additionally, all of the possible values need being pre-configured, such as cnki, there are 27 candidates in total, as shown in Table 3. The number or index can be retrieved according to the expected or tagged character in pre-computed hash tables, and vice reverse.

Table 3.

Relationship between sequence number and char.

No.CharNo.CharNo.CharNo.CharNo.CharNo.CharNo.CharNo.CharNo.Char
021324354658697A8B
9C10D11E12F13G14H15J16K17L
18N19M20P21R22S23T24W25X26Y

3.3.

Application scenarios

The framework was applied for 4 scenarios, MNIST handwriting database, CAPTCHAs built by the captcha generator, online CAPTCHAs of CNKI website, and CAPTCHAs within open source PHP DedeCMS.

On making the standard set or recognition, preprocessing is the first step, e.g., the image “E93D” was converted to 4 individual images “E”, “9”, “3” and “D”, and then the next step is locating the position of hash tables according to Table 3. With the hash tables, it is quick to locate associated group according to the tagged value, as shown in Figure 5. Table 4 lists test results for all these 4 scenarios. It is seen that the framework work well for different scenarios with the accuracy 97.05% on average. Generally, the accuracy differs a little according to the parameters configured. Take the CMS scenario as an example, the accuracy differs according to different knn and MXT_E, as shown in Figure 6.

Figure 5.

Preprocessing and locating standard items.

00221_psisdg12260_122601i_page_6_1.jpg

Figure 6.

Relationships between accuracy and MXT_E.

00221_psisdg12260_122601i_page_6_2.jpg

Table 4.

Test results for 4 different scenarios.

 MNISTCAPTCHASCNKICMS
knn/MXN_E1/101/51/51/5
MXT_I/MXT_S/MXT_E0.95/0.9/0.80.995/0.995/0.950.995/0.995/0.950.95/0.95/0.8
Count of training/test/right items60000/10000/96676000/4000/38264000/8400/8085700/1537/1531
accuracy96.67%95.65%96.25%99.61%
Testing/average time46320.38/4.6353892.36/13.4741554.62/4.9593.27/0.06

4.

CONCLUSION

This paper provides a configurable image recognition framework based on k-NN and bit-based similarity model, where 4 different scenarios were successfully applied. For different scenarios, different parameters can be configured in order to get better results. Additionally, the framework can be expended to other scenarios conveniently by setting different parameters, or be updated to involve more algorithms by adding some new concrete classes to implement the interfaces mentioned in the papers.

ACKNOWLEDGMENTS

The authors are grateful to the support of the Guangdong Key Disciplines Project (2021ZDJS138). This work was financially supported by 2021 University-level Teaching Quality Project supported by Zhuhai College of Science and Technology (Grant No. ZLGC20210203).

REFERENCES

[1] 

Thobhani, A., Gao, M. S., Hawbani, A., et al, “CAPTCHA recognition using deep learning with attached binary images,” Electronics, 9 (1522), 1 –19 (2020). Google Scholar

[2] 

Gao, H., Yan, J., Fang, C., Zhang, Z. and Li, J., “A simple generic attack on text captchas,” in Network & Distributed System Security Symp, 1 –14 (2016). Google Scholar

[3] 

George, D. et al., “A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs,” Science, 358 (6368), 1 –9 (2017). https://doi.org/10.1126/science.aag2612 Google Scholar

[4] 

Bursztein, E., Aigrain, J., Moscicki, A. and Mitchell, J. C., “The end is nigh: Generic solving of text-based CAPTCHAs,” in USENIX Conf. on Offensive Technologies, 1 –15 (2014). Google Scholar

[5] 

Hussain, R., Kumar, K. H., Gao, H. and Khan, I., “Recognition of merged characters in text based CAPTCHAs,” in Proc. of the 2016 3rd Inter. Conf. on Computing for Sustainable Global Development (INDIACom), 3917 –3921 (2016). Google Scholar

[6] 

Kumar, M. and Jindal, M. K., “A systematic survey on CAPTCHA recognition: Types, creation and breaking techniques,” Archives of Computational Methods in Engineering, 28 (4), 1 –30 (2021). Google Scholar

[7] 

Hernandez-Castro C. J., “R-Moreno M. D. and Barrero D. F., “Using JPEG to measure image continuity and break capy and other puzzle CAPTCHAs,” IEEE Internet Computing, 19 (6), 46 –53 (2015). https://doi.org/10.1109/MIC.2015.127 Google Scholar

[8] 

Schryen, G., Wagner, G. and Schlegel, A., “Development of two novel face-recognition CAPTCHAs: A security and usability study,” Computers & Security, 60 95 –116 (2016). https://doi.org/10.1016/j.cose.2016.03.007 Google Scholar

[9] 

Xu Y., Reynaga G., Chiasson S., et al, “Security analysis and related usability of motion-based CAPTCHAs: Decoding codewords in motion,” IEEE Transactions on Dependable & Secure Computing, 11 (5), 480 –493 (2014). https://doi.org/10.1109/TDSC.2013.52 Google Scholar

[10] 

Wang, Y. and Lu, M., “An optimized system to solve text-based CAPTCHA,” International Journal of Artificial Intelligence and Applications (IJAIA), 9 (3), 19 –36 (2018). https://doi.org/10.5121/ijaia Google Scholar

[11] 

Bursztein, E., Martin, M. and Mitchell, J., “Text-based captcha strengths and weaknesses,” in Proc. of the 18th ACM Conf. on Computer and Communications Security CCS’11, 125 –137 (2011). Google Scholar

[12] 

Zhou, T., Yuan, F. and Zhuang, X., “Simplest Data Mining,” 31 –45 Publishing House of Electronics Industry, Beijing (2020). Google Scholar

[13] 

Manning, C. D., Raghavan, P. and Schütze H., “Introduction to Information Retrieval,” 200 –220 Beijing Posts & Telecom Press, Beijing (2010). Google Scholar

[14] 

Lin G., Liang Y., Fu X., Chen G., Cai S., “Design of a daily brief business report generator based on web scraping with KNN algorithm,” in Journal of Physics: Conference Series, 1 –7 (2019). Google Scholar

[15] 

Cover, T. M. and Hart, P. E., “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, 13 (1), 21 –27 (1967). https://doi.org/10.1109/TIT.1967.1053964 Google Scholar

[16] 

Chellapilla, K. and Simard, P., “Using machine learning to break visual human interaction proofs (HIPs),” Advances in Neural Information Processing Systems, 17 1 –8 (2004). Google Scholar

[17] 

Li Q. J., Mao Y. B. and Wang Z. Q., “A survey of CAPTCHA technology,” Journal of Computer Research and Development, 49 (3), 469 –480 (2012). Google Scholar

[18] 

Booth, R. R. and Phelps, M. J., “Image Binarization,” 2016). Google Scholar

[19] 

Chaki, N., Shaikh, S. H. and Saeed, K., “A comprehensive survey on image binarization techniques,” Studies in Computational Intelligence, 560 5 –15 (2014). Google Scholar

[20] 

Lin, Z., “Principles and Applications of Big Data Technology,” 286 –296 3rd editionBeijing Posts & Telecom Press, Beijing (2021). Google Scholar

[21] 

Broder, A. Z., “On the resemblance and containment of documents,” in Proc. of the Compression and Complexity of Sequences, 21 –29 (1997). Google Scholar

[22] 

Hart P. E., “The condensed nearest neighbor rule,” IEEE Transactions on Information Theory, 18 515 –516 (1968). https://doi.org/10.1109/TIT.1968.1054155 Google Scholar

[23] 

Gaede, V. and Gunther, O., “Multidimensional access methods,” ACM Computing Surveys, 30 (2), 170 –231 (1998). https://doi.org/10.1145/280277.280279 Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Gang Lin, Yanchun Liang, Yonglin Chen, and Wenbo Pan "Configurable image recognition framework design based on KNN and bit-based similarity model", Proc. SPIE 12260, International Conference on Computer Application and Information Security (ICCAIS 2021), 122601I (24 May 2022); https://doi.org/10.1117/12.2637494
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Binary data

Image segmentation

Expectation maximization algorithms

Detection and tracking algorithms

Databases

Target recognition

Image processing

RELATED CONTENT


Back to Top