Could machine learning build a model for prime numbers?
This article is a repost from blog.sap.com.
When we think about a prime number we do it in terms of divisibility properties.
Wikipedia. A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number.
I came in second place in the #SapHanaCloud #MachineLearning #Challenge.
I wrote this blog about the challenge and the solution.
The classification stuck in my head for a while and then one day the prime number problem popped up.
Machine learning doesn’t solve all problems, but it’s worth exploring if it can.
“Humans are an API to ChatGPT.
ChatGPT is an API to Python.
Python is an API to C.
C is an API to assembly.
Assembly is an API to binary.
Binary is an API to physics.
Physics is an API to the machine that runs the universe.It’s computation all the way down.”
Can we express the prime number property as a classification problem?
Table of content
1. Intervals with prime numbers
2. Intervals with the last digit of the prime
3. Conversion of class into binary
4. Conversion of intervals into binary
5. Splitting binary intervals into columns
6. Reorganizing data with sliding windows
7. Building the models with sliding windows
7.1 Model classification report
7.2 Model confusion matrix
7.3 Prediction classification report
7.4 Prediction confusion matrix
8. Building models with a validation data set
8.1 Model classification report
8.2 Model confusion matrix
8.3 Prediction classification report
8.4 Prediction confusion matrix
8.5 Model performance
9. Building models with early stopping
9.1 Model classification report
9.2 Model confusion matrix
9.3 Prediction classification report
9.4 Prediction confusion matrix
9.5 Model performance
10. Conclusion and further steps
11. Building LSTM model with sliding windows
In the blog, I will only go through the main parts. The entire code is on GitHub, published with my last run.
1. Intervals with prime numbers
Let’s try the re-definition of a prime number as a classification problem in binary form.
A prime number could end with [1, 3, 7, 9]. Examples by intervals:
In the interval from 10 to 19 primes are: 11, 13, 17, 19
In the interval from 20 to 29 primes are 23, 29.
1->[11, 13, 17, 12]
2->[23, 29]
2. Intervals with the last digit of the prime
The combination [1, 3, 7, 9] is the pattern the class I have to predict.
1->[1, 3, 7, 9]
2->[3, 9]
3. Conversion of class into binary
If prime 1, else 0.
1->[1, 1, 1, 1]
2->[0, 1, 0, 1]
4. Conversion of intervals into binary
Conversion of class as a list into a string.
00000000000001->1111
00000000000010->0101
5. Splitting binary intervals into columns
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]->[1111]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]->[0101]
6. Reorganizing data with sliding windows
That shifts the previous interval and previous prime class. The sliding window can include any number of subsequent previous rows.
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]->[1111]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [1, 1, 1, 1], # previous row
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]->[0101] # current row and class to predict
7. Building the models with sliding windows
Fit with all train data without a validation data set
# In 93:
%%time
n_shifts = 24
for i_shift in range(1, n_shifts):
# Out 93:
model score: 0.521 prediction score: 0.250 shift: 001 (9983, 14) (16, 14) (9983,) (16,)
model score: 0.730 prediction score: 0.375 shift: 002 (9982, 32) (16, 32) (9982,) (16,)
model score: 0.843 prediction score: 0.375 shift: 003 (9981, 50) (16, 50) (9981,) (16,)
model score: 0.907 prediction score: 0.312 shift: 004 (9980, 68) (16, 68) (9980,) (16,)
model score: 0.939 prediction score: 0.375 shift: 005 (9979, 86) (16, 86) (9979,) (16,)
model score: 0.970 prediction score: 0.250 shift: 006 (9978, 104) (16, 104) (9978,) (16,)
model score: 0.981 prediction score: 0.125 shift: 007 (9977, 122) (16, 122) (9977,) (16,)
model score: 0.988 prediction score: 0.312 shift: 008 (9976, 140) (16, 140) (9976,) (16,)
model score: 0.994 prediction score: 0.250 shift: 009 (9975, 158) (16, 158) (9975,) (16,)
model score: 0.997 prediction score: 0.188 shift: 010 (9974, 176) (16, 176) (9974,) (16,)
model score: 0.999 prediction score: 0.312 shift: 011 (9973, 194) (16, 194) (9973,) (16,)
model score: 0.999 prediction score: 0.438 shift: 012 (9972, 212) (16, 212) (9972,) (16,)
model score: 1.000 prediction score: 0.312 shift: 013 (9971, 230) (16, 230) (9971,) (16,)
model score: 1.000 prediction score: 0.438 shift: 014 (9970, 248) (16, 248) (9970,) (16,)
model score: 1.000 prediction score: 0.375 shift: 015 (9969, 266) (16, 266) (9969,) (16,)
model score: 1.000 prediction score: 0.562 shift: 016 (9968, 284) (16, 284) (9968,) (16,)
model score: 1.000 prediction score: 0.375 shift: 017 (9967, 302) (16, 302) (9967,) (16,)
model score: 1.000 prediction score: 0.312 shift: 018 (9966, 320) (16, 320) (9966,) (16,)
model score: 1.000 prediction score: 0.375 shift: 019 (9965, 338) (16, 338) (9965,) (16,)
model score: 1.000 prediction score: 0.375 shift: 020 (9964, 356) (16, 356) (9964,) (16,)
model score: 1.000 prediction score: 0.250 shift: 021 (9963, 374) (16, 374) (9963,) (16,)
model score: 1.000 prediction score: 0.438 shift: 022 (9962, 392) (16, 392) (9962,) (16,)
model score: 1.000 prediction score: 0.375 shift: 023 (9961, 410) (16, 410) (9961,) (16,)
CPU times: total: 1h 24min 1s
Wall time: 32min 49s
Model score = 1.000!
However, the maximum prediction score is = 0.562!
The model is capable of learning and then memorizing but it can’t generalize the knowledge to predict!
7.1 Model classification report
# Out 95
precision recall f1-score support
0 1.00 1.00 1.00 3192
1 1.00 1.00 1.00 1114
2 1.00 1.00 1.00 1097
3 1.00 1.00 1.00 196
4 1.00 1.00 1.00 1126
5 1.00 1.00 1.00 578
6 1.00 1.00 1.00 202
7 1.00 1.00 1.00 86
8 1.00 1.00 1.00 1106
9 1.00 1.00 1.00 179
10 1.00 1.00 1.00 596
11 1.00 1.00 1.00 93
12 1.00 1.00 1.00 174
13 1.00 1.00 1.00 96
14 1.00 1.00 1.00 92
15 1.00 1.00 1.00 34
accuracy 1.00 9961
macro avg 1.00 1.00 1.00 9961
weighted avg 1.00 1.00 1.00 9961
# Out 96:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
# Out 97:
array(['0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111',
'1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111'],
dtype=object)
# 0 ->0000 # interval doesn't have any prime
# 1 ->0001 # prime 0009 # interval has one primes ending with 9
# 15->1111 # prime 1379 # interval has primes ending with 1, 3, 7, 9
7.2 Model confusion matrix
7.3 Prediction classification report
# Out 99:
precision recall f1-score support
0 0.45 0.71 0.56 7
1 1.00 0.00 0.00 2
2 0.00 1.00 0.00 0
4 0.00 1.00 0.00 0
5 1.00 0.00 0.00 1
8 0.50 0.25 0.33 4
10 1.00 0.00 0.00 2
accuracy 0.38 16
macro avg 0.56 0.42 0.13 16
weighted avg 0.64 0.38 0.33 16
7.4 Prediction confusion matrix
8. Building models with a validation data set
Increasing the sliding windows and allowing overfitting to analyze model performance with train data. Early stopping = 0.
# In 101:
%%time
# n_shifts = 24
for i_shift in range(1, n_shifts):
X_train, X_test, y_train, y_test = f_Xy_train_Xy_test(i_shift, df)
# 101:
model score: 0.534 prediction score: 0.250 shift: 001 (9983, 14) (16, 14) (9983,) (16,)
model score: 0.675 prediction score: 0.188 shift: 002 (9982, 32) (16, 32) (9982,) (16,)
model score: 0.744 prediction score: 0.250 shift: 003 (9981, 50) (16, 50) (9981,) (16,)
model score: 0.783 prediction score: 0.312 shift: 004 (9980, 68) (16, 68) (9980,) (16,)
model score: 0.808 prediction score: 0.250 shift: 005 (9979, 86) (16, 86) (9979,) (16,)
model score: 0.810 prediction score: 0.438 shift: 006 (9978, 104) (16, 104) (9978,) (16,)
model score: 0.814 prediction score: 0.438 shift: 007 (9977, 122) (16, 122) (9977,) (16,)
model score: 0.816 prediction score: 0.375 shift: 008 (9976, 140) (16, 140) (9976,) (16,)
model score: 0.818 prediction score: 0.375 shift: 009 (9975, 158) (16, 158) (9975,) (16,)
model score: 0.821 prediction score: 0.375 shift: 010 (9974, 176) (16, 176) (9974,) (16,)
model score: 0.816 prediction score: 0.375 shift: 011 (9973, 194) (16, 194) (9973,) (16,)
model score: 0.825 prediction score: 0.438 shift: 012 (9972, 212) (16, 212) (9972,) (16,)
model score: 0.820 prediction score: 0.375 shift: 013 (9971, 230) (16, 230) (9971,) (16,)
model score: 0.825 prediction score: 0.375 shift: 014 (9970, 248) (16, 248) (9970,) (16,)
model score: 0.822 prediction score: 0.438 shift: 015 (9969, 266) (16, 266) (9969,) (16,)
model score: 0.821 prediction score: 0.438 shift: 016 (9968, 284) (16, 284) (9968,) (16,)
model score: 0.825 prediction score: 0.438 shift: 017 (9967, 302) (16, 302) (9967,) (16,)
model score: 0.818 prediction score: 0.250 shift: 018 (9966, 320) (16, 320) (9966,) (16,)
model score: 0.822 prediction score: 0.375 shift: 019 (9965, 338) (16, 338) (9965,) (16,)
model score: 0.819 prediction score: 0.188 shift: 020 (9964, 356) (16, 356) (9964,) (16,)
model score: 0.825 prediction score: 0.375 shift: 021 (9963, 374) (16, 374) (9963,) (16,)
model score: 0.819 prediction score: 0.500 shift: 022 (9962, 392) (16, 392) (9962,) (16,)
model score: 0.823 prediction score: 0.250 shift: 023 (9961, 410) (16, 410) (9961,) (16,)
CPU times: total: 1h 8min 9s
Wall time: 29min 35s
Building models with a validation data set doesn’t increase the prediction score, the maximum prediction score = 0.438.
8.1 Model classification report
# Out 106:
precision recall f1-score support
0 0.77 0.90 0.83 3192
1 0.83 0.79 0.81 1114
2 0.86 0.78 0.82 1097
3 0.88 0.75 0.81 196
4 0.83 0.79 0.81 1126
5 0.91 0.79 0.84 578
6 0.87 0.80 0.83 202
7 0.97 0.70 0.81 86
8 0.81 0.82 0.82 1106
9 0.90 0.77 0.83 179
10 0.94 0.75 0.83 596
11 0.97 0.75 0.85 93
12 0.87 0.75 0.81 174
13 0.92 0.83 0.87 96
14 0.94 0.79 0.86 92
15 0.96 0.76 0.85 34
accuracy 0.82 9961
macro avg 0.89 0.78 0.83 9961
weighted avg 0.83 0.82 0.82 9961
8.2 Model confusion matrix
8.3 Prediction classification report
# Out 110:
precision recall f1-score support
0 0.36 0.57 0.44 7
1 0.00 0.00 0.00 2
2 0.00 1.00 0.00 0
4 0.00 1.00 0.00 0
5 1.00 0.00 0.00 1
8 1.00 0.00 0.00 4
10 1.00 0.00 0.00 2
11 0.00 1.00 0.00 0
accuracy 0.25 16
macro avg 0.42 0.45 0.06 16
weighted avg 0.60 0.25 0.19 16
8.4 Prediction confusion matrix
8.5 Model performance
The overfit occurs after 20 epochs.
9. Building models with early stopping
Stop overfitting to analyze model performance with validation data. Early stopping = 10.
# Out 113:
model score: 0.329 prediction score: 0.375 shift: 001 (9983, 14) (16, 14) (9983,) (16,)
model score: 0.373 prediction score: 0.438 shift: 002 (9982, 32) (16, 32) (9982,) (16,)
model score: 0.377 prediction score: 0.438 shift: 003 (9981, 50) (16, 50) (9981,) (16,)
model score: 0.402 prediction score: 0.375 shift: 004 (9980, 68) (16, 68) (9980,) (16,)
model score: 0.439 prediction score: 0.375 shift: 005 (9979, 86) (16, 86) (9979,) (16,)
model score: 0.467 prediction score: 0.312 shift: 006 (9978, 104) (16, 104) (9978,) (16,)
model score: 0.508 prediction score: 0.375 shift: 007 (9977, 122) (16, 122) (9977,) (16,)
model score: 0.529 prediction score: 0.375 shift: 008 (9976, 140) (16, 140) (9976,) (16,)
model score: 0.542 prediction score: 0.375 shift: 009 (9975, 158) (16, 158) (9975,) (16,)
model score: 0.579 prediction score: 0.500 shift: 010 (9974, 176) (16, 176) (9974,) (16,)
model score: 0.547 prediction score: 0.375 shift: 011 (9973, 194) (16, 194) (9973,) (16,)
model score: 0.570 prediction score: 0.375 shift: 012 (9972, 212) (16, 212) (9972,) (16,)
model score: 0.569 prediction score: 0.375 shift: 013 (9971, 230) (16, 230) (9971,) (16,)
model score: 0.628 prediction score: 0.312 shift: 014 (9970, 248) (16, 248) (9970,) (16,)
model score: 0.619 prediction score: 0.375 shift: 015 (9969, 266) (16, 266) (9969,) (16,)
model score: 0.617 prediction score: 0.438 shift: 016 (9968, 284) (16, 284) (9968,) (16,)
model score: 0.637 prediction score: 0.312 shift: 017 (9967, 302) (16, 302) (9967,) (16,)
model score: 0.626 prediction score: 0.438 shift: 018 (9966, 320) (16, 320) (9966,) (16,)
model score: 0.608 prediction score: 0.375 shift: 019 (9965, 338) (16, 338) (9965,) (16,)
model score: 0.667 prediction score: 0.312 shift: 020 (9964, 356) (16, 356) (9964,) (16,)
model score: 0.656 prediction score: 0.375 shift: 021 (9963, 374) (16, 374) (9963,) (16,)
model score: 0.669 prediction score: 0.375 shift: 022 (9962, 392) (16, 392) (9962,) (16,)
model score: 0.664 prediction score: 0.375 shift: 023 (9961, 410) (16, 410) (9961,) (16,)
CPU times: total: 22min 34s
Wall time: 10min 22s
Building models with a validation data set and early stopping doesn’t increase the prediction score, the maximum prediction score = 0.500.
9.1 Model classification report
# Out 114:
precision recall f1-score support
0 0.59 0.87 0.70 3192
1 0.70 0.61 0.65 1114
2 0.70 0.60 0.64 1097
3 0.73 0.51 0.60 196
4 0.71 0.60 0.65 1126
5 0.84 0.56 0.67 578
6 0.77 0.48 0.59 202
7 0.93 0.50 0.65 86
8 0.66 0.60 0.63 1106
9 0.76 0.23 0.35 179
10 0.88 0.49 0.63 596
11 1.00 0.29 0.45 93
12 0.82 0.55 0.66 174
13 0.91 0.70 0.79 96
14 0.97 0.65 0.78 92
15 0.93 0.76 0.84 34
accuracy 0.66 9961
macro avg 0.81 0.56 0.64 9961
weighted avg 0.69 0.66 0.66 9961
9.2 Model confusion matrix
9.3 Prediction classification report
# Out 116:
precision recall f1-score support
0 0.42 0.71 0.53 7
1 1.00 0.00 0.00 2
4 0.00 1.00 0.00 0
5 1.00 0.00 0.00 1
6 0.00 1.00 0.00 0
8 1.00 0.25 0.40 4
10 0.00 0.00 0.00 2
accuracy 0.38 16
macro avg 0.49 0.42 0.13 16
weighted avg 0.62 0.38 0.33 16
9.4 Prediction confusion matrix
9.5 Model performance
10. Conclusion and further steps
Could an RNN keras model improve prediction accuracy? What other features’ reengineering could increase the prediction accuracy?
Computations are performed locally so resources are limited.
Could #SapHanaCloud #MachineLearning help to see what’s going on beyond the horizon of n_power = 10?
Is this another attempt to show that machine learning can’t build a function for prime numbers?
Update on 21.03.2022
11. Building LSTM model with sliding windows
The LSTM model built with Keras trained on data with sliding windows 24 has an accuracy below XGBoost model.
Recall the results with XGBoost from step 7. Building the models with sliding windows.
Model score = 1.000!
However, the maximum prediction score is = 0.562!
Updated code with LSTM is on GitHub, published with my last.
Importing libraries.
# In 120:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, LSTM, TimeDistributed
from keras.utils import to_categorical
import matplotlib.pyplot as plt
Data preparation.
# In 121:
n_shifts = 24 #sliding window
X_train, X_test, y_train, y_test = f_Xy_train_Xy_test(n_shifts, df)
LSTM model.
# In 132:
%%time
# create a sequential model
# https://keras.io/api/layers/recurrent_layers/lstm/
model = Sequential()input_shape_timesteps = X_train.shape[1]
input_shape_feature = X_train.shape[2]
model.add(LSTM(32, input_shape=(input_shape_timesteps, input_shape_feature)))
# add two hidden layers with 16 and 8 units and relu activation
model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
# add a dense output layer with 16 units and softmax activation
model.add(Dense(16, activation='softmax'))
# compile the model with categorical cross-entropy loss and adam optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# train the model for 50 epochs
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), verbose=0)
# evaluate the model on the training and testing data
train_loss, train_acc = model.evaluate(X_train, y_train, verbose=1)
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=1)
print(f"Training accuracy: {train_acc:.4f}")
print(f"Testing accuracy: {test_acc:.4f}")
# Out 136:
precision recall f1-score support
0 0.32 1.00 0.49 3192
1 1.00 0.00 0.00 1114
2 1.00 0.00 0.00 1097
3 1.00 0.00 0.00 196
4 1.00 0.00 0.00 1126
5 1.00 0.00 0.00 577
6 1.00 0.00 0.00 202
7 1.00 0.00 0.00 86
8 1.00 0.00 0.00 1106
9 1.00 0.00 0.00 179
10 1.00 0.00 0.00 596
11 1.00 0.00 0.00 93
12 1.00 0.00 0.00 174
13 1.00 0.00 0.00 96
14 1.00 0.00 0.00 92
15 1.00 0.00 0.00 34
accuracy 0.32 9960
macro avg 0.96 0.06 0.03 9960
weighted avg 0.78 0.32 0.16 9960
Model classification report
# Out 137:
precision recall f1-score support
0 0.44 1.00 0.61 7
1 1.00 0.00 0.00 2
5 1.00 0.00 0.00 1
8 1.00 0.00 0.00 4
10 1.00 0.00 0.00 2
accuracy 0.44 16
macro avg 0.89 0.20 0.12 16
weighted avg 0.75 0.44 0.27 16
Prediction classification report