חילוץ טקסט

חילוץ חלקי טקסט מתוך לוגים

בסיסי
Published

November 28, 2023

ברוכים הבאים לתרגיל החמישי!

הדאטה

התרגיל מורכב מטבלת לוגים לכל יוזר:

SELECT TOP 5 * 
FROM LOGS_Q5
5 records
user_id log_date log_text
1 2023-09-20 user 1 registered in 2018. Closure reason: Duplicate account.
2 2023-04-01 user 2 registered in 2022. Closure reason: Fraudulent behaviour.
3 2023-06-26 user 3 registered in 2016. Closure reason: Unsupported activity. Current status: open. Account has been re-opened
4 2023-06-21 user 4 registered in 2016. Closure reason: Did not provide relevant documents. Current status: open. Account has been re-opened
5 2023-02-04 user 5 registered in 2017. Closure reason: Unsupported activity.

שאלה 1

מתוך כל טקסט בלוג חלצו את השנה בה נרשם היוזר וסכמו את מספר המופעים של כל שנה. סדרו את הפלט לפי שנים (השנה העדכנית ביותר תופיע ראשונה וכן הלאה).

[פלט למטה]

שאלה 2

הלוגים מעט שונים אחד מהשני אבל בכולם יש סיבת סגירה של יוזר. חלצו את סיבות הסגירה השונות וסכמו אותן. את הפלט סדרו באופן יורד כך שהסיבה עם מספר המופעים הרב ביותר תופיע ראשונה.

רמז

שימו לב למבנה של הלוג — מה מופיע לפני הסיבה? מה יכול להופיע אחרי הסיבה? איך אפשר להיעזר בזה כדי לחלץ את מה שאנחנו צריכים?

נסו בעצמכם

אונליין

נסו בחלון טקסט למטה (SQLite):

-- EDIT THIS QUERY:
SELECT *
FROM LOGS_Q5
LIMIT 10;

נסו על המחשב שלכם

הצג קוד ליצירת טבלה

DROP TABLE IF EXISTS LOGS_Q5;

CREATE TABLE LOGS_Q5
    (user_id int, log_date datetime, log_text varchar(128))
;
    
INSERT INTO LOGS_Q5
    (user_id, log_date, log_text)
VALUES
    (1, '2023-09-20 00:00:00', 'user 1 registered in 2018. Closure reason: Duplicate account.'),
    (2, '2023-04-01 00:00:00', 'user 2 registered in 2022. Closure reason: Fraudulent behaviour.'),
    (3, '2023-06-26 00:00:00', 'user 3 registered in 2016. Closure reason: Unsupported activity. Current status: open. Account has been re-opened'),
    (4, '2023-06-21 00:00:00', 'user 4 registered in 2016. Closure reason: Did not provide relevant documents. Current status: open. Account has been re-opened'),
    (5, '2023-02-04 00:00:00', 'user 5 registered in 2017. Closure reason: Unsupported activity.'),
    (6, '2023-03-09 00:00:00', 'user 6 registered in 2022. Closure reason: Fraudulent behaviour. Current status: open. Account has been re-opened'),
    (7, '2023-01-17 00:00:00', 'user 7 registered in 2018. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (8, '2023-09-12 00:00:00', 'user 8 registered in 2019. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (9, '2023-01-17 00:00:00', 'user 9 registered in 2014. Closure reason: Duplicate account. Current status: account closed.'),
    (10, '2023-09-17 00:00:00', 'user 10 registered in 2018. Closure reason: Requested to close. Current status: account closed.'),
    (11, '2023-01-06 00:00:00', 'user 11 registered in 2014. Closure reason: Unsupported activity. Current status: open. Account has been re-opened'),
    (12, '2023-05-28 00:00:00', 'user 12 registered in 2015. Closure reason: Requested to close.'),
    (13, '2023-09-09 00:00:00', 'user 13 registered in 2015. Closure reason: Fraudulent behaviour.'),
    (14, '2023-04-08 00:00:00', 'user 14 registered in 2015. Closure reason: Duplicate account. Current status: open.'),
    (15, '2023-03-19 00:00:00', 'user 15 registered in 2018. Closure reason: Duplicate account.'),
    (16, '2023-10-09 00:00:00', 'user 16 registered in 2022. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (17, '2023-05-19 00:00:00', 'user 17 registered in 2015. Closure reason: Fraudulent behaviour. Current status: open. Account has been re-opened'),
    (18, '2023-01-22 00:00:00', 'user 18 registered in 2016. Closure reason: Unsupported activity. Current status: account closed.'),
    (19, '2023-08-30 00:00:00', 'user 19 registered in 2022. Closure reason: Fraudulent behaviour.'),
    (20, '2023-09-26 00:00:00', 'user 20 registered in 2016. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (21, '2023-04-20 00:00:00', 'user 21 registered in 2015. Closure reason: Did not provide relevant documents. Current status: account closed.'),
    (22, '2023-05-02 00:00:00', 'user 22 registered in 2020. Closure reason: Duplicate account. Current status: account closed.'),
    (23, '2023-04-18 00:00:00', 'user 23 registered in 2021. Closure reason: Duplicate account. Current status: account closed.'),
    (24, '2023-07-27 00:00:00', 'user 24 registered in 2020. Closure reason: Requested to close.'),
    (25, '2023-08-25 00:00:00', 'user 25 registered in 2020. Closure reason: Requested to close. Current status: open. Account has been re-opened'),
    (26, '2023-07-22 00:00:00', 'user 26 registered in 2022. Closure reason: Fraudulent behaviour.'),
    (27, '2023-01-28 00:00:00', 'user 27 registered in 2016. Closure reason: Unsupported activity. Current status: open.'),
    (28, '2023-05-20 00:00:00', 'user 28 registered in 2017. Closure reason: Requested to close. Current status: open.'),
    (29, '2023-09-10 00:00:00', 'user 29 registered in 2017. Closure reason: Unsupported activity.'),
    (30, '2023-05-28 00:00:00', 'user 30 registered in 2019. Closure reason: Did not provide relevant documents. Current status: open. Account has been re-opened'),
    (31, '2023-08-27 00:00:00', 'user 31 registered in 2018. Closure reason: Requested to close. Current status: account closed.'),
    (32, '2023-01-08 00:00:00', 'user 32 registered in 2014. Closure reason: Fraudulent behaviour.'),
    (33, '2023-07-20 00:00:00', 'user 33 registered in 2021. Closure reason: Unsupported activity. Current status: account closed.'),
    (34, '2023-01-31 00:00:00', 'user 34 registered in 2016. Closure reason: Duplicate account. Current status: open.'),
    (35, '2023-07-08 00:00:00', 'user 35 registered in 2019. Closure reason: Unsupported activity.'),
    (36, '2023-07-31 00:00:00', 'user 36 registered in 2016. Closure reason: Requested to close. Current status: account closed.'),
    (37, '2023-02-02 00:00:00', 'user 37 registered in 2019. Closure reason: Did not provide relevant documents. Current status: open.'),
    (38, '2023-03-07 00:00:00', 'user 38 registered in 2016. Closure reason: Fraudulent behaviour.'),
    (39, '2023-06-11 00:00:00', 'user 39 registered in 2017. Closure reason: Duplicate account. Current status: account closed.'),
    (40, '2023-05-21 00:00:00', 'user 40 registered in 2014. Closure reason: Did not provide relevant documents. Current status: account closed.'),
    (41, '2023-08-12 00:00:00', 'user 41 registered in 2020. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (42, '2023-09-09 00:00:00', 'user 42 registered in 2018. Closure reason: Duplicate account.'),
    (43, '2023-05-26 00:00:00', 'user 43 registered in 2014. Closure reason: Did not provide relevant documents.'),
    (44, '2023-03-06 00:00:00', 'user 44 registered in 2016. Closure reason: Duplicate account. Current status: account closed.'),
    (45, '2023-08-07 00:00:00', 'user 45 registered in 2016. Closure reason: Did not provide relevant documents. Current status: account closed.'),
    (46, '2023-10-04 00:00:00', 'user 46 registered in 2017. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (47, '2023-01-28 00:00:00', 'user 47 registered in 2019. Closure reason: Fraudulent behaviour. Current status: open. Account has been re-opened'),
    (48, '2023-01-17 00:00:00', 'user 48 registered in 2021. Closure reason: Did not provide relevant documents. Current status: open. Account has been re-opened'),
    (49, '2023-02-26 00:00:00', 'user 49 registered in 2015. Closure reason: Unsupported activity. Current status: account closed.'),
    (50, '2023-06-19 00:00:00', 'user 50 registered in 2015. Closure reason: Duplicate account.'),
    (51, '2023-01-13 00:00:00', 'user 51 registered in 2015. Closure reason: Duplicate account. Current status: open.'),
    (52, '2023-03-02 00:00:00', 'user 52 registered in 2017. Closure reason: Did not provide relevant documents.'),
    (53, '2023-01-14 00:00:00', 'user 53 registered in 2014. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (54, '2023-09-13 00:00:00', 'user 54 registered in 2018. Closure reason: Requested to close. Current status: open.'),
    (55, '2023-07-17 00:00:00', 'user 55 registered in 2017. Closure reason: Requested to close. Current status: open. Account has been re-opened'),
    (56, '2023-10-12 00:00:00', 'user 56 registered in 2018. Closure reason: Did not provide relevant documents. Current status: account closed.'),
    (57, '2023-09-27 00:00:00', 'user 57 registered in 2016. Closure reason: Duplicate account. Current status: open.'),
    (58, '2023-02-15 00:00:00', 'user 58 registered in 2017. Closure reason: Did not provide relevant documents. Current status: open.'),
    (59, '2023-10-26 00:00:00', 'user 59 registered in 2021. Closure reason: Duplicate account. Current status: account closed.'),
    (60, '2023-07-10 00:00:00', 'user 60 registered in 2020. Closure reason: Unsupported activity. Current status: open.'),
    (61, '2023-09-09 00:00:00', 'user 61 registered in 2020. Closure reason: Fraudulent behaviour.'),
    (62, '2023-07-12 00:00:00', 'user 62 registered in 2017. Closure reason: Fraudulent behaviour.'),
    (63, '2023-01-11 00:00:00', 'user 63 registered in 2017. Closure reason: Duplicate account. Current status: open.'),
    (64, '2023-02-27 00:00:00', 'user 64 registered in 2020. Closure reason: Unsupported activity. Current status: open.'),
    (65, '2023-07-12 00:00:00', 'user 65 registered in 2021. Closure reason: Did not provide relevant documents.'),
    (66, '2023-03-09 00:00:00', 'user 66 registered in 2014. Closure reason: Requested to close. Current status: open. Account has been re-opened'),
    (67, '2023-10-31 00:00:00', 'user 67 registered in 2022. Closure reason: Duplicate account.'),
    (68, '2023-06-26 00:00:00', 'user 68 registered in 2018. Closure reason: Duplicate account. Current status: open.'),
    (69, '2023-04-22 00:00:00', 'user 69 registered in 2022. Closure reason: Did not provide relevant documents. Current status: open.'),
    (70, '2023-09-22 00:00:00', 'user 70 registered in 2015. Closure reason: Unsupported activity.'),
    (71, '2023-10-04 00:00:00', 'user 71 registered in 2016. Closure reason: Duplicate account. Current status: open.'),
    (72, '2023-10-31 00:00:00', 'user 72 registered in 2020. Closure reason: Unsupported activity.'),
    (73, '2023-04-20 00:00:00', 'user 73 registered in 2020. Closure reason: Did not provide relevant documents. Current status: open.'),
    (74, '2023-10-15 00:00:00', 'user 74 registered in 2017. Closure reason: Duplicate account. Current status: account closed.'),
    (75, '2023-07-23 00:00:00', 'user 75 registered in 2021. Closure reason: Fraudulent behaviour.'),
    (76, '2023-03-15 00:00:00', 'user 76 registered in 2017. Closure reason: Duplicate account.'),
    (77, '2023-10-27 00:00:00', 'user 77 registered in 2019. Closure reason: Requested to close. Current status: account closed.'),
    (78, '2023-02-03 00:00:00', 'user 78 registered in 2022. Closure reason: Requested to close. Current status: account closed.'),
    (79, '2023-08-09 00:00:00', 'user 79 registered in 2019. Closure reason: Duplicate account.'),
    (80, '2023-05-19 00:00:00', 'user 80 registered in 2021. Closure reason: Requested to close.'),
    (81, '2023-01-02 00:00:00', 'user 81 registered in 2020. Closure reason: Duplicate account. Current status: open. Account has been re-opened'),
    (82, '2023-08-08 00:00:00', 'user 82 registered in 2020. Closure reason: Duplicate account. Current status: open. Account has been re-opened'),
    (83, '2023-04-25 00:00:00', 'user 83 registered in 2021. Closure reason: Duplicate account.'),
    (84, '2023-04-24 00:00:00', 'user 84 registered in 2020. Closure reason: Duplicate account.'),
    (85, '2023-01-15 00:00:00', 'user 85 registered in 2021. Closure reason: Fraudulent behaviour.'),
    (86, '2023-01-11 00:00:00', 'user 86 registered in 2020. Closure reason: Duplicate account. Current status: account closed.'),
    (87, '2023-08-08 00:00:00', 'user 87 registered in 2021. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (88, '2023-01-27 00:00:00', 'user 88 registered in 2015. Closure reason: Unsupported activity.'),
    (89, '2023-08-31 00:00:00', 'user 89 registered in 2020. Closure reason: Unsupported activity. Current status: account closed.'),
    (90, '2023-07-26 00:00:00', 'user 90 registered in 2021. Closure reason: Duplicate account. Current status: account closed.'),
    (91, '2023-07-10 00:00:00', 'user 91 registered in 2014. Closure reason: Unsupported activity. Current status: account closed.'),
    (92, '2023-07-18 00:00:00', 'user 92 registered in 2017. Closure reason: Fraudulent behaviour. Current status: account closed.'),
    (93, '2023-08-19 00:00:00', 'user 93 registered in 2021. Closure reason: Did not provide relevant documents. Current status: account closed.'),
    (94, '2023-01-26 00:00:00', 'user 94 registered in 2015. Closure reason: Duplicate account.'),
    (95, '2023-08-14 00:00:00', 'user 95 registered in 2019. Closure reason: Did not provide relevant documents. Current status: account closed.'),
    (96, '2023-05-11 00:00:00', 'user 96 registered in 2022. Closure reason: Fraudulent behaviour.'),
    (97, '2023-08-03 00:00:00', 'user 97 registered in 2020. Closure reason: Duplicate account. Current status: open. Account has been re-opened'),
    (98, '2023-01-17 00:00:00', 'user 98 registered in 2017. Closure reason: Unsupported activity. Current status: open.'),
    (99, '2023-10-08 00:00:00', 'user 99 registered in 2021. Closure reason: Did not provide relevant documents.'),
    (100, '2023-10-18 00:00:00', 'user 100 registered in 2019. Closure reason: Duplicate account. Current status: open. Account has been re-opened')
;

תשובות

(שימו לב שהפתרון נכתב ב-T-SQL ולא יעבוד בתיבת טקסט-SQL למעלה)

סרטון הסבר

תשובה 1

הצג פתרון
SELECT  
    registered_year = SUBSTRING(LOG_TEXT, CHARINDEX('registered in ', LOG_TEXT) + len('registered in.'), 4), -- LEN() ignores trailing spaces so a sapce is replaces with a period
    COUNT(*) AS n
FROM logs_q5
GROUP BY SUBSTRING(LOG_TEXT, CHARINDEX('registered in ', LOG_TEXT) + len('registered in.'), 4)
ORDER BY registered_year DESC;
9 records
registered_year n
2022 9
2021 13
2020 15
2019 9
2018 9
2017 14
2016 12
2015 11
2014 8

תשובה 2

הצג פתרון

WITH reason_extracted as (
    SELECT
    reason = REPLACE(
        SUBSTRING(
            log_text,
            CHARINDEX('closure reason: ', log_text) + LEN('closure reason: '),
            CASE
                WHEN CHARINDEX('Current status', log_text) = 0 THEN LEN(log_text)
            ELSE CHARINDEX('Current status', log_text) - CHARINDEX('closure reason: ',log_text) - LEN('closure reason: ')
                END),
        '.', '')
    FROM logs_q5
)

SELECT reason,
    COUNT(*) AS n
FROM reason_extracted
GROUP BY reason
ORDER BY n desc
5 records
reason n
Duplicate account 30
Fraudulent behaviour 23
Did not provide relevant documents 17
Unsupported activity 17
Requested to close 13