این اصطلاح «عالیه» یک جور متلک هم هست ولی خب خشن نیست. حداقل تو گفتار من. حالا هم گیتلب هنر خاصی زده و از چند ساعت قبل داون است. چرا؟ توییترشون می گه:
ظاهرا یک سیستم ادمین عزیز نیمههمکار ما زده دایرکتوری مهمی رو پاک کرده و با اینکه کل رپوزیتوری ها سرجاشون هستن، درخواست های پول ریکوئست و غیره پاک شدن. حداقل این برداشت من بوده. بک آپ ها که لحظه ای نیستن و ظاهرا اونها هم مشکل دارن.
حالا چرا عالیه؟ چون شفاف و معقول اینها رو گفته و چون یک داکیومنت آنلاین درست کرده که هر قدم تعمیراتی رو به ما میگه:
GitLab.com Database Incident - 2017/01/31 This incident affected the database (including issues and merge requests) but not the git repo's (repositories and wikis). Timeline (all times UTC): 2017/01/31 16:00/17:00 - 21:00 YP is working on setting up pgpool and replication in staging, creates an LVM snapshot to get up to date production data to staging, hoping he can re-use this for bootstrapping other replicas. This was done roughly 6 hours before data loss. Getting replication to work is proving to be problematic and time consuming (estimated at ±20 hours just for the initial pg_basebackup sync). The LVM snapshot is not usable on the other replicas as far as YP could figure out. Work is interrupted due to this (as YP needs the help of another collegue who’s not working this day), and due to spam/high load on GitLab.com 2017/01/31 21:00 - Spike in database load due to spam users - Twitter | Slack Blocked users based on IP address Removed a user for using a repository as some form of CDN, resulting in 47 000 IPs signing in using the same account (causing high DB load). This was communicated with the infrastructure and support team. Removed users for spamming (by creating snippets) - Slack Database load goes back to normal, some manual PostgreSQL vacuuming is applied here and there to catch up with a large amount of dead tuples. 2017/01/31 22:00 - Replication lag alert triggered in pagerduty Slack Attempts to fix db2, it’s lagging behind by about 4 GB at this point db2.cluster refuses to replicate, /var/opt/gitlab/postgresql/data is wiped to ensure a clean replication db2.cluster refuses to connect to db1, complaining about max_wal_senders being too low. This setting is used to limit the number of WAL (= replication) clients YP adjusts max_wal_senders to 32 on db1, restarts PostgreSQL PostgreSQL complains about too many semaphores being open, refusing to start YP adjusts max_connections to 2000 from 8000, PostgreSQL starts again (despite 8000 having been used for almost a year) db2.cluster still refuses to replicate, though it no longer complains about connections; instead it just hangs there not doing anything At this point frustration begins to kick in. Earlier this night YP explicitly mentioned he was going to sign off as it was getting late (23:00 or so local time), but didn’t due to the replication problems popping up all of a sudden. 2017/01/31 23:00-ish YP thinks that perhaps pg_basebackup is being super pedantic about there being an empty data directory, decides to remove the directory. After a second or two he notices he ran it on db1.cluster.gitlab.com, instead of db2.cluster.gitlab.com 2017/01/31 23:27 YP - terminates the removal, but it’s too late. Of around 310 GB only about 4.5 GB is left - Slack
و از اون بالاتر حتی لایو استریم می کنه تو یوتوب من در واقع گیت لب رو دوست دارم و به نظرم خوبه در موردش بیشتر حرف بزنیم و حتی بهش سوییچ کنیم. دلیل اصلی اش اینه که اجازه می ده ما رپوزیتوریهای خصوصی هم داشته باشیم و خب وقتی اجازه می ده من رپوزیتوریهای غیرآزادم رو روش بذارم به نظرم قشنگتر اینه که من رپوزیتوری های آزادم رو هم ببرم اونجا. این مستقل از بحث های فنی است و قابلیت هایی که برای کار داره.
همینجا تصمیم رو قطعی می کنیم که در مقابل شماره رادیو گیک در مورد گیت هاب یک شماره هم در مورد گیت لب بسازیم. بسیار شرکت جالبی است و آدم های جالبی پشتش هستن. امیدوارم این استرسشون زودتر تموم بشه.