Page MenuHomePhabricator

jcrespo (Jaime Crespo)
Sr Database Administrator

Projects (12)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
May 11 2015, 8:31 AM (489 w, 3 h)
Availability
Available
IRC Nick
jynus
LDAP User
Jcrespo
MediaWiki User
JCrespo (WMF) [ Global Accounts ]

Recent Activity

Today

jcrespo added a comment to T375382: Post pc1013 crash.

It appears it was a hw error on memory leading to an uncorrectable memory error, leading to killing mysql:

Mon, Sep 23, 12:21 PM · DC-Ops, ops-eqiad, DBA
jcrespo created P69389 pc1013 dmesg.
Mon, Sep 23, 12:20 PM
jcrespo created P69388 pc1013 crash database log.
Mon, Sep 23, 12:09 PM
jcrespo added a comment to T375186: databases preswitchover checks.

Check the other host on puppet/icinga with notifications disabled, I think I saw others, but maybe those are being setup/decom: db2185/6/7.

Mon, Sep 23, 11:56 AM · Data-Persistence-SRE, Patch-For-Review, DBA
jcrespo updated the task description for T375186: databases preswitchover checks.
Mon, Sep 23, 11:37 AM · Data-Persistence-SRE, Patch-For-Review, DBA
jcrespo added a comment to T375186: databases preswitchover checks.

@ABran-WMF I see that T373579 in theory its productionization has finished and it is pooled as a candidate master, but it has notifications disabled. Is that expected (e.g. hardware crash)?

Mon, Sep 23, 11:27 AM · Data-Persistence-SRE, Patch-For-Review, DBA
jcrespo added a subtask for T370962: Southward Datacenter Switchover (September 2024): T375186: databases preswitchover checks.
Mon, Sep 23, 10:58 AM · Patch-For-Review, Datacenter-Switchover, serviceops
jcrespo added a parent task for T375186: databases preswitchover checks: T370962: Southward Datacenter Switchover (September 2024).
Mon, Sep 23, 10:58 AM · Data-Persistence-SRE, Patch-For-Review, DBA
jcrespo updated subscribers of T375186: databases preswitchover checks.

@Scott_French wrote the patch:

Mon, Sep 23, 10:17 AM · Data-Persistence-SRE, Patch-For-Review, DBA
jcrespo updated the task description for T375186: databases preswitchover checks.
Mon, Sep 23, 10:15 AM · Data-Persistence-SRE, Patch-For-Review, DBA
jcrespo added a comment to T375144: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication.

Given the remaining time before switchover

Mon, Sep 23, 8:41 AM · Data-Persistence-SRE, Patch-For-Review, DBA, Wikimedia-production-error
jcrespo updated the task description for T375186: databases preswitchover checks.
Mon, Sep 23, 8:36 AM · Data-Persistence-SRE, Patch-For-Review, DBA

Thu, Sep 19

jcrespo added a comment to T373105: Migrate servers in codfw racks D7 & D8 from asw to lsw.

Resumed ms backups on codfw.

Thu, Sep 19, 4:37 PM · SRE-swift-storage, collaboration-services, DC-Ops, ops-codfw, netops, Infrastructure-Foundations, SRE
jcrespo added a comment to T375186: databases preswitchover checks.

I forgot to mention, I think orchestrator has a similar tool, but I found in the past a tool like db-replication-tree useful for this kind of work (preparation) and later tuning after switchover:

image.png (1×1 px, 640 KB)

Thu, Sep 19, 3:47 PM · Data-Persistence-SRE, Patch-For-Review, DBA
jcrespo added a comment to T373105: Migrate servers in codfw racks D7 & D8 from asw to lsw.

ms backups con codfw are stopped. As usual, not asking for priority over my workmates, but if you can not leave backup2007 for the end, I would appreciate it so I can restart them and finish my week soon (I won't be around tomorrow).

Thu, Sep 19, 2:53 PM · SRE-swift-storage, collaboration-services, DC-Ops, ops-codfw, netops, Infrastructure-Foundations, SRE
jcrespo closed T374972: Output test logs of production testing of the pre switchover tasks related to databases, a subtask of T371351: Automate the pre/post switchover tasks related to databases, as Resolved.
Thu, Sep 19, 2:23 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo closed T374972: Output test logs of production testing of the pre switchover tasks related to databases as Resolved.

This should be now done.

Thu, Sep 19, 2:23 PM
jcrespo added a comment to P69307 (An Untitled Masterwork).

Not super needed, but we can maybe add a note so we don't pool it accidentally or something? We don't use notes too often, and they were designed for things like this (awareness why it was depooled for an extended time).

Thu, Sep 19, 11:58 AM
jcrespo updated the task description for T375144: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication.
Thu, Sep 19, 9:55 AM · Data-Persistence-SRE, Patch-For-Review, DBA, Wikimedia-production-error
jcrespo updated the task description for T375144: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication.
Thu, Sep 19, 9:40 AM · Data-Persistence-SRE, Patch-For-Review, DBA, Wikimedia-production-error
jcrespo added a comment to T375144: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication.

Not resolved- this is a blocker for switchover, and we haven't yet fixed it for future runs. This is an outstanding issue and we need to do something about it, even if it is no longer happening.

Thu, Sep 19, 9:24 AM · Data-Persistence-SRE, Patch-For-Review, DBA, Wikimedia-production-error

Wed, Sep 18

jcrespo created T375144: ROW-based replicas broke with cleaned up heartbeat tables after setting up circular replication.
Wed, Sep 18, 10:28 PM · Data-Persistence-SRE, Patch-For-Review, DBA, Wikimedia-production-error
jcrespo reopened T374972: Output test logs of production testing of the pre switchover tasks related to databases, a subtask of T371351: Automate the pre/post switchover tasks related to databases, as In Progress.
Wed, Sep 18, 9:23 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo reopened T374972: Output test logs of production testing of the pre switchover tasks related to databases as "In Progress".
Wed, Sep 18, 9:22 PM
jcrespo closed T374972: Output test logs of production testing of the pre switchover tasks related to databases, a subtask of T371351: Automate the pre/post switchover tasks related to databases, as Resolved.
Wed, Sep 18, 9:22 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo closed T374972: Output test logs of production testing of the pre switchover tasks related to databases as Resolved.
Wed, Sep 18, 9:22 PM
jcrespo added a comment to P69307 (An Untitled Masterwork).

To check tomorrow.

Wed, Sep 18, 9:04 PM
jcrespo created P69252 test-s4 patch.
Wed, Sep 18, 11:22 AM

Tue, Sep 17

jcrespo added a comment to T371351: Automate the pre/post switchover tasks related to databases.

I found an actual bug: this is failing:

Failed to run cookbooks.sre.switchdc.databases.finalize.FinalizeSection.clean_heartbeat: Failed to run 'DELETE FROM heartbeat WHERE server_id=180360463' on db1125.eqiad.wmnet
Tue, Sep 17, 4:38 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo added a comment to T374972: Output test logs of production testing of the pre switchover tasks related to databases.

I found an actual bug: this is failing:

Failed to run cookbooks.sre.switchdc.databases.finalize.FinalizeSection.clean_heartbeat: Failed to run 'DELETE FROM heartbeat WHERE server_id=180360463' on db1125.eqiad.wmnet
Tue, Sep 17, 4:37 PM
jcrespo added a comment to T371351: Automate the pre/post switchover tasks related to databases.

one additional comment about the process, not necessarily the script, is that the post-maintenance script is confusing, as it will be ran post-maintenance, but the parameters will be in the direction of the maintenance (but replication will be flowing in the previous direction).

Tue, Sep 17, 4:24 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo added a comment to T374972: Output test logs of production testing of the pre switchover tasks related to databases.

I was able to see it fail, so the check works as expected (that's good):

**MASTER_TO db2230.codfw.wmnet MASTER STATUS is not stable, see the extended logs**
Failed to run cookbooks.sre.switchdc.databases.prepare.PrepareSection.wait_master_to_position: MASTER_TO db2230.codfw.wmnet MASTER STATUS is not stable, see the extended logs
Tue, Sep 17, 3:40 PM
jcrespo added a comment to T374972: Output test logs of production testing of the pre switchover tasks related to databases.

Minor usability, given the 10 seconds of wait, I would add a print that that is happening, when there is 1 second of pause it is ok, but I would print explicitly Something informative such as "waiting 10 second to make sure all pending events/transactions/writes are caught up" so the operator feels ok. :-D

Tue, Sep 17, 3:26 PM
jcrespo removed projects from T374972: Output test logs of production testing of the pre switchover tasks related to databases: Infrastructure-Foundations, SRE-tools.

Removing tags to avoid IRC spam until tests complete.

Tue, Sep 17, 3:14 PM
jcrespo added a comment to T371351: Automate the pre/post switchover tasks related to databases.
Tue, Sep 17, 3:08 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo added a comment to T371351: Automate the pre/post switchover tasks related to databases.

The other thing I saw after T371351#10153483 is that on the next step, if I run twice the disabling of GTID, there is no error or warning.

Tue, Sep 17, 2:57 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo created T374972: Output test logs of production testing of the pre switchover tasks related to databases.
Tue, Sep 17, 2:52 PM
jcrespo added a comment to T371351: Automate the pre/post switchover tasks related to databases.

I am going to create a dedicated task for production testing, to avoid also noise here and on IRC.

Tue, Sep 17, 2:49 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo added a comment to T371351: Automate the pre/post switchover tasks related to databases.

I executed:

Tue, Sep 17, 2:31 PM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo added a comment to T374933: Add section alias for databases in the test-s1 and test-s4 sections.

However, it was removed before at c9fe19ccd39c89274f9f6f.

Tue, Sep 17, 11:46 AM · DBA
jcrespo updated subscribers of T374933: Add section alias for databases in the test-s1 and test-s4 sections.

Thoughts?

Tue, Sep 17, 11:35 AM · DBA
jcrespo added a subtask for T371351: Automate the pre/post switchover tasks related to databases: T374933: Add section alias for databases in the test-s1 and test-s4 sections.
Tue, Sep 17, 11:35 AM · Patch-For-Review, Data-Persistence-SRE, DBA, Datacenter-Switchover
jcrespo added a parent task for T374933: Add section alias for databases in the test-s1 and test-s4 sections: T371351: Automate the pre/post switchover tasks related to databases.
Tue, Sep 17, 11:35 AM · DBA
jcrespo created T374933: Add section alias for databases in the test-s1 and test-s4 sections.
Tue, Sep 17, 11:29 AM · DBA
jcrespo closed T374774: db1125 (test-s4) is broken as Resolved.

I will take over the hosts for unrelated testing, will reload data anyway from backups.

Tue, Sep 17, 9:37 AM · DBA

Mon, Sep 16

jcrespo awarded T237020: Ferm should log errors when failing to create all configured rules a Like token.
Mon, Sep 16, 3:45 PM · Infrastructure-Foundations, SRE

Thu, Sep 12

jcrespo closed T374610: db1171:s8 is having performance issues and lagging as Resolved.

After restart, the server looks way less io stressed.

Thu, Sep 12, 5:18 PM · DBA, Data-Persistence-Backup, database-backups
jcrespo added a comment to T373102: Migrate servers in codfw racks D1 & D2 from asw to lsw.

I've stopped codfw media backups.

Thu, Sep 12, 3:27 PM · SRE-swift-storage, collaboration-services, ops-codfw, netops, Infrastructure-Foundations, SRE, DC-Ops
jcrespo awarded T374600: Move db2139 replication source under the new codfw primary db a Love token.
Thu, Sep 12, 2:11 PM · Data-Persistence-SRE, DBA
jcrespo added a comment to T365717: [wikireplicas] Update Admin docs.

I wasn't sure if "cannot apply to wikireplicas" included Sanitariums or only clouddbs. If it also includes Sanitariums, what would be your recommended procedure for my example above -- moving MASTER_HOST of db1154 (sanitarium) from db1196 to db1206? Did you ever change the MASTER_HOST for a Sanitarium in the past?

Thu, Sep 12, 12:47 PM · Data-Persistence-SRE, cloud-services-team (FY2024/2025-Q1-Q2), Data-Persistence, Data-Services
jcrespo added a comment to T365717: [wikireplicas] Update Admin docs.

@jcrespo I have added your comment above to MariaDB#Manipulating_the_Replication_Tree.

Would the method in https://wikitech.wikimedia.org/wiki/Primary_database_switchover work for changing the source of replication of a Sanitarium host, e.g. moving the MASTER_HOST of db1154 (sanitarium) from db1196 to db1206? I don't want to do it right now, but I'm trying to understand what the procedure would be in case db1196 has an issue, and update the example at Sanitarium_and_clouddb_instances#Sanitarium's_primary_failover.

Thu, Sep 12, 12:32 PM · Data-Persistence-SRE, cloud-services-team (FY2024/2025-Q1-Q2), Data-Persistence, Data-Services
jcrespo added a comment to T374610: db1171:s8 is having performance issues and lagging.
[11:44] <jynus> I will restart now db1171:s7, and s8 on the same host in ~1h, when the dumps there finish
[11:44] <jynus> to apply the buffer pool change
Thu, Sep 12, 11:45 AM · DBA, Data-Persistence-Backup, database-backups
jcrespo added a comment to T374610: db1171:s8 is having performance issues and lagging.

Yeah, that would explain it. So root cause found. I still want to merge the patch to optimize memory assignment (future schema changes will happen there).

Thu, Sep 12, 11:36 AM · DBA, Data-Persistence-Backup, database-backups
jcrespo claimed T374610: db1171:s8 is having performance issues and lagging.
Thu, Sep 12, 11:30 AM · DBA, Data-Persistence-Backup, database-backups
jcrespo created T374610: db1171:s8 is having performance issues and lagging.
Thu, Sep 12, 11:28 AM · DBA, Data-Persistence-Backup, database-backups
jcrespo renamed T374600: Move db2139 replication source under the new codfw primary db from db2139 replication source to Move db2139 replication source under the new codfw primary db.
Thu, Sep 12, 11:28 AM · Data-Persistence-SRE, DBA
jcrespo added a comment to T365717: [wikireplicas] Update Admin docs.

This method is used: https://wikitech.wikimedia.org/wiki/Primary_database_switchover but it only works when switching working replication and with direct parent-child relationships, so it cannot apply to wikireplicas -and it may not be, due to skipped/modified/additional transactions due to filtering (which is why it is so hard to handle them)

Thu, Sep 12, 10:56 AM · Data-Persistence-SRE, cloud-services-team (FY2024/2025-Q1-Q2), Data-Persistence, Data-Services
jcrespo added a comment to T365717: [wikireplicas] Update Admin docs.

I wouldn't be responsible if I didn't tell you that GTID has been very error prone to us, and that is has been very unreliable, and why I believe it is not used in production at the moment. GTID works well when it works well, and terrible when it doesn't. The only reason GTID is enabled in production is the innodb safe replication tracking on crash.

Thu, Sep 12, 10:40 AM · Data-Persistence-SRE, cloud-services-team (FY2024/2025-Q1-Q2), Data-Persistence, Data-Services

Wed, Sep 11

jcrespo added a comment to T373105: Migrate servers in codfw racks D7 & D8 from asw to lsw.

I will want to stop ms backups at codfw for backup2007 before it happens. No big deal if I don't do it (just some backups will be marked as failed and probably retried later), but that way we avoid extra failures.

Wed, Sep 11, 12:26 PM · SRE-swift-storage, collaboration-services, DC-Ops, ops-codfw, netops, Infrastructure-Foundations, SRE
jcrespo added a comment to T373102: Migrate servers in codfw racks D1 & D2 from asw to lsw.

I will want to stop ms backups at codfw for backup2011 before it happens. No big deal if I don't do it (just some backups will be marked as failed and probably retried later), but that way we avoid extra failures.

Wed, Sep 11, 12:26 PM · SRE-swift-storage, collaboration-services, ops-codfw, netops, Infrastructure-Foundations, SRE, DC-Ops
jcrespo awarded T374425: db2205 stuck replication/processlist a Like token.
Wed, Sep 11, 12:20 PM · DBA

Tue, Sep 10

Dzahn awarded T374410: Unstuck productionEqiad backup pool a Like token.
Tue, Sep 10, 4:55 PM · bacula, Data-Persistence-Backup
jcrespo closed T374410: Unstuck productionEqiad backup pool as Resolved.

Things are ok now, may tune more later. Will ask the deploy1002 issue separately.

Tue, Sep 10, 10:18 AM · bacula, Data-Persistence-Backup
jcrespo added a comment to T374425: db2205 stuck replication/processlist.

Let's rename it to the cause, not the suggested solution.

Tue, Sep 10, 10:06 AM · DBA
jcrespo renamed T374425: db2205 stuck replication/processlist from Reimage db2205/db2107 to db2205 stuck replication/processlist.
Tue, Sep 10, 10:02 AM · DBA
jcrespo added a comment to T363581: Build a machine-readable catalogue of mariadb tables in production.

Related: https://wikitech.wikimedia.org/wiki/Obsolete_or_unneeded_database_tables

Tue, Sep 10, 8:52 AM · DBA
jcrespo added a comment to T374410: Unstuck productionEqiad backup pool.

deploy1002.eqiad.wmnet backups failed. I am unsure if your team handles that, but do you happen to know if that no longer exists, but the backups are still active? Can it be removed from puppet cache/config?

Tue, Sep 10, 3:09 AM · bacula, Data-Persistence-Backup
jcrespo added a comment to T374410: Unstuck productionEqiad backup pool.

Aside from making sure config was loaded and distributed, I had to do some additional work:

Tue, Sep 10, 2:57 AM · bacula, Data-Persistence-Backup
jcrespo added a comment to T374410: Unstuck productionEqiad backup pool.

This errored out as it was running while the config updated:

586570  Incr       2,922    28.05 G  Error    10-Sep-24 02:38 arclamp2001.codfw.wmnet-Monthly-1st-Tue-productionEqiad-arclamp-application-data
Tue, Sep 10, 2:40 AM · bacula, Data-Persistence-Backup
jcrespo triaged T374410: Unstuck productionEqiad backup pool as High priority.

CC @Dzahn

Tue, Sep 10, 2:23 AM · bacula, Data-Persistence-Backup
jcrespo created T374410: Unstuck productionEqiad backup pool.
Tue, Sep 10, 2:22 AM · bacula, Data-Persistence-Backup

Mon, Sep 9

jcrespo awarded T356788: thanos-query probedown due to OOM of both eqiad titan frontends a Like token.
Mon, Sep 9, 3:20 PM · SRE Observability (FY2024/2025-Q1), Sustainability (Incident Followup), SRE, observability
jcrespo added a comment to T369253: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts).

Thanks, everone. I think @MatthewVernon 's suggestion is fair, and something I should have done. I will update the code to do so bounces get sent to root@. While I know mail is not reliable, I just found weird that the same kind of message (as it is automated) got filtered only that one time.

Mon, Sep 9, 10:02 AM · Infrastructure-Foundations, Mail

Jul 9 2024

jcrespo added a comment to P65571 mediabackups resharding.
0 -> backup1004               0
1 -> backup1004               0
2 -> backup1004               0
3 -> backup1004 -> backup1005 1 (done)
4 -> backup1005 *             1
5 -> backup1005 *             1
6 -> backup1005 *  backup1006 2
7 -> backup1005 *  backup1006 2
8 -> backup1006               2
9 -> backup1006 -> backup1007 3 (done)
a -> backup1006 -> backup1007 3 (done)
b -> backup1006 -> backup1007 3 (done)
c -> backup1007 -> backup1011 4 (done)
d -> backup1007 -> backup1011 4 (done)
e -> backup1007 -> backup1011 4 (done)
f -> backup1007 -> backup1011 4 (done)
Jul 9 2024, 3:56 PM

Jul 8 2024

jcrespo closed T334069: Evaluate and decide the future of MinIO for media backups given the upgrade requirements and increase the available storage space as Resolved.

Resharding completed, only pending 2 running purge screeen on ms-backup2001, 2002 for purging leftovers. backup1011 & backup2011 will have to be completented by backup1012 and backup2012 this Q.

Jul 8 2024, 12:40 PM · Data-Persistence-Backup, media-backups
jcrespo closed T365607: Reprovision missing files due to backup1005 hw issues as Resolved.
Jul 8 2024, 12:21 PM · Data-Persistence-Backup, media-backups

Jul 4 2024

jcrespo renamed T369253: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts) from check-dbbackup-time sometimes doesn't send email alerts to Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts).
Jul 4 2024, 8:16 AM · Infrastructure-Foundations, Mail
jcrespo edited projects for T369253: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts), added: Mail, Infrastructure-Foundations; removed observability, Data-Persistence-Backup, database-backups.

Running it manually it worked every time, so I am confused- it doesn't seem to be a script issue. Could it be a mailing subsystem issue?

Jul 4 2024, 8:14 AM · Infrastructure-Foundations, Mail
jcrespo created T369253: Alert email sent from backupmon1001 didn't reach engineer's google inbox (was: check-dbbackup-time sometimes doesn't send email alerts).
Jul 4 2024, 7:30 AM · Infrastructure-Foundations, Mail

Jul 3 2024

jcrespo changed the status of T334069: Evaluate and decide the future of MinIO for media backups given the upgrade requirements and increase the available storage space from Open to In Progress.

1 more week left to finish the resharding.

Jul 3 2024, 12:43 PM · Data-Persistence-Backup, media-backups
jcrespo triaged T334069: Evaluate and decide the future of MinIO for media backups given the upgrade requirements and increase the available storage space as High priority.
Jul 3 2024, 12:42 PM · Data-Persistence-Backup, media-backups
jcrespo placed T351895: Make it easy to retrieve disk usage trends on backup storage for hw provisioning up for grabs.
Jul 3 2024, 12:42 PM · database-backups, media-backups, bacula, Data-Persistence-Backup
jcrespo placed T313582: Migrate bacula director to new hardware and setup independent bacula directors/storage/metadata for each primary datacenter for increased redundancy up for grabs.

Backlog for when I come back.

Jul 3 2024, 12:41 PM · Patch-For-Review, Goal, bacula, Data-Persistence-Backup
jcrespo placed T330882: transferpy should not log cumin subcomands as ERRORs on a normal, succesful run up for grabs.
Jul 3 2024, 12:40 PM · Patch-For-Review, database-backups, Data-Persistence-Backup
jcrespo changed the status of T365607: Reprovision missing files due to backup1005 hw issues from Open to In Progress.
Jul 3 2024, 12:39 PM · Data-Persistence-Backup, media-backups
jcrespo added a comment to T365607: Reprovision missing files due to backup1005 hw issues.

5 million files left to recover!

Jul 3 2024, 12:38 PM · Data-Persistence-Backup, media-backups
jcrespo placed T283017: Create a dashboard for database backups monitoring/reporting up for grabs.

It would be nice to productionize this, but didn't had the time so far.

Jul 3 2024, 12:38 PM · dbbackups-dashboard, Patch-For-Review, Goal, database-backups, Data-Persistence-Backup
jcrespo updated the task description for T365607: Reprovision missing files due to backup1005 hw issues.
Jul 3 2024, 12:38 PM · Data-Persistence-Backup, media-backups
jcrespo changed the status of T362509: Setup new dbprov hosts and decommission the old ones from Open to Stalled.
Jul 3 2024, 12:37 PM · Patch-For-Review, database-backups, Data-Persistence-Backup
jcrespo closed T200035: DB backup restore skip empty databases as Resolved.

This has been workarounded with the mini-loader method of restoring backups, so I would call it resolved.

Jul 3 2024, 12:36 PM · Data-Persistence-Backup, Upstream
jcrespo updated the task description for T363812: Setup backups for es6, es7 and archive old read only backups.
Jul 3 2024, 12:24 PM · Patch-For-Review, database-backups, Data-Persistence-Backup
jcrespo closed T363812: Setup backups for es6, es7 and archive old read only backups as Resolved.

I will skip the "Remove dump user", as I think that may be useful and we will decide how to leave it long term when the es1, es2 & es3 backups are generated (with or without the user).

Jul 3 2024, 12:24 PM · Patch-For-Review, database-backups, Data-Persistence-Backup
jcrespo added a comment to T362509: Setup new dbprov hosts and decommission the old ones.

Let's wait a little bit before deleting the files on the old dbprovs just in case (I will do it when I come back).

Jul 3 2024, 11:50 AM · Patch-For-Review, database-backups, Data-Persistence-Backup
jcrespo updated subscribers of T362509: Setup new dbprov hosts and decommission the old ones.

@Volans @ABran-WMF FYI

Jul 3 2024, 11:49 AM · Patch-For-Review, database-backups, Data-Persistence-Backup

Jul 2 2024

jcrespo added a comment to T363812: Setup backups for es6, es7 and archive old read only backups.

es4 has already been archived on jobs 574899 and 574900, the two for es5 are running now. When finished, we will be able to close this ticket.

Jul 2 2024, 12:28 PM · Patch-For-Review, database-backups, Data-Persistence-Backup

Jul 1 2024

jcrespo added a comment to T365993: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2 - lsw1-e1-eqiad.

No action will be needed for backup1010 in the end.

Jul 1 2024, 1:52 PM · SRE-swift-storage, DBA, Data-Persistence, Infrastructure-Foundations, netops, SRE
jcrespo added a comment to T368907: Requesting GitLab account activation for [Davenyi].

@Davenyi please note you missed the options asked on the form, as seen above.

Jul 1 2024, 12:14 PM · GitLab (Account Approval), Release-Engineering-Team
jcrespo merged T368906: Requesting GitLab account activation for [YOUR DEVELOPER ACCOUNT USERNAME HERE] into T368907: Requesting GitLab account activation for [Davenyi].
Jul 1 2024, 12:13 PM · GitLab (Account Approval), Release-Engineering-Team
jcrespo merged task T368906: Requesting GitLab account activation for [YOUR DEVELOPER ACCOUNT USERNAME HERE] into T368907: Requesting GitLab account activation for [Davenyi].
Jul 1 2024, 12:12 PM · GitLab (Account Approval), Release-Engineering-Team
jcrespo added a comment to T344599: wikireplicas root access.

If I may @fnegri, the issue is that those hosts are in a way special, because they are pieces (data) of production (meaning here mediawiki) on cloud realm, so it may not be easy to solve with the current architecture. If there was an implementation where absolutely all non-public data and configuration was deleted on production side (e.g. a message protocol that cleans up everything and reconstructs them again on cloud network), that would solve all concerns- but that would be way more complex and will require a lot of work. And only now there is the start of a proper inventory where each table and column will document its privacy and concerns for global usage and editing.

Jul 1 2024, 11:56 AM · cloud-services-team (FY2024/2025-Q1-Q2), Data-Services, Infrastructure Security
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy