summaryrefslogtreecommitdiffstats
path: root/libglusterfs
Commit message (Collapse)AuthorAgeFilesLines
* features/bitrot: tuanble object signing waiting time value for bitrotGaurav Kumar Garg2015-06-171-0/+3
| | | | | | | | | | | | | | | | | Currently bitrot using 120 second waiting time for object to be signed after all fop's released. This signing waiting time value should be tunable. Command for changing the signing waiting time will be #gluster volume bitrot <VOLNAME> signing-time <waiting time value in second> Change-Id: I89f3121564c1bbd0825f60aae6147413a2fbd798 BUG: 1231832 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11105 (cherry picked from commit 554fa0c1315d0b4b78ba35a2d332d7ac0fd07d48) Reviewed-on: http://review.gluster.org/11235 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* changetimerecorder : port log messages to a new frameworkMohamed Ashiq2015-06-173-6/+502
| | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/10938/ Cherry picked from 00f9a61fe8884062c141edd662424625d349a377 >Change-Id: I66e7ccc5e62482c3ecf0aab302568e6c9ecdc05d >BUG: 1194640 >Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> >Reviewed-on: http://review.gluster.org/10938 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Joseph Fernandes >Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Change-Id: I66e7ccc5e62482c3ecf0aab302568e6c9ecdc05d BUG: 1217722 Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Reviewed-on: http://review.gluster.org/11195 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes
* libglusterfs: update glfs-message header for reserved segmentsHumble Devassy Chirammal2015-06-121-4/+36
| | | | | | | | | | | | | | | | | | all the logging patches will be dependent on this segment allocation. Sending this change as a seperate one to avoid the conflicts. This also include backport of http://review.gluster.org/#/c/10420/ Change-Id: Iedd72ab6dc3526f1a6b01828807b5e6b9edcba90 BUG: 1194640 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/10400 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> (cherry picked from commit cd4e9abee1270c6400037912aacbbf171c6897c0) Reviewed-on: http://review.gluster.org/10670 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Fix dht_setxattr to follow files under migrationNithya Balachandran2015-06-041-2/+3
| | | | | | | | | | | | | | | | If a file is under migration, then any xattrs created on it are lost post migration of the file. This is because the xattrs are set only on the cached subvol of the source and as the source is under migration, it becomes a linkto file post migration. Change-Id: Ib8e233b519cf954e7723c6e26b38fa8f9b8c85c0 BUG: 1225839 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10968 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Bump op version and max op version for 3.7.2Shyam2015-06-031-1/+4
| | | | | | | | | | | | | As 3.7.1 is released, and a DHT configuration option needs higher op version, bumping the gluster op-version to 3.7.2 (or 30702). Change-Id: Iaed9e49b86a195653ddca55994e2c2398b2ee3bc BUG: 1227887 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/11071 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* libglusterfs: Copy d_len and dict as well into dst direntKrutika Dhananjay2015-06-032-3/+23
| | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/11026 Also, added memory allocation failure checks in light of the comments received @ http://review.gluster.org/#/c/10809/2/libglusterfs/src/gf-dirent.c, and http://review.gluster.org/#/c/10809/1/xlators/features/shard/src/shard.c Change-Id: Iad4736bba69e6977390f76d0c9eb64217a005b80 BUG: 1227576 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11052 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: Bump op version and max op version for 3.7.1Shyam2015-06-021-1/+4
| | | | | | | | | | | | | | | As 3.7 is released, and a DHT configuration option needs higher op version, bumping the gluster op-version to 3.7.1 (or 30701). Change-Id: I9747cf93b41be72e43077ed8e977e21eed99ccc3 BUG: 1225999 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-upstream-on: http://review.gluster.org/10849 Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/10975 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* quota: quota.conf backward compatibility fixvmallika2015-06-012-57/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/10889/ > In release-3.7 the format of quota.conf is changed. > There is a backward compatibility issues during upgrade > 1) There can be an issue when peer sync between node-3.6 and node-3.7 > 2) If the user sets/removes limit, there is will different format of > file in node-3.6 and node-3.7 > > This patch fixes the issue: > 1) restrict the user to execute command quota enable, limit-usage, remove > 2) write quota.conf in older format if op-version is less than 3.6 > > Change-Id: Ib76f5a0a85394642159607a105cacda743e7d26b > BUG: 1223739 > Signed-off-by: vmallika <vmallika@redhat.com> > Reviewed-on: http://review.gluster.org/10889 > Tested-by: NetBSD Build System > Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: I3d5075b96de1e125d30b0f6045389573aaaaecdb BUG: 1226153 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/10985 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* tier/tier.t: Fixing tier.t crash in regression runsJoseph Fernandes2015-05-302-8/+17
| | | | | | | | | | | | | | | | | | | | | | | 1) If the database file exists a. Dont try re-creating the db schema b. Dont try re-configuring the db. 2) Dont assert in fini_db () when connection is NULL >> Signed-off-by: Joseph Fernandes <josferna@redhat.com> >> Reviewed-on: http://review.gluster.org/10870 >> Tested-by: NetBSD Build System >> Tested-by: Gluster Build System <jenkins@build.gluster.com> >> Reviewed-by: Vijay Bellur <vbellur@redhat.com> >> Signed-off-by: Joseph Fernandes <josferna@redhat.com> >>Change-Id: Idd2833f55caaa6b3a77d935d877d6c4d2994da6a Change-Id: Ia2e044fce4e41bccc8fdada1cb21f240fdbd55df BUG: 1225077 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10992 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* protocol/client,server: Move EEXIST logs in mkdir and mknod to DEBUG levelKrutika Dhananjay2015-05-281-0/+4
| | | | | | | | | | | Backport of: http://review.gluster.org/10791 Change-Id: I096296f0b97f62f49577ca698ae34e28cce4a4b4 BUG: 1225919 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10973 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bit-rot-stub: versioning of objects in write/truncate fop instead ↵Raghavendra Bhat2015-05-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of open * This patch brings in the changes where object versioning is done in write and truncate fops instead of tracking them in open and create fops. This model works for both regular and anonymous fds. It also removes the race associated with open calls, create and lookups. This patch follows the below method for object versioning and notifications: Before sending writev on the fd, increase the ongoing version first. This makes anonymous fd write similar to the regular fd write by having the ongoing version increased before doing the write. Do following steps to do versioning: 1) For anonymous fds set the fd context (so that release is invoked) and add the fd context to the list maintained in the inode context. For regular fds the above think would have been done in open itself. 2) Increase the on-disk ongoing version 3) Increase the in memory ongoing version and mark inode as non-dirty 3) Once versioning is successfully done send write operation. If versioning fails, then fail the write fop. 5) In writev_cbk mark inode as modified. > Change-Id: I7104391bbe076d8fc49b68745d2ec29a6e92476c > BUG: 1207979 > Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> > Reviewed-on: http://review.gluster.org/10233 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I4bb86989b5fab02b9ed2950798b1a80e566f1024 BUG: 1220041 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/10722 Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-104-24/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. > Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f > BUG: 1207020 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10511 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I5a125b2d0ac7dafd3e278b7fe4c6c9dd07af76dd Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10720 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
* features/bitrot: Follow xattr naming conventionsVenky Shankar2015-05-101-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*" NOTE: With this patch, data on existing volumes would be resigned (which should be OK as of now since we do not expect many users as of now :-)) > Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0 > BUG: 1170075 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10181 > Tested-by: NetBSD Build System > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I3c18d7dc2db4beaca6e8d8d231b4171a7b18795f Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10718 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
* core: Global timer-wheelVenky Shankar2015-05-104-3/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instantiate a process wide global instance of the timer wheel data structure. Spawning glusterfs* process with option arg "--global-timer-wheel" instantiates a global instance of timer-wheel under global context (->ctx). Translators can make use of this process wide instance [via a call to glusterfs_global_timer_wheel()] instead of maintaining an instance of their own and possibly consuming more memory. Linux kernel too has a single instance of timer wheel where subsystems such as IO, networking, etc.. make use of. Bitrot daemon would be early consumers of this: bitrot translator instances for multiple volumes would track objects belonging to their respective bricks in this global expiry tracking data structure. This is also a first step to move GlusterFS timer mechanism to use timer-wheel. > Change-Id: Ie882df607e07acaced846ea269ebf1ece306d6ae > BUG: 1170075 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10380 > Tested-by: NetBSD Build System > Reviewed-by: Vijay Bellur <vbellur@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> Change-Id: I35c840daa9996a059699f8ea5af54c76ede7e09c Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10716 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* CTR/Libgfdb: Log typo fixJoseph Fernandes2015-05-101-2/+2
| | | | | | | | | | | Log typo fix for CTR Xlator and Libgfdb Change-Id: Ia39069a5ce9c48bbee937f1b5c5d749a30c9ac56 BUG: 1220100 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10742 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: make lookup-unhashed=auto do something actually usefulJeff Darcy2015-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The key concept here is to determine whether a directory is "clean" by comparing its last-known-good topology to the current one for the volume. These are stored as "commit hashes" on the directory and the volume root respectively. The volume's commit hash changes whenever a brick is added or removed, and a fix-layout is done. A directory's commit hash changes only when a full rebalance (not just fix-layout) is done on it. If all bricks are present and have a directory commit hash that matches the volume commit hash, then we can assume that every file is in its "proper" place. Therefore, if we look for a file in that proper place and don't find it, we can assume it's not on any other subvolume and *safely* skip the global (broadcast to all) lookup. Change-Id: Id6ce4593ba1f7daffa74cfab591cb45960629ae3 BUG: 1220064 Reviewed-on-master: http://review.gluster.org/#/c/7702/ Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/10729 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* bitrot/scrub: fix induced throttling in syncop_ftw_throttle()Venky Shankar2015-05-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Failing to reset scanning counter causes "incorrect" delay of around 50 seconds per directory entry. This causes scrubber to run extremely slowly. [ NOTE: This is a temporary fix. With the introduction of token bucket based throttling, inducing throttle via sleep() call would be unneeded. ] Also, fix logging messages in scrubber to log brick and full path of the object which is identified/marked as corrupted. > Change-Id: Id501bd15dcdbd8a09613f80f9d84050304740027 > BUG: 1170075 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10375 > Tested-by: NetBSD Build System > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> > Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Change-Id: I78f227f52f12549d62ecb35cbb70121424f7c2a7 BUG: 1220041 Reviewed-on: http://review.gluster.org/10714 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: use reference counting for mem_acct structuresJeff Darcy2015-05-0910-174/+193
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When freeing memory, our memory-accounting code expects to be able to dereference from the (previously) allocated block to its owning translator. However, as we have already found once in option validation and twice in logging, that translator might itself have been freed and the dereference attempt causes on of our daemons to crash with SIGSEGV. This patch attempts to fix that as follows: * We no longer embed a struct mem_acct directly in a struct xlator, but instead allocate it separately. * Allocated memory blocks now contain a pointer to the mem_acct instead of the xlator. * The mem_acct structure contains a reference count, manipulated in both the normal and translator allocate/free code using atomic increments and decrements. * Because it's now a separate structure, we can defer freeing the mem_acct until its reference count reaches zero (either way). * Some unit tests were disabled, because they embedded their own copies of the implementation for what they were supposedly testing. Life's too short to spend time fixing tests that seem designed to impede progress by requiring a certain implementation as well as behavior. Change-Id: Id929b11387927136f78626901729296b6c0d0fd7 BUG: 1219026 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/10417 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10723 Tested-by: NetBSD Build System
* cluster/afr : Prevent inode-evict during split-brain resolutionAnuradha2015-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/10134/ 1) Provided setfattr command to set timeout for split-brain choice. 2) If split-brain inspection/resolution is being done from the mount for a file, ref the inode when split-brain-choice is set. This inode will be unconditionally unref-ed after timeout seconds set by the user/default otherwise. 3) Updated the doc and testcase to reflect the changes. Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1 BUG: 1219388 Reviewed-on: http://review.gluster.org/10134 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10679
* cluster/ec: data heal implementation for ecPranith Kumar K2015-05-081-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Data self-heal: 1) Take inode lock in domain 'this->name:self-heal' on 0-0 range (full file), So that no other processes try to do self-heal at the same time. 2) Take inode lock in domain 'this->name' on 0-0 range (full file), 3) perform fxattrop+fstat and get the xattrs on all the bricks 3) Choose the brick with ec->fragment number of same version as source 4) Truncate sinks 5) Unlock lock taken in 2) 5) For each block take full file lock, Read from sources write to the sinks, Unlock 6) Take full file lock and see if the file is still sane copy i.e. File didn't become unusable while the bricks are offline. Update mtime to before healing 7) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 8) unlock lock acquired in 6) 9) unlock lock acquired in 1) Change-Id: I6f4d42cd5423c767262c9d7bb5ca7767adb3e5fd BUG: 1216303 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10384 Reviewed-on: http://review.gluster.org/10692 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Fix dictionary compare functionPranith Kumar K2015-05-082-1/+91
| | | | | | | | | | | | | | | | | | | | | If both dicts are NULL then equal. If one of the dicts is NULL but the other has only ignorable keys then also they are equal. If both dicts are non-null then check if for each non-ignorable key, values are same or not. value_ignore function is used to skip comparing values for the keys which must be present in both the dictionaries but the value could be different. geo-rep's stime xattr doesn't need to be present in list xattr but when getxattr comes on stime xattr even if there aren't enough responses with the xattr we should still give out an answer which is maximum of the stimes available. Change-Id: I8de2ceaa2db785b797f302f585d88e73b154167d BUG: 1216303 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10078 Reviewed-on: http://review.gluster.org/10690 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* geo-rep: rename handling in dht volumeNithya Balachandran2015-05-082-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: Glusterfs changelogs are stored in each brick, which records the changes happened in that brick. Georep will run in all the nodes of master and processes changelogs "independently". Processing changelogs is in brick level, but all the fops will be replayed on "slave mount" point. Problem: With a DHT volume, in changelog "internal fops" are NOT recorded. For Rename case, Rename is recorded in "hashed" brick changelog. (DHT's internal fops like creating linkto file, unlink is NOT recorded). This lead us to inconsistent rename operations. For example, Distribute volume created with Two bricks B1, B2. //Consider master volume mounted @ /mnt/master and following operations executed: cd /mnt/master touch f1 // f1 falls on B1 Hash mv f1 f2 // f2 falls on B2 Hash // Here, Changelogs are recorded as below: @B1 CREATE f1 @B2 RENAME f1 f2 Here, race exists between Brick B1 and B2, say B2 will get executed first. Source file f1 itself is "NOT PRESENT", so it will go ahead and create f2 (Current implementation). We have this problem When rename falls in another brick and file is unlinked in Master. Similar kind of issue exists in following case too(multiple rename): CREATE f1 RENAME f1 f2 RENAME f2 f1 Solution: Instead of carrying out "changelogging" at "HASHED volume", carry out at the "CACHED volume". This way we have rename operations carried out where actual files are present. So,Changelog recorded as : @B1 CREATE f1 RENAME f1 f2 credit: sarumuga@redhat.com PS: Some of the races as the one below are _NOT_ fixed by this patch * f1 and f2 exist. B1 and B2 are their respective cached subvols. For both files hashed-subvol == cached-subvol * mv f1 f2 on master. * B1 has change-log entry of rename f1 f2 * rebalance migrates f2 from B1 and B2 * mv f2 f1 on master. * B2 has change-log entry of rename f2 f1 Since changelog entries (rename f1 f2) and (rename f2 f1) are processed independently by gsyncds, which of either f1 and f2 survives on slave is subject to race. Note that on master its file f1 with name f1 which survived. On slave it can be either file f1 with name f1 or file f2 with name f2 based on who wins the race of processing changelog. BUG: 1219412 Change-Id: I43725d69635e2ce065135691ef629014e8df7d50 Original-Author: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/10410 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/10628 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ctr/xlator: Named lookup heal of pre-existing files, before ctr was ON.Joseph Fernandes2015-05-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The CTR xlator records file meta (heat/hardlinks) into the data. This works fine for files which are created after ctr xlator is switched ON. But for files which were created before CTR xlator is ON, CTR xlator is not able to record either of the meta i.e heat or hardlinks. Thus making those files immune to promotions/demotions. Solution: The solution that is implemented in this patch is do ctr-db heal of all those pre-existent files, using named lookup. For this purpose we use the inode-xlator context variable option in gluster. The inode-xlator context variable for ctr xlator will have the following, a. A Lock for the context variable b. A hardlink list: This list represents the successful looked up hardlinks. These are the scenarios when the hardlink list is updated: 1) Named-Lookup: Whenever a named lookup happens on a file, in the wind path we copy all required hardlink and inode information to ctr_db_record structure, which resides in the frame->local variable. We dont update the database in wind. During the unwind, we read the information from the ctr_db_record and , Check if the inode context variable is created, if not we create it. Check if the hard link is there in the hardlink list. If its not there we add it to the list and send a update to the database using libgfdb. Please note: The database transaction can fail(and we ignore) as there already might be a record in the db. This update to the db is to heal if its not there. If its there in the list we ignore it. 2) Inode Forget: Whenever an inode forget hits we clear the hardlink list in the inode context variable and delete the inode context variable. Please note: An inode forget may happen for two reason, a. when the inode is delete. b. the in-memory inode is evicted from the inode table due to cache limits. 3) create: whenever a create happens we create the inode context variable and add the hardlink. The database updation is done as usual by ctr. 4) link: whenever a hardlink is created for the inode, we create the inode context variable, if not present, and add the hardlink to the list. 5) unlink: whenever a unlink happens we delete the hardlink from the list. 6) mknod: same as create. 7) rename: whenever a rename happens we update the hardlink in list. if the hardlink was not present for updation, we add the hardlink to the list. What is pending: 1) This solution will only work for named lookups. 2) We dont track afr-self-heal/dht-rebalancer traffic for healing. > http://review.gluster.org/#/c/10370/ > Cherry picked from commit cb11dd91a6cc296e4a3808364077f4eacb810e48 > Change-Id: Ia4bbaf84128ad6ce8c3ddd70bcfa82894c79585f > BUG: 1212037 > Signed-off-by: Joseph Fernandes <josferna@redhat.com> > Signed-off-by: Dan Lambright <dlambrig@redhat.com> > Reviewed-on: http://review.gluster.org/10370 > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Tested-by: NetBSD Build System > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I367aa46c3f4b8f912248fb8be75866507f2538df BUG: 1219075 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10370 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10615 Tested-by: Vijay Bellur <vbellur@redhat.com>
* quota: support for inode quota in quota.confvmallika2015-05-074-0/+232
| | | | | | | | | | | | | | | | | | Currently when quota limit is set, corresponding gfid is set in quota.conf. This patch supports storing inode-quota limits in quota.conf and also stores additional byte for each gfid to differentiate between usage quota limit and inode quota limit. Change-Id: I444d7399407594edd280e640681679a784d4c46a BUG: 1218170 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/10069 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/10524
* ctr/libgfdb: Performance enhancer for unlink/create/rename/link fopsJoseph Fernandes2015-05-072-30/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) ctr_link_consistency option for ctr xaltor is provided so that the user can choose to switch it on or off. /* For link consistency we do a double update i.e mark the link * during the wind and during the unwind we update/delete the link. * This has a performance hit. We give a choice here whether we need * link consistency to be spoton or not using link_consistency flag. * This will have only one link update */ 2) In delete the wind time recording is moved to unwind path. /* Special performance case: * Updating wind time in unwind for delete. This is done here * as in the wind path we will not know whether its the last * link or not. For a last link there is not use to update any * wind or unwind time!*/ > http://review.gluster.org/#/c/10170/ > Cherry picked from commit 606d9734543208542afcf9df982bf2d560235ef6 > Change-Id: I209472fb816f939db4a868b97ba053b028f17ea6 > BUG: 1217786 > Signed-off-by: Joseph Fernandes <josferna@redhat.com> > Reviewed-on: http://review.gluster.org/10170 > Reviewed-by: Dan Lambright <dlambrig@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Signed-off-by: Joseph Fernandes <josferna@redhat.com> Change-Id: I4a89ef80875f36cff91520f712e1f47fde258a63 BUG: 1219066 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10170 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10614 Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Upcall: Send stat as part of cache_invalidation notificationsSoumya Koduri2015-05-073-19/+40
| | | | | | | | | | | | | | | | | Have added support to send attributes of both entries and its parent (include oldparent in case of RENAME fop) in the same notification request to avoid multiple rpc requests. Also, made changes in gfapi to send parent object and its attributes changed in a single upcall event. Change-Id: I92833da3bcec38d65216921c2ce4d10367c32ef1 BUG: 1217711 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10568 Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Upcall: Process each of the upcall events separatelySoumya Koduri2015-05-071-3/+13
| | | | | | | | | | | | | | | | | | As suggested during the code-review of Bug1200262, have modified GF_CBK_UPCALL to be exlusively GF_CBK_CACHE_INVALIDATION. Thus, for any new upcall event, a new CBK procedure will be added. Also made changes to store upcall data separately based on the upcall event type received. BUG: 1217711 Change-Id: I0f5e53d6f5ece16aecb514a0a426dca40fa1c755 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10049 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/10562 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* rebalance: Introducing local crawl and parallel migrationSusant Palai2015-05-072-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current patch address two part of the design proposed. 1. Rebalance multiple files in parallel 2. Crawl only bricks that belong to the current node Brief design explanation for the above two points. 1. Rebalance multiple files in parallel: ------------------------------------- The existing rebalance engine is single threaded. Hence, introduced multiple threads which will be running parallel to the crawler. The current rebalance migration is converted to a "Producer-Consumer" frame work. Where Producer is : Crawler Consumer is : Migrating Threads Crawler: Crawler is the main thread. The job of the crawler is now limited to fix-layout of each directory and add the files which are eligible for the migration to a global queue in a round robin manner so that we will use all the disk resources efficiently. Hence, the crawler will not be "blocked" by migration process. Producer: Producer will monitor the global queue. If any file is added to this queue, it will dqueue that entry and migrate the file. Currently 20 migration threads are spawned at the beginning of the rebalance process. Hence, multiple file migration happens in parallel. 2. Crawl only bricks that belong to the current node: -------------------------------------------------- As rebalance process is spawned per node, it migrates only the files that belongs to it's own node for the sake of load balancing. But it also reads entries from the whole cluster, which is not necessary as readdir hits other nodes. New Design: As part of the new design the rebalancer decides the subvols that are local to the rebalancer node by checking the node-uuid of root directory prior to the crawler starts. Hence, readdir won't hit the whole cluster as it has already the context of local subvols and also node-uuid request for each file can be avoided. This makes the rebalance process "more scalable". Change-Id: I6f1b44086a09df8ca23935fd213509c70cc0c050 BUG: 1217381 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10466 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: N Balachandran <nbalacha@redhat.com>
* libglusterfs: Fix cluster_entrylk retryPranith Kumar K2015-05-071-3/+3
| | | | | | | | | | | | Change-Id: I92ff46bae36d39a449d4bbaedc88a322992f65eb BUG: 1215265 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10391 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit e6f2472d2434ab43a30720ef4de2e0abc0a3f4ac) Reviewed-on: http://review.gluster.org/10597 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: remove replace brick with data migration support form cli/glusterdGaurav Kumar Garg2015-05-071-1/+0
| | | | | | | | | | | | | | | | | Replace-brick operation with data migration support have been deprecated from gluster. With this fix replace brick command will support only one commad gluster volume replace-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK> {commit force} Change-Id: Ib81d49e5d8e7eaa4ccb5830cfec2bc081191b43b BUG: 1218602 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10577 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaushal M <kaushal@redhat.com>
* syncop: Implement syncop_fxattropPranith Kumar K2015-05-062-0/+23
| | | | | | | | | | | Backport of http://review.gluster.org/10382 BUG: 1216303 Change-Id: I4433002906efc6894b4ff8de8fefe8b7bc954dcf Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10438 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Implement cluster-syncopPranith Kumar K2015-05-057-422/+2480
| | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10240 This patch implements syncop equivalent for cluster of xlators. The xlators on which the fop needs to be performed is taken in input arguments to the functions and the responses are gathered and provided as the output. This idea is taken from afr-v2 self-heal implementation by Avati. BUG: 1216303 Change-Id: I189400ea5bb3205aae928a72afbb6c960968b65a Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10439 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: Don't log geo-rep safe errors in mount logsKotresh HR2015-05-041-0/+4
| | | | | | | | | | | | | | | | | | | | | EEXIST and ENOENT are safe errors for geo-replication. Since mkdir is captured in all the bricks of the changelog. mkdir is tried multiple times as per the number of bricks. The first one to process by gsyncd will succeed and all others will get EEXIST. Hence EEXIST is a safe error and can be ignored. Similarly ENOENT also in rm -rf case. And also gsyncd validates these errors and log them in master if it is genuine error. This is up with the patch http://review.gluster.org/#/c/10048/ Hence ignoring above said safe errors. BUG: 1217938 Change-Id: I1962a85f23fe5e30448ceec1b6ddcb5724ed5627 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10184 Reviewed-on: http://review.gluster.org/10501 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* arbiter: load arbiter xlator on every 3rd brick of a replica 3 AFR subvolRavishankar N2015-05-031-0/+14
| | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10257 Logic for adding the 'glusterd_brickinfo->group' member and using it to find the brick positon has been taken from http://review.gluster.org/#/c/9919. Thanks to Jeff Darcy for that. This patch is a part of the arbiter logic implementation for 3 way AFR details of which can be found at http://review.gluster.org/#/c/9656/ Change-Id: Idbfe4f29ee8e098e0102def8f38b32314316b188 BUG: 1217689 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10479 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* nfs: make it possible to disable nfs.mount-rmtabNiels de Vos2015-05-032-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there are many NFS-clients doing very often mount/unmount actions, the updating of the 'rmtab' can become a bottleneck and cause delays. In these situations, the output of 'showmount' may be less important than the responsiveness of the (un)mounting. By setting 'nfs.mount-rmtab' to the value "/-", the cache file is not updated anymore, and the entries are only kept in memory. Cherry picked from commit 331ef6e1a86bfc0a93f8a9dec6ad35c417873849: > BUG: 1169317 > Change-Id: I40c4d8d754932f86fb2b1b2588843390464c773d > Reported-by: Cyril Peponnet <cyril@peponnet.fr> > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/9223 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: soumya k <skoduri@redhat.com> > Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> This change also contains the fixes to the test-case from: > > nfs: fix spurious failure in bug-1166862.t > > In some environments, "showmount" could return an NFS-client that does > not start with "1". This would cause the test-case to fail. The check is > incorrect, the number of lines should get counted instead. > > Also moving the test-case to the .../nfs/... subdirectory. > > Cherry picked from commit ee9b35a780607daddc2832b9af5ed6bf414aebc0: > BUG: 1166862 > Change-Id: Ic03aa8145ca57d78aea01564466e924b03bb302a > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/10419 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> > Change-Id: I40c4d8d754932f86fb2b1b2588843390464c773d BUG: 1215385 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10379 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Implementation of sync lock as recursive lock to avoid crash.anand2015-04-302-54/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem : In glusterd,we are using big lock which is implemented based on sync task frame work for thread synchronization and rcu lock for data consistency. sync task frame work swap the threads if there is no worker poll threads available,due to this rcu lock and rcu unlock was happening in different threads (urcu-bp will not allow this),resulting into glusterd crash. fix : To avoid releasing the sync lock(big lock) in between rcu critical section,implemented sync lock as recursive lock. More details: link : http://www.spinics.net/lists/gluster-devel/msg14632.html Change-Id: I2b56c1caf3f0470f219b1adcaf62cce29cdc6b88 BUG: 1216942 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/10285 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com> (cherry picked from commit ada6b3a8800867934af57a57d5312f5a5d8374f0) Reviewed-on: http://review.gluster.org/10432 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* timer: Fix use after free issuePranith Kumar K2015-04-261-1/+5
| | | | | | | | | | BUG: 1215173 Change-Id: Ic0fcfa009d1e4828ffd245154cd3e042a1b2bc2e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10368 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: avoid crashes in gf_msg dup-detection codeJeff Darcy2015-04-221-0/+13
| | | | | | | | | | | | | | | | | Use global_xlator for allocations so that we don't try to free objects belonging to an already-deleted translator (which will crash). Change-Id: Ie72a546e7770cf5cb8a8370e22448c8d09e3ab37 BUG: 1214220 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/10319 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/10330 Tested-by: NetBSD Build System
* quota/disperse: handle inode quotas in xattr aggregatevmallika2015-04-142-7/+25
| | | | | | | | | | | | | | | | | | | | with the inode quota feature, quota size is now increased from 64bit to 192bits which contains values of 'file size', 'file count' and 'dir count' This change in quota size xattr needs to be handled in disperse xattr aggregation Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: I5fd28aa9f5b8b6cba83a98360236417a97ac16ee BUG: 1207967 Reviewed-on: http://review.gluster.org/10112 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Replace transaction peers listsKaushal M2015-04-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transaction peer lists were used in GlusterD to peers belonging to a transaction. This was needed to prevent newly added peers performing partial transactions, which could be incorrect. This was accomplished by creating a seperate transaction peers list at the beginning of every transaction. A transaction peers list referenced the peerinfo data structures of the peers which were present at the beginning of the transaction. RCU protection of peerinfos referenced by the transaction peers list is a hard problem and difficult to do correctly. To have proper RCU protection of peerinfos, the transaction peers lists have been replaced by an alternative method to identify peers that belong to a transaction. The alternative method is to the global peers list along with generation numbers to identify peers that should belong to a transaction. This change introduces a global peer list generation number, and a generation number for each peerinfo object. Whenever a peerinfo object is created, the global generation number is bumped, and the peerinfos generation number is set to the bumped global generation. With the above changes, the algorithm to identify peers belonging to a transaction with RCU protection is as follows, - At the beginning of a transaction, the current global generation number is saved - To identify if a peers belonging to the transaction, - Start a RCU read critical section - For each peer in the global peers list, - If the peers generation number is not greater than the saved generation number, continue with the action on the peer - End the RCU read critical section The above algorithm guarantees that, - The peer list is not modified when a transaction is iterating through it - The transaction actions are only done on peers that were present when the transaction started But, as a transaction could iterate over the peers list multiple times, the algorithm cannot guarantee that same set of peers will be selected every time. A peer could get deleted between two iterations of the list within a transaction. This problem existed with transaction peers list as well, but unlike before now it will not lead to invalid memory access and potential crashes. This problem will be addressed seprately. This change was developed on the git branch at [1]. This commit is a combination of the following commits on the development branch. 52ded5b Add timespec_cmp 44aedd8 Add create timestamp to peerinfo 7bcbea5 Fix some silly mistakes 13e3241 Add start time to opinfo 17a6727 Use timestamp comparisions to identify xaction peers instead of a xaction peer list 3be05b6 Correct check for peerinfo age 70d5b58 Use read-critical sections for peer list iteration ba4dbca Use peerinfo timestamp checks in op-sm instead of xaction peer list d63f811 Add more peer status checks when iterating peers list in glusterd-syncop 1998a2a Timestamp based peer list traversal of mgmtv3 xactions f3c1a42 Remove transaction peer lists b8b08ee Remove unused labels 32e5f5b Remove 'npeers' usage a075fb7 Remove 'npeers' from mgmt-v3 framework 12c9df2 Use generation number instead of timestamps. 9723021 Remove timespec_cmp 80ae2c6 Remove timespec.h include a9479b0 Address review comments on 10147/4 [1]: https://github.com/kshlm/glusterfs/tree/urcu Change-Id: I9be1033525c0a89276f5b5d83dc2eb061918b97f BUG: 1205186 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/10147 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* build: make contrib/uuid dependency optionalNiels de Vos2015-04-108-37/+110
| | | | | | | | | | | | | | | | | | | On Linux systems we should use the libuuid from the distribution and not bundle and statically link the contrib/uuid/ bits. libglusterfs/src/compat-uuid.h has been introduced and should become an abstraction layer for different UUID APIs. Non-Linux operating systems should implement their compatibility layer there. Once all operating systems have an implementation in compat-uuid.h, we can remove contrib/uuid/ from the repository completely. Change-Id: I345e5357644be2521685e00358bb8c83c4ea0577 BUG: 1206587 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10129 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* quota/cli: validate quota hard-limit optionvmallika2015-04-102-16/+53
| | | | | | | | | | | | | | | | | | | Quota hard-limit is supported only upto: 9223372036854775807 (int 64) In CLI, it is allowed to set the value upto 16384PB (unsigned int 64), this is not a valid value as the xattrop for quota accounting and the quota enforcer operates on a signed int64 limit value. This patches fixes the problem in CLI and allows user to set the hard-limit value only from range 0 - 9223372036854775807 Change-Id: Ifce6e509e1832ef21d3278bacfa5bd71040c8cba BUG: 1206432 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/10022 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: avoid possibility of crash when ctx is nullHumble Devassy Chirammal2015-04-081-0/+3
| | | | | | | | | | | | | | ctx is passed to gf_log_inject_timer_event() and pass through to __gf_log_inject_timer_event() where the struct members are getting dereferenced, and can cause crash if the passed ctx is null. This patch avoids the issue. Change-Id: I153dbb5d3744898429139e3d40bb4f0e9093632a BUG: 1208118 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/10102 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* libglusterfs/syncop: Add xdata to all syncop callsRaghavendra Talur2015-04-083-169/+509
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for xdata in both the request and response path of syncops. Few calls like lookup already had the support; have renamed variables in few places to maintain uniformity. xdata passed downwards is known as xdata_in and xdata passed upwards is known as xdata_out. There is an old patch by Jeff Darcy at http://review.gluster.org/#/c/8769/3 which does the same for some selected calls. It also brings in xdata support at gfapi level. xdata support at gfapi level would be introduced in subsequent patches. Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc BUG: 1158621 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9859 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* nfs : introducing logging framework for nfs translatorManikandan Selvaganesh2015-04-081-0/+4
| | | | | | | | | | Change-Id: I3a47cdd06595c87da8e822d11683d68b43c11cda BUG: 1194640 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9945 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* ctr : Fix for heating of files during promotion/demotionJoseph Fernandes2015-04-084-19/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix will solve the heating of the files during the promotion or demotion. Promotion: ~~~~~~~~~ When a file gets promoted it get the current time stamp during creation only, but following writes or reads during the migration wont heat the file. Demotion: ~~~~~~~~ When a file gets demoted it get the wind/unwind time stamp is set to zero. The following writes or reads during the migration wont heat the file. What is remaining ? ~~~~~~~~~~~~~~~~~ Bug 1209129 ( https://bugzilla.redhat.com/show_bug.cgi?id=1209129 ) Inspite of this fix there is still a issue remaining, i.e the heat of the file is not keep intact during a internal rebalance activity i.e a rebalance within a tier. Change-Id: I01e82dc226355599732d40e699062cee7960b0a5 BUG: 1207867 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10080 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* bitrot/scrub: Scrubber fixesVenky Shankar2015-04-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a handful of problem with scrubber which are detailed below. Scrubber used to skip objects for verification due to missing fd iterface to fetch versioning extended attributes. Similar to the inode interface, an fd based interface in POSIX is now introduced. Moreover, this patch also fixes potential false reporting by scrubber due to: An object gets dirtied and signed when scrubber is busy calculatingobject checksum. This is fixed by caching the signed version when an object is first inspected for stalenes, i.e., during pre-compute stage. This version is used to verify checksum in the post-compute stage when the signatures are compared for possible corruption. Side effect of _not_ sending signature length during signing resulted in "truncated" signature to be set for an object. Now, at the time of signing, the signature length is sent and is used in place of invoking strlen() to get signature length (which could have possible 00s). The signature length itself is not persisted in the signature xattr, but is calculated on-the-fly by substracting the xattr length by the "structure" header size. Some of the log entries are made more meaningful (as and aid for debugging). Change-Id: I938bee5aea6688d5d99eb2640053613af86d6269 BUG: 1207624 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10118 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: fix tier.c problems found prior to feature freezeDan Lambright2015-04-062-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves tiering translator issues taken from the list in bug 1203776. These issues have been selected to be fixed first. The rest will be fixed in a subsequent patch (or are not a problem). 3. Replace hardcoded #defines of promote/demote file names 6. Use loc_wipe() in migrate_using_query_file() 9. Only promote/demote files on the same node on which they reside. 14. Replace calloc with GF_CALLOC in tier.c and ensure freeing done properly. 15. Handle if parse_query_str fails 22. Only load gfdb library on server side, remove SQL references from client. Change-Id: I6563b11e58ab2e4c6b1ce44db755781ad6d930fb BUG: 1203776 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9987 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-047-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* iobuf: Do not call __iobuf_ref directlyRaghavendra Talur2015-04-021-2/+2
| | | | | | | | | | | | | | | | | | iobuf_get will be creating the iobuf and hence lock is not necessary to increment ref. However, it is a good practice to call iobuf_ref instead of __iobuf_ref so that we have a single point to get refs and this can be used later to do mem accounting etc. Change-Id: I1fd328c3c463c23fd5f6df505ccb5c86f6207f28 BUG: 1199075 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9812 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>